There is an extensive literature on microphone- and speaker-arrays for audio measurement and reconstruction (Ahrens, 2012; Pulkki, 2017). Let denote the number of microphones, and the number of speakers. When , we have monaural recording and playback, while describes stereo, etc. There are many approaches to making larger microphone and/or speaker-arrays ( and/or greater than 2). Since only a small number of speakers is affordable in typical practice, we are normally very concerned with human perception of spatial sound (Blauert, 1997), informing stereophonic, quadraphonic, and more generally ambisonic sound systems (Cooper, 1972; Gerzon, 1985,1974). Ambisonics extends stereo and quad with an expansion of the soundfield in terms of spherical harmonic basis functions centered on one listening point.2 Such systems must deal with the psychoacoustics of direction and timbre perception in various frequency ranges and for various geometries.
Given a very large number of microphones and speakers, it is possible to approximate complete reconstruction of the soundfield in a given space. The best known approach to this problem is Wave Field Synthesis (WFS) (Berkhout et al., 1993),3also called ``acoustic holography,'' or ``holophony'' (Berkhout, 1988). WFS reproduces (or synthesizes) a recorded soundfield physically, so that psychoacoustic questions can in principle be avoided.4 However, for best results at minimum expense, psychoacoustic considerations remain important.
The basic idea of an ``acoustic curtain'' for reconstructing soundfields in principle was described by Harvey Fletcher (1934), and at that time, two or three speakers was considered an adequate psychoacoustic approximation (Steinberg, 1934). Generating wave propagation from spherical waves (``secondary sources'') emitted along the wavefront is the essence of Huygens' Principle (1690).5 Both Huygens and Fletcher called for a continuum of wavefront samples. The theory of bandlimited sampling was introduced by Nyquist (1928), which, together with basic wave theory, can be considered the basis of this paper.
Deriving WFS begins with the Kirchhoff-Helmholtz integral (or in simplified form from the Rayleigh integral), which expresses any source-free acoustic field as a sum of contributions--called secondary sources--from the boundary of any enclosing surface (Firtha, 2018; Pierce, 1989; Berkhout et al., 1993; Ahrens, 2012). The same basic approach is used by the well known Boundary Element Method (BEM) for numerically computing a wavefield from its values along a boundary surface (Kirkup, 2007). The secondary sources in WFS aim to reconstruct (in the listening zone) the same soundfield produced by the original (primary) sources ``on stage.'' In practice, the secondary sources are simplified from an enclosing sphere down to (typically) a ring of loudspeakers in a line array around the listening space (which should not be reverberant and ideally anechoic--a major goal of Berkhout's WFS formulation was to include the reverberant as well as the direct soundfield). There are many variations on the details of deriving a practical WFS system, and some of them get close to the sampling-based point-of-view taken here. However, there does not appear to be a WFS paper which formulates the problem as basic soundfield sampling and reconstruction from samples (spatial analog-to-digital and digital-to-analog conversion). As a result, differences in final implementation do emerge, as will be brought out below.
Far-Field WFS (FFWFS) is the limiting form of WFS in which the sources are many wavelengths away from the recording mics and listening audience. By adopting this simplifying assumption, which is not restrictive in many applications, we can derive FFWFS very easily and clearly from sampling theory, and this is where we begin below.
http://arxiv.org/abs/1911.07575
.