Analysis and Design of Multichannel Systems for Perceptual Sound Field Reconstruction

Size: px

Start display at page:

Download "Analysis and Design of Multichannel Systems for Perceptual Sound Field Reconstruction"

Kathryn Cobb
5 years ago
Views:

1 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 8, AUGUST Analysis and Design of Multichannel Systems for Perceptual Sound Field Reconstruction Enzo De Sena, Student Member, IEEE, Hüseyin Hacıhabiboğlu, Senior Member, IEEE, and Zoran Cvetković, Senior Member, IEEE Abstract This paper presents a systematic framework for the analysis and design of circular multichannel surround sound systems. Objective analysis based on the concept of active intensity fields shows that for stable rendition of monochromatic plane wavesitisbeneficial to render each such wave by no morethan twochannels.basedonthatfinding, we propose a methodology for the design of circular microphone arrays, in the same configuration as the corresponding loudspeaker system, which aims to capture inter-channel time and intensity differences that ensure accurate rendition of the auditory perspective. The methodology is applicable to regular and irregular microphone/speaker layouts, and a wide range of microphone array radii, including the special case of coincident arrays which corresponds to intensity-based systems. Several design examples, involving first and higher-order microphones are presented. Results of formal listening tests suggest that the proposed design methodology achieves a performance comparable to prior art in the center of the loudspeaker array and a more graceful degradation away from the center. Index Terms Active intensity, microphone array, microphone directivity, multichannel audio, spatial hearing, surround sound recording, tangent panning law, time-intensity trading. I. INTRODUCTION AND MOTIVATION GENERATING the experience of spatial sound can be achieved in a number of ways. From a practical standpoint one aims to provide the most convincing experience with the minimum equipment and channels. Comparing different methods, at one extreme there are binaural techniques, which deliver a convincing experience over two channels by presenting necessary binaural cues [4]. Binaural presentation works best over headphones, but the perception of the reproduced field is severely compromised by head movements and by the mismatch between the individual HRTFs of the listener and the presented binaural cues. Alternatively, two loudspeakers with cross-talk cancellation can be used [5], but the sweet listening spot is very narrow. At the other extreme Manuscript received August 02, 2012; revised December 30, 2012 and April 02, 2013; accepted April 10, Date of publication April 25, 2013; date of current version May 08, This work was supported by the EPSRC under Grant EP/F001142/1. This work was done while H. Hacıhabiboğlu was with the Department of Informatics, King s College London. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Boaz Rafaely. E. De Sena and Z. Cvetković are with the Institute of Telecommunications, King s College London, Strand, London, WC2R 2LS, U.K. ( enzo.desena@kcl.ac.uk; zoran.cvetkovic@kcl.ac.uk). H. Hacıhabiboğlu is with the Informatics Institute, Middle East Technical University, Ankara, 06531, Turkey ( huseyin@ii.metu.edu.tr). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TASL there is wave-field synthesis (WFS), which typically uses a large number of channels to accurately reproduce the wave front generated by a virtual source [6], thus providing a wide sweet listening area. Higher-order Ambisonics (HOA) [7] [9] is capable of achieving similar effects by reconstructing spherical harmonics of the pressure field in the center of the listening area. In between these extremes there are systems with five or more channels. Such systems do not possess sufficient number of channels to simulate physical wave fronts without spatial aliasing, or to reconstruct ear signals accurately for listeners at multiple locations. Instead these systems must rely to a large degree on perceptual effects, most notably summing localization, to generate a spatially stable experience of the desired auditory scene. Summing localization describes the effect by which two loudspeakers radiating identical signals with given inter-channel level and time differences (ICLD and ICTD) result in a single, fused, auditory event [4]. The perceived location of this auditory event depends on both the ICLD and ICTD applied. Current commercial multichannel recordings rely to a great extent on summing localization. However, they are yet to achieve the spatial realism that is possible with the available number of channels. That is in part due to design legacy of sound production for the film industry, the main focus of which is creating attention grabbing effects and providing a general ambiance feel. Auralization is usually achieved in a synthetic manner through intensity panning between pairs of channels. Then ambient information is usually presented from the rear surround channels and directional localization information from the front channels [10]. This makes perfect sense when one is expected to be viewing a frontally placed screen, since localizing away from the screen breaks the audio-visual illusion. While a pleasing listening experience can be achieved most of the time using such systems, they do not necessarily yield presentations which are coherent with the acoustics of the performance venue or the desired virtual space. Due to this incoherence, the spatial auditory experience lacks realism and fidelity. A number of multichannel recording techniques which aim to overcome these limitations and provide a coherent auditory perspective in a wider area have been proposed [11] [18]. Theile proposes a class of non-coincident microphone arrays for recording frontal scenes in 3/2-stereo ITU-R standard format, which is known as optimized cardioid triangle (OCT) [12, p.255]. Williams describes more general guidelines for appropriately arranging standard first-order studio microphones [12] given a loudspeaker layout and a desired coverage angle /$ IEEE

2 1654 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2013 [13]. Johnston et al. propose the perceptual sound field reconstruction (PSR) [14] [16] scheme, which aims to render convincing auditory perspective by capturing interaural time and level differences. All these methods are still mainly a result of empirical observation and hands-on tuning. This work develops further insight into the underlying physical and perceptual phenomena, and based on that refines Johnston s PSR approach, and extends it to a more general and systematic framework. We depart from the original PSR idea of capturing interaural time and level differences, but rather aim to capture ICLDs and ICTDs which when reproduced by the given speaker system accurately render the auditory perspective. Furthermore, we leverage recent advances in the field of higher-order microphones [19] [22] which greatly extend the design space of directivity patterns enabling the implementation of more sophisticated and powerful recording concepts than possible with standard first-order microphones. Sound-field sampling performed by low channel count systems as considered in this paper is too sparse to allow for reconstruction with any reasonable physical accuracy. However, such low channel-count systems should at least be capable of rendering meaningful approximations of some very simple sound fields, and their performance for such sound fields can provide insight into the effects of various design choices, guiding some high-level design decisions, and narrowing down the design space. Therefore, in Section II we study reproduction capabilities of circular multichannel systems for monochromatic plane waves based on the concept of active intensity [8], [23]. One of the conclusions of this analysis is that cross-channel terms have an adverse effect on the size and stability of the sweet-spot. Guided by that observation, in Section III, microphone directivity design problem is then formulated as a psychoacoustic curve fitting problem aimed at capturing sound field cues which allow satisfactory rendition of the auditory perspective while suppressing undesirable cross-channel terms. Design examples for the case of pentagonal systems are then provided in Section IV. Results of formal subjective experiments comparing the performance of the proposed designs with prior-art are reported in Section V and Section VI. Conclusions are drawn in Section VII. II. ANALYTICAL CONSIDERATIONS This section is concerned with the analysis of circular microphone and loudspeaker arrays for recording and reproducing monochromatic plane waves. Consider a reproduction system which comprises loudspeakers, situated on a circle, at angles, as shown in Fig. 1(a). Assume that the loudspeaker array is centered at the origin, and that its radius is large enough so that within the listening area loudspeakers can be well approximated by plane-wave sources. The pressure component of the sound field at a position within the listening area, due to a monochromatic plane wave played by loudspeaker is given by Fig. 1. Considered multichannel (a) reproduction and (b) recording system, showing two elements of the loudspeaker and microphone arrays. where is the complex gain of the -th loudspeaker, is the wave number, and is the speed of sound. The complex pressure and velocity of the sound field are sums of individual loudspeaker components:, and,where is the air density, and is the unit vector co-directional with the acoustic axis of the -th loudspeaker (see Fig. 1(a)). The product of pressure and complex conjugate velocity is known as the complex intensity [23]. The real part of complex intensity, referred to as active intensity, is co-directional with the wave propagation [8]. The complex intensity of the considered system due to the monochromatic plane wave is given by where can be expressed as. Each component where and,with Hence, each component, where, contributes a complex intensity field which fluctuates in amplitude across the space with frequency: and propagates in the direction, which is perpendicular to the median line between channels and. Components on the other hand contribute a spatially uniform field in the direction of the -th loudspeaker. The active intensity field can be expressed as (2) (3) (4) (5) (6) (1)

DE SENA et al.: ANALYSIS AND DESIGN OF MULTICHANNEL SYSTEMS FOR PERCEPTUAL SOUND FIELD RECONSTRUCTION 1655 where and,andwhere we used the identities and.

3 DE SENA et al.: ANALYSIS AND DESIGN OF MULTICHANNEL SYSTEMS FOR PERCEPTUAL SOUND FIELD RECONSTRUCTION 1655 where and,andwhere we used the identities and.the first term in (7) corresponds to a spatially-uniform active intensity field, while the second term corresponds to active intensity components that fluctuate in space in different directions and with frequencies which depend both on (see (6)) and on the temporal frequency of the sound wave. The active intensity field due to a plane wave incident from the direction, which we aim to reconstruct, is uniform in space,, i.e it has no spatial fluctuations. In order to reproduce an active intensity field largely uniform in space, the cross-channel components in the second term of (7) should ideally be eliminated or at least suppressed as much as possible. Note that cross-terms in (7) cannot be eliminated completely, because that would imply only one active channel for each incident direction, prohibiting reproduction of wave directions other than the acoustic axes of the loudspeakers. Two active channels are therefore a minimum needed for continuous panoramic reproduction. Consider the optimization problem minimizing the energy of the cross terms subject to the constraint that the uniform component is in the correct direction: Solving (8) with numerical optimization methods (e.g. the downhill simplex method [24]) reveals that solutions have only two active channels, and in particular channels and such that. Based on this observation, we focus on systems such that each plane wave direction is rendered by only the pair of adjacent loudspeakers such that. The above analysis justifies physically the design of several surround methods which minimize cross-talk between non-adjacent channels [13], [17], [18]. Perceptual studies of Lee and Rumsey [25] support this design paradigm too. Auralization of sound sources in multichannel systems also employs two channels only [26], [27] for its good stability and locatedness properties. Note, however, that the technology proposed in this paper is fundamentally different from multichannel systems which employ pairwise panning, as it aims to design the system so that it records sound field cues which enable rendering sound sources and all their reflections in a manner which makes them perceptually consistent with their original directions, as well as with the acoustics of the original venue. Assume now that the signals played back by the considered loudspeaker system are recorded by an array comprised of microphones positioned on a circle of radius,alsoatangles asshowninfig.1(b).assumefurtherthat each loudspeaker plays back the signal recorded by the corresponding microphone without mixing, as in Johnston s original PSR scheme [14] [16]. In the considered recording-playback setup, the gain of the -th loudspeaker for a plane wave incident from direction is given by (8) (9) Fig. 2. Active intensity vector plots around the center of the loudspeaker array for five loudspeakers located in the far-field at angles, and source direction,i.e.themidline between two loudspeakers. Vector lengths are proportional to the amplitude of active intensity, while the grey level represents the angular error between the active intensity vectors and the source direction.the microphone array is coincident, and the directivity patterns are: omnidirectional, I-order cardioid, and III-order cardioid. where isthewaveamplitude,and is the directivity pattern of the -th microphone, which will be assumed to be real. It is also assumed that the wave phase is zero in the center of the microphone array. The cross-terms thus have the form,where,and (10) In the above it was concluded that a wave with an angle of incidence will be captured and reproduced by two channels only, and, while the contribution of other channels will be negligibly small. This requires minimizing outside of the sector. The impact of the suppression of the cross terms, achieved by making microphones progressively more selective, as well as the effect of the sound frequency on the uniformity of the reproduced sound field is illustrated in Fig. 2. Observe that even without a careful microphone directivity design, the active intensity field reproduced using the third-order cardioid is largely uniform. When only two adjacent channels, and, are active for source angles, the active intensity field has the form (11) While the first term in (11) is uniform in space, the second one is a vector fieldinthedirectionof, i.e. the median angle between the two loudspeakers, the intensity of which fluctuates in space with frequency equal to. Recall that this fluctuating component is unavoidable if one aims to render directions other than the acoustic axes of the loudspeakers. It fol-

4 1656 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2013 lows from (6) that the frequency of these fluctuations increases with the angular spacing between microphones and, and also increases linearly with the temporal frequency of the sound wave. For circularly symmetric arrays, which are studied later in the paper, the spatial frequency of the fluctuating term is. This quantifies the impact of the number of channels on the size of the area with approximately accurate sound field rendition. The radius of the microphone array has an effect only on term in (11), and hence a variation in causes only translation of the reproduced sound fieldinthedirectionof. Having concluded that for physical accuracy of reconstructed monochromatic plane waves it is beneficial to minimize outside the region, in the next section we focus on the design of directivity patterns within their sectors.tothisend,weturntopsychoacoustic criteria and design so that perceived directions of rendered sound sources agree with their actual directions. This is the focus of the next section. The reader interested more in general in acoustic scene analysis using circular microphone arrays can refer to [28]. III. MICROPHONE DIRECTIVITY DESIGN The design should ensure first that the sound power in the center of the loudspeaker array is constant for all directions. If only two adjacent channels are active for each angle,the power constraint becomes for all,, [27], [29]. Under this constraint, directivity patterns are completely specified by corresponding inter-channel level ratios as (12) In the following we will conduct our analysis mainly in terms of, as it makes some derivations more intuitive. Directivity patterns will be further constrained to be positive for all, as reproduction of out-of-phase signals may cause undesirable inside-the-head locatedness effect [4, p. 136]. Within these constraints, in the following we describe two methods for specific design of the directivity patterns. The first method is applicable to coincident arrays and it aims at rendering active intensity fields co-directional with the corresponding sources in the center of the listening area. This method will be derived based on physical considerations, but we show also that it is closely related to the tangent panning law used in intensity stereophony, where it is motivated by psychoacoustic criteria. The second method is a generalization of the first approach to non-coincident arrays, and it aims at shaping the directivity patterns so that the system operates along corresponding time-intensity psychoacoustic curves. A. Design of Coincident Arrays It follows from equation (11) in the case of coincident microphone arrays, that the active intensity vector in the center of the loudspeaker array,, is co-directional with the active intensity vector of a plane wave incident from direction if and only if (13) which can be expressed in terms of the inter-channel level ratio as (14) This last expression then completely specifies corresponding directivity patterns according to (13). An interesting result is obtained by expressing (13) as For the stereophonic case with loudspeakers at, (15) reduces to (15) and (16) which is equivalent to the well-known tangent panning law [30] used in intensity stereophony. This panning law was originally derived on the basis of perceptual considerations, and for low frequencies. On the other hand, the above result shows that the tangent panning law and its periphonic extension, vector base amplitude panning (VBAP) [27], are also based on physical aspects of the reproduced sound field. Microphone directivity patterns implementing (14) will be referred to in the following as intensity directivity (ID). Recording based solely on inter-channel level differences has been applied in a number of fields, and is usually deemed to yield sharper phantom images [11]. On the other hand, methods employing both time and level differences have regularly proven among the most naturally sounding and realistic of spatial microphone techniques [11]. Another advantage of time-intensity systems, as it will become evident in the following, is that they require microphones with lower spatial selectivity, which are less challenging to design and build [19], [20], [22]. B. Design Based on Time-Intensity Psychoacoustic Laws When there is a small time delay between two channels, the perceived direction of the auditory image shifts towards the leading loudspeaker [4]. If the delay does not exceed the summing localization threshold [4], it has a significant influence on the perceived direction of the auditory event. Fig. 3 shows the stereophonic psychoacoustic curves derived by Franssen [31], that map combinations of inter-channel level difference, denoted as, and inter-channel time difference, denoted as, to the perceived directions of corresponding auditory images. The upper curve in this figure,,represents all the pairs for which the auditory event is localized at the right loudspeaker, while the lower curve,,representsthe other limit at which the auditory event is localized at the left loudspeaker. Hence, in a system with maximum ICTD of, inter-channel time and level difference pairs which traverse a

5 DE SENA et al.: ANALYSIS AND DESIGN OF MULTICHANNEL SYSTEMS FOR PERCEPTUAL SOUND FIELD RECONSTRUCTION 1657 The constraints in (19) only specify end-points of the directivity pattern. One possible way of achieving gradual source displacement between these end points is to modify the tangent law in (14) according to (20) where has the role of reducing intensity differences to account for the presence of time differences. A particular value of is obtained by equating with,which yields (21) Fig. 3. Time-intensity psychoacoustic curves, adapted from Franssen [31]. The curves,,and represent the (ICTD,ICLD) pairs for which the auditory event is localized at the right loudspeaker, left loudspeaker, and on the midline between the loudspeakers, respectively. For a system with maximum ICTD ms, subjects localize the auditory event at the left and right loudspeaker for (ICTD,ICLD) pairs at points and, respectively. Two possible curves achieving gradual source displacement between these extremal points are plotted as solid and dashed lines. curve connecting with will create auditory events which move gradually from the right to the left loudspeaker. It should be noted that Franssen s curves were obtained using the standard stereophonic arrangement, that is, two loudspeakers separated by a base angle of 60. Williams reports similar psychoacoustic curves which can be also used in this context [13]. In the considered system, loudspeaker gains are determined by the directivity functions of the corresponding microphones: where is the direction of the sound source which creates an ICTD. A simple geometric argument (see Fig. 1(b)) shows that the time delay between microphones and,fora sound wave incident from direction,is (17) Maximal ICTDs are obtained for sources in the directions of the two microphones, i.e. for delays and. Directivity patterns which provide corresponding ICLDs, as given by curves and,mustsatisfy In the symmetric case these constraints become (18) (19) The directivity pattern defined in (20) will be referred to as timeintensity directivity (TID). It is instructive to consider in this context also the particular case of coincident arrays,, which corresponds to. One would expect that in this case, the directivity pattern in (20) would become equal to the tangent law in (14), that is also perceptually motivated. However, in the general formula (20), parameter is non-zero, and its particular value is given by (21) with, while in the case of the tangent law, which is also obtained from (21) with. This dichotomy is reconciled by noting that while Franssen s curves give minimal level difference needed to create an auditory event in the direction of one of the speakers, the tangent law uses the maximal level difference that achieves the same effect. Gradual displacement of the auditory event between the two speakers can be achieved by employing a monotonic function with either the minimal or maximal level differences needed for source auralization in loudspeaker directions. The difference lies in that the slope of at its extreme points and would need to be much higher in the case of the maximal level differences compared to the case of minimal level differences. Note that while the function in (20) provides a unifying framework for intensity and time-intensity systems, there are no findings which would make it psychoacoustically more legitimate than many other monotonic curves connecting the corresponding end time-intensity pairs. The two end points in (19) can be connected also by straight lines, and that would give (22) Owing to its shape, the directivity pattern in (22) will be referred to as time-intensity linear directivity (TILD). The two types of time-intensity curves defined by (20) and (22) are illustrated in Fig. 3 for a system with a maximum ICTD of ms. IV. DESIGN EXAMPLES Consider now circularly symmetric systems where,and, and such that all directivity functions are rotated versions of a single prototype, as would arise in systems where there is no preferred seating orientation. Further, let us focus on systems with channels, as that seems to be the minimum needed for satisfactory rendition of the auditory perspec-

6 1658 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2013 tive [14] [16], [32], [33] and the envelopment experience [34], and also for direct comparison with the original PSR technology and second-order ambisonics. The methodology of Section III-B can be used to design microphone directivity patterns for any array radius,aslongas the ICTD is within summing localization limits. What the optimal radius for the considered system is, if there is one, is an open question. The analysis in Section III shows that varying array radius causes only translation of the reconstructed active intensity field (see (11)). However, Johnston et al. propose 15.5 cm radius and support this design choice by anthropomorphic arguments [14] [16]. Our preliminary investigation of the naturalness of combinations of inter-aural time and level differences that a listener would experience in this same setup, based on Gaik s psychoacoustic studies [35], also supports the radius proposed by Johnston et al. [36]. Hence, in this study we focus on microphone arrays with a radius of 15.5 cm. This array radius corresponds to a maximum ICTD of 0.31 ms. Note that Williams and Franssen s curves intersect at this ICTD [13], [31], and that therefore the resulting directivity patterns would be identical regardless of which psychoacoustic curve is used. Solutions for will be sought in the form, which is a general expression of an -order axisymmetric directivity pattern [20] that can be realized through various microphone beamforming methods [19] [21]. Coefficients which yield a desired directivity pattern, as given in (14), (20) or (22), or an approximation of it, can be found using numerical optimization. Corresponding optimization criteria include: (i) constrain to match the desired directivity pattern at a number of equidistant angles in the region, and to have zeros at a number of equidistant angles within ; (ii) constrain to match the desired directivity pattern at a number of equidistant angles in the region, and constrain to be below a given threshold for ; (iii) jointly minimize the -distance from the desired pattern and the -norm in the rejection region: for some small and. Fig. 4(a) shows a fifth order approximation of the ID directivity specified by (14), obtained using method (i). A sixth oder approximation of the TILD directivity in (22), obtained using method (ii) for cm, is shown in Fig. 4(b). Although high-order microphones are or may soon become feasible owing to recent advances in the field [19], [20], [22], it is of practical interest to consider low-order approximations of desired directitivity patterns. The time-intensity TID directivity (20) for cm can be well approximated by the second-order pattern with coefficients. This approximation is obtained using optimization method (iii) with and, and is shown in the same plot with the sixth-order TILD pattern. The above optimization criteria do not restrict explicitly to be positive for all, still the amplitudes of negative lobes are negligible. Fig. 4. Polar plots of proposed directivity patterns as described in Sections III and IV, and equivalent patterns of matching and in-phase higher-order Ambisonics (HOA). Patterns shown in (a) and (b) are designed for coincident and non-coincident arrays, respectively. Parameter denotes pattern order, with indicating the exact desired design. Fig. 4 shows also other directivity patterns used in the subjective listening tests reported in the paper. These include the original Johnston s PSR directivity [15], the specifications of which can be satisfied exactly by the second-order pattern with coefficients. Horizontal higher order ambisonics (HOA) [8] is also included in the subjective tests. The pentagonal loudspeaker array is optimal for reconstruction of first and second-order circular harmonics [8]. This decoding criterion is commonly referred to as mode matching and is equivalent to a coincident array of second-order microphones, with directivity patterns specified by coefficients. In-phase decoding is an alternative ambisonics implementation that provides a larger sweet spot at the expense of poorer localization accuracy [7]. In the second-order case, this trade-off is achieved by the directivity pattern with coefficients. It is instructive to briefly consider these directivity patterns in the context of reproduction of active intensity fields corresponding to monochromatic plane waves, as discussed in Section II. Fig. 5 shows active intensity fields rendered by the considered system with five channels, and recorded using microphone arrays with PSR, TID, ID and HOA-matching directivity patterns. One can observe the benefitofhigherorder

7 DE SENA et al.: ANALYSIS AND DESIGN OF MULTICHANNEL SYSTEMS FOR PERCEPTUAL SOUND FIELD RECONSTRUCTION 1659 Fig. 5. Active intensity vector plots around the center of a loudspeaker array as described in Fig. 2. The different mutichannel systems aim at reproducing a monochromatic plane wave of frequency, incident from direction. patterns, in particular the exact specifications (infinite order) of ID and TID patterns in terms of rendering spatially more uniform active intensity fields. It is interesting also to observe that among the second-order patterns, the TID directivity seems to yield more uniform active intensity fields than the original PSR and the HOA-matching patterns. In the next two sections, the performance of the above design examples is evaluated by means of formal listening tests. Section V is focused on the angular accuracy of rendered auditory events, while Section VI studies their perceived locatedness. V. PERCEPTUAL EVALUATION LOCALIZATION The tests reported in this section aim to assess the error between the actual and perceived directions of sound sources as rendered by different systems. 1) Experimental Setup: The subjects were seated in an acoustically isolated sound booth ms of size m m m. The test setup, illustrated in Fig. 6, had two components. The first consisted of five MACKIE HR824 active monitor loudspeakers equally spaced on a circle of radius 2 m. This system was used to play back stimuli synthesized for the tested multichannel systems. The second component consisted of eight Genelec 6010 loudspeakers positioned between two adjacent channels of the five-channel system with 8 separation. These loudspeakers were used as acoustic pointers. All loudspeakers were calibrated to a nominal level of dba [10], [37]. This nominal level is known to be somewhat uncomfortable for subjects [37, p. 287], and was therefore reduced by 3 db. All the loudspeakers were positioned at the ear level facing the subject. The subjects responses were recorded using a specially designed graphical user interface, which was displayed on a monitor placed in front of them. The subjects were instructed to face the monitor, but their heads were not physically restrained. 2) Methodology and Stimuli: Gains and delays of each microphone corresponding to simulated sources in the far-field, were calculated for eight directions corresponding to the actual directions of the acoustic pointers. The subjects task was to listen to the simulated free-field recording over the five-channel system and respond by listening to and selecting the acoustic Fig. 6. The test setup for the localization and locatedness tests. The large white loudspeakers constitute the five-channel reproduction system. The square markers indicate the position of the 8 acoustic pointers used in the localization test. The round hollow markers represent the 3 acoustic pointers used in the locatedness test and positioned in directions,which are playing the unprocessed signal as described in VI-A-2. Three considered seating orientations in the center are denoted by arrows,,and.the off-center position is located at m. pointer which is closest to the perceived direction of the auditory image. This methodology is equivalent to the source identification method [38]. Three seating orientations in the center,,,and, and one position 30 cm off-center,asshown in Fig. 6, were used to test localization performance. Windowed white Gaussian noise of 0.1 s duration was used as a stimulus, and it was generated at each trial in order to eliminate bias due to fixed stimulus spectrum. The noise stimulus was selected for its wide frequency content, providing strong ILD and ITD cues [39]. A Tukey window with 30% taper ratio was used to reduce the effect of loudspeakers transient response. The sampling frequency was 44.1 khz. A. Preliminary Test 1) Subjects: A preliminary experiment was carried out with six subjects (5 male and 1 female) with no reported hearing impairments. Three of the subjects were the authors of this paper. 2) Multichannel Systems Considered: In the preliminary tests we studied microphone arrays of radius 15.5 cm with the PSR,TILD,andID, directivity patterns. HOA-in-phase was also included in this test. It was the only methodology that did not make use of ICTDs.

8 1660 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2013 Fig. 7. Results of preliminary localization test mean response angle as a function of stimulus angle averaged across all subjects, for the front-seating direction. Stimuli and response angles are relative to the listener facing direction. The error bars show the 95% confidence intervals. Note, that the ID directivity is derived assuming a coincident microphone array, and is therefore overly selective for the near-coincident 15.5 cm array. We used it in this preliminary test with the 15.5 cm array to assess merits of employing what according to psychoacoustics would be correct time and level differences, as well as the sensitivity of the system to using their precise values. Considering one additional system, ID, which deviates in the opposite direction from the proposed TILD directivity than the PSR design, enables making more credible conclusions about the investigated issues. 3) Results: Altogether, 90 responses were collected for each investigated system and for each stimulus direction. In Fig. 7 the mean responses are shown as a function of the stimulus direction. It can be observed that the localization accuracy in the frontal direction obtained with the TILD directivity is not only the highest of all tested systems on average, but is also the highest for each stimulus direction. It can be further observed that in the case of the system with the ID directivity pattern, the auditory images are not rendered uniformly across the span, but tend to concentrate at the loudspeaker closer to the stimulus direction. The opposite effect can be observed with HOA-in-phase and PSR systems, i.e. auditory images concentrate around the middle of the range. 4) Discussion: These results can be explained based on considerations in Section III-B. It follows from (17), that for the five-channel system with cm, the ICTD of a stimulus in the direction of one of the speakers is ms. Frannsen s curves in Fig. 3 indicate that the level difference between the two speakers needed in combination with this time difference to create an auditory event in the direction of one of the speakers is 9.60 db, and thatiswhatthetilddesignaimsto achieve. The corresponding level differences of the PSR is pattern is 3 db, which is insufficient to achieve the desired source displacement from the center. On the other hand, the level difference of the ID design for the same direction is db, which is in excess of what is needed for the desired displacement, and is a consequence of the fact that the ID pattern is designed so as to provide desired source auralization without inter-channel time differences. Notice from Fig. 4 that the HOA-in-phase provides higher level differences than the PSR pattern. Even so, due to the absence of time differences, it renders auditory images closer to the center. Finally, from Fig. 3 it can be observed that with an ICLD of 3 db, the ICTD needed to create an auditory image in the direction of one of the loudspeakers is approximately 2 ms, which corresponds to m. This explains the observation made in [40] that PSR arrays with larger radii resulted in higher subjective ratings. The same observation was also shared by J. D. Johnston in a personal communication. The results shown in Fig. 7 demonstrate that fine tuning of time and level differences does matter for rendering accurate auditory perspective, but the graceful degradation of the performance as the directivity departs from the TILD towards the ID and PSR patterns suggests that the technology is not very sensitive to deviations of its parameters from ideal ones. In the next subsection we present results of our main listening test which includes HOA and the coincident version of the ID array, that both achieve high accuracy for a listener in the center of the loudspeaker array. B. Main Test 1) Subjects: Sixteen naïve [41] subjects (13 male and 3 female), with no reported hearing impairments participated in the main test. All subjects but one were students aged years old. 2) Multichannel Systems Considered: Four multichannel methodologies were included in this localization test. One of them is the second-order TID approximation presented in Section IV. Two versions of second-order ambisonics, i.e. HOA-in-phase, and a state-of-the-art HOA implementation, were also considered. Although a HOA standard is yet to be defined, literature suggests that there is a general understanding of what its state-of-the-art implementation should involve [8], [9], [42], [43]. First, mode-matching decoding is applied at low frequencies [8]. At high-frequencies, physical reconstruction becomes infeasible, therefore different criteria should be employed. Poletti suggests weighting circular harmonic components using a Kaiser window [8]. Daniel et al. propose alternative formulas which maximize the so-called energy vector [9]. This latter approach is preferred here for its compliance with Ambisonics first-order version originally proposed by Gerzon [44]. According to [9], the cross-fading frequency between mode-matching and maximum energy decoding is set to 1.2 khz. The cross-fading filters are designed as phase-matched second-order IIR filters as described in [42]. To compensate for the finite distance of loudspeakers, the near-field correction described in [43] is also implemented. The above three multichannel methodologies all use second-order directivity patterns. The ID directivity is too selective to be well approximated by low order patterns. Hence, it is implemented in its exact formulation (14), and serves as an additional benchmark illustrating the performance of intensity-based systems, i.e. coincident arrays, and the equivalent tangent panning law. Note, however, that high order microphones are more challenging to design and build [19], [20], [22], and the results should be interpreted accordingly. 3) Methodology: The test methodology used in the preliminary test was amended in two ways to elicit more information from subjects. Firstly, subjects could also report whether an

9 DE SENA et al.: ANALYSIS AND DESIGN OF MULTICHANNEL SYSTEMS FOR PERCEPTUAL SOUND FIELD RECONSTRUCTION 1661 Fig. 8. Results of the main localization test mean responses with 95% confidence intervals as a function of stimulus angle, for (a) the front-seating direction, (b) side-seating direction, (c) back-seating direction. The bar plots indicate the percentage of times subjects chose the leftward and rightward buttons, as described in Section V-B-3. event was perceived to the left or to the right of the pointer array. Specifically, they were instructed to choose the leftward button when the event was beyond the midline between the last pointer on their left and speaker (see Fig. 6), and the angle associated with these responses was the direction of speaker. The case of auditory events perceived to the right of the pointer array was treated analogously. Secondly, subjects were asked to make open comments at the end of each test block. 4) Results and Discussion: Altogether, 64 responses were collected for each investigated system and for each stimulus direction. The mean angular responses are shown in Fig. 8(a), along with the percentage of leftward and rightward decisions. HOA-in-phase has high localization error. This is consistent with the findings of the preliminary test. All other systems give very low errors, with HOA being the smallest. Considering that the localization blur of the auditory system is in the 1 4 range [4], the differences between TID, ID and HOA are not perceptually significant in the frontal seating position. All subjects reported that whenever they used the extreme buttons the auditory event was perceived at the directions of speakers or, or in their close proximity. When asked for open comments, one reported some front-back confusions [4], while two reported it was harder to judge events in directions close to the midline between and, which was also observed in [39]. Observe that the TID system exhibited a slight bias toward the midline. This is likely due to the insufficiency of employed level differences, since in the absence of psychoacoustic laws governing source auralization over the 72 range, we designed all considered time-intensity systems using Franssen s curves that are obtained in measurements using the 60 set-up. On the other hand, the ID system tends to pull auditory events closer to loudspeakers (the so-called detent effect [29], [45]), and yields the highest number of extreme decisions. This effect is in agreement with the fact that the tangent law in (14) uses extreme level differences, and it can be probably corrected by either using an intensity law which exhibits a faster change at end points and, or by amending it according to (20) with selected so that smaller or minimal level differences are used at the end points. In the side and back seating directions, only TID and HOA were tested in order to maintain moderate test duration. TID and HOA were chosen because they yield low angular error in the frontal direction with second-order designs. Results for the side seating direction show higher error and variability of responses, reflecting the lower localization accuracy of the auditory system in this region [4]. Fig. 8(b) shows that responses are biased towards the loudspeaker. Most subjects reported using the leftward button when the auditory event was located at or in its close vicinity, while two subjects perceived some events somewhere between and. In the seating position the localization accuracy of both HOA and TID improves as compared with, as shown in Fig. 8(c). Three subjects reported some front/back confusions. HOA achieves lower error than TID for both and orientations, the difference being comparable to the localization blur in corresponding sectors [4]. The TID directivity is designed based on frontal time-intensity trading curves, hence as it could be expected, the system has good performance for frontal sources, while being inferior to HOA for sound sources coming from the sides and the back. In applications with no preferred orientation, localization accuracy in the frontal direction will be more important, given also that the spatial resolution of the auditory system is highest in the frontal sector. If, on the other hand, the system is designed for reproduction in a situation where listener s orientation is fixed and known, the localization accuracy of time-intensity systems on the sides and in the back can be improved by designing direc-

10 1662 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2013 Fig. 9. Results of the main localization test mean responses with 95% confidence intervals as a function of stimulus angle, for the position. tivity patterns of side and back microphones according to corresponding time-intensity curves [39]. Fig. 9 shows results obtained for the off-center seating position. A strong shift towards the direction of the closest active loudspeaker is present, which could be expected [46]. Most of the responses for stimulus angles between and were at or beyond the last pointer. HOA-in-phase is more accurate in this region, possibly due to its lower inter-channel level differences. As the stimulus angle shifts to the right, the other methodologies outperform HOA-in-phase, particularly the ID system. Subjects reported a significant amount of leftward decisions for HOA, even for stimulus directions close to the midline. Most subjects reported using the leftward button when the auditory event was perceived in directions between the loudspeaker and the midline between and. Four subjects reported that auditory events perceived in directions near the midline between and were blurry or hard to localize. This is in agreement with results of the locatedness test presented in the Section VI. VI. PERCEPTUAL EVALUATION LOCATEDNESS An important perceptual attribute of multichannel reproduction is the locatedness of phantom sources, defined as the degree of certainty about the location of auditory events [39]. In this section we present results of a listening test which evaluates considered systems in terms of the locatedness of auditory events they produce. 1) Subjects: Twelve of the sixteen participants of the test described in Section V-B also took the locatedness test around a month after the first test. An additional seven naïve [41] subjects were recruited for this test. The subject group thus consisted of nineteen listeners (17 male and 2 female). 2) Methodology and Stimuli: Subjects responded to the questions How well can you assign a particular direction to the perceived source? [47], How certain are you of the direction of the source? [39] with a score on a continuous scale from 0 to 100. The scale was divided into five equal intervals labeled as I am certain, I have a slight doubt, I have a doubt, I am really not sure and I have no idea, as suggested in [39]. Subjects were instructed to ignore any other audible attribute, such as pitch, tonal coloration and, more importantly, the specificperceived source direction. The studied surround systems were compared directly, in a manner similar to MUSHRA tests [48]. Two additional sounds with known characteristics were also included among the systems to grade. The first was the unprocessed signal played by a MACKIE HR824 loudspeaker positioned in the direction of the emulated plane wave, which will be referred to as real. The second, which will be referred to as diffuse anchor, was an approximation of a diffuse sound field obtained by playing over all the 5 channels the unprocessed signal convolved with uncorrelated 10 ms long sequences [49]. The subjects did not know which system they were grading at any time, and the presentation order was randomized at each iteration. The same experiment was repeated under different conditions by varying the seating position, plane wave incident direction, and sound excerpt. The central seating position and offcenter position were used. The surround systems were simulated to reproduce virtual sources in the far field at three different directions,asdepictedinfig.6.inthe central position, only the frontal and right directions were investigated. Three anechoic excerpts from Bang & Olufsen Music for Archimedes CD were used as representatives of common program material female speech, African bongo and cello [39]. The excerpts were faded off after around 5 s. To avoid bias due to persistence effects [4], listeners could not switch between presentations before the whole excerpt was played. Each subject ran a training session before the actual test, which allowed them to listen to all possible sound excerpts and to familiarize themselves with the grading system. 3) Results and Discussion: A pilot experiment showed that a higher uncertainty was associated with responses in the offcenter position in the case of right and frontal incident directions. Therefore, a higher number of repetitions were used for these conditions. Altogether, each system was graded 36 times for the right direction and 30 time for the frontal direction in the off-center position. In the remaining three cases, i.e. the left direction in the off-center position and the two directions in the central seating position, each system was graded 27 times in total. Results of the locatedness test are shown in Fig. 10. As expected, the real and diffuse anchors have under all conditions the highest and lowest mean scores, respectively. In the central position all surround methodologies have high mean scores in both the frontal and right directions. In the off-center position, the following observations can be made. For subjects localize most auditory events at or near (see Section V), and locatedness scores are still high. Paired Student t-tests [50] reveal that HOA-in-phase has significantly lower mean locatedness score than both TID and ID.For the middle and right directions in the off-center position, HOA has significantly lower mean scores than TID ( and

11 DE SENA et al.: ANALYSIS AND DESIGN OF MULTICHANNEL SYSTEMS FOR PERCEPTUAL SOUND FIELD RECONSTRUCTION 1663 Fig. 10. Subjective assessment of locatedness showing mean response scores and 95% confidence intervals of experiments for seating positions or refer to Fig. 6 and three incident directions., respectively) and the other two systems at the 5% significance level. Interestingly, for, TID has significantly higher mean score than ID ( ). Note that high locatedness scores are not necessarily preferable over low ones. In fact, depending on the intended application, a sound engineer could favor systems which blur the acoustic perspective [39]. However, the variability of mean scores of HOA indicates that the acoustic perspective changes significantly when moving 30 cm from the center of the loudspeaker array. For instance, when reproducing a plane wave incident from the right direction, the mean locatedness score drops from 83.7 to 51.6 (unpaired t-test is significant with ). A similar difference in mean locatedness scores is observed for a listener in the off-center position between left and middle incident directions, and right and middle incident directions. Results in Fig. 10 show that TID is the system with the most uniform mean scores across incidence directions and seating positions. Note that results of our tests disagree with general observations made in earlier studies by Lipshitz and Linkwitz [51], [52], that recommend against non-coincident methods (especially those with very large inter-microphone distances) because they are considered to produce a higher phasiness and imaging blur, or, equivalently, inferior locatedness. This disagreement could be a results of differences in time-intensity systems considered in this work and these previous studies. Reconciling these differences requires further research. VII. CONCLUSIONS We presented an analysis of circular microphone arrays in the context of panoramic audio recording and reproduction. The analysis was first based on the concept of active intensity and was focused on the performance of arrays in recording and reproduction of monochromatic plane waves. The analysis showed that cross-channel terms have a detrimental effect on the direction and the uniformity of reproduced sound fields, leading to the conclusion that using not more than two active channels for rendition of plane waves reduces spatial fluctuations and error of the reproduced sound fields. This analysis is then refined to include psychoacoustic phenomena, leading to a methodology for the design of circular microphone arrays for panoramic recording of acoustic events. As a result, an intensity based design named ID is first proposed. This approach was then generalized to a broader class of arrays named TID that capture inter-channel time and level differences needed for accurate directional reproduction of all sources and reflections of a recorded event. Design examples were given for the case of pentagonal microphone and loudspeaker arrays. Subjective listening tests were carried out in order to evaluate the performance of the proposed designs in comparison with several other multichannel systems. These systems included the PSR array proposed in [15], and two second-order ambisonics versions, HOA and HOA-in-phase. Results of a localization experiment showed that angular error in front of the listener is high for PSR and HOA-in-phase, while HOA, ID and TID all achieve high accuracy. HOA is more accurate than TID for sources at the side and back of the listener, the difference being comparable to the localization blurs in these directions [4]. The localization accuracy of the ID and TID methods degrade more gracefully in an off-center position, than HOA, with the ID system degrading the least. However, as opposed to the other considered systems, ID cannot be approximated effectively with second-order patterns, and therefore requires more sophisticated microphones. Results of a locatedness experiment have shown that HOA degrades the most at a position 30 cm off-center, while TID yields most consistent responses across seating positions and source incidence angles. The present study did not address the issue of optimal array radius, but rather proposed a methodology of shaping microphone directivity given array radius and the number of channels to render satisfactory auditory perspective. This leaves array radius as a free parameter that can be used to optimize other perceptual criteria or possibly meet some practical implementation requirements. For play-back systems with aprioriknown fixed listener s orientation, the localization performance of time-intensity systems on the sides and to the back can be probably improved by making use of psychoacoustic curves corresponding to those directions. However, such further optimization of the time-intensity approach along with optimization of array radius are a matter of future research.

12 1664 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2013 ACKNOWLEDGMENT The authors would like to thank Yaqub Alwan, James Hall, James Johnston, Francis Rumsey, and Peter Sollich for insightful discussions on topic and all the volunteers for participating in the listening tests. REFERENCES [1] H. HacıhabiboğluandZ.Cvetković, Panoramic recording and reproduction of multichannel audio using a circular microphone array, in Proc. IEEE Workshop Appl. Signal Process. Audio and Acoust. (WASPAA 09), New Paltz, NY, USA, Oct. 2009, pp [2] E. de Sena, H. Hacıhabiboğlu, and Z. Cvetković, Perceptual evaluation of a circularly symmetric microphone array for panoramic recording of audio, in Proc. 2nd Int. Symp. Ambison., Spher. Acoust., Paris, France, [3] H. Hacıhabiboğlu,E.DeSena,andZ.Cvetković, Design of a circular microphone array for panoramic audio recording and reproduction: Microphone directivity, in AES 128th Conv., London, U.K., May 2010, Preprint #8063. [4] J. Blauert, Spatial Hearing: The Psychophysics of Human Sound Localization. Cambridge, MA, USA: MIT Press, [5] W. G. Gardner, 3-D Audio Using Loudspeakers. Norwell, MA, USA: Kluwer, [6] M.M.Boone,U.Horbach,andW.P.J.Bruijn, Spatial sound-field reproduction by wave-field synthesis, J. Audio Eng. Soc., vol. 43, no. 12, pp , Dec [7] J. Daniel, Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia, Ph.D. dissertation, Univ. of Paris VI, Paris, France, [8] M. A. Poletti, A unified theory of horizontal holographic sound systems, J. Audio Eng. Soc., vol. 48, no. 12, pp , Dec [9] J. Daniel, J. Rault, and J. Polack, Ambisonics encoding of other audio formats for multiple listening conditions, in AES 105th Conv., San Francisco, CA, USA, Sep. 1998, Preprint #4795. [10] ITU-R, Rec. BS [11] F. Rumsey, Spatial Audio. Oxford, U.K.: Focal Press, [12] J. Eargle, The Microphone Book. Oxford, U.K.: Focal Press, [13] M. Williams and G. Le Du, Microphone array analysis for multichannel sound recording, in AES 107th Conv., NewYork,NY,USA, Sep. 1999, Preprint #4997. [14] J. D. Johnston and Y. H. Lam, Perceptual soundfield reconstruction, in AES 109th Conv., Los Angeles, CA, USA, Sep. 2000, Preprint #2399. [15] J. D. Johnston and E. R. Wagner, Microphone array for preserving soundfield perceptual cues, U.S. patent, US 6,845,163 B1, Jan [16] G. L. Rosen and J. D. Johnston, Automatic speaker directivity control for sound field, in Proc. AES 19th Int. Conf., Schloss Elmau, Germany, Jun [17] G. Theile, Multichannel natural music recording based on psychoacoustic principles, in Proc. AES 19th Int. Conf., SchlossElmau,Germany, Jun [18] A. Laborie, R. Bruno, and S. Montoya, High spatial resolution multichannel recording, in Proc. AES 116th Conv.,Berlin,Germany,2004, Preprint #6166. [19] J. Meyer and G. W. Elko, A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield, in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process. (ICASSP 93), Minneapolis, MN, USA, April [20] E. De Sena,H.Hacıhabiboğlu, and Z. Cvetković, On the design and implementation of higher order differential microphones, IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp , Jan [21] B. Rafaely, Design of a second-order soundfield microphone, in Proc. AES 118th Conv., Barcelona, Spain, May 2005, Preprint #6405. [22] S. Doclo and M. Moonen, Design of broadband beamformers robust against gain and phase errors in the microphone array characteristics, IEEE Trans. Signal Process., vol. 51, no. 10, pp , Oct [23] R. D. Heyser, Instantaneous intensity, in Proc. AES 81st Conv., Los Angeles, CA, USA, Nov. 1986, Preprint #2399. [24] J. Nelder and R. Mead, A simplex method for function minimization, Comput. J., vol. 7, no. 4, pp , Jan [25] H.-K. Lee and F. Rumsey, Investigation into the effect of interchannel crosstalk in multichannel microphone technique, in Proc. AES 118th Conv., Barcelona, Spain, May 2005, Preprint #6405. [26] J. Borenius, Moving sound image in the theaters, J. Audio Eng. Soc., vol. 25, no. 4, pp , [27] V. Pulkki, Virtual sound source positioning using vector-base amplitude panning, J. Audio Eng. Soc., vol. 45, no. 6, pp , Jun [28] H. Teutsch, Modal Array Signal Processing: Principles and Applications of Acoustic Wavefield Decomposition. New York, NY, USA: Springer, [29] S.-L.Lee,K.-Y.Han,S.-R.Lee,andK.-M.Sung, Reductionofsound localization error for surround sound system using enhanced constant power panning law, IEEE Trans. Consum. Electron., vol. 50, no. 3, pp , Aug [30] B. Bernfeld, Attempts for better understanding of the directional stereophonic listening mechanism, in Proc. AES 44th Conv., Rotterdam, The Netherlands, Mar. 1973, Preprint #C-4. [31] N. V. Franssen, Stereophony. Eindhoven, The Netherlands: Philips Research Laboratories, [32] H. Fletcher, Speech and Hearing in Communication. NewYork,NY, USA: van Nostrand, [33] P. A. Ratliff, Properties of hearing related to quadraphonic reproduction, Research Dept., BBC, 1974, Tech. Rep.. [34] Y. Ando and K. Kurihara, Nonlinear response in evaluating the subjective diffuseness of sound fields, J. Acoust. Soc. Amer., vol. 80, no. 3, pp , Sep [35] W. Gaik, Combined evaluation of interaural time and intensity differences: Psychoacoustic results and computer modeling., J. Acoust. Soc. Amer., vol. 94, no. 1, pp , Jul [36] E. De Sena, H. Hacıhabiboğlu, and Z. Cvetković, Design of a circular microphone array for panoramic audio recording and reproduction: Array radius, in Proc. AES 128th Conv., London, U.K., May 2010, Preprint #8064. [37] S. Bech and N. Zacharov, Perceptual Audio Evaluation: Theory, Method and Application. New York, NY, USA: Wiley, [38] W. Hartmann, B. Rakerd, and J. Gaalaas, On the source-identification method, J. Acoust. Soc. Amer., vol. 104, pp , Dec [39] L. S. R. Simon and R. Mason, Time and level localization curves for a regularly-spaced octagon loudspeaker array, in Proc. AES 128th Conv., London, U.K., May 2010, Preprint #8079. [40] J. Hall and Z. Cvetković, Coherent multichannel emulation of acoustic spaces, in Proc. AES 28th Int. Conf., Piteå, Sweden, Jun [41] ISO, , Sensory Analysis General Guidance for the Selection, Training and Monitoring of Assessors Part 2: Experts, [42] A. Heller, R. Lee, and E. Benjamin, Is my decoder ambisonic, in Proc. AES 125th Conv., London, UK, Oct. 2008, Preprint #7553. [43] J. Daniel, Spatial sound encoding including near field effect: Introducing distance coding filters and a viable, new ambisonic format, in Proc. AES 23rd Int. Conf., Copenhagen, Denmark, May [44] M. Gerzon, General metatheory of auditory localisation, in Proc. AES 92nd Conv., Vienna, Austria, Mar. 1992, Preprint #3306. [45] S. Jeon, Y. Park, S. Lee, and D. Youn, Virtual source panning using multiple-wise vector pase in the multispeaker stereo format, in Proc. 19th Eur. Signal Process. Conf. (EUSIPCO 11), Barcelona, Spain, Sep. 2011, pp [46] J. Ródenas, R. Aarts, and A. Janssen, Derivation of an optimal directivity pattern for sweet spot widening in stereo sound reproduction, J. Acoust. Soc. Amer., vol. 113, pp , [47] H. Wittek, F. Rumsey, and G. Theile, Perceptual enhancement of wavefield synthesis by stereophonic means, J. Audio Eng. Soc., vol. 55, no. 9, pp , Sep [48] ITU-R, Rec. BS , [49] G. Kendall, The decorrelation of audio signals and its impact on spatial imagery, Comput. Music J., vol. 19, no. 4, pp , [50] T. Sporer, J. Liebetrau, and S. Schneider, Statistics of MUSHRA revisited, in Proc. AES 127th Conv., New York, NY, USA, 2009, Preprint #7825. [51] S. Lipshitz, Stereo microphone techniques: Are the purists wrong?, J.AudioEng.Soc, vol. 34, no. 9, pp , [52] S. Linkwitz, A model for rendering stereo signals in the itd-range of hearing, in Proc. AES 133rd Conv., San Francisco, CA, USA, 2012, Preprint #8713.

DE SENA et al.: ANALYSIS AND DESIGN OF MULTICHANNEL SYSTEMS FOR PERCEPTUAL SOUND FIELD RECONSTRUCTION 1665 Enzo De Sena (S 11) was born in Napoli, Italy, in 1984. He received the B.Sc.

degree in electronic engineering at King s College London, London, U.K., and is also serving as a Teaching Fellow at the same university.

(honors) degree from the Middle East Technical University (METU), Ankara, Turkey, in 2000, the M.Sc. degree from the University of Bristol, Bristol, U.K.

13 DE SENA et al.: ANALYSIS AND DESIGN OF MULTICHANNEL SYSTEMS FOR PERCEPTUAL SOUND FIELD RECONSTRUCTION 1665 Enzo De Sena (S 11) was born in Napoli, Italy, in He received the B.Sc. degree in 2007 and M.Sc. degree (cum laude) in 2009 both from the Università degli Studi di Napoli Federico II, Napoli, Italy, in telecommunications engineering. He is currently pursuing the Ph.D. degree in electronic engineering at King s College London, London, U.K., and is also serving as a Teaching Fellow at the same university. His research interests include multichannel audio, spatial hearing, room acoustics simulation, and microphone array processing. Hüseyin Hacıhabiboğlu (S 96 M 00 SM 12) received the B.Sc. (honors) degree from the Middle East Technical University (METU), Ankara, Turkey, in 2000, the M.Sc. degree from the University of Bristol, Bristol, U.K., in 2001, both in electrical and electronic engineering, and the Ph.D. degree in computer science from Queen s University Belfast, Belfast, U.K., in He held research positions at University of Surrey, Guildford, U.K. ( ) and King s College London, London, U.K. ( ). Currently, he is an Assistant Professor and Head of Department of Modelling and Simulation at Graduate School of Informatics, Middle East Technical University, Ankara, Turkey. His research interests include audio signal processing, room acoustics, multichannel audio systems, psychoacoustics of spatial hearing, microphone arrays, and game audio. Dr. Hacıhabiboğlu is a member of the IEEE Signal Processing Society, Audio Engineering Society (AES), Turkish Acoustics Society (TAD), and the European Acoustics Association (EAA). Zoran Cvetković (SM 04) received the Dipl.Ing.El. and Mag.El. degrees from the University of Belgrade, Belgrade, Yugoslavia, in 1989 and 1992, respectively, the M.Phil. degree from Columbia University, New York, in 1993, and the Ph.D. degree in electrical engineering from the University of California, Berkeley, in He held research positions at EPFL, Lausanne, Switzerland (1996), and at Harvard University, Cambridge, MA ( ). From 1997 to 2002, he was a Member of Technical Staff at AT&T Shannon Laboratory. He is now a Professor in Signal Processing at Kings College London, London, U.K. His research interests are in the broad area of signal processing, ranging from theoretical aspects of signal analysis to applications in telecommunications, audio and speech technologies, and biomedical engineering.

University of Huddersfield Repository

University of Huddersfield Repository Lee, Hyunkook Capturing and Rendering 360º VR Audio Using Cardioid Microphones Original Citation Lee, Hyunkook (2016) Capturing and Rendering 360º VR Audio Using Cardioid