A Comparative Study of the Performance of Spatialization Techniques for a Distributed Audience in a Concert Hall Environment

Size: px

Start display at page:

Download "A Comparative Study of the Performance of Spatialization Techniques for a Distributed Audience in a Concert Hall Environment"

Kelley Patrick
5 years ago
Views:

1 A Comparative Study of the Performance of Spatialization Techniques for a Distributed Audience in a Concert Hall Environment Gavin Kearney, Enda Bates, Frank Boland and Dermot Furlong 1 1 Department of Electronic & Electrical Engineering, Trinity College Dublin, Ireland Correspondence should be addressed to Gavin Kearney (gpkearney@ee.tcd.ie) ABSTRACT The performance of various spatialization techniques is evaluated for a distributed audience using the nonideal speaker arrangements found in small concert halls. The methods under evaluation are Second Order Ambisonics, Vector Base Amplitude Panning, Delta Stereophony and Spat with B-format Ambisonics encoding. Each method is assessed in terms of its localization accuracy. The data is presented by comparison of empirical high resolution binaural measurements and perceptual listening tests to simulations of the speaker arrangements in an equivalent acoustically modelled environment. 1. INTRODUCTION The presentation of spatialized audio to a distributed audience presents significant challenges to contemporary composers and audio engineers. While numerous schemes exist for high resolution multi-channel spatialization, there is a distinct lack of research on the effect of a real listening environment on these techniques. In such environments, the presence of reverberation and reflections can impact negatively on localization accuracy [1, 2]. Furthermore, spatialization schemes which are optimised for central listening positions can result in a compromised perception of the source for off-centre listeners, the extent of which has yet to be fully determined. Finally, the symmetrical loudspeaker arrangements recommended for most spatialization techniques can be difficult to implement in a standard rectangular concert hall without greatly reducing the size of the audience area. These issues create significant challenges when transferring spatialized audio composed in a studio, to a performance setting in a concert hall. In this paper we present the results of a comparative study on the localization performance of various spatialization schemes, for a distributed audience, in a small concert hall. The spatialization schemes are first evaluated in terms of subjective localization performance using perceptual listening tests conducted in a real hall. These tests are then compared to empirical binaural measurements at each listener position. The results of the subjective listening tests and objective measurements are then investigated through the use of an equivalent acoustically modelled environment. We approach this analysis with a brief review of human auditory localization in reverberant environments, followed by a succinct summary of the assessed spatialization schemes Auditory Localization Human auditory localization can be broken down into three main categories, namely directional hearing in the horizontal plane, directional hearing in the vertical plane, and distance hearing [3]. Here we will limit our discussion to the horizontal plane, as this is particularly relevant for most spatial music presentations. Two main cues dominate localization in the horizontal plane, namely interaural level differences (ILD) which arise from the shadowing effect of the head, and interaural time differences (ITD) which arise from the spatial separation of the two ears [4]. Variations in localization accuracy can occur for different source stimuli in the free field. This has been well documented in [3, 5, 6, 7]. Numerous studies have demonstrated that the region of most precise spatial hearing lies in the forward direction with frontal hearing having an accuracy of between 4.4 and 1 for different signal types [3]. Localization ability decreases as the source azimuth moves to the sides, with the localization blur at ±9 being between three to ten times its value for the forward direction. For sources to the rear of the listener, localization blur improves somewhat but is still 1

2 approximately twice that for frontal sources. It is expected therefore that the ideal localization performance of spatialization systems would follow a similar trend Spatialization Techniques In this paper, we assess the localization performance of Second Order Ambisonics, Vector Base Amplitude Panning (VBAP), the Delta Stereophony System (DSS), and Spat with B-format Ambisonics encoding. Ambisonics is a complete set of techniques for recording, manipulating and synthesizing artificial sound fields [8] which has been regularly used in spatial music and theatre for the past three decades. The theoretical background to this technique has been widely discussed and excellent overviews can be found in [8, 9]. Although Ambisonics was originally developed as a point source solution for a single listener, its recent extension to higher orders of spherical harmonics has suggested that the listening area can be extended significantly. For the tests presented in this paper, Second Order Ambisonics encoding and decoding was carried out using the set of externals for Max MSP developed by the Institute for Computer Music and Sound Technology (ICST) [1]. These externals are based on a Csound implementation of Ambisonics created by David Malham of York University, who also published one of the few papers on large area Ambisonics systems [11]. When implementing Ambisonics, the directional response pattern of the generated soundfield can be narrowed or widened according to the needs of the room acoustics, the position of the speakers and the required listening area. This is achieved by adjusting the weighting of the different order components during the decoding process. Prior to the formal listening tests presented in Section 4, the authors experimented with different weightings, using the three schemes listed below in Table 1 as a starting point. The Furse-Malham matched weighting increases the directional response which is optimal for a single listener but produced poor results in the test environment. Controlled opposite, or in-phase decoding, reduces the directional response and produces a more diffuse soundfield. Although this is often recommended for large listening areas, this weighting produced very poorly localized sources. Overall, the best localization was achieved using a ratio that lies between matched and controlled opposites decoding (entry three in Table 1). VBAP is a generic method for virtual source positioning developed by Ville Pulkki [12]. This vector-based reformulation of the amplitude panning method can be used Weighting Scheme Furse-Malham Set In-Phase Malham Order st Order nd Order Table 1: Ambisonic decoder weighting schemes. to extend the basic stereophonic principle to an arbitrary number of loudspeakers. Once the available loudspeaker layout is defined, virtual sources are positioned by simply specifying the source azimuth. If a virtual source is panned to the same direction as any of the loudspeakers, then only that loudspeaker will be used. If a source is panned to a point between two loudspeakers then only those two loudspeakers will be used to produce the virtual source using the tangent panning law. The flexibility of VBAP gives it a distinct advantage over other amplitude-panning based spatialization schemes and its straightforward implementation and scalability make it an attractive solution for spatialization using amplitude panning. Spat is a real-time modular spatial-sound-processing system developed by IRCAM and Espaces Nouveaux for the Max MSP environment [13]. The system allows for the positioning and reverberation of audio sources in three dimensions using a high level control interface based on a number of perceptual parameters. The design of Spat is largely based on the spatial processing algorithms developed by Chowning and Moore in the seventies and eighties [14] [15]. The supplied output module can be configured for reproduction over loudspeakers using standard stereophony, discrete intensity panning over various multichannel loudspeaker configurations, Ambisonics B- format encoding or binaural encoding for reproduction over headphones. In this case, the Ambisonics B-Format scheme was chosen to position the direct sound at the required locations around the test audience. The level of artificial reverberation generated by Spat was minimised due to the pre-existing reverb in the test environment. It should be noted that IRCAM have not released any details of the precise Ambisonics encoding and decoding schemes implemented in Spat. The Delta Stereophony System is a sound reinforcement solution intended to provide correct localization of sound Page 2 of 1

3 sources for a distributed audience. DSS is largely based on the precedence effect and is an approach which ensures that each listener in an auditorium receives the direct sound from the original sound direction first, before that of reinforcement speakers placed around the audience area [16]. It is the opinion of the authors that this approach is highly applicable for spatial audio presentations. In addition, the DSS system aims to provide uniform sound reinforcement levels about the audience area. The advantage of this scheme over other sound reinforcement systems is that DSS ensures that the delayed loudspeakers do not exceed the upper limits of the precedence effect, causing suppression of the direct sound. It is important to note that DSS generally uses monophonic source simulation radiators to reinforce the direct sound [16], ensuring good localization. However, as was shown by Ahnert, [17], it has on occasion been used for the reinforcement of moving sources. In order to compare DSS as a spatialization scheme, we assess here its ability to form phantom images between source radiators. 2. ASSESSMENT PROCEDURE In order to investigate fully the localization performance of spatialization schemes for a distributed audience in a reverberant environment, both subjective and objective assessment procedures are necessary. Specifically, subjective listening tests are required to assess the localization accuracy for different source and listener positions. In addition, binaural recordings are used to assess the non-perceptual effects of the spatialization systems, and a virtual model of the acoustic environment is created to investigate the effect of room acoustics on system performance. The tests presented in this paper were conducted in a small sized concert hall in Trinity College Dublin. The hall has a reverberation time (RT6) of.93 seconds at 1kHz. A loudspeaker array consisting of 16 Genelec 129A loudspeakers was arranged around a 9 listener audience area as shown in Figure 1. A PC utilising MOTU 896 audio interfaces was used to route the audio to the loudspeakers. In these tests, monophonic sources were presented using the 8 black loudspeakers shown in Figure 1 while the 8 gray loudspeakers were used by the various spatialization techniques to generate virtual sources at the same positions. This method will allow for a direct comparison between the localization accuracy for a real source and a virtual source positioned at the same location. 1 second unfiltered recordings of male speech, female speech, Gaussian white noise and music with fast transients were used throughout in order to assess the effect of different spectral and temporal stimuli on localization accuracy. In this analysis, we assess the localization performance of the various schemes through the use of a nonsymmetrical array. Since VBAP and Ambisonics are optimised for symmetrical loudspeaker arrays where each loudspeaker is equidistant from the centre listening position, appropriate gain and delay adjustments are required. This is a general requirement for lateral loudspeakers (speakers 7 and 15 in this case) which are positioned closer to the centre position; an inevitable consequence of attempting to place a circular array in a rectangular room. The appropriate delays were applied to each of the two lateral loudspeakers when encoding the test signals. Each loudspeaker in the array was then calibrated to 7 dba at the centre listening position. This approach is preferable to using the inverse square law when operating in a reverberant acoustic environment, due to the superposition of the direct and reverberant sound fields affecting the total SPL. 18 o 11 9 o o o Fig. 1: Geometry of loudspeaker array and audience area. 3. TOWARDS OBJECTIVE ASSESSMENT Comparison of listening test results to measured nonperceptual binaural data is essential in assessing localization accuracy. While both ITD and ILD cues can be calculated from binaural recordings, the computation of the ILD in a reverberant environment does not give highly accurate estimates of azimuthal angle. This is due to Page 3 of 1

4 the fact that the superposition of the room reflections at the ear gives rise to significant differences of ILD measurements to that of measurements taken in a free-field environment for the same window lengths. The ITD, however, is a more reliable estimate in this regard and is calculated using the normalized interaural cross correlation function (IACF) with the left and right ear signals, x 1 (t)and x 2 (t), given by, IACF(τ) = t2 t 1 x 1 (t)x 2 (t + τ)dt t2 t 1 x1 2(t)dt (1) t 2 t 1 x2 2(t)dt The IACF is a function in the range [-1,1] which gives a measure of the correlation between the received signals in the integration limits t 1 to t 2 as a function of the time delay τ. Therefore the function yields its maximum where τ equals the true delay between x 1 (t) and x 2 (t), i. e., T = arg(max τ [IACF(τ)]) (2) Prior to the normalization of (1) pre-filters may be applied in the frequency domain so as to enhance the observed peak at the true time delay. We investigated the Phase Transform (PHAT) processor as presented by Knapp and Carter [18] in this regard. The results of these tests, which will be presented at a later date, show that the processor improves the accuracy of the ITDs for monophonic sources in a concert hall. It also leads to the derivation of an accurate theoretical head model obtained from empirical binaural measurements for angular mapping of ITD. A sampling rate of 96kHz was utilized to maintain high resolution in the angle estimation. 4. SUBJECTIVE ASSESSMENT A series of listening tests were undertaken using groups of nine test subjects. Each group was presented with virtual and monophonic sources from pseudo-random positions located about the speaker array and were then asked to identify the location of the sources via a questionnaire running concurrently with the tests. This randomized method was used to negate any order effects during the tests. In order to assess the effect of various stimuli, users were presented with 1 second unfiltered recordings of male speech, female speech, Gaussian white noise and music with fast transients. Each sample was presented twice, followed by a short interval before the next presentation. Listeners were asked to keep their heads in the forward direction throughout the test, but this was not strictly enforced. Upon completion of one iteration of the test each listener was asked to move to the next seat for another randomized iteration. Each of the listeners answers were weighted, depending on the confidence level of the listener with their choice, with weightings of 1/n, where n is the number (or range) of speakers that a listener felt the sound originated from. From this, the histogram {h(θ i )} i [1:16] of all the listeners answers is computed for each seat. The angular mean θ and the unbiased standard deviation σ θ at each listener position are computed using: θ = 16 i=1 h(θ i) θ i 16 i=1 h(θ i) 16 i=1 σ θ = h(θ i)(θ i θ) 2 ( 16 i=1 h(θ i)) 1 (3) (4) 4.1. Subjective Localization of Monophonic Sources The results for monophonic source localization indicate that real sources can be reasonably well localized by a distributed audience under reverberant conditions. A detailed study of these results can be found in [19]. The results indicate that localization accuracy is greatest for frontal sources which demonstrate highly correlating mean angle results with little deviation, irrespective of the nature of the source stimulus. Lateral sources are similarly well localized, whereas the least accurate localization was recorded for rear sources, which displayed some deviations towards neighboring loudspeakers, most likely attributed to room reflections. It was also shown that even though the results display some variation for different source stimuli, with the best results being achieved for male speech, these differences are negligible. In general, accurate localization can be achieved for all source positions in all seats with the exception of some extreme source-listener angles. These results provide the best case scenario one could expect from the multichannel spatialization systems assessed in the next section Subjective Localization of Virtual Sources With Second Order Ambisonics the perceived image was quite often biased towards one of its nearest contributing loudspeakers. This can be clearly seen in the results for a presentation of male speech from loudspeaker 14 which is shown in 2 (a). The results for a frontal source display a similar trend as listeners at seats 1 and 2 consistently Page 4 of 1

5 localized the source to loudspeakers 1 and 3 respectively. Listeners at seats 3 and 4 localized to loudspeakers 5 and 15 respectively. For frontal sources, the best results were achieved at the rear-centre listening position, indicating that localization accuracy increases with distance from the source position. (a) Ambisonics (b) Spat (c) VBAP (d) Delta Stereophony Fig. 2: Localization angles for a source at speaker 14 with male speech. The dark blue arrow represents the angular mean θ computed at each listener position while the light orange arrow indicates the true angle θ T to the presented source position. A similar bias is also evident for the other source positions with the worst localization being experienced at the listening positions closest to the source. In general, for lateral sources, the best localization was achieved at the listener position in the middle row and at the opposite side to the source. These results imply that localization accuracy is degraded for non-central listeners seated close to the array. This near-speaker effect was predicted by Gerzon long before the development of Higher Order systems and recent research by Daniel [2] has also confirmed the extent of these effects on localization accuracy. Reasonable results were achieved at the centre listening position with the perceived source angle generally staying within 1 degrees of the actual source position. However, the deviations from θ indicate that consistent localization was not achieved, even at the centre position where such effects are presumably not a factor. The localization results for Spat with B-format Ambisonics encoding show a similar trend as for the higher order system with localization being consistently biased away from the intended source position. This can be clearly seen in 2 (b). Again no significant variations were found for different source stimuli. Interestingly, the localization achieved for rear sources was at least comparable to that for frontal sources with both Ambisonics systems. Reasonable results were again achieved at the centre listening position with the perceived source angle staying within 1 degrees of the actual source position. The deviations from the mean and number of correlating results at the centre position improved slightly with this first order system. For lateral sources and opposite seating positions, the higher order system displayed better results. This would seem to suggest that the near-speaker effect is not only a problem with higher order Ambisonics systems but is also a significant factor in first order, B-format presentations. However, this problem seems to be countered somewhat by the better performance offered to non-central listening positions with higher order systems. The localization results for a presentation of male speech from loudspeaker 14 using VBAP are presented in 2 Page 5 of 1

6 (c). Again, no significant variations were found for different source stimuli. The results demonstrate similar biases to those reported for both Ambisonic systems. However, due to the number of contributing loudspeakers with VBAP (a maximum of two), smaller deviations from θ T were found. The localization of rear sources was again comparable to that for frontal sources with the worst localization occurring for a rear lateral source at loudspeaker 14 as shown in 2 (c). The localization results for male speech presented from loudspeaker 14 using DSS are shown in 2 (d). Once again, no significant variations were found for different source stimuli. The results are very similar to those of VBAP, with localization being consistently biased towards the nearest speaker. However, the results for DSS show significantly greater variance about the mean than VBAP, especially at the seating positions furthest away from the source. This is not surprising considering that there are a greater number of contributing loudspeakers with DSS than with VBAP. It should be noted, however, that each θ compares favourably with VBAP for these listener positions. The results for lateral sources correlate well with those for VBAP with greater deviations about θ. 5. OBJECTIVE EVALUATION Binaural recordings were taken at each listener point in the hall for the same source stimuli and angular presentations as the subjective experiments. From these recordings the angular ITDs were inferred for each source location. The results show similar trends across all source material. A comparison of all systems for localization of male speech from loudspeaker 14 is shown in Figure 3. It is important to note that in this set of plots the error bar ( ) does not correspond to angular deviation ±σ θ, but rather to the accepted tolerance of localization in the direction of a particular speaker. This tolerance is set by the angles corresponding to the halfway points between the speakers on either side of the target location. For Second Order Ambisonics there is always image shift exhibited for each position in the array. In support of the perceptual results we note that the best localization occurs at seat 5 followed by seats 6 and 9. It is interesting to note that there are biases also exhibited at seat 5. For sources presented at loudspeaker 2, there are consistent ITDs indicating that the source is at loudspeaker 3. For sources at loudspeaker 6, there is a consistent bias towards loudspeaker 5, and for sources at loudspeaker 1, Direction of localisation (Degrees) Direction of localisation (Degrees) Direction of localisation (Degrees) Direction of localisation (Degrees) AMBISONICS: Virtual Source at 14: Stimulus: MaleSpeech Listener position SPAT: Virtual Source at 14: Stimulus: MaleSpeech Listener position VBAP: Virtual Source at 14: Stimulus: MaleSpeech Listener position DELTA: Virtual Source at 14: Stimulus: MaleSpeech Listener position Fig. 3: Objective localization for a source at speaker 14 with Male Speech = θ IT D, = θ T, = Tolerance. Page 6 of 1

7 there is consistent bias towards loudspeaker 11. These results confirm the findings of the perceptual tests that consistent accurate localization is not achieved, even at the centre position. VBAP exhibits similar ITD biases to Ambisonics for frontal localization, and again, the same trends are seen with image shift away from the intended position. The system performs adequately in terms of rear localization and the best localization is achieved at position 5, albeit with biases due to incorrect ITD information, attributed to the non-frontal stereophonic imaging. The ITDs for Spat again showed no significant variation with source material. The common trend of localization to the nearest contributing loudspeaker relative to the intended source position is repeated here. However, B-Format Ambisonics does marginally improve the localization cue for frontal sources at the centre listening position. Lateral source cues are comparable to secondorder Ambisonics, whereas the rear localization cues are mainly compromised. Delta Stereophony exhibits better frontal localization for the rear listener positions, but compromised rear localization for back sources. Its lateral localization performance is comparable to the other systems. It is important to note that since all loudspeakers in DSS are contributing significant early reinforcement levels across the entire audience area it is likely that the Apparent Source Width (ASW) is affected. In light of this, the ASW was measured at seat position 5 for male speech presentations. The results shown in Table 2 (where a value of represents a point source and 1 an unlocalizable source) show that frontal source width is greatest with DSS and smallest for B-Format Ambisonics, whereas Higher Order Ambisonics exhibits the largest source width for rear presentations. An examination of Figure 3 also shows a large deviation from the predicted angle at seat 9 for each system. Since this trend is present for all systems it is possible that the room affects the correlation at this position. 6. ROOM INVESTIGATIONS An equivalent acoustically modelled environment was implemented using the EASE developer package in order to look at the influence of room effects on the systems. The constructed model closely represents the absorption and reverberation characteristics of the real hall. System Speaker 2 Speaker 6 Speaker 1 Speaker 14 Mono Delta HOA B-format VBAP Table 2: Apparent Source Width Measurements. Impulse response measurements were taken using maximum length sequence (MLS) noise at particular points in the hall to verify the accuracy of the model. The measured pressure levels and arrival times of the direct sound and early reflections were found to be comparable to the simulations in EASE. The direct SPL contribution from a virtual source at loudspeaker 2 was calculated at 1kHz for each system and the results compared to the monophonic presentation shown in Figure 4. The contour plots of the listener area shown in Figures 4 through 7 show (from top left in a clockwise direction) the clarity ratio (C 8 ), the ratio of the direct to reverberant sound in terms of distance (Critical Distance), the Direct Sound Pressure Level and the Rapid Speech Transmission Index (RaSTI) 1. These graphs have the same orientation as the plots in Figure 2. We can see in terms of the direct sound levels that the region in the central listening position receives the highest SPL as expected. Good coverage for a sound reinforcement system should typically be in the region of ±3dB, which is not achieved with the current configuration of these systems. Furthermore, the existence of the large pressure null around the area of seat 9 in the Ambisonics and VBAP SPL plots show how rear reinforcement may be compromised. Such a null could partly explain the large deviations in localization accuracy found in the binaural measurements at listener seat 9 for each system. We also see that the differential between the maximum SPL seat and the surrounding seats is greatest for B-format Ambisonics. All systems also show small areas where the SPL drops significantly (more than 1dB). 1 Note here that the Alcons formulae modified with Farrel Becker equations was used to produce RaSTI. Page 7 of 1

Fig. 4: Monophonic audience area plots. Top Left: C 8, Top Right: Critical Distance, Bottom left: RaSTI, Bottom Right: Direct SPL. Fig. 5: Ambisonic audience area plots.

This can be attributed to the superposition of loudspeakers presenting the source material whilst significantly exciting the room modes.

For a good musical performance to be presented the C 8 figure should not exceed an 8dB limit over the audience area.

Furthermore each system exhibits large areas where the level of the reverberation is greater than that of the direct sound, the extent of which could adversely affect localization.

8 Fig. 4: Monophonic audience area plots. Top Left: C 8, Top Right: Critical Distance, Bottom left: RaSTI, Bottom Right: Direct SPL. Fig. 5: Ambisonic audience area plots. Top Left: C 8, Top Right: Critical Distance, Bottom left: RaSTI, Bottom Right: Direct SPL. This can be attributed to the superposition of loudspeakers presenting the source material whilst significantly exciting the room modes. The RaSTI for each system is in general good with values typically ranging between.66 to.76. The C 8 s give an estimate of the musical performance of the systems in the room. For a good musical performance to be presented the C 8 figure should not exceed an 8dB limit over the audience area. Here, the level of C 8 ranges mostly between 5 and 7 over the audience area for each system, indicating system/room combinations best suited for light jazz or light contemporary. Furthermore each system exhibits large areas where the level of the reverberation is greater than that of the direct sound, the extent of which could adversely affect localization. We can conclude from these simulations that the large SPL fluctuations about the audience area must contribute to the inaccurate localization of sources at off centre positions. Although intelligibility can be maintained, it is clear that each of the given systems are by no means immune to room effects. 7. OVERALL PERFORMANCE In the preceding sections we analysed the localization performance of each spatialization technique for each source position. However, it is also important to find a measure of which spatialization technique provides the best overall performance. For this purpose, each system was analyzed in terms of its subjective hit rate, calculated by correlating the ideal localization histogram with the observed results. This measure, displayed in Figure 9 expresses the percentage localization accuracy over all source positions and stimuli. As we can see, the localization performance of each system does not achieve that of monophonic sources. Overall VBAP and DSS perform better for front and back sources, with VBAP providing a 12.7% higher localization accuracy over DSS for rear sources. Higher Order Ambisonics performs consistently better than B-format at all source positions and its performance is comparable to VBAP for lateral rear sources. Interestingly, we see that even though lateral rear monophonic localization falls by over 24%, the localization accuracy in VBAP and DSS does not fall by the same degree, and in fact both Ambisonics systems exhibit a marginally higher degree of accuracy for lateral sources over frontal sources. It is also important to note that although VBAP shows better localization performance than that of the other systems, its SPL coverage over the audience area is comparable to that of a monophonic source, as seen from the direct SPL simulations. 8. CONCLUSION In this paper we have assessed the subjective and objective performance of various spatialization techniques, in terms of their localization accuracy for virtual sources for a distributed audience in a reverberant environment. Page 8 of 1

Fig. 6: Spat audience area plots. Top Left: C 8, Top Right: Critical Distance, Bottom left: RaSTI, Bottom Right: Direct SPL.

An equivalent acoustic model was implemented and used to investigate specific aspects of the perceptual findings.

Source localization for non-central listener positions is consistently biased away from the intended image position, irrespective of the spatialization technique or the nature of the source stimulus.

Due to the number of contributing loudspeakers in these systems this bias resulted in a significant range of perceived source angles at different listener positions.

The results for both systems at the centre listening position were also relatively poor, and the simulations suggest that the room acoustics also impact on localization accuracy.

These results suggest that higher order systems can create a slightly larger accurate centre listening area. This suggests that near-speaker effects in Ambisonics are a Fig.

9 Fig. 6: Spat audience area plots. Top Left: C 8, Top Right: Critical Distance, Bottom left: RaSTI, Bottom Right: Direct SPL. The results of a series of listening tests were presented and these findings were supported by calculated ITDs inferred from high resolution binaural measurements recorded in the test environment. An equivalent acoustic model was implemented and used to investigate specific aspects of the perceptual findings. The results indicate that neither intensity panning nor Ambisonics techniques can create consistently localized virtual sources for a distributed audience in a reverberant environment. Source localization for non-central listener positions is consistently biased away from the intended image position, irrespective of the spatialization technique or the nature of the source stimulus. Both B-format and Second Order Ambisonics suffered from near-speaker effects which resulted in a strong localization bias towards the near loudspeakers. Due to the number of contributing loudspeakers in these systems this bias resulted in a significant range of perceived source angles at different listener positions. In both cases, the localization accuracy increases with distance from the source which again illustrates this problem. The results for both systems at the centre listening position were also relatively poor, and the simulations suggest that the room acoustics also impact on localization accuracy. B- format Ambisonics performed slightly better at the centre listening position but the higher order system produced slightly better results at distant listener positions. These results suggest that higher order systems can create a slightly larger accurate centre listening area. This suggests that near-speaker effects in Ambisonics are a Fig. 7: Delta Stereophony audience area plots. Top Left: C 8, Top Right: Critical Distance, Bottom left: RaSTI, Bottom Right: Direct SPL. problem in both higher order and B-format presentations. The results for VBAP and DSS illustrate the well known limitations of the stereophonic principle for offcentre listening positions. As with Ambisonics, both of these systems displayed biases towards the nearest loudspeaker. However due to the smaller number of contributing loudspeakers in VBAP, this did not affect the overall localization performance to the same degree. The results for DSS were largely comparable to those of VBAP albeit with increased deviation for distant listener positions due to the greater number of contributing loudspeakers. However, this effect is countered somewhat by the reinforcement created across the listening area with this system. The comparison of the results for monophonic and virtual source localization suggest that if consistent and accurate localization is required for a distributed audience, then virtual sources created by these schemes cannot be relied upon. However, systems which attempt to synthesize the accurate wavefronts required for localization for a distributed audience present one possible solution to this problem. The Wave Field Synthesis (WFS) approach, proposed by Berkhout et al [21] is one such solution for large arrays, and warrants investigation. In addition, the perception of dynamically moving sources needs to be investigated as the capability of these systems in this regard is also of significant importance. Page 9 of 1

[6] S. G. Weinrich, Horizontal plane localization ability and response time as a function of signal bandwidth, in Audio Engineering Society Preprint 47; AES Convention 98, February 1995. [7] T.

G. Malham and A. Myatt, 3-D sound spatialisation using ambisonic techniques, Computer Music Journal, vol. 19, pp. 58 7, 1995. [9] M. A. Gerzon, Criteria for evaluating surround-sound systems, Journal of the Audio Engineering Society, vol.

Hit rate % 9 8 7 6 4 3 2 1 Source at 2 Hit rate % 9 8 7 6 4 3 2 1 Source at 6 [1] J. Schacher and P. Kocher, Ambisonics spatialization tools for max/msp, www.icst.net, 26. [11] D. G.

10 [6] S. G. Weinrich, Horizontal plane localization ability and response time as a function of signal bandwidth, in Audio Engineering Society Preprint 47; AES Convention 98, February [7] T. T. Sandel, D. C. Teas, W. E. Feddersen, and L. A. Jeffress, Localization of sound from single and paired sources, Journal of the Acoustical Society of America, vol. 27, pp , July [8] D. G. Malham and A. Myatt, 3-D sound spatialisation using ambisonic techniques, Computer Music Journal, vol. 19, pp. 58 7, [9] M. A. Gerzon, Criteria for evaluating surround-sound systems, Journal of the Audio Engineering Society, vol. 25, pp. 4 48, Fig. 8: VBAP audience area plots. Top Left: C 8, Top Right: Critical Distance, Bottom left: RaSTI, Bottom Right: Direct SPL. Hit rate % Source at 2 Hit rate % Source at 6 [1] J. Schacher and P. Kocher, Ambisonics spatialization tools for max/msp, [11] D. G. Malham, Experience with large area 3-D ambisonic sound systems, Institute of Acoustics, vol. 8, pp , [12] V. Pulkki, Virtual sound source positioning using vector base amplitude panning, Journal of the Audio Engineering Society, vol. 45, pp , [13] J. Jot, Real-time spatial processing of sounds for music, multimedia and human-computer interfaces, Mono VBAP Delta Ambi Source at Spat Mono VBAP Delta Ambi Source at Spat [14] J. Chowning, The simulation of moving sound sources, Journal of the Audio Engineering Society, vol. 19, pp. 2 6, Hit rate % Hit rate % [15] F. R. Moore, A general model for spatial processing of sounds, Computer Music Journal, vol. 7, pp. 6 15, Mono VBAP Delta Ambi Spat Mono VBAP Delta Ambi Spat Fig. 9: Overall subjective localization performance. 9. REFERENCES [1] W. M. Hartmann, Localization of sound in rooms, Journal of the Acoustical Society of America, vol. 74, pp , Nov [2] C Giguere and S. M. Abel, Sound localization: Effects of reverberation time, speaker array, stimulus frequency, and stimulus rise/decay, Journal of the Acoustical Society of America, vol. 94, pp , [3] J. Blauert, Spatial Hearing, MIT Press Cambridge MA, 23. [4] J.W.S. Rayleigh, Theory of Sound, Dover, N.Y., [5] E. A. MacPherson and J. C. Middlebrooks, Listener weighting of cues for lateral angle: The Duplex Theory of sound localization revisited, Journal of the Acoustical Society of America, vol. 111, pp , May 22. [16] W. Ahnert, Complex simulation of soundfields by the Delta Stereophony System (DSS), Journal of the Audio Engineering Society, vol. 35, pp , [17] W. Ahnert, Problems of near-field sound reinforcement and of mobile sources in the operation of the Delta Stereophony System (DSS) and computer processing of the same, in 82nd Convention of the Audio Engineering Society, [18] C. H. Knapp and G. C. Carter, The generalized correlation method for estimation of time delay, IEE Transactions on Acoustics, Speech and Signal Processing, vol. 24, pp , [19] E. Bates, G. Kearney, D. Furlong, and F. Boland, Localization accuracy of advanced spatialisation techniques in small-sized concert halls, in 153rd Meeting of the Acoustical Society of America, June 27. [2] J. Daniel, Spatial sound encoding including near field effect: Introducing distance coding filters and a viable, new ambisonic format, in 23rd International Conference of the Audio Engineering Society, 23. [21] A. J. Berkhout, A holographic approach to acoustic control, Journal of the Audio Engineering Society, vol. 36, pp , Page 1 of 1

Localization Accuracy of Advanced Spatialization Techniques in Small Concert Halls

Localization Accuracy of Advanced Spatialization Techniques in Small Concert Halls Enda Bates, a) Gavin Kearney, b) Frank Boland, c) and Dermot Furlong d) Department of Electronic & Electrical Engineering,