On distance dependence of pinna spectral patterns in head-related transfer functions

On distance dependence of pinna spectral patterns in head-related transfer functions Simone Spagnol a) Department of Information Engineering, University of Padova, Padova 35131, Italy spagnols@dei.unipd.it Abstract: The aim of this letter is to address a little understood question in sound source localization: Can the distance of a near sound source affect our own perception of its elevation? The issue is studied by means of an objective analysis of a database of distance-dependent head-related transfer functions (HRTFs) of a KEMAR (Knowles Electronic Manikin for Acoustic Research) mannequin with different pinnae on a dense spatial grid. Iso-directional HRTFs are compared through spectral error metrics; results indicate significant distancedependent HRTF modifications due to the pinna occur when the source is close to the interaural axis. VC 2014 Acoustical Society of America [BLM] Date Received: October 9, 2014 Date Accepted: November 30, 2014 1. Introduction Vertical sound source localization is possible thanks to the pinna, which is known to play a fundamental role in vertical localization by introducing peaks and notches in the high-frequency spectrum of the head-related transfer function (HRTF), whose center frequency, amplitude, and bandwidth greatly depend on the elevation angle of the sound source. A recent study 1 showed that a parametric HRTF recomposed using only the first spectral peak and the first two notches yields almost the same localization accuracy as the corresponding measured HRTF. Additional evidence is given by Moore et al., 2 who state that the threshold for perceiving a notch frequency shift is consistent with the localization blur on the median plane. Furthermore, increasing elevation in the front hemisphere is known to be cued by the increasing central frequency of a notch. 3 Elevation cues may be thought of as distance-independent when the source is in the so-called far-field (approximately more than 1.5 m from the center of the head) where sound waves reaching the listener can be assumed to be plane. On the other hand, when the source is in the near-field the central frequency, amplitude, and bandwidth of peaks and especially notches were seen to vary with distance. 4,5 However, Brungart and Rabinowitz 4 conclude that elevation cues are not correlated to distancedependent patterns in the near-field. This is supported just by graphical evidence that HRTFs at three distinct elevations and three distances are considerably more consistent across distance than across elevation. The authors argue that if this result generalizes to all elevations, it would imply that elevation cues are roughly independent of distance and that the same mechanisms that mediate elevation perception in the distal region (i.e., the far-field) may also be used in the proximal region (i.e., the near-field) but this hypothesis has never been thoroughly verified on measured HRTF sets in the following literature. The aim of this letter is thus to investigate through the objective analysis of a complete distance-dependent database of HRTF measurements (Ref. 6) whether the rough independence between elevation cues and distance in the near-field is a wellgrounded claim. a) Author to whom correspondence should be addressed. EL58 J. Acoust. Soc. Am. 137 (1), January 2015 VC 2014 Acoustical Society of America

2. Distance-dependent HRTF analysis Typically, HRTFs are measured by presenting a sound stimulus at several different spatial locations lying on the surface of a sphere centered in the subject s head, hence at one single distance (1 m or farther). Since a common loudspeaker cannot simulate an acoustic point source in the near-field, measuring HRTFs at closer distances becomes an issue. Recently, Qu et al. 6 successfully managed to collect a spatiallydense, distance-dependent HRTF database with the aid of a specialized spark gap as acoustic point source. The database includes the left-ear [large KEMAR (Knowles Electronic Manikin for Acoustic Research) pinna] and right-ear (small KEMAR pinna) responses of a complete KEMAR mannequin for 72 azimuth angles, 14 elevation angles, and 8 distances (20, 30, 40, 50, 75, 100, 130, and 160 cm) from the center of the mannequin s interaural axis, totaling 12 688 HRTFs. Taking the vertical polar coordinate system as reference, elevation goes from 40 to 90 in 10 steps and azimuth goes from 0 to 355 in 5 steps except for elevation 60 (10 steps), 70 (15 steps), 80 (30 steps), and 90 (h ¼ 0 only). The (0,0 ) direction is right in front of the listener, (90,0 )is at the right ear, and (270,0 ) is at the left ear. These measurements were certified to be comparable to the KEMAR HRTFs included in the CIPIC database. Qu et al. also replicated the same measurements with the left pinna removed and its slot filled with plasticine. 7 The following analysis will be based on all of the small-ear, large-ear, and pinnaless responses. 2.1 Spectral distortion In order to quantitatively measure the difference between magnitude responses for various distances at fixed azimuth (h) and elevation (/) angles, let us define spectral distortion (SD) as vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! u 1 X SD H; ~H N 2 jhf ð i Þj ¼ t 20 log N 10 ½dBŠ; (1) i¼1 j ~H ðf i Þj where H and ~H are the frequency responses to be compared and N is the number of available frequency bins in the R 1 ¼ [3,16] khz range, chosen in order to include all of the known relevant pinna cues. In this case the reference transfer function, ~H, isthe response for the farthest distance (d ff ¼ 160 cm) assumed to lie in the far-field, while H is the response for one of the closer distances sharing the same angular coordinates (h k, / k ). If we reasonably assume the effects due to distinct body parts (head, pinna, shoulders, and torso) on the HRTF being additive, the magnitude response of the pinna-related component of the HRTF (which we refer to as Pinna-Related Transfer Function, PRTF) for one spatial position (h k, / k, d k ) can be obtained by simply dividing the full-kemar HRTF magnitude by the corresponding pinnaless response (Head-and-Torso Transfer Function, HTTF): jprtfðf ; h k ; / k ; d k Þj¼ jhrtf ð f ; h k; / k ; d k Þj jhttfðf ; h k ; / k ; d k Þj : (2) This operation, that in principle guarantees the elimination of head, torso, and nearfield effects from the starting HRTF, was performed on both the small- and large-ear HRTF sets. Since HTTFs were available for the left channel only, each HRTF (f, h k, / k, d k ) for the right (small ear) channel was divided by HTTF(f, h k, / k, d k ). Admittedly, two more preprocessing steps were needed because of a couple of artifacts emerging from the HTTF database measurements. First, unexpected spectral notches were seen to appear in the high-frequency range of HTTFs for incidence angles directly ahead of the left channel. Despite the KEMAR pinna slot having been filled with plasticine, 7 these artifacts are hypothesized to be due to reflections on its J. Acoust. Soc. Am. 137 (1), January 2015 Simone Spagnol: Distance dependence of pinna patterns EL59

edges. In order to avoid destruction of pinna notches in HRTFs, 1/3-octave smoothing was preliminarily applied to all HTTFs. Second, it was observed that, although HTTFs (and HRTFs) in the database are not normalized over any free-field response, the power of HTTF spectra jhttf (f, h k, / k, d k )j does not always decrease with increasing distance d k as any distance-attenuation law would predict. In order to allow SD computation, magnitude normalization of PRTFs was performed by simply dividing each PRTF by its mean magnitude in R 1, bprtfðf ; h; /; dþ ¼ PRTF ð f ; h k; / k ; d k Þ X N i¼1 jprtfðf i ; h k ; / k ; d k Þj N: (3) Spectral distortion SD( b PRTF(f, h k, / k, d), b PRTF(f, hk, / k, d ff )) is now computed for both pinnae at each available angular coordinate and each distance d [20,30,40,50,75,100,130] cm. Furthermore, in order to have a measure of comparison between SDs along distance and elevation, the SD between adjacent far-field PRTFs in the frontal half of the median plane, SD( b PRTF(f, 0, / k, d ff ),b PRTF(f, 0, / kþ1, d ff )), is calculated. Results, reported in Fig. 1 and Table 1 respectively, reveal that the initial hypothesis is verified for the majority of the considered spatial coordinates. In Fig. 1, 67% of the nonzero entries for the large pinna and 58% for the small pinna are less than 3 db, i.e., the mean SD calculated from Table 1 corresponding to a shift of the elevation angle by 10, which is consistent with the localization blur in the median plane. 2 Nevertheless, two major critical areas occur in all of the seven plots, one centered around the contralateral ear (h ¼ 90 for large pinna and h ¼ 270 for small pinna) and one broader area in the ipsilateral side (h ¼ [250, 340 ] for large pinna and h ¼ [20, 110 ] for small pinna), both extending from low to middle elevation angles. For some of the directions included in these two areas SD increases up to 10 13 db for Fig. 1. Spectral distortion between PRTFs at distance d ff ¼ 160 cm and distance (top-to-bottom): d ¼ 20 cm, d ¼ 30 cm, d ¼ 40 cm, d ¼ 50 cm, d ¼ 75 cm, d ¼ 100 cm, d ¼ 130 cm. Coordinate (h, /) ¼ (0,90 ), showing a SD never greater than 3.4 db for the left pinna and 4.5 db for the right pinna, is omitted to reduce clutter. EL60 J. Acoust. Soc. Am. 137 (1), January 2015 Simone Spagnol: Distance dependence of pinna patterns

Table 1. Spectral distortion [db] between adjacent frontal median-plane far-field PRTFs. Elevation 40 30 20 10 0 10 20 30 40 50 60 70 80 Large pinna 3.3 2.7 3.2 2.4 2.1 2.5 1.8 1.5 4.0 2.8 1.8 2.5 2.8 Small pinna 5.1 3.9 5.7 3.1 3.0 3.0 3.6 4.4 4.2 2.6 2.3 1.3 1.4 the closest distances, implying an evident involvement of a perceptually salient effect of distance on PRTF spectral features. These results can be related to the acoustic simulations of the Br uel&kjaer 4128C dummy head that Otani et al. 5 performed on the horizontal plane for source distances ranging from 0.1 to 3 m, which showed increasing differences between far- and near-field HRTFs at both ipsilateral and contralateral locations. 2.2 Pinna notches Following an informal inspection of different HRTF/PRTF shapes, the reason for such a systematic SD rise which is almost limited to low and medium elevations is hypothesized to lie in the variation of both the position and shape of pinna notches along distance. In order to check such a hypothesis, center frequencies, gains, and bandwidths of the three main pinna notches appearing in all HRTFs between / ¼ 40 and / ¼ 50 are extracted and processed according to the following procedure. (1) Center frequencies, f C, of all available pinna notches in the HRIR (head-related impulse response) set are computed by application of the ad hoc frequency notch extraction algorithm by Raykar et al. 8 Briefly, the algorithm computes the autocorrelation function of the linear prediction HRIR residual and extracts notch frequencies as the local minima of its group-delay function falling beyond a fixed threshold (heuristically set to 0.5 samples). (2) For each available angular coordinate (h, /), the extracted notches are grouped in frequency tracks along consecutive distances through the McAulay-Quatieri partial tracking algorithm, 9 originally used to group sinusoidal partials along consecutive temporal windows according to their spectral location. In this context, if we conceptually replace temporal evolution with distance dependence and sinusoidal partials with spectral notches, the algorithm can be exploited to track the most marked notch patterns along distance. The matching interval for the tracking procedure is set to D ¼ 1 khz. Only tracks with four notches at least are considered in the following steps; if more than three tracks satisfying such a requirement are available, only the three longest tracks are considered. (3) The gain, G, of a notch belonging to any track is derived by simply checking the magnitude response in the corresponding PRTF at frequency f C. (4) The bandwidth, B, of a notch belonging to any track is computed as the 3-dB bandwidth, i.e., B ¼ f r f l, where f l and f r are the left and right þ3 db level points relative to f C, respectively. (5) For each angular coordinate, standard deviations of center frequencies, gains, and bandwidths are calculated within each notch track and averaged among the three (or less) available tracks. Figure 2 reports standard deviations as functions of the angular coordinate for the large- and small-pinna HRTF sets. Both the center-frequency and gain plots reflect an evident correlation with the SD plots, especially those related to the nearest distances. Table 2 reports the correlation coefficients between SD and standard deviation data. Notice how the frequency and gain data highly correlates with SD for the nearest distances, as expected, while standard deviation of bandwidths shows much lower coefficients. J. Acoust. Soc. Am. 137 (1), January 2015 Simone Spagnol: Distance dependence of pinna patterns EL61

Fig. 2. Standard deviation of notch frequencies (top row), gains (middle row), and bandwidths (bottom row) across the eight distances. While the perceptual relevance of notch depth variations is little understood, the absence or displacement of prominent elevation cues is known to have an impact on localization accuracy. For instance, one could verify that 1-kHz shifts of the lowest-frequency pinna notch usually correspond to an increase/decrease of the elevation angle of 20 or more. 3 From previous literature 2 we also know that two steady notches in the high-frequency range (around 8 khz) differing just in center frequency are distinguishable on average if such difference is around 10% of the lowest center frequency at least, independent of notch bandwidth. If we extend the above assumption to the range R 2 ¼ [6,11] khz where the first two spectral notches typically crucial in elevation perception (Ref. 1) usually lie, we would be able to detect all those notch tracks lying in R 2 for which the difference between the highest- and lowest-frequency point exceeds 10% of the latter. This result is plotted in Fig. 3, showing that 13% and 12% of the considered spatial coordinates for the large and small ear, respectively, exceed such a threshold. 3. Discussion and conclusions The objective analysis of a recent database of distance-dependent HRTFs reported in this letter indicates that the frequency displacement of spectral notches across nearfield distances in iso-directional HRTFs is likely to have an impact on localization. The initial hypothesis on the rough independence between spectral elevation cues and distance cannot be guaranteed for all directions of the sound source, especially those surrounding the interaural axis. Admittedly, modifications of spectral cues for contralateral HRTFs shall have little effect on localization, as the far ear does not significantly contribute to localization accuracy in elevation. 10 However, the amount of notch deviations in ipsilateral HRTFs could imply an improvement or degradation of localization performances in elevation in that spatial region when source distance is reduced or increased. Such a hypothesis cannot be punctually verified in the near-field localization data by Brungart et al.: 11 even though their results suggest the independence of the overall elevation error with respect to source distance, localization data is Table 2. Correlation coefficients between SD and standard deviation. Pinna 20 cm 30 cm 40 cm 50 cm 75 cm 100 cm 130 cm Frequency Large 0.59 0.62 0.59 0.56 0.50 0.39 0.40 Small 0.49 0.50 0.45 0.41 0.35 0.28 0.33 Gain Large 0.58 0.61 0.61 0.59 0.52 0.46 0.47 Small 0.57 0.58 0.60 0.61 0.56 0.46 0.41 Bandwidth Large 0.30 0.32 0.33 0.33 0.31 0.27 0.28 Small 0.32 0.36 0.37 0.37 0.36 0.34 0.33 EL62 J. Acoust. Soc. Am. 137 (1), January 2015 Simone Spagnol: Distance dependence of pinna patterns

Fig. 3. Spatial locations for which a pinna notch in the range R 2 ¼ [6,11] khz deviates more than 10% across distance. pooled across azimuth and subjects. Previous literature suggests that the dependence of elevation cues on distance is weakly due to the acoustic parallax effect, 12 i.e., the discrepancy between the angles of the source relative to the head and ear. In the present work SD was found to be high within those directions for which the parallax effect is theoretically absent (i.e., the interaural axis), and much lower for directions where the parallax effect is most evident (e.g., along the median plane). Furthermore, the parallax effect is typically prominent at very near distances (below 20 cm). As a consequence, the impact of such an effect on the found notch frequency shifts and spectral modifications shall be ruled out. However, if we consider a secondary parallax effect applied to the individual pinna folds, which becomes more and more prominent as the sound source moves toward it with a spherical wave front, then it may be maximal along the interaural axis. Our analysis was performed on a single subject, a KEMAR mannequin with different pinnae. Similar measurements on a human subject would require a considerable amount of time and fatigue in order to collect several HRTF sets for a number of near-field source distances. A future contingent availability of a distance-dependent HRTF database measured on human subjects will allow the present data analysis to be repeated and the related results to be verified. Finally, having in regard that notch detectability heavily depends not only on center frequency, depth, and bandwidth but also on both stimulus intensity and intersubject variation, 13 individual psychoacoustic tests are needed in order to ascertain whether the found spectral modifications correlate to elevation performance in those spatial regions where modifications are prominent. Acknowledgments I am grateful to Professor Tianshu Qu for the kind provision of the near-field HTTF database. This work was supported by the research project Personal Auditory Displays for Virtual Acoustics, University of Padova, under Grant No. CPDA135702. References and links 1 K. Iida, M. Itoh, A. Itagaki, and M. Morimoto, Median plane localization using a parametric model of the head-related transfer function based on spectral cues, Appl. Acoust. 68, 835 850 (2007). 2 B. C. J. Moore, S. R. Oldfield, and G. J. Dooley, Detection and discrimination of spectral peaks and notches at 1 and 8 khz, J. Acoust. Soc. Am. 85, 820 836 (1989). 3 J. Hebrank and D. Wright, Spectral cues used in the localization of sound sources on the median plane, J. Acoust. Soc. Am. 56, 1829 1834 (1974). 4 D. S. Brungart and W. M. Rabinowitz, Auditory localization of nearby sources. Head-related transfer functions, J. Acoust. Soc. Am. 106, 1465 1479 (1999). 5 M. Otani, T. Hirahara, and S. Ise, Numerical study on source-distance dependency of head-related transfer functions, J. Acoust. Soc. Am. 125, 3253 3261 (2009). 6 T. Qu, Z. Xiao, M. Gong, Y. Huang, X. Li, and X. Wu, Distance-dependent head-related transfer functions measured with high spatial resolution using a spark gap, IEEE Trans. Audio, Speech, Lang. Process. 17, 1124 1132 (2009). 7 T. Qu, S. Cao, and X. Wu, Relationship between distance and binaural cues on sound source localization, Acta. Sci. Nat. Univ. Pekin. 46, 901 906 (2010). 8 V. C. Raykar, R. Duraiswami, and B. Yegnanarayana, Extracting the frequencies of the pinna spectral notches in measured head related impulse responses, J. Acoust. Soc. Am. 118, 364 374 (2005). 9 R. J. McAulay and T. F. Quatieri, Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust., Speech, Signal Process. 34, 744 754 (1986). J. Acoust. Soc. Am. 137 (1), January 2015 Simone Spagnol: Distance dependence of pinna patterns EL63

10 R. A. Humanski and R. A. Butler, The contribution of the near and far ear toward localization of sound in the sagittal plane, J. Acoust. Soc. Am. 83, 2300 2310 (1988). 11 D. S. Brungart, N. I. Durlach, and W. M. Rabinowitz, Auditory localization of nearby sources. II. Localization of a broadband source, J. Acoust. Soc. Am. 106, 1956 1968 (1999). 12 D. S. Brungart, Auditory parallax effects in the HRTF for nearby sources, in Proceedings IEEE Workshop on Applications of Signal Processing, Audio and Acoustics, New Paltz, New York (October 1999), pp. 171 174. 13 A. Alves-Pinto and E. A. Lopez-Poveda, Detection of high-frequency spectral notches as a function of level, J. Acoust. Soc. Am. 118, 2458 2469 (2005). EL64 J. Acoust. Soc. Am. 137 (1), January 2015 Simone Spagnol: Distance dependence of pinna patterns