WAVELET-BASE SPECTRAL SMOOTHING FOR HEA-RELATE TRANSFER FUNCTION FILTER ESIGN HUSEYIN HACIHABIBOGLU, BANU GUNEL, AN FIONN MURTAGH Sonic Arts Research Centre (SARC), Queen s University Belfast, Belfast, BT 1NN, U.K. {h.hacihabiboglu, b.gunel, f.murtagh}@qub.ac.uk Three wavelet-based spectral smoothing techniques are presented in this paper as a pre-processing stage for headrelated transfer function (HRTF) filter design. These wavelet-based methods include wavelet denoising, wavelet approximation, and redundant wavelet transform. These methods are used with time-domain parametric filter design methods to reduce the order of the IIR filters which is useful for real-time implementation of immersive audio systems. Results of a subjective listening test are then presented in order to justify the perceptual validity of the investigated smoothing methods. Results show that wavelet based spectral smoothing methods are beneficial in reducing the filter order and increasing the perception of localization without introducing noticeable effect on timbre. INTROUCTION Spectral shaping of sound on its way from a fixed point in free field to an external ear is defined by a headrelated transfer function (HRTF). ata reduction of HRTFs becomes an important issue when dealing with virtual reality applications, spatialization of audio and binaural synthesis in real time. ifferent methods for data complexity reduction including downsampling, rectangular windowing [1], and principal component analysis [2] have been proposed. The data-reduced form of an HRTF is then used for filter design. An essential issue in HRTF filter design for perceptual validity is preservation of the binaural hearing cues. A better approach uses auditory criteria for smoothing [3]. Frequency warping [4] has been used as an auditory weighting method []. A review of auditory weighting is given in Huopaniemi et al. []. The smoothing methods presented in this paper depend on wavelet transforms. In Section 1, a summary of wavelet de-noising, wavelet approximation and redundant wavelet transform is presented. The application of these smoothing methods to individual HRTFs is discussed together with the filter design issues in Section 2. Results of the subjective listening test are presented in Section 3. 1 REVIEW OF WAVELET SMOOTHING The smoothing methods employed in this paper utilize the discrete wavelet transform (WT). A general review of wavelets can be found in Mallat []. WT can be used for smoothing a given signal in a number of different ways. Three possibilities are wavelet denoising, wavelet approximation and redundant wavelet transform. In the scope of this paper, the signals to be smoothed are the magnitude responses of the individual HRTFs. 1.1 Wavelet enoising with SURE Assuming that the magnitude response of an HRTF is a noisy version of a smoother function, we can use wavelet denoising to recover an approximate, smoother function. The wavelet denoising is defined as a three-step process [] [9]. If we assume that the magnitude response of the actual HRTF has additive white Gaussian noise such that h = f t ) + z i where ( (1) i done by following the steps below: i 2 z i = N(, σ ), then the wavelet denoising is 1) WT is performed on the noisy data. 2) A soft-threshold non-linearity is applied to the noisy wavelet coefficients to find the wavelet coefficient estimates. η ( w) = sgn( w)( w t) I( w t) (2) t > The threshold t in this equation is calculated with the SURE (Stein s Unbiased Risk Estimation) algorithm. AES 22 nd International Conference on Virtual, Synthetic and Entertainment Audio 1
3) The denoised function estimate (t) is obtained by inverting the WT using the wavelet coefficient estimates found in the previous step. Validity of the assumption of noise characteristics for HRTF data will be discussed in Section 2. 1.2 Wavelet Approximation Another assumption regarding HRTF data is that small deviations are not critically important and can accordingly be discarded [2]. After the WT is performed on a signal of length 2 N, approximation coefficients at level L will have the length 2 N-L. The signal approximation is low-pass interpolated to get a smooth approximation of the actual HRTF magnitude response. Smoothness of the final representation also depends on the selection of the wavelet type and the depth of the tree. A scaling function with a higher attenuation in the first side-lobe captures better low frequency approximation. f The smoothing and filtering procedures described in the subsequent sections have been applied to the HRTFs for all the azimuth and elevation values for both types of the pinnae. 2.2 Smoothing with Wavelet enoising The wavelet denoising is performed with the SURE algorithm to select the soft-threshold. ifferent wavelet functions show different denoising performances with the same denoising scheme. Therefore, denoising was performed for different wavelets selected from a wellknown group consisting of Haar, discrete Meyer, aubechies 2-9, Coiflet 1- and Symmlet 1-1 wavelets. A 3-level WT was used in all of the cases. The results of the smoothing for magnitude response of the HRTF corresponding to the left ear at zero-azimuth (ϕ= ) and zero-elevation (δ= ) is given below (see Fig. 1). 1.3 Redundant Wavelet Transform If the sub-sampling steps are discarded from the WT structure, a better low-pass approximation can be obtained. In the case of the HRTFs, significant coarse details of the frequency notches and the low frequency parts of the magnitude response are successfully captured. Results of the redundant wavelet transform depend on the selection of the wavelet type and the depth of the wavelet tree as in the case of wavelet approximation. Selection of a scaling function with a high attenuation in the first side-lobe is also important for this type of smoothing. 2 HRTF SMOOTHING USING WAVELETS 2.1 HRTF ata The wavelet-based smoothing methods mentioned in the previous section are applied to the magnitude responses of the non-individualized HRTFs sampled at three different locations. The HRTF sets are selected from the free-field equalized HRTF measurements made on the KEMAR mannequin by Martin and Gardner [1]. The Martin-Gardner HRTF set includes the HRTF measurements for two types of pinnae. A property of these data sets is that they are already presented as 12 sample series, after an acceptable amount of data is discarded. Before the smoothing process, magnitude responses of the individual HRTFs were equalized to discard the effect of the loudspeaker used in the HRTF measurements. - -4 - - -1. 1 Figure 1: Original and denoised magnitude responses with db4, sym2, and coif3 from top to bottom, respectively It may be noted that wavelet denoising with SURE does not provide a sufficiently smooth result. Overfitting of the original by the wavelet-denoised data prevents the reduction of the order of filters in the design phase. This result is a consequence of the false assumption that the irregularities in the magnitude responses may be modeled as additive white Gaussian noise. Therefore wavelet denoising with SURE was not included as a smoothing method to be investigated in the listening test. 2.3 Smoothing with Wavelet Approximation The equalized magnitude response can be smoothed by obtaining the low-frequency approximation using the WT. The wavelet approximation of the signal was computed using the Symmlet 1 filter bank. A 3-level WT was performed on the 12-sample magnitude response. The AES 22 nd International Conference on Virtual, Synthetic and Entertainment Audio 2
wavelet approximation, which is 4 samples long, was low-pass interpolated to get a smoother version of the signal. An alternative to this approach may be sub-sampling the signal by and low-pass interpolating with samples between each obtained result. However, this alternative results in loss of crucial data and gives minor frequency peaks and notches (see the selected area in Fig. 2) which are not significant in the original data. This decreases the success of smoothing in reducing the filter order. 2. Filter esign The equalized and smoothed magnitude responses were first converted into the time-domain. A minimum-phase version of the signal was then obtained by the minimum-phase reconstruction using real cepstrum [11]. A minimum-phase IIR filter with the same number of poles and zeros was designed for each HRTF with Prony s method. Frequency dependence of the interaural time delay (IT) was assumed to be perceptually irrelevant [2]. Thus the IT was supplied by a constant delay line. - -4-3 PERCEPTUAL VALIATION OF WAVELET- BASE SMOOTHING Perceptual validation of our results was carried out using a subjective listening test. We used intraural, earplug style Sony headphones, and carried out the tests in a recording studio in the Music epartment of Queen s University Belfast. - -1. 1 Figure 2: Original (middle), wavelet approximation smoothing (top), and subsampling/interpolation smoothing (bottom) (ϕ=, δ= ) 2.4 Smoothing with Redundant Transform The redundant wavelet transform described in the previous section was applied to the magnitude responses of the HRTFs. The filter-bank used was the Symmlet 1 filter bank. As the approximation obtained using the redundant wavelet transform consists of 12 samples, low-pass interpolation is no longer necessary. Thus, the results capture the frequency peaks and notches better than wavelet approximation (see Fig. 3). - -4 - -. 1 Figure 3: Original (top) and redundant wavelet transform smoothing (bottom) (ϕ=, δ= ) 3.1 Subjects Eleven subjects (3 female, male) aged 19-3 served as voluntary participants. All had normal hearing and no history of hearing problems. Five of the subjects had previous experience of specialized listening. 3.2 Stimulus We used a pink noise sample with a length of one second with ms ramps in the onset and the offset portions. 3.3 Procedure Three HRTF sets representing three spatial positions were used in the listening test. The HRTF pairs we used were selected to represent both types of pinnae for 3 varying positions of azimuth (ϕ) and elevation (δ): ϕ= and δ= for small pinnae. ϕ=3 and δ= for small pinnae. ϕ=24 and δ=-1 for large red pinnae. This selection was rather arbitrary but it was made sure that the positions were far enough from each other to prevent bias in the listening test due to the previous location of the sound. This means that the ripples, peaks and notches of the HRTFs are quite different from each other (see Fig. 4 and Fig. ). The subjects were seated in the acoustically insulated recording studio. Tests for each of the three HRTF sets were carried out separately for each subject. Test signals were presented in pairs. The first signal was always the reference signal obtained by the convolution of the stimulus signal with the original HRTF set. The second signal was one of the following three signals selected randomly: AES 22 nd International Conference on Virtual, Synthetic and Entertainment Audio 3
Pink noise filtered with the IIR filter which was directly designed from the HRTF data in the timedomain without any smoothing (). Pink noise filtered with the IIR filter after smoothing with the wavelet approximation (). Pink noise filtered with the IIR filter after smoothing with the redundant wavelet transform (). - -4 - - -1-1. 1 AE A3E A-1E24 Figure 4: Tested HRTFs for the right ear plotted with db difference between them. AE A3E A-1E24 The test program, designed with MATLAB was similar to that used by Huopaniemi et al. [12]. The subjects were verbally instructed on the usage of the test program beforehand. The program selected the sound pairs from the test set randomly. Each sound pair was played with. sec time interval between the first and the second sounds. Subjects evaluated the sound pairs by similarity of timbre and localization on a scale of 1- (very different - no difference) quantized to unit intervals of.1. Subjects were given the opportunity to replay the sound pairs. The test lasted for -3 minutes per subject. The positions at which the subjects actually heard the stimuli were not investigated due to the fact that the HRTFs used in the test were not individualized HRTFs of the subjects. 3.4 Results The results were analyzed using the SPSS software package. The independent variables were method, subject, filter order, and angle. The dependent variables were timbre and localization scores obtained from the subjective listening test. The redundant wavelet transform outperformed the wavelet approximation and the direct design in terms of the localization scores for different filter orders. It may be noted that even for a filter order as low as 1, the localization is satisfactorily high. The wavelet approximation improved the localization compared to the direct design as well (see Fig. ).. - -4 - - -1-1. 1 Figure : Tested HRTFs for the left ear plotted with db difference between them. All of the tested filters were calculated for the orders of,,,, 9, 1, 12, 14, 1, 1,, 24, 2, 32 and 3 for the HRTFs sampled at three locations. Thus, each subject was presented a total of 13 test pairs, which were equalized for the headphone and the ear canal resonance. 9% CI Localisation 4. 3. 2. 1. Order 9 1 12 14 1 1 24 2 32 3 Figure : Filter Order vs. Mean Localization graphs for the three methods (,, and ) The timbre scores follow an increasing trend for increasing filter orders as expected. However, it is not possible to comment on the type of method which is better in preserving the timbre (see Fig. ). AES 22 nd International Conference on Virtual, Synthetic and Entertainment Audio 4
. wavelet transform was found to be satisfactory even for very low filter orders (see Fig. 9). 4. 3 3. 9% CI Timbre 2. 1. 9 1 12 14 1 1 24 2 32 3 1 Order Figure : Filter Order vs. Mean Timbre graphs for the three methods (, and ) The redundant wavelet transform was the preferred choice for all of the subjects in terms of the mean localization scores. The wavelet approximation was found to be superior over the direct design. This result is another proof that the wavelet-based smoothing methods provide better localization cues (see Fig. ).. Count Order 9 1 12 14 1 1 24 2 32 3 Figure 9: Filter Order vs. Count for the three methods for (Timbre 3.) and (Localization 4.) The localization accuracy depends on the location of the sound source [13]. The redundant wavelet transform smoothing performed well for all of the investigated HRTFs giving mean localization scores over 4. (see Fig. 1). 4.. 4. 3. 4. 9% CI Localisation 2. 1. 2 Subjects 3 4 1 9 1 11 Figure : Subject vs. Mean Localization graphs for the three methods (, and ) The correlation coefficient between the timbre and the localization was calculated as.19 using Spearman s rho test. This low correlation can be due to the fact that the HRTFs include both the timbre and the localization information []. Overall mean values for the timbre score and the localization score were found to be 3.4 and 3.3 respectively. Therefore the cases where the timbre score is greater than 3. and the localization score is greater than 4. can be taken as acceptable. These cases were analyzed with respect to the filter orders for the three smoothing methods. The redundant 9% CI Localisation 3. 3. 2. 2. 1. Angle AE A24E-1 A3E Figure 1: Angle vs. Mean Localization graphs for three methods (, and ) 4 CONCLUSIONS AN FUTURE WORK In this work, we analyzed the effects of three waveletbased methods for spectral smoothing for HRTF filter design. The results show that the wavelet-based smoothing methods can help in reducing the filter order for IIR filters. The wavelet approximation outperforms the direct design in most of the cases. The redundant wavelet transform increases the localization accuracy AES 22 nd International Conference on Virtual, Synthetic and Entertainment Audio
without substantially decreasing the timbre quality. Its robustness for different source locations provides evidence for its usefulness for HRTF filter design for immersive audio applications. The opportunities for future work include investigation of the different thresholding strategies to be used with wavelet denoising compatible with the human auditory system. The detail coefficients of the WT will be used to restore the details of the HRTFs within a real-time system utilizing a head tracker. Restoring the details of the HRTFs can be useful in smoothly and gradually increasing the level of timbral detail when the position of the listener s head is stable. ACKNOWLEGEMENTS The authors gratefully acknowledge the assistance of Mr. Chris Corrigan in the organization of the subjective listening test. We would also like to thank the participants of the test. REFERENCES [1]. R. Begault, 3- Sound for Virtual Reality and Multimedia (Academic Press, London, 1994), pp. 1-1. [2] F. L. Wightman and. J. Kistler, "A Model of Head-Related Transfer Function Based on Principal Components Analysis and Minimumphase Reconstruction," J. Acoust. Soc. Am., vol. 91, pp. 13-14, (1992) []. L. onoho, and I. M. Johnstone, Adapting to Unknown Smoothness via Wavelet Shrinkage, J. Am. Statist. Ass., vol. 9, pp. 1-1224, (199) [9]. L. onoho, "enoising by Soft-thresholding," IEEE Trans. Inf. Theory, vol. 41, pp. 13-2, (199) [1] W. G. Gardner and K.. Martin. "HRTF Measurements of a KEMAR," J. Acoust. Soc. Am., vol. 9, pp. 39-39, (199) [11] A. V. Oppenheim and R. W. Schafer, iscrete- Time Signal Processing (Prentice-Hall, N.J., 199) pp. 3- [12] J. Huopaniemi, N. Zacharov, and M. Karjalainen. "Objective and Subjective Evaluation of Head- Related Transfer Function Filter esign," J. Audio Eng. Soc., vol. 4, no. 4., pp. 21-239, (1999) [13] J. Blauert, Spatial Hearing : The Psychophysics of Human Sound Localization (M.I.T. Press, Cambridge, MA, 193), pp. 3-49 [3] J. Huopaniemi and M. Karjalainen, "HRTF Filter esign Based on Auditory Criteria," in Proc. Nordic Acoustical Meeting (NAM'9), (Helsinki, Finland, 199, June 12-14), pp. 323-33. [4] J. Huopaniemi and M. Karjalainen. Comparison of igital Filter esign s for 3- Sound, in Proc. IEEE Nordic Signal Processing Symp. (NORSIG'9), (Espoo, Finland, 199), pp. 131-134. [] J. Smith and J. Abel, Bark and ERB Bilinear Transforms, IEEE Transactions on Speech and Audio Processing, vol., iss., pp. 9, (1999) [] J. Huopaniemi and M. Karjalainen. Review of igital Filter esign and Implementation s for 3- Sound, in Proc. of the 12 nd Convention of the Audio Engineering Society, Preprint 441, (Munich, Germany, 199) [] S. Mallat, A Wavelet Tour of Signal Processing, (Academic Press, London, 199) AES 22 nd International Conference on Virtual, Synthetic and Entertainment Audio