WAVELET-BASED SPECTRAL SMOOTHING FOR HEAD-RELATED TRANSFER FUNCTION FILTER DESIGN

Similar documents
A binaural auditory model and applications to spatial sound evaluation

Enhancing 3D Audio Using Blind Bandwidth Extension

Psychoacoustic Cues in Room Size Perception

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS

The analysis of multi-channel sound reproduction algorithms using HRTF data

Proceedings of Meetings on Acoustics

A triangulation method for determining the perceptual center of the head for auditory stimuli

III. Publication III. c 2005 Toni Hirvonen.

A Virtual Audio Environment for Testing Dummy- Head HRTFs modeling Real Life Situations

ORIENTATION IN SIMPLE VIRTUAL AUDITORY SPACE CREATED WITH MEASURED HRTF

Direction-Dependent Physical Modeling of Musical Instruments

Spatial Audio Reproduction: Towards Individualized Binaural Sound

Auditory modelling for speech processing in the perceptual domain

ROOM SHAPE AND SIZE ESTIMATION USING DIRECTIONAL IMPULSE RESPONSE MEASUREMENTS

Sound source localization and its use in multimedia applications

BINAURAL RECORDING SYSTEM AND SOUND MAP OF MALAGA

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

Audio Engineering Society. Convention Paper. Presented at the 131st Convention 2011 October New York, NY, USA

Convention Paper Presented at the 125th Convention 2008 October 2 5 San Francisco, CA, USA

Aalborg Universitet. Audibility of time switching in dynamic binaural synthesis Hoffmann, Pablo Francisco F.; Møller, Henrik

Multichannel level alignment, part I: Signals and methods

Acoustics Research Institute

Binaural Hearing. Reading: Yost Ch. 12

HRTF adaptation and pattern learning

Listening with Headphones

HRIR Customization in the Median Plane via Principal Components Analysis

Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

Computational Perception /785

Convention Paper 9870 Presented at the 143 rd Convention 2017 October 18 21, New York, NY, USA

Ivan Tashev Microsoft Research

The psychoacoustics of reverberation

Nonlinear Filtering in ECG Signal Denoising

Study on method of estimating direct arrival using monaural modulation sp. Author(s)Ando, Masaru; Morikawa, Daisuke; Uno

Robotic Spatial Sound Localization and Its 3-D Sound Human Interface

Virtual Reality Presentation of Loudspeaker Stereo Recordings

Proceedings of Meetings on Acoustics

Tu1.D II Current Approaches to 3-D Sound Reproduction. Elizabeth M. Wenzel

University of Huddersfield Repository

Blind source separation and directional audio synthesis for binaural auralization of multiple sound sources using microphone array recordings

Image De-Noising Using a Fast Non-Local Averaging Algorithm

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

From Binaural Technology to Virtual Reality

Computational Perception. Sound localization 2

Subband Analysis of Time Delay Estimation in STFT Domain

VIRTUAL ACOUSTICS: OPPORTUNITIES AND LIMITS OF SPATIAL SOUND REPRODUCTION

A spatial squeezing approach to ambisonic audio compression

SIMULATION OF SMALL HEAD-MOVEMENTS ON A VIRTUAL AUDIO DISPLAY USING HEADPHONE PLAYBACK AND HRTF SYNTHESIS. György Wersényi

PERSONAL 3D AUDIO SYSTEM WITH LOUDSPEAKERS

DECORRELATION TECHNIQUES FOR THE RENDERING OF APPARENT SOUND SOURCE WIDTH IN 3D AUDIO DISPLAYS. Guillaume Potard, Ian Burnett

Transcription of Piano Music

WAVELET SIGNAL AND IMAGE DENOISING

Optimizing a High-Order Graphic Equalizer for Audio Processing

THE PERCEPTION OF ALL-PASS COMPONENTS IN TRANSFER FUNCTIONS

NAME STUDENT # ELEC 484 Audio Signal Processing. Midterm Exam July Listening test

396 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011

ANALYZING NOTCH PATTERNS OF HEAD RELATED TRANSFER FUNCTIONS IN CIPIC AND SYMARE DATABASES. M. Shahnawaz, L. Bianchi, A. Sarti, S.

Improving Virtual Sound Source Robustness using Multiresolution Spectral Analysis and Synthesis

Wavelet Speech Enhancement based on the Teager Energy Operator

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

Convention Paper Presented at the 139th Convention 2015 October 29 November 1 New York, USA

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015

MAGNITUDE-COMPLEMENTARY FILTERS FOR DYNAMIC EQUALIZATION

MPEG-4 Structured Audio Systems

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Khlui-Phiang-Aw Sound Synthesis Using A Warped FIR Filter

From acoustic simulation to virtual auditory displays

Sound Source Localization using HRTF database

Estimation of Reverberation Time from Binaural Signals Without Using Controlled Excitation

Virtual Acoustic Space as Assistive Technology

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Externalization in binaural synthesis: effects of recording environment and measurement procedure

NOISE ESTIMATION IN A SINGLE CHANNEL

Source Localisation Mapping using Weighted Interaural Cross-Correlation

Nonuniform multi level crossing for signal reconstruction

Personalized 3D sound rendering for content creation, delivery, and presentation

THE BEATING EQUALIZER AND ITS APPLICATION TO THE SYNTHESIS AND MODIFICATION OF PIANO TONES

HTTP Compression for 1-D signal based on Multiresolution Analysis and Run length Encoding

Feasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants

IMPROVED COCKTAIL-PARTY PROCESSING

Audio Enhancement Using Remez Exchange Algorithm with DWT

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Mel Spectrum Analysis of Speech Recognition using Single Microphone

SGN Audio and Speech Processing

Proceedings of Meetings on Acoustics

Adaptive Filters Application of Linear Prediction

On distance dependence of pinna spectral patterns in head-related transfer functions

THE HUMANISATION OF STOCHASTIC PROCESSES FOR THE MODELLING OF F0 DRIFT IN SINGING

Auditory Localization

An Adaptive Algorithm for Speech Source Separation in Overcomplete Cases Using Wavelet Packets

A classification-based cocktail-party processor

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 A MODEL OF THE HEAD-RELATED TRANSFER FUNCTION BASED ON SPECTRAL CUES

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O.

Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

Perceptual Distortion Maps for Room Reverberation

Transcription:

WAVELET-BASE SPECTRAL SMOOTHING FOR HEA-RELATE TRANSFER FUNCTION FILTER ESIGN HUSEYIN HACIHABIBOGLU, BANU GUNEL, AN FIONN MURTAGH Sonic Arts Research Centre (SARC), Queen s University Belfast, Belfast, BT 1NN, U.K. {h.hacihabiboglu, b.gunel, f.murtagh}@qub.ac.uk Three wavelet-based spectral smoothing techniques are presented in this paper as a pre-processing stage for headrelated transfer function (HRTF) filter design. These wavelet-based methods include wavelet denoising, wavelet approximation, and redundant wavelet transform. These methods are used with time-domain parametric filter design methods to reduce the order of the IIR filters which is useful for real-time implementation of immersive audio systems. Results of a subjective listening test are then presented in order to justify the perceptual validity of the investigated smoothing methods. Results show that wavelet based spectral smoothing methods are beneficial in reducing the filter order and increasing the perception of localization without introducing noticeable effect on timbre. INTROUCTION Spectral shaping of sound on its way from a fixed point in free field to an external ear is defined by a headrelated transfer function (HRTF). ata reduction of HRTFs becomes an important issue when dealing with virtual reality applications, spatialization of audio and binaural synthesis in real time. ifferent methods for data complexity reduction including downsampling, rectangular windowing [1], and principal component analysis [2] have been proposed. The data-reduced form of an HRTF is then used for filter design. An essential issue in HRTF filter design for perceptual validity is preservation of the binaural hearing cues. A better approach uses auditory criteria for smoothing [3]. Frequency warping [4] has been used as an auditory weighting method []. A review of auditory weighting is given in Huopaniemi et al. []. The smoothing methods presented in this paper depend on wavelet transforms. In Section 1, a summary of wavelet de-noising, wavelet approximation and redundant wavelet transform is presented. The application of these smoothing methods to individual HRTFs is discussed together with the filter design issues in Section 2. Results of the subjective listening test are presented in Section 3. 1 REVIEW OF WAVELET SMOOTHING The smoothing methods employed in this paper utilize the discrete wavelet transform (WT). A general review of wavelets can be found in Mallat []. WT can be used for smoothing a given signal in a number of different ways. Three possibilities are wavelet denoising, wavelet approximation and redundant wavelet transform. In the scope of this paper, the signals to be smoothed are the magnitude responses of the individual HRTFs. 1.1 Wavelet enoising with SURE Assuming that the magnitude response of an HRTF is a noisy version of a smoother function, we can use wavelet denoising to recover an approximate, smoother function. The wavelet denoising is defined as a three-step process [] [9]. If we assume that the magnitude response of the actual HRTF has additive white Gaussian noise such that h = f t ) + z i where ( (1) i done by following the steps below: i 2 z i = N(, σ ), then the wavelet denoising is 1) WT is performed on the noisy data. 2) A soft-threshold non-linearity is applied to the noisy wavelet coefficients to find the wavelet coefficient estimates. η ( w) = sgn( w)( w t) I( w t) (2) t > The threshold t in this equation is calculated with the SURE (Stein s Unbiased Risk Estimation) algorithm. AES 22 nd International Conference on Virtual, Synthetic and Entertainment Audio 1

3) The denoised function estimate (t) is obtained by inverting the WT using the wavelet coefficient estimates found in the previous step. Validity of the assumption of noise characteristics for HRTF data will be discussed in Section 2. 1.2 Wavelet Approximation Another assumption regarding HRTF data is that small deviations are not critically important and can accordingly be discarded [2]. After the WT is performed on a signal of length 2 N, approximation coefficients at level L will have the length 2 N-L. The signal approximation is low-pass interpolated to get a smooth approximation of the actual HRTF magnitude response. Smoothness of the final representation also depends on the selection of the wavelet type and the depth of the tree. A scaling function with a higher attenuation in the first side-lobe captures better low frequency approximation. f The smoothing and filtering procedures described in the subsequent sections have been applied to the HRTFs for all the azimuth and elevation values for both types of the pinnae. 2.2 Smoothing with Wavelet enoising The wavelet denoising is performed with the SURE algorithm to select the soft-threshold. ifferent wavelet functions show different denoising performances with the same denoising scheme. Therefore, denoising was performed for different wavelets selected from a wellknown group consisting of Haar, discrete Meyer, aubechies 2-9, Coiflet 1- and Symmlet 1-1 wavelets. A 3-level WT was used in all of the cases. The results of the smoothing for magnitude response of the HRTF corresponding to the left ear at zero-azimuth (ϕ= ) and zero-elevation (δ= ) is given below (see Fig. 1). 1.3 Redundant Wavelet Transform If the sub-sampling steps are discarded from the WT structure, a better low-pass approximation can be obtained. In the case of the HRTFs, significant coarse details of the frequency notches and the low frequency parts of the magnitude response are successfully captured. Results of the redundant wavelet transform depend on the selection of the wavelet type and the depth of the wavelet tree as in the case of wavelet approximation. Selection of a scaling function with a high attenuation in the first side-lobe is also important for this type of smoothing. 2 HRTF SMOOTHING USING WAVELETS 2.1 HRTF ata The wavelet-based smoothing methods mentioned in the previous section are applied to the magnitude responses of the non-individualized HRTFs sampled at three different locations. The HRTF sets are selected from the free-field equalized HRTF measurements made on the KEMAR mannequin by Martin and Gardner [1]. The Martin-Gardner HRTF set includes the HRTF measurements for two types of pinnae. A property of these data sets is that they are already presented as 12 sample series, after an acceptable amount of data is discarded. Before the smoothing process, magnitude responses of the individual HRTFs were equalized to discard the effect of the loudspeaker used in the HRTF measurements. - -4 - - -1. 1 Figure 1: Original and denoised magnitude responses with db4, sym2, and coif3 from top to bottom, respectively It may be noted that wavelet denoising with SURE does not provide a sufficiently smooth result. Overfitting of the original by the wavelet-denoised data prevents the reduction of the order of filters in the design phase. This result is a consequence of the false assumption that the irregularities in the magnitude responses may be modeled as additive white Gaussian noise. Therefore wavelet denoising with SURE was not included as a smoothing method to be investigated in the listening test. 2.3 Smoothing with Wavelet Approximation The equalized magnitude response can be smoothed by obtaining the low-frequency approximation using the WT. The wavelet approximation of the signal was computed using the Symmlet 1 filter bank. A 3-level WT was performed on the 12-sample magnitude response. The AES 22 nd International Conference on Virtual, Synthetic and Entertainment Audio 2

wavelet approximation, which is 4 samples long, was low-pass interpolated to get a smoother version of the signal. An alternative to this approach may be sub-sampling the signal by and low-pass interpolating with samples between each obtained result. However, this alternative results in loss of crucial data and gives minor frequency peaks and notches (see the selected area in Fig. 2) which are not significant in the original data. This decreases the success of smoothing in reducing the filter order. 2. Filter esign The equalized and smoothed magnitude responses were first converted into the time-domain. A minimum-phase version of the signal was then obtained by the minimum-phase reconstruction using real cepstrum [11]. A minimum-phase IIR filter with the same number of poles and zeros was designed for each HRTF with Prony s method. Frequency dependence of the interaural time delay (IT) was assumed to be perceptually irrelevant [2]. Thus the IT was supplied by a constant delay line. - -4-3 PERCEPTUAL VALIATION OF WAVELET- BASE SMOOTHING Perceptual validation of our results was carried out using a subjective listening test. We used intraural, earplug style Sony headphones, and carried out the tests in a recording studio in the Music epartment of Queen s University Belfast. - -1. 1 Figure 2: Original (middle), wavelet approximation smoothing (top), and subsampling/interpolation smoothing (bottom) (ϕ=, δ= ) 2.4 Smoothing with Redundant Transform The redundant wavelet transform described in the previous section was applied to the magnitude responses of the HRTFs. The filter-bank used was the Symmlet 1 filter bank. As the approximation obtained using the redundant wavelet transform consists of 12 samples, low-pass interpolation is no longer necessary. Thus, the results capture the frequency peaks and notches better than wavelet approximation (see Fig. 3). - -4 - -. 1 Figure 3: Original (top) and redundant wavelet transform smoothing (bottom) (ϕ=, δ= ) 3.1 Subjects Eleven subjects (3 female, male) aged 19-3 served as voluntary participants. All had normal hearing and no history of hearing problems. Five of the subjects had previous experience of specialized listening. 3.2 Stimulus We used a pink noise sample with a length of one second with ms ramps in the onset and the offset portions. 3.3 Procedure Three HRTF sets representing three spatial positions were used in the listening test. The HRTF pairs we used were selected to represent both types of pinnae for 3 varying positions of azimuth (ϕ) and elevation (δ): ϕ= and δ= for small pinnae. ϕ=3 and δ= for small pinnae. ϕ=24 and δ=-1 for large red pinnae. This selection was rather arbitrary but it was made sure that the positions were far enough from each other to prevent bias in the listening test due to the previous location of the sound. This means that the ripples, peaks and notches of the HRTFs are quite different from each other (see Fig. 4 and Fig. ). The subjects were seated in the acoustically insulated recording studio. Tests for each of the three HRTF sets were carried out separately for each subject. Test signals were presented in pairs. The first signal was always the reference signal obtained by the convolution of the stimulus signal with the original HRTF set. The second signal was one of the following three signals selected randomly: AES 22 nd International Conference on Virtual, Synthetic and Entertainment Audio 3

Pink noise filtered with the IIR filter which was directly designed from the HRTF data in the timedomain without any smoothing (). Pink noise filtered with the IIR filter after smoothing with the wavelet approximation (). Pink noise filtered with the IIR filter after smoothing with the redundant wavelet transform (). - -4 - - -1-1. 1 AE A3E A-1E24 Figure 4: Tested HRTFs for the right ear plotted with db difference between them. AE A3E A-1E24 The test program, designed with MATLAB was similar to that used by Huopaniemi et al. [12]. The subjects were verbally instructed on the usage of the test program beforehand. The program selected the sound pairs from the test set randomly. Each sound pair was played with. sec time interval between the first and the second sounds. Subjects evaluated the sound pairs by similarity of timbre and localization on a scale of 1- (very different - no difference) quantized to unit intervals of.1. Subjects were given the opportunity to replay the sound pairs. The test lasted for -3 minutes per subject. The positions at which the subjects actually heard the stimuli were not investigated due to the fact that the HRTFs used in the test were not individualized HRTFs of the subjects. 3.4 Results The results were analyzed using the SPSS software package. The independent variables were method, subject, filter order, and angle. The dependent variables were timbre and localization scores obtained from the subjective listening test. The redundant wavelet transform outperformed the wavelet approximation and the direct design in terms of the localization scores for different filter orders. It may be noted that even for a filter order as low as 1, the localization is satisfactorily high. The wavelet approximation improved the localization compared to the direct design as well (see Fig. ).. - -4 - - -1-1. 1 Figure : Tested HRTFs for the left ear plotted with db difference between them. All of the tested filters were calculated for the orders of,,,, 9, 1, 12, 14, 1, 1,, 24, 2, 32 and 3 for the HRTFs sampled at three locations. Thus, each subject was presented a total of 13 test pairs, which were equalized for the headphone and the ear canal resonance. 9% CI Localisation 4. 3. 2. 1. Order 9 1 12 14 1 1 24 2 32 3 Figure : Filter Order vs. Mean Localization graphs for the three methods (,, and ) The timbre scores follow an increasing trend for increasing filter orders as expected. However, it is not possible to comment on the type of method which is better in preserving the timbre (see Fig. ). AES 22 nd International Conference on Virtual, Synthetic and Entertainment Audio 4

. wavelet transform was found to be satisfactory even for very low filter orders (see Fig. 9). 4. 3 3. 9% CI Timbre 2. 1. 9 1 12 14 1 1 24 2 32 3 1 Order Figure : Filter Order vs. Mean Timbre graphs for the three methods (, and ) The redundant wavelet transform was the preferred choice for all of the subjects in terms of the mean localization scores. The wavelet approximation was found to be superior over the direct design. This result is another proof that the wavelet-based smoothing methods provide better localization cues (see Fig. ).. Count Order 9 1 12 14 1 1 24 2 32 3 Figure 9: Filter Order vs. Count for the three methods for (Timbre 3.) and (Localization 4.) The localization accuracy depends on the location of the sound source [13]. The redundant wavelet transform smoothing performed well for all of the investigated HRTFs giving mean localization scores over 4. (see Fig. 1). 4.. 4. 3. 4. 9% CI Localisation 2. 1. 2 Subjects 3 4 1 9 1 11 Figure : Subject vs. Mean Localization graphs for the three methods (, and ) The correlation coefficient between the timbre and the localization was calculated as.19 using Spearman s rho test. This low correlation can be due to the fact that the HRTFs include both the timbre and the localization information []. Overall mean values for the timbre score and the localization score were found to be 3.4 and 3.3 respectively. Therefore the cases where the timbre score is greater than 3. and the localization score is greater than 4. can be taken as acceptable. These cases were analyzed with respect to the filter orders for the three smoothing methods. The redundant 9% CI Localisation 3. 3. 2. 2. 1. Angle AE A24E-1 A3E Figure 1: Angle vs. Mean Localization graphs for three methods (, and ) 4 CONCLUSIONS AN FUTURE WORK In this work, we analyzed the effects of three waveletbased methods for spectral smoothing for HRTF filter design. The results show that the wavelet-based smoothing methods can help in reducing the filter order for IIR filters. The wavelet approximation outperforms the direct design in most of the cases. The redundant wavelet transform increases the localization accuracy AES 22 nd International Conference on Virtual, Synthetic and Entertainment Audio

without substantially decreasing the timbre quality. Its robustness for different source locations provides evidence for its usefulness for HRTF filter design for immersive audio applications. The opportunities for future work include investigation of the different thresholding strategies to be used with wavelet denoising compatible with the human auditory system. The detail coefficients of the WT will be used to restore the details of the HRTFs within a real-time system utilizing a head tracker. Restoring the details of the HRTFs can be useful in smoothly and gradually increasing the level of timbral detail when the position of the listener s head is stable. ACKNOWLEGEMENTS The authors gratefully acknowledge the assistance of Mr. Chris Corrigan in the organization of the subjective listening test. We would also like to thank the participants of the test. REFERENCES [1]. R. Begault, 3- Sound for Virtual Reality and Multimedia (Academic Press, London, 1994), pp. 1-1. [2] F. L. Wightman and. J. Kistler, "A Model of Head-Related Transfer Function Based on Principal Components Analysis and Minimumphase Reconstruction," J. Acoust. Soc. Am., vol. 91, pp. 13-14, (1992) []. L. onoho, and I. M. Johnstone, Adapting to Unknown Smoothness via Wavelet Shrinkage, J. Am. Statist. Ass., vol. 9, pp. 1-1224, (199) [9]. L. onoho, "enoising by Soft-thresholding," IEEE Trans. Inf. Theory, vol. 41, pp. 13-2, (199) [1] W. G. Gardner and K.. Martin. "HRTF Measurements of a KEMAR," J. Acoust. Soc. Am., vol. 9, pp. 39-39, (199) [11] A. V. Oppenheim and R. W. Schafer, iscrete- Time Signal Processing (Prentice-Hall, N.J., 199) pp. 3- [12] J. Huopaniemi, N. Zacharov, and M. Karjalainen. "Objective and Subjective Evaluation of Head- Related Transfer Function Filter esign," J. Audio Eng. Soc., vol. 4, no. 4., pp. 21-239, (1999) [13] J. Blauert, Spatial Hearing : The Psychophysics of Human Sound Localization (M.I.T. Press, Cambridge, MA, 193), pp. 3-49 [3] J. Huopaniemi and M. Karjalainen, "HRTF Filter esign Based on Auditory Criteria," in Proc. Nordic Acoustical Meeting (NAM'9), (Helsinki, Finland, 199, June 12-14), pp. 323-33. [4] J. Huopaniemi and M. Karjalainen. Comparison of igital Filter esign s for 3- Sound, in Proc. IEEE Nordic Signal Processing Symp. (NORSIG'9), (Espoo, Finland, 199), pp. 131-134. [] J. Smith and J. Abel, Bark and ERB Bilinear Transforms, IEEE Transactions on Speech and Audio Processing, vol., iss., pp. 9, (1999) [] J. Huopaniemi and M. Karjalainen. Review of igital Filter esign and Implementation s for 3- Sound, in Proc. of the 12 nd Convention of the Audio Engineering Society, Preprint 441, (Munich, Germany, 199) [] S. Mallat, A Wavelet Tour of Signal Processing, (Academic Press, London, 199) AES 22 nd International Conference on Virtual, Synthetic and Entertainment Audio