IMPROVEMENT OF SPEECH SOURCE LOCALIZATION IN NOISY ENVIRONMENT USING OVERCOMPLETE RATIONAL-DILATION WAVELET TRANSFORMS

Similar documents
Robust Low-Resource Sound Localization in Correlated Noise

Image De-Noising Using a Fast Non-Local Averaging Algorithm

Wavelet Speech Enhancement based on the Teager Energy Operator

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Image Denoising Using Complex Framelets

Speech Enhancement for Nonstationary Noise Environments

A DUAL TREE COMPLEX WAVELET TRANSFORM CONSTRUCTION AND ITS APPLICATION TO IMAGE DENOISING

Speech Signal Enhancement Techniques

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Nonlinear Filtering in ECG Signal Denoising

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method

AdaBoost based EMD as a De-Noising Technique in Time Delay Estimation Application

Acoustic Source Tracking in Reverberant Environment Using Regional Steered Response Power Measurement

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Speech Enhancement using Wiener filtering

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

ICA & Wavelet as a Method for Speech Signal Denoising

THE problem of acoustic echo cancellation (AEC) was

An Adaptive Algorithm for Speech Source Separation in Overcomplete Cases Using Wavelet Packets

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

High-speed Noise Cancellation with Microphone Array

Localization of underwater moving sound source based on time delay estimation using hydrophone array

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators

REAL-TIME BROADBAND NOISE REDUCTION

Estimation of Non-stationary Noise Power Spectrum using DWT

PROSE: Perceptual Risk Optimization for Speech Enhancement

Computer Science and Engineering

Passive Emitter Geolocation using Agent-based Data Fusion of AOA, TDOA and FDOA Measurements

SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim

Automotive three-microphone voice activity detector and noise-canceller

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

Noise-robust compressed sensing method for superresolution

Denoising Of Speech Signal By Classification Into Voiced, Unvoiced And Silence Region

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999

Speech Enhancement Using a Mixture-Maximum Model

arxiv: v1 [cs.sd] 4 Dec 2018

ROBUST echo cancellation requires a method for adjusting

Single channel noise reduction

Performance Comparison of Mean, Median and Wiener Filter in MRI Image De-noising

A New Method to Remove Noise in Magnetic Resonance and Ultrasound Images

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

AN EFFICIENT IMAGE ENHANCEMENT ALGORITHM FOR SONAR DATA

NOISE ESTIMATION IN A SINGLE CHANNEL

Integer Optimization Methods for Non-MSE Data Compression for Emitter Location

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

Optimization of Coded MIMO-Transmission with Antenna Selection

WAVELET SIGNAL AND IMAGE DENOISING

Calibration of Microphone Arrays for Improved Speech Recognition

GAUSSIAN DE-NOSING TECHNIQUES IN SPATIAL DOMAIN FOR GRAY SCALE MEDICAL IMAGES Nora Youssef, Abeer M.Mahmoud, El-Sayed M.El-Horbaty

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

A Diffusion Strategy for the Multichannel Active Noise Control System in Distributed Network

Performance of Combined Error Correction and Error Detection for very Short Block Length Codes

Survey of Image Denoising Methods using Dual-Tree Complex DWT and Double-Density Complex DWT

Department of Electronic Engineering FINAL YEAR PROJECT REPORT

Interpolation of CFA Color Images with Hybrid Image Denoising

Sound Source Localization using HRTF database

Noise Plus Interference Power Estimation in Adaptive OFDM Systems

ECG De-noising Based on Translation Invariant Wavelet Transform and Overlapping Group Shrinkage

VOL. 3, NO.11 Nov, 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Proceedings of the 5th WSEAS Int. Conf. on SIGNAL, SPEECH and IMAGE PROCESSING, Corfu, Greece, August 17-19, 2005 (pp17-21)

TIMIT LMS LMS. NoisyNA

Mel Spectrum Analysis of Speech Recognition using Single Microphone

STATISTICAL METHODS FOR THE ENHANCEMENT OF NOISY SPEECH. Rainer Martin

Broadband Microphone Arrays for Speech Acquisition

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

A Soft-Limiting Receiver Structure for Time-Hopping UWB in Multiple Access Interference

Removal of High Density Salt and Pepper Noise through Modified Decision based Un Symmetric Trimmed Median Filter

Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics

Chapter 4 SPEECH ENHANCEMENT

ANUMBER of estimators of the signal magnitude spectrum

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

Empirical Mode Decomposition: Theory & Applications

BER PERFORMANCE AND OPTIMUM TRAINING STRATEGY FOR UNCODED SIMO AND ALAMOUTI SPACE-TIME BLOCK CODES WITH MMSE CHANNEL ESTIMATION

Omnidirectional Sound Source Tracking Based on Sequential Updating Histogram

EE 6422 Adaptive Signal Processing

Denoising and Effective Contrast Enhancement for Dynamic Range Mapping

Smart antenna for doa using music and esprit

Joint Position-Pitch Decomposition for Multi-Speaker Tracking

Optimized threshold calculation for blanking nonlinearity at OFDM receivers based on impulsive noise estimation

NEAR-END CROSSTALK MITIGATION USING WAVELETS

A Review Paper on Image Processing based Algorithms for De-noising and Enhancement of Underwater Images

Original Research Articles

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

Chapter 3. Speech Enhancement and Detection Techniques: Transform Domain

Multiple Sound Sources Localization Using Energetic Analysis Method

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

Multiresolution Bilateral Filtering for Image Denoising Ming Zhang and Bahadir K. Gunturk

An Efficient Noise Removing Technique Using Mdbut Filter in Images

APJIMTC, Jalandhar, India. Keywords---Median filter, mean filter, adaptive filter, salt & pepper noise, Gaussian noise.

for Single-Tone Frequency Tracking H. C. So Department of Computer Engineering & Information Technology, City University of Hong Kong,

Denoising of ECG signal using thresholding techniques with comparison of different types of wavelet

The Simulated Location Accuracy of Integrated CCGA for TDOA Radio Spectrum Monitoring System in NLOS Environment

Optimal Adaptive Filtering Technique for Tamil Speech Enhancement

Transcription:

1 International Conference on Cyberworlds IMPROVEMENT OF SPEECH SOURCE LOCALIZATION IN NOISY ENVIRONMENT USING OVERCOMPLETE RATIONAL-DILATION WAVELET TRANSFORMS Di Liu, Andy W. H. Khong School of Electrical and Electronic Engineering Nanyang Technological University Singapore Email: {LIUDI, andykhong}@ntu.edu.sg Abstract The generalized cross-correlation using the phase transform prefilter remains popular for the estimation of timedifferences-of-arrival. However it is not robust to noise and as a consequence, the performance of direction-of-arrival algorithms is often degraded under low signal-to-noise condition. We propose to address this problem through the use of a wavelet-based speech enhancement technique since the wavelet transform can achieve good performance. The overcomplete rational-dilation wavelet transform is then exploited to effectively process speech signals due to its higher frequency resolution. In addition, we exploit the joint distribution of the speech in the wavelet domain and develop a novel local noise variance estimator based on the bivariate shrinkage function. As will be shown, our proposed algorithm achieves good direction-of-arrival performance in the presence of noise. Keywords-, wavelet, speech source localization, DOA estimation I. INTRODUCTION Research into speech source localization has received much attention for cyberworld applications including automatic camera steering, online video surveillance and speaker tracking. One of the widely adopted approaches for speech source localization is the generalized crosscorrelation (GCC) based time-differences-of-arrival (TDOA) estimation algorithm [1]. This algorithm computes the interchannel delays by locating the maximum weighted crosscorrelation between each pair of the received signals. While many different prefilters can be applied, the heuristic-based phase transform (PHAT) prefilter has been found to perform very well under practical conditions []. As reported in [], the PHAT prefilter is optimal in the maximum likelihood (ML) sense in the presence of reverberation. However, this prefilter is not robust to low signal-tonoise ratio (SNR) conditions and as a result, the performance of direction-of-arrival (DOA) estimation algorithms degrade with reducing SNR. Figure 1 shows an illustrative example of this degradation where the mean and standard deviation This work is supported by the Singapore National Research Foundation Interactive Digital Media R&D Program, under research grant NRF8IDM-IDM4-1. bearing error (degree) 1 1 5 1 15 5 1 SNR (db) Figure 1. Variation of the mean and standard deviation of the bearing error against SNR for DOA estimation using the PHAT-GCC algorithm. of the bearing errors increase from to 4 and 4 to 6, respectively, when the SNR reduces from 1 to db. As can be seen, degradation in performance for DOA estimation becomes more pronounced with lower SNR. A common approach to this problem is to preprocess the noisy signals by. Although speech has been an active area of research, these efforts have mainly been focused on improving the subjective quality or intelligibility of the speech. In this work, however, we focus on with the aim of improving the performance of DOA estimation. It has been shown that wavelet-based methods have become an important tool to address the difficult problem of [3], [4]. This is achieved by taking advantage of the sparseness of signals in the wavelet domain. In this work, we propose to incorporate such wavelet techniques to improve the DOA performance in the presence of noise. The wavelet-based algorithm will consist of three steps: 1) computing the wavelet transform (WT) of the noisy signal, ) modifying the noisy wavelet coefficients and 3) computing the inverse WT using the modified wavelets. It is therefore important, in this work, to determine the type of wavelet transform and the threshold selection method in order to achieve good DOA estimation. We note that the speech and noise signals can better be separated if an appropriate transform is selected. The overcomplete rational dilation WT [5] is a recent enhancement where the frequency resolution can be varied. Due to the fact that the speech spectrum varies significantly across frequency bands, the rational dilation WT with high frequency resolution can be effective for processing the speech 978--7695-415-7/1 $6. 1 IEEE DOI 1.119/CW.1.69 77

1 5 Figure. Analysis and synthesis filter banks for the implementation of the rational-dilation wavelet transform [after [5]]. in wavelet domain. In contrast, the poor frequency resolution of the dyadic WT limits its effectiveness for analyzing signals that are quasiperiodic in nature including speech, electroencephalogram and signals arising from mechanical vibrations [6]. In addition, among a variety of nonlinear thresholding rules for wavelet-based, the bivariate shrinkage thresholding [7] can improve SNR performance significantly. This is achieved by taking into account the statistical dependencies between wavelet coefficients and their parents using Bayesian estimation theory. As an a priori knowledge, we will discuss the joint distribution of wavelet coefficients for a typical speech signal. In addition, we show that direct application of existing approaches will not address the noise robustness issue. This thresholding requires a noise variance estimatior which will be computed locally for each frequency subband, making it suitable according to the speech spectrum distribution characteristics. II. REVIEW OF OVERCOMPLETE RATIONAL-DILATION WAVELET TRANSFORMS The overcomplete rational-dilation WTs [5] can achieve a class of WTs with constant quality (Q)-factor where the Q- factor of a band-pass filter is the ratio of its center frequency to its bandwidth. We note that WTs with high Q-factors are desirable for processing quasi-periodic signals such as speech due to their higher frequency resolution compared to the dyadic WT with low Q-factor. The iterated filter banks shown in Fig. can be used to implement rational-dilation WTs [5], where p is an upsampling factor, q and s are the downsampling factors while q/p is a rational dilation factor. These parameters can affect the Q-factor, redundancy of WTs and the timebandwidth product; for a given q/p, there is often a trade-off between the Q-factor and the time-bandwidth product. One generally requires higher frequency resolution when analyzing/filtering quasi-periodic signals like speech. In this work, we set p = 9, q = 1, s = 5 giving a dilation factor of 1.11 and a redundancy of.. Figure 3 illustrates its corresponding frequency response of the iterated filter bank and the wavelet. As can be seen from these figures, a good time-frequency localization with more band-pass filters covering the same frequency range is achieved. In addition, these parameters give rise to a high Q-factor and is able to avoid ringing with a modest factor of redundancy of less than 3. This WT, set with higher frequency resolution, can better separate the speech and noise signals. In addition, the noise reduction filter on each subband can be manipulated Figure 3. [after [5]]..1..3.4.5 FREQUENCY (CYCLES/SAMPLE) SUBBAND 1 3 4 5 6 1 1 TIME (SAMPLES) Frequency response and wavelets at several scales independently which in turn determines the amount of noise reduction in each subband. III. WAVELET-BASED SPEECH DENOISING FOR DIRECTION-OF-ARRIVAL ESTIMATION To describe the wavelet-based problem for speech, we define ω k (j) to be the kth wavelet coefficient in the high-pass (H) subband wavelets of scale j, where j = 1,..., J denotes the wavelet scale index and k = 1,... K denotes the wavelet coefficient index. Here, J denotes the total number of wavelet scales and K denotes the total number of wavelet coefficients in each scale after resizing. We next define y k (j) as the noisy observation of ω k (j) and n k (j) as the additive noise, giving y k (j) = ω k (j) + n k (j). We also note that ω k (j + 1) is the wavelet coefficient at the next coarser scale to ω k (j) and therefore we say ω k (j + 1) is the parent of ω k (j). In statistical processes, we can define W k (j), Y k (j) and N k (j) as the random variables of w k (j), y k (j) and n k (j), respectively. Using this notation, we can write y = w + n, (1) where w = [W k (j), W k (j + 1)] T, y = [Y k (j), Y k (j + 1)] T and n = [N k (j), N k (j+1)] T. Taking into account the statistical dependency between adjacent wavelets and employing the maximum a posteriori (MAP) estimator, we can esimate w of the clean speech given the noisy observation y using ŵ(y) = arg max w [p n(y w) p w (w)], () where p n (y w) and p w (w) are the joint probability distribution functions (pdfs) of n and w, respectively. Hence, to estimate clean wavelets ŵ(y) using (), both p w (w) and p n (n) must be computed. Here, the noise is assumed to be i.i.d white Gaussian and we can express the noise pdf as p n (n) = 1 πσn exp ( N k (j) + N k (j + 1) σ n where σ n is the variance of the additive noise. ), (3) 78

Joint Histogram.4.3..1 1 Parent 1.5 Child.5 Proposed pdf Child Figure 4. Empirical joint parent-child histogram of wavelet coefficients from speech signal database. Bivariate pdf (4) for joint pdf of parentchild wavelet coefficient paris. 1 1 Parent A. Bivariate shrinkage thresholding for speech signal It is therefore important to determine an analytical expression for the joint pdf that models the wavelet distribution of a typical speech. This joint empirical child-parent histogram can then be used to etimate p w (w). As presented in [7], a possible pdf model is given by p w (w) = 3 πσω exp ( 3 σ ω W k (j) + W k (j + 1) ), (4) where σω is defined as the variance of the clean speech wavelet. To evaluate if this pdf model is suitable for speech signals, we performed the overcomplete rational-dilation WT as described in Section II using q/p = 1/9, s = 5 for a set of 3 speech signals extracted from the NOIZEUS database [8]. The joint histogram between W k (j) and W k (j + 1) is then plotted in Fig. 4 while this joint pdf model defined in (4) is plotted in Fig. 4. Comparing both plots, we note the close similarity between the analytical expression given by (4) and that of the speech signals. We therefore propose to employ (4) for the estimation of p w (w). Substituting (3) and (4) into (), the MAP estimator in () can be rewritten as [7] Ŵ k (j) = Y k (j) ( Yk (j) + Y k (j + 1) 3σ n σ ω ) + Y k (j) + Yk (j + 1), (5) where the function (g) + at the numerator is defined as { if g < (g) + = g otherwise. (6) This is the bivariate shrinkage function in each wavelet scale used for speech. B. Variance estimation for thresholding Considering the wavelet shrinkage function in (5), we define T = 3σn/σ ω as the threshold. It is therefore essential to estimate the noise variance σn and the wavelet variance σω for each wavelet scale. In our algorithm, the variance σω can be estimated as σ ω = ( σ y σ n) +, (7) where σy is the variance of the noisy wavelets. If one assumes that Y k (j) has Gaussian distribution, σy for the kth coefficient in each wavelet scale j will be estimated in the ML sense using coefficients in the neighboring region of B(k), σ y = 1 y M k(j), (8) y k (j) B(k) where M is the size of the neighborhood B(k) and B(k) is defined as all coefficients within a window that is centered at the kth coefficient. Although a typical speech signal occupies a wide frequency spectrum, it has significant energy within the range of 4 Hz. The wavelets in the finest scale correspond to the highest frequency subband denoted as H 1 and do not contain significant speech content. This assumption is valid since we utilize the high frequency resolution of the given rational-dilation WT. In addition, we assume that the noise is white with equivalent energy throughout the whole frequency band and as a result, y(h 1 ) n(h 1 ). We can therefore estimate the overall noise variance from the finest scale wavelet coefficients and a robust median estimator for noise variance is [9], y k (1) subband H 1. (9).6745 We note that direct application of (9) is not applicable for our DOA application. Simulation using (9) exhbits a degradation in DOA performance and that the bearing errors are sensitive to the noise variance. This is because the energy of the speech spectrum varies significantly across different scales. A poor noise estimation can therefore result in an inappropriate threshold T. Accordingly, this can lead to additional unwanted high-frequency noise components. In view of the above, we should consider the degree of shrinkage for the wavelets of the speech signals and propose that the new estimator σ n be given as, y k (1) subband H 1. (1) c The performance of the DOA estimation algorithm is therefore dependent on the choice of c. C. Factor c selection We determine a suitable value of c that gives rise to good DOA performance. This can be achieved empirically by studying how c varies across different speech signals under different SNR conditions. We first perform using (1), (8) and (5) for 3 speech signals extracted from the NOIZEUS database [8]. The DOA of the denoised speech is subsequently estimated using GCC-PHAT. Figures 5 and show the variation of bearing error with c for the case of SNR = and 5 db, respectively. As can be seen, the bearing error first reduces with c after which it then increases modestly. Accordingly, a good choice of c = 1 can be chosen, i.e.,, y k (1) subband H 1. (11) 1 79

bearing error(degree) 7 6 4 3 1 c 3 bearing error(degree) 7 6 4 3 1 1 c 3 Figure 5. Variation of the mean bearing errors with c for SNR = db and SNR = 5 db. 4 3 1 c(1) =.7 c(1) =.3 c(1) =.5 1 3 4 5 6 7 8 9 1 1 3 4 5 6 7 8 9 1 Figure 6. Variation of mean and standard deviation of the bearing error with SNR for different factor c(1). 7 6 4 3 c(1) =.7 c(1) =.3 c(1) =.5 Additional simulations show similarity in this variation for different SNR conditions. We propose to further improve the performance of DOA estimation through c(j) which is level dependent. We achieve this by noting that the ratio between clean and noisy signals in each scale is different and that each scale may be processed independently in order to estimate the noise variance for each scale. We determine a good choice of c(j) empirically for realistic applications through an iterative procedure by first initializing c(j) = 1 for j =,..., J. The value of c(1) is then set to a value which gives rise the lowest DOA error using the GCC-PHAT algorithm. The value of c(j + 1) is then subsequently obtained in a similar manner after finding c(j) that gives rise to the lowest DOA error. The same process is then applied to 3 speech signals from the NOIZEUS database [8] under different SNRs. Experiments conducted in this manner reveal that the performance of GCC-PHAT after is relatively insensitive to c(j), j =,..., J under different SNR conditions and that c(j) = 1 can be considered as a good choice for j =,..., J. Figures 6 and, show the variation of mean and standard deviation of the bearing errors with SNR for different values of c(1). We note that the choice of c(1) affects the DOA performance under different SNR conditions. This can occur since, for the finest wavelet scale, corresponding to the highest frequency subband, it is expected that noise dominates the signal component under low SNR. Therefore, compared with other scales, the noise energy in scale 1 is more significant than the energy of the clean wavelet. Hence, one should set a higher threshold for the finest scale. As can be seen, a good choice for c(1) that gives rise to good DOA performance for the GCC-PHAT is given by c(1) =.3 across the SNRs considered. In addition, we note that, for c(1) =.7, a low mean bearing error can be achieved while its standard deviation is modestly high compared to the case when c(1) =.3. We therefore conclude that c(j) =.3 and c(j) = 1, j =,..., J are good choices for DOA estimation. Although a good choice of c(1) is given by.3, we further provide a means of estimating the SNR so that c(1) can be determined based on that shown in Fig. 6. We first define γ w (j), γ y (j), γ n (j) as the energy of the clean and received signal wavelets as well as noise of scale j, respectively. We next define r w (j) = γ w (j) / j=1 γ w(j), r y (j) = γ y (j) / j=1 γ y(j), r n (j) = γ n (j) / j=1 γ n(j) as the energy ratio for wavelets corresponding to clean, received and noise signals. Since energy in the wavelet domain is equivalent to the time-domain energy, the SNR can be computed by ( j=1 SNR = 1 log γ ) w(j) 1 i=1 γ. (1) n(j) The ratio r y (j) can be obtained using r y (j) = from which we obtain where γ y (j) j=1 γ y(j) = γ w(j) + γ n (j) j=1 γ y(j) = r w(j)( j=1 γ y(j) j=1 γ n(j)) j=1 γ y(j) + r n(j) j=1 γ n(j) j=1 γ, (13) y(j) r y (j) = r w (j) + β (r n (j) r w (j)), (14) β = J j / J γ n (j) γ y (j). (15) We note that when the number of decomposition levels J is large, the signal energy in the coarsest scale approximates to zero. Hence, (14) can be rewritten as r y (J) = β r n (J) and β in (15) can then be expressed as j β = r y (J) / r n (J). (16) Since a white Gaussian noise should have constant energy ratio across the scales, r n (j) can be computed given a WT. By using (15), (16) and (1), SNR can now be rewritten as SNR = 1 log 1 ((1 β)/β) db, (17) from which we can now select a value of c(1) based on Fig. 6. 8

7 6 4 3 Martin s approach [1] without Beroutis approach [9] wavelet based 1 1 3 4 5 6 7 8 91 wavelet based 3 1 3 4 5 6 7 8 91 Figure 7. DOA performance comparison by our proposed method and that of [1], [11] under different SNRs: mean bearing errors and standard deviation of bearing errors. 7 6 4 Martin s approach [1] without Beroutis approach [9] Using the above, we can therefore apply a MAP estimator using () and our proposed algorithm for speech source localization is summarized as follows: 1) select c or c(1) using Fig. 6 or estimate SNR using (17); ) compute the noise variance σ n using (1); 3) for wavelet coefficients in each scale k = 1,..., K, a) calculate σ y using (8); b) calculate σ ω using using (7); 4) estimate each coefficient Ŵk(j) in (5); 5) estimate the DOA using the GCC-PHAT. IV. EXPERIMENT RESULTS We evaluate the performance of our proposed algorithm and compare its performance with that of two well-known techniques [1], [11] in the context of DOA estimation. A virtual room of size 1 m 1 m 1 m is created using the method of images. A linear array of four microphones with spacing.5 m and centroid position (5, 5, 1.6) m is used. We evaluate the performance of the algorithms by varying the source bearing with a constant source-sensor distance of 3.6 m. We introduce white noise with different SNRs at each microphone. Speech signals used are obtained from the NOIZEUS database [8]. Bearing errors of our proposed wavelet-based algorithm and the spectral-substraction (SS) technique by Beroutis approach [1] and Martins approach [11] are computed for 3 different speech signals each using 1 independent trials under different SNR conditions. For our method, we have used factors c(j) = 1, j =,..., J and c(1) is chosen using Fig. 6 based on different SNR conditions estimated using (17). The mean and standard deviation of the bearing errors are illustrated in Figs. 7 and, respectively. As can be seen, the approach of [11] does not give rise to good DOA estimation, although it is well known for offering better speech intelligibility. Using our proposed algorithm, the mean bearing errors are reduced by approximately 4 over Beroutis approach under low SNR environment. In addition, the standard deviation for our proposed algorithm is reduced by approximately 8 over Beroutis approach. This improvement is significantly higher than the improvement of the SS method over the GCC-PHAT processor without. This shows that our approach based on wavelet can improve DOA performance over that for the existing SS speech method. V. CONCLUSION We presented a novel wavelet-based speech algorithm for achieving high DOA performance for speech signals. We estimate the local noise variance which can improve DOA performance further. Simulation results showed our proposed method outperforms the spectral subtraction technique under low SNR when the original PHAT algorithm is not robust to low SNR environments. REFERENCES [1] C. Knapp and G. Carter, The generalized correlation method for estimation of time delay, IEEE Trans. Acoust., Speech and Signal Process., vol. 4, no. 4, pp. 3 37, Aug. 1976. [] C. Zhang, D. Florencio, and Z. Y. Zhang, Why does PHAT work well in low noise, reverberative environments? IEEE Int l Conf. Acoust., Speech and Signal Process., pp. 565 568, Mar.-Apr. 8. [3] M. Miller and N. Kingsbury, Image using derotated complex wavelet coefficients, IEEE Trans. Image Process., vol. 17, no. 9, pp. 1 1511, Nov. 8. [4] V. Bruni and D. Vitulano, Wavelet-based signal via simple singularities approximation, Signal Processing, vol. 86, no. 4, pp. 859 876, Apr. 6. [5] I. Bayram and I. W. Selesnick, Frequency-domain design of overcomplete rational-dilation wavelet transforms, IEEE Trans. Signal Process., vol. 57, no. 8, pp. 957 97, Aug. 9. [6] C. S. Burrus, R. Gopinath, and H. Guo, Introduction to wavelets and wavelet transform: a primer, Prentice Hall, 1997. [7] L. Sendur and I. W. Selesnick, Bivariate shrinkage functions for wavelet-based exploiting interscale dependency, IEEE Trans. Signal Process., vol., no. 11, pp. 744 756, Nov.. [8] http://www.utdallas.edu/ loizou/speech/noizeus/. [9] D. Donoho and I. Johnstone, Ideal spatial adaptation by wavelet shrinkage, Biometrika, vol. 81, no. 3, pp. 45 455, 1994. [1] M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic noise, in Proc. IEEE Int l Conf. Acoust., Speech and Signal Process., pp. 8 11, 1979. [11] R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Trans. Speech and Audio Process., vol. 9, no. 5, pp. 4 51, Jul. 1. 81