Acoustic Source Tracking in Reverberant Environment Using Regional Steered Response Power Measurement

Similar documents
IMPROVEMENT OF SPEECH SOURCE LOCALIZATION IN NOISY ENVIRONMENT USING OVERCOMPLETE RATIONAL-DILATION WAVELET TRANSFORMS

A FAST CUMULATIVE STEERED RESPONSE POWER FOR MULTIPLE SPEAKER DETECTION AND LOCALIZATION. Youssef Oualil, Friedrich Faubel, Dietrich Klakow

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Acoustic Source Tracking in a Reverberant Environment Using a Pairwise Synchronous Microphone Network

ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION

Omnidirectional Sound Source Tracking Based on Sequential Updating Histogram

Robust Low-Resource Sound Localization in Correlated Noise

Convention Paper Presented at the 131st Convention 2011 October New York, USA

arxiv: v1 [cs.sd] 4 Dec 2018

A MICROPHONE ARRAY INTERFACE FOR REAL-TIME INTERACTIVE MUSIC PERFORMANCE

Joint Position-Pitch Decomposition for Multi-Speaker Tracking

Antennas and Propagation. Chapter 6b: Path Models Rayleigh, Rician Fading, MIMO

Exploiting a Geometrically Sampled Grid in the SRP-PHAT for Localization Improvement and Power Response Sensitivity Analysis

Applying the Filtered Back-Projection Method to Extract Signal at Specific Position

Sound Source Localization using HRTF database

SPEAKER CHANGE DETECTION AND SPEAKER DIARIZATION USING SPATIAL INFORMATION.

EXPERIMENTS IN ACOUSTIC SOURCE LOCALIZATION USING SPARSE ARRAYS IN ADVERSE INDOORS ENVIRONMENTS

Automotive three-microphone voice activity detector and noise-canceller

Adaptive Waveforms for Target Class Discrimination

Lab S-3: Beamforming with Phasors. N r k. is the time shift applied to r k

Localization of underwater moving sound source based on time delay estimation using hydrophone array

Dynamically Configured Waveform-Agile Sensor Systems

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Detection of SINR Interference in MIMO Transmission using Power Allocation

STAP approach for DOA estimation using microphone arrays

Accurate Three-Step Algorithm for Joint Source Position and Propagation Speed Estimation

VOL. 3, NO.11 Nov, 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Indoor Localization based on Multipath Fingerprinting. Presented by: Evgeny Kupershtein Instructed by: Assoc. Prof. Israel Cohen and Dr.

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTION

ONE of the most common and robust beamforming algorithms

High-speed Noise Cancellation with Microphone Array

Time-of-arrival estimation for blind beamforming

Broadband Microphone Arrays for Speech Acquisition

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Channelized Digital Receivers for Impulse Radio

Cross-Layer MAC Scheduling for Multiple Antenna Systems

Performance analysis of passive emitter tracking using TDOA, AOAand FDOA measurements

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method

Speaker Localization in Noisy Environments Using Steered Response Voice Power

IT is well known that a better quality of service

THE problem of acoustic echo cancellation (AEC) was

SPEECH ENHANCEMENT BASED ON A LOG-SPECTRAL AMPLITUDE ESTIMATOR AND A POSTFILTER DERIVED FROM CLEAN SPEECH CODEBOOK

Passive Emitter Geolocation using Agent-based Data Fusion of AOA, TDOA and FDOA Measurements

A Frequency-Invariant Fixed Beamformer for Speech Enhancement

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research

Chapter 4 Investigation of OFDM Synchronization Techniques

Chapter 4 SPEECH ENHANCEMENT

Auditory System For a Mobile Robot

Effects of Unknown Shadowing and Non-Line-of-Sight on Indoor Tracking Using Visible Light

A New Analysis of the DS-CDMA Cellular Uplink Under Spatial Constraints

PATH UNCERTAINTY ROBUST BEAMFORMING. Richard Stanton and Mike Brookes. Imperial College London {rs408,

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

Michael Brandstein Darren Ward (Eds.) Microphone Arrays. Signal Processing Techniques and Applications. With 149 Figures. Springer

Cost Function for Sound Source Localization with Arbitrary Microphone Arrays

The Simulated Location Accuracy of Integrated CCGA for TDOA Radio Spectrum Monitoring System in NLOS Environment

Channel Probability Ensemble Update for Multiplatform Radar Systems

A Fast and Accurate Sound Source Localization Method Using the Optimal Combination of SRP and TDOA Methodologies

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Image De-Noising Using a Fast Non-Local Averaging Algorithm

Artificial Beacons with RGB-D Environment Mapping for Indoor Mobile Robot Localization

Kalman Tracking and Bayesian Detection for Radar RFI Blanking

ON FREQUENCY DOMAIN MODELS FOR TDOA ESTIMATION

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

A Weighted Least Squares Algorithm for Passive Localization in Multipath Scenarios

Analytical Expression for Average SNR of Correlated Dual Selection Diversity System

Mobile Radio Propagation: Small-Scale Fading and Multi-path

A Spatial Mean and Median Filter For Noise Removal in Digital Images

LOCALIZATION AND IDENTIFICATION OF PERSONS AND AMBIENT NOISE SOURCES VIA ACOUSTIC SCENE ANALYSIS

AN ADAPTIVE MICROPHONE ARRAY FOR OPTIMUM BEAMFORMING AND NOISE REDUCTION

Hedonic Coalition Formation for Distributed Task Allocation among Wireless Agents

Design and Evaluation of Two-Channel-Based Sound Source Localization over Entire Azimuth Range for Moving Talkers

Communication and Sensing Trade-Offs in Decentralized Mobile Sensor Networks: A Cross-Layer Design Approach

AN ADAPTIVE MICROPHONE ARRAY FOR OPTIMUM BEAMFORMING AND NOISE REDUCTION

Spatial Correlation Effects on Channel Estimation of UCA-MIMO Receivers

Multihop Routing in Ad Hoc Networks

Speech Synthesis using Mel-Cepstral Coefficient Feature

Recent Advances in Acoustic Signal Extraction and Dereverberation

Effects of Antenna Mutual Coupling on the Performance of MIMO Systems

Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments

Different Approaches of Spectral Subtraction Method for Speech Enhancement

ACOUSTIC SOURCE LOCALIZATION IN HOME ENVIRONMENTS - THE EFFECT OF MICROPHONE ARRAY GEOMETRY

Bias Correction in Localization Problem. Yiming (Alex) Ji Research School of Information Sciences and Engineering The Australian National University

Overview of MIMO Radio Channels

A BROADBAND BEAMFORMER USING CONTROLLABLE CONSTRAINTS AND MINIMUM VARIANCE

Throughput-optimal number of relays in delaybounded multi-hop ALOHA networks

Mikko Myllymäki and Tuomas Virtanen

Consideration of Sectors for Direction of Arrival Estimation with Circular Arrays

AD-HOC acoustic sensor networks composed of randomly

Almost Perfect Reconstruction Filter Bank for Non-redundant, Approximately Shift-Invariant, Complex Wavelet Transforms

Speech Enhancement Using Microphone Arrays

LCRT: A ToA Based Mobile Terminal Localization Algorithm in NLOS Environment

Beamforming with Imperfect CSI

REAL TIME INDOOR TRACKING OF TAGGED OBJECTS WITH A NETWORK OF RFID READERS

ROBUST echo cancellation requires a method for adjusting

Time Difference of Arrival Estimation Exploiting Multichannel Spatio-Temporal Prediction

Parameter Estimation of Double Directional Radio Channel Model

SOUND SPATIALIZATION CONTROL BY MEANS OF ACOUSTIC SOURCE LOCALIZATION SYSTEM

Reducing comb filtering on different musical instruments using time delay estimation

Transcription:

Acoustic Source Tracing in Reverberant Environment Using Regional Steered Response Power Measurement Kai Wu and Andy W. H. Khong School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. E-mail: wuai@e.ntu.edu.sg; andyhong@ntu.edu.sg Abstract Acoustic source localization and tracing using a microphone array is challenging due to the presence of bacground noise and room reverberation. Conventional algorithms employ the steered response power (SRP) as the measurement function in a particle filter based tracing framewor. The particle weight is updated according to a pseudo-lielihood derived from the SRP value of each particle position. The performance of this approach reduces in a noisy and reverberant environment. In this paper, instead of evaluating the SRP value for each discrete particle position, we propose to apply a regional SRP beamformer which taes into account a circular region centered on each particle position, in order to provide a more robust particle lielihood evaluation. In addition, a proper mapping function is proposed to transform the regional SRP value to the lielihood. Simulation results show that the proposed method achieves robustness in tracing a speech source in a noisy and reverberant environment. Index Terms Acoustic localization and tracing, particle filter, steered response power, microphone array I. INTRODUCTION Acoustic source localization and tracing (ASLT) involves estimating the position of an acoustic source using an array of distributed microphones. Recently, ASLT has become an active research area for applications including teleconferencing, automatic camera steering and surveillance. Localizing and tracing a speech source in an enclosed environment, however, is challenging due to the presence of bacground noise, room reverberation, sound interference and non-stationarity of the speech signal []. Therefore, developing a robust localization and tracing algorithm is necessary for real applications under an adverse environment. ASLT algorithms aim to exploit the relative temporal/spatial information of the microphone received signals given the array geometry. In general, localization algorithms can be classified into two categories: single-step and dual-step approaches. The single-step approach estimates the source position directly by scanning a synthetic beamformer across all possible source locations and finding the maximum power corresponding to the source position estimate []. The dual-step approach, on the other hand, estimates the time-difference-of-arrival (TDOA) information across all microphone pairs in the first step []. These TDOAs are then used to perform localization in the second step by using a mapping from the TDOAs to the source location estimate []. One of the disadvantages of the above approaches is that the localization is performed independently across each time frame. Recently, the Bayesian approach which taes into account the temporal consistency of localization measurements by incorporating the source-dynamic model has been proposed []. The particle filter (PF), which does not require the need to satisfy linearity and Gaussianity assumptions, is one such approach that has been widely used for acoustic source tracing [6]. In PF, the source position at each time frame is defined by a state vector and propagated according to a source-dynamic model. The posterior probability density function (pdf) of the state vector is then updated by the measurement at current time frame. It was observed that the steered response power (SRP) beamformer can be used as a measurement function and it achieves better performance than TDOA-based measurement [7]. Instead of evaluating the SRP over the whole region, the PF constrains the estimation to within a relatively small number of positions (the particle set.) Such technique is often referred to as the pseudo-lielihood approach [7]. Although the pseudo-lielihood approach has been widely adopted in recent literature [8], [9], it still suffers from the effect of bacground noise and reverberation. In this paper, we propose a new PF framewor which incorporates a regional SRP as its measurement function. Instead of evaluating the SRP for each discrete particle position, the proposed method taes into account a circular region centered around each particle position [] so as to provide a more comprehensive evaluation of the lielihood function. The regional SRP value is used to compute the lielihood via a nonlinear mapping. As opposed to [], the proposed method taes into account the temporal consistency of the source position and incorporates a source-dynamic model in the tracing scenario. Simulation results show that the proposed method achieves a performance that is more robust than that proposed in [8], [] in a noisy and reverberant environment. II. REVIEW OF PF BASED TRACKING APPROACH A. Particle Filter Framewor In ASLT, the state-space model is used to describe the source position estimation problem in an iterative manner. Given a pre-defined Cartesian coordinate system, the source state vector is defined as α =[x,y, ẋ, ẏ ] T at time frame

index, where the first two elements x and y define the source position r =[x, y ], ẋ and ẏ denote the source velocity in x and y direction, respectively. We also define the measurement variable z =[ x, ŷ ] T which contains the prior source position estimate. This variable z may be also defined by TDOA-based approach alternatively [7]. The state-space model can therefore be represented as α = g(α, u ), z = h(α, w ), (a) (b) where g( ) denotes the state-transition process, u is the process noise, h( ) denotes the measurement function, and w is the measurement noise. Similar to [7] [9], we employ the Langevin process which had been proposed as a sourcedynamic model for simulating a realistic human motion. Equation (a) can then be rewritten as at bt at bt α = a α + b u, () a b where u N(μ, Σ) is the noise variable, T is the time interval between consecutive frames while μ =[, ] T and Σ = I denote the mean vector and covariance matrix, respectively. The parameters a and b are defined as a =exp( βt), b = v a, (a) (b) where v is the steady-state velocity and β is the rate constant. In this paper, we have used, similar to [8], v =.8 m/s, β =Hz. The bootstrap PF is commonly used in ASLT due to its simplicity [6]. Defining p as the particle index and N p as the total number of particles, the posterior pdf Pr(α z ) is approximated using a set of particles of the state space with associated weights {α (p),w(p) }Np p=. Each particle goes through a propagation followed by an update step. The bootstrap PF is summarized in Table I and will be adopted in this paper. The source position estimate r corresponds to the first two elements of the estimated state α. B. Steered Response Power Measurement The ey step in bootstrap PF-based acoustic source tracing is to determine the measurement lielihood Pr(z α ) so that a proper weight can be assigned to each particle. A pseudolielihood approach which incorporates a SRP beamformer as the measurement function has been proposed in [7]. More specifically, the SRP beamformer defines the energy of an assumed (loo) position r as [], [] P(r )= W i(ω l )Y i(ω l )e jω lτ i (r ), () ω l Ω i= where i is the microphone index, M is the number of microphones, Y i (ω l ) is the frequency-domain received signal of the ith microphone, ω l =πl/l is the angular frequency of the lth frequency bin, L is the number of frequency bins, Ω is the frequency range of interest such that Ω=[, 6] Hz is often chosen for a speech source [9], τ i (r )= r r m i /c is the time-of-arrival from r to the ith microphone, c is the TABLE I: Summary of the bootstrap PF. At time, a set of particles {α (p) discrete representation of Pr(α z ).,w(p) }Np p= is a For the th frame: ) Particle propagation: Propagate each particle through the source-dynamic model described by (), α (p) = g(α (p), u ). ) Update: Each particle is then assigned a weight according to its lielihood w (p) = w (p) Pr(z α (p) ), followed by a normalization step w (p) = w (p) ( N p i= w(i) ). ) Resampling: Resample the particles if the effective sample size is below a threshold, N eff <N t, where N eff =( N p p= (w(p) ) ). ) Result: The particle set {α (p),w(p) }Np p= is obtained for approximation of Pr(α z ). The state estimate at the th frame is α = N p p= w(p) α(p). speed of sound, and W i (ω l ) is a weighting function. The phase transform (PHAT) weighting W i (ω l )=/ Y i (ω l ) is commonly used in ASLT due to its robustness to reverberation and noise [8], []. In general, the SRP beamformer is employed to scan the assumed source position r across the whole surveillance region such that the source position estimate corresponds to that having the maximum power. However, this search process requires high computational complexity for realistic applications. The pseudo-lielihood PF approach mitigates this drawbac based on the concept of pseudo-lielihood. In PF, the lielihood Pr(z α ) defines the probability of obtaining the measurement z given the state α. The SRP value, representing the power for each discrete point, can be used as an approximate version of this lielihood during the voiced frame, i.e., Pr(z α )= { P γ (r ), for voiced frame U D(r ), for unvoiced frame, () where r =[x y ]T represents the first two elements of the state vector α, γ =is a control parameter to regulate the fusion of the SRP function to the lielihood [8], and U D ( ) is the uniform pdf over the considered enclosure domain D = {x,y x min x x max,y min y y max }. By using the pseudo-lielihood PF approach, the SRP evaluation P γ (r ) is thus constrained within a relatively small number of positions (the particle set.) However, this approach still suffers in terms of performance in the presence of bacground noise and reverberation due to the lac of robustness for the SRP [7], [8]; noise and reverberation may flatten the SRP spatial spectrum and cause the location corresponding to the maximum power to deviate from the true source position.

Fig. : Regional steered response power for a circle region. The performance of ASLT algorithm can be improved if a robust measurement function is adopted in the PF tracing framewor. III. THE PROPOSED METHOD A. Regional SRP Measurement We propose to employ a regional SRP beamformer [] as a measurement function in order to mitigate the effect of reverberation and noise. Due to the energy integration over a square grid centered on an assumed position, the regional SRP beamformer has shown to be more robust than the conventional SRP [] in a noisy and reverberant environment. Evaluation of the regional SRP over a square grid proposed in [] requires the computation of the distance from the center to each boundary along a certain direction. We however consider a circular region centered on each particle, in order to reduce the computational complexity given that the distance from the center to the circular circumference is a constant. Before defining the regional SRP function, we note that the relationship between the conventional SRP function in () and the GCC function is given by [] P(r )= M W i(ω l )Y i(ω l )e jω lτ i (r ) ω l i= =π R i,j(τ i,j(r )), (6) where R i,j(τ i,j(r )) = π i= j= ω l Ψ i,j(ω l )Y i(ω l )Y j (ω l )e jω lτ i,j (r ) is the GCC function between the ith and jth microphones, τ i,j(r )=τ j(r ) τ i(r ) = r r m j r r m i c is the TDOA between the ith and jth microphones, and Ψ i,j(ω l )= Yi(ω l )Yj (ω l) is the PHAT weighting. Expanding (6) and removing the fixed energy terms and symmetries [], one can define a modified SRP function for a discrete assumed position r in terms of the summation of GCC functions: (7) (8) (9) P m (r )=π i= j=i+ R i,j(τ i,j(r )). () where the superscript m in () denotes for the modified SRP function. Equation () indicates that instead of using (), the power at r can also be computed from the summation of GCC functions in which the TDOAs are determined by the discrete assumed position. Now, instead of considering r, we tae into account a circular region C(r ) centered at r, as illustrated in Fig.. The regional SRP is defined by accumulating the power within C(r ), i.e., P c (r )=π i= j=i+ r C(r ) R i,j(τ i,j(r )), () where the superscript c denotes for the circular region. It has been shown in [] that the GCC function for points within a region taes only values in the TDOA range τ i,j (r ) [τ l i,j (r ),τ h i,j (r )] for each microphone pair, where the TDOA range limits τ l i,j (r ),τ h i,j (r ) are only determined by the region boundary. In this paper, since we are considering a circular region r C(r ) in (), τ l i,j (r ),τ h i,j (r ) can be determined by the boundary of the circular region. In order to compute these TDOA range limits, we first evaluate the TDOA gradient along which the TDOA exhibits the highest rate of increase. By taing the gradient of (8), the TDOA gradient (τ i,j (r )) at position r can be derived as (τ i,j(r )) = [ x (τ i,j(r )), y (τ i,j(r ))], () where x ( ) = ( )/ x such that x (τ i,j(r )) = ( x x m j c r r m j x x m i r r m i y (τ i,j(r )) = ( y yj m c r r m j y yi m r r m i ), (a) ). (b) In (), x and y denote the two-dimensional components of r while x m i and y m i denote the two-dimensional components of the ith microphone location. The lower and upper limits of the TDOA can be computed by considering the product of the gradient magnitude and the distance along the gradient, i.e., τi,j(r l )=τ i,j(r ) (τ i,j(r )) ρ, τi,j(r h )=τ i,j(r )+ (τ i,j(r )) ρ, (a) (b) where ρ is the radius of the circular region. With the obtained TDOA range limits, the regional SRP in () can then be evaluated as P c (r )=π τ h i,j (r ) i= j=i+ τ i,j (r )=τ i,j l (r ) B. Distribution of Regional SRP Values R i,j(τ i,j(r )). () The regional SRP value computed from () cannot be directly used as a measurement lielihood. We see for some mapping function M( ) to map the regional SRP value into the lielihood Pr(z α ) that is within the range of [, ]. Pr(z α )=M(P c (r )). (6) In order to develop a proper mapping function, we first analyze the distribution of regional SRP values. Substituting (7) and (9)

Probability.8.6.. Distribution of regional SRP values in the clutter positions Distribution of regional SRP values in the neighborhood source position 6 7 Regional SRP values into (), we obtain P c (r )= Fig. : Distribution of the regional SRP values. τi,j h (r ) e jωlτi,j (r)+jωlτi,j (r ), i= j=i+ τ i,j (r )=τ i,j l (r ) ω l (7) where r is the true source position. Equation (7) is useful for analysis of the distribution of the regional SRP values. We split the whole surveillance area D into two areas. Distribution of regional SRP values in the neighborhood of source position: The neighborhood of source position is defined as positions with distance from the true source position being less than a threshold, i.e., r r d t. In this simulation d t =. mwas used. For positions in this area, P c (r ) in (7) reaches the maximum due to the compensation of phase delays of the received signals. Distribution of regional SRP values in the clutter positions: The clutter positions are defined as the positions which are at some distant away from the source position such that r r d t. For those clutter positions, due to the unmatch in the phase compensation, we assume that the phase follows a uniform distribution [9], given by O = e jω lτ i,j (r)+jω l τ i,j (r ) = e jθ, θ U[ π, π). (8) In addition, due to the identically independent distributions of the phases and the sufficient number of summations for the phases, we deduce, based on central limit theorem, that the regional SRP power values for the clutter positions follow a Gaussian distribution, i.e., P c (r ) N(,σ ), r r d t. (9) where σ is the variance of distribution of regional SRP values in clutter positions. Figure shows the two distributions of the regional SRP values in these two areas. The distribution of regional SRP values in the neighborhood of source position is indicated by the solid line, while the distribution of SRP values in clutter positions is indicated by the dashed line. The figure shows that the distribution of SRP values in clutter positions corresponds approximately to a zero mean Gaussian distribution as expected. The variance σ depends on the TDOA summation boundary and number of microphone pairs used in (7). In our simulation, σ = was observed when M = 8 and ρ =. mwas used. On the other hand, the regional SRP values corresponding to the neighborhood of source position are generally higher than the values corresponding to the (a) ē =. m (b) ē =. m Fig. : Comparison of tracing results with T 6 = ms and SNR = db. (a) Conventional PF-SRP tracing method [8]. (b) Proposed PF-regional SRP tracing method. clutter positions due to the phase compensation in (7). We therefore choose a threshold to distinguish between these two distributions of regional SRP values. In this wor, we set an ad-hoc threshold P t =in order to eliminate the effect of clutter positions as much as possible. This threshold should be modified accordingly if different M and ρ are used. A normal cumulative distribution function (cdf) can be applied as the mapping function: M(P c (r )) = Φ(P c (r ), P t,σ P), () where Φ( ) is a normal cdf. As discussed, the threshold P t =is chosen so that the regional SRP values of clutter positions are mapped onto the lower end of Φ( ), while those corresponding to the neighborhood of the source position are mapped onto the higher end of Φ( ). The variable σp is the variance of the normal cdf which determines its steepness. In this wor, σp = was chosen and performs well in our simulation. The lielihood Pr(z α ) thus can be defined as { M(P c (r Pr(z α )= )), for voiced frame U D(r ), for unvoiced frame. () The remaining procedures follow the standard PF framewor in Table I. The position estimate at each iteration r correspond to the first two elements of the state estimate α. IV. SIMULATION RESULTS Simulations were conducted in a room of dimension m m. m. Eight microphones were distributed. maway from the perimeter of the room (see Fig..) A s speech signal sampled at 6 Hz from the TIMIT database [] was used as a source signal. The microphone signals were generated by the method of images []. White Gaussian noise (WGN) at different signal-to-noise ratio (SNR) was added to the microphone signals. The positions of speech source were computed using a frame size of samples with N p =8 particles. The radius of the circular region centered on each particle was ρ =. m. The effective sample size threshold in PF was N t =7.. The proposed method is compared with the conventional PF-SRP tracing method [8] where the simple binary voice/unvoice detector was implemented and the regional SRP localization method without PF framewor []. We quantify their performance using e = r r, where r is the estimated position at the th frame, and r is the true source position. The average tracing error ē = K K = e

Mean tracing error ē (m) Mean tracing error ē (m).7.6..... Conventional PF SRP tracing [8] Conventional regional SRP localization without PF [] Proposed PF regional SRP tracing..........7.6..... T 6 (s) (a) Conventional PF SRP tracing [8] Conventional regional SRP localization without PF [] Proposed PF regional SRP tracing......... T 6 (s) (b) Fig. : Variation of average tracing error with reverberation time for (a) SNR = db and (b) SNR = db. quantifies the performance across all audio frames, where K is the number of frames. Figure compares the tracing results of the two PF based tracing methods when T 6 = ms. Figure (a) shows that the performance of the conventional PF-SRP method [8] is significantly affected by room reverberation. The particles, indicated by the dotted points, are scattered around the surveillance region due to the poor performance of the conventional SRP measurements. The conventional PF-SRP method has an average tracing error of. m. Figure (b) shows the performance of the proposed PF-regional SRP method. The regional SRP measurements result in well-propagated particles which are concentrated along the true source trajectory. The proposed method achieves an averaged tracing error of. m, indicating that it outperforms the conventional PF- SRP method in this reverberant condition. Figure presents the average tracing error of the conventional PF-SRP method [8], the regional SRP without PF method [] and the proposed PF-regional SRP method, for various reverberation time. Two cases of SNR = and db were examined. The performance of these three methods reduces with reverberation time, as expected. The conventional PF-SRP method and the regional SRP without PF method consistently exhibit higher tracing error than the proposed PF-regional SRP method. The lower SNR condition further degrades the performance of the conventional methods. Due to the improved regional SRP evaluation, the regional SRP without PF method performs modestly better than the PF-SRP method, even though it does not exploit the temporal consistency of source positions. By incorporating the PF framewor and taing into account the temporal consistency of source (a) ē =.9 m (b) ē =. m Fig. : Comparison of tracing results with T 6 = ms and SNR = db using randomly distributed microphones. (a) Conventional PF-SRP tracing method [8]. (b) Proposed PF-regional SRP tracing method. positions, the proposed PF-regional SRP results in a mean error of less than. m, indicating that it outperforms both of the two conventional methods for the environments being examined. The improvement over the conventional methods becomes more significant at lower SNR and higher reverberant condition. To further examine the validity of the algorithm in different microphone array configuration, we consider microphones that are randomly distributed as illustrated in Fig.. The remaining parameters were the same as the previous simulations. The conventional PF-SRP method [8], shown in Fig. (a), results in the particles scattered around the room enclosure and poor performance is exhibited. The proposed PF-regional SRP method, shown in Fig. (b), can achieve good tracing performance by reducing the tracing error from.9 m to. m. This simulation indicates that the algorithm is not limited to the case where the microphones have to be placed along the parameter of the room enclosure. V. CONCLUSION We propose a PF based acoustic source tracing framewor by using a regional SRP measurement function. Instead of evaluating the power of discrete particle positions, the proposed method taes into account a circular region centered on each particle by accumulating the power within each region to provide a more comprehensive lielihood evaluation. Simulation results show that the proposed method achieves lower tracing error than the conventional methods in a noisy and reverberant environment. REFERENCES [] K. Wu, S. T. Goh, and A. W. H. Khong, Speaer localization and tracing in the presence of sound interference by exploiting speech harmonicity, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP ),. [] J. P. Dmochowsi, J. Benesty, and S. Affes, A generalized steered response power method for computationally viable source localization, IEEE Trans. Audio, Speech, Lang. Process., vol., no. 8, pp. 6, Nov. 7. [] C. Knapp and G. Carter, The generalized correlation method for estimation of time delay, IEEE Trans. Acoust. Speech, Signal Process., vol., no., pp. 7, Aug 976. [] Y. Huang, J. Benesty, G. W. Elo, and R. M. Mersereati, Real-time passive source localization: a practical linear-correction least-squares approach, IEEE Trans. Speech, Audio Process., vol. 9, no. 8, pp. 9 96, Nov..

[] J. Vermaa and A. Blae, Nonlinear filtering for speaer tracing in noisy and reverberant environments, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP ),, pp.. [6] M. S. Arulampalam, S. Masell, N. Gordon, and T. Clapp, A tutorial on particle filters for online nonlinear/non-gaussian Bayesian tracing, IEEE Trans. Signal Process., vol., no., pp. 7 88, Feb.. [7] D. B. Ward, E. A. Lehmann, and R. C. Williamson, Particle filtering algorithms for tracing an acoustic source in a reverberant environment, IEEE Trans. Speech and Audio Process., vol., no. 6, pp. 86 86,. [8] E. A. Lehmann and A. M. Johansson, Particle filter with integrated voice activity detection for acoustic source tracing, EURASIP J. on Adv. Signal Process., vol. 7, 7. [9] M. F. Fallon and S. Godsill, Acoustic source localization and tracing using trac before detect, IEEE Trans. Audio, Speech, Lang. Process., vol. 8, no. 6, pp. 8,. [] M. Cobos, A. Marti, and J. J. Lopez, A modified SRP-PHAT functional for robust real-time sound source localization with scalable spatial sampling, IEEE Signal Process. Letters, vol. 8, no., pp. 7 7,. [] J. DiBiase, H. Silverman, and M Brandstein, Robust localization in reverberant rooms, rophone Arrays: Signal Processing Techniques and Applications., pp. 7 8,. [] D. Florencio C. Zhang and Z. Zhang, Why does PHAT wor well in low noise, reverberant environment, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP 8), 8, pp. 6 68. [] J. H. DiBiase, A High Accuracy, Low-Latency Technique for Taler Localization in Reverberant Environments using rophone Arrays, Ph.D. thesis, Brown Univ.,. [] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgrena, and V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus, Philadelphia, PA, 99. [] E. A. Lehmann and A. M. Johansson, Prediction of energy decay in room impulse responses simulated with an image-source model, J. Acoust. Soc. Amer., vol., no., pp. 69 77, July 8.