arxiv: v1 [cs.sd] 4 Dec 2018

Similar documents
Microphone Array Design and Beamforming

ONE of the most common and robust beamforming algorithms

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Recent Advances in Acoustic Signal Extraction and Dereverberation

Speech Enhancement Using Microphone Arrays

Exploiting a Geometrically Sampled Grid in the SRP-PHAT for Localization Improvement and Power Response Sensitivity Analysis

A MICROPHONE ARRAY INTERFACE FOR REAL-TIME INTERACTIVE MUSIC PERFORMANCE

arxiv: v1 [cs.sd] 17 Dec 2018

DIRECTION OF ARRIVAL ESTIMATION IN WIRELESS MOBILE COMMUNICATIONS USING MINIMUM VERIANCE DISTORSIONLESS RESPONSE

ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method

Performance Analysis of MUSIC and MVDR DOA Estimation Algorithm

STAP approach for DOA estimation using microphone arrays

Antennas and Propagation. Chapter 6b: Path Models Rayleigh, Rician Fading, MIMO

SIGNAL MODEL AND PARAMETER ESTIMATION FOR COLOCATED MIMO RADAR

Multiple Sound Sources Localization Using Energetic Analysis Method

A BROADBAND BEAMFORMER USING CONTROLLABLE CONSTRAINTS AND MINIMUM VARIANCE

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

Performance Analysis of MUSIC and LMS Algorithms for Smart Antenna Systems

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Approaches for Angle of Arrival Estimation. Wenguang Mao

Advances in Direction-of-Arrival Estimation

This is a repository copy of Robust DOA estimation for a mimo array using two calibrated transmit sensors.

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTION

Joint DOA and Array Manifold Estimation for a MIMO Array Using Two Calibrated Antennas

Chapter 4 SPEECH ENHANCEMENT

Design of Robust Differential Microphone Arrays

The LOCATA Challenge Data Corpus for Acoustic Source Localization and Tracking

Applications & Theory

Direction of Arrival Algorithms for Mobile User Detection

Automotive three-microphone voice activity detector and noise-canceller

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks

ADAPTIVE ANTENNAS. TYPES OF BEAMFORMING

Level I Signal Modeling and Adaptive Spectral Analysis

Bluetooth Angle Estimation for Real-Time Locationing

A Novel 3D Beamforming Scheme for LTE-Advanced System

Robust Near-Field Adaptive Beamforming with Distance Discrimination

Multipath Effect on Covariance Based MIMO Radar Beampattern Design

Adaptive Beamforming Applied for Signals Estimated with MUSIC Algorithm

TARGET SPEECH EXTRACTION IN COCKTAIL PARTY BY COMBINING BEAMFORMING AND BLIND SOURCE SEPARATION

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

Omnidirectional Sound Source Tracking Based on Sequential Updating Histogram

HUMAN speech is frequently encountered in several

Sound pressure level calculation methodology investigation of corona noise in AC substations

S. Ejaz and M. A. Shafiq Faculty of Electronic Engineering Ghulam Ishaq Khan Institute of Engineering Sciences and Technology Topi, N.W.F.

JOINT DOA AND FUNDAMENTAL FREQUENCY ESTIMATION METHODS BASED ON 2-D FILTERING

Broadband Microphone Arrays for Speech Acquisition

Speech Enhancement Techniques using Wiener Filter and Subspace Filter

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Hybrid Positioning through Extended Kalman Filter with Inertial Data Fusion

Non Unuiform Phased array Beamforming with Covariance Based Method

Informed Spatial Filtering for Sound Extraction Using Distributed Microphone Arrays

Evaluation of a Multiple versus a Single Reference MIMO ANC Algorithm on Dornier 328 Test Data Set

FP6 IST

Localization of underwater moving sound source based on time delay estimation using hydrophone array

Local Relative Transfer Function for Sound Source Localization

Michael Brandstein Darren Ward (Eds.) Microphone Arrays. Signal Processing Techniques and Applications. With 149 Figures. Springer

Adaptive f-xy Hankel matrix rank reduction filter to attenuate coherent noise Nirupama (Pam) Nagarajappa*, CGGVeritas

NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA. Qipeng Gong, Benoit Champagne and Peter Kabal

Multiple sound source localization using gammatone auditory filtering and direct sound componence detection

METIS Second Training & Seminar. Smart antenna: Source localization and beamforming

Time-of-arrival estimation for blind beamforming

BORIS KASHENTSEV ESTIMATION OF DOMINANT SOUND SOURCE WITH THREE MICROPHONE ARRAY. Master of Science thesis

SOUND SPATIALIZATION CONTROL BY MEANS OF ACOUSTIC SOURCE LOCALIZATION SYSTEM

Joint Position-Pitch Decomposition for Multi-Speaker Tracking

An improved direction of arrival (DOA) estimation algorithm and beam formation algorithm for smart antenna system in multipath environment

Optimum Beamforming. ECE 754 Supplemental Notes Kathleen E. Wage. March 31, Background Beampatterns for optimal processors Array gain

Microphone Array Feedback Suppression. for Indoor Room Acoustics

COMPARISON OF MICROPHONE ARRAY GEOMETRIES FOR MULTI-POINT SOUND FIELD REPRODUCTION

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

Eigenvalues and Eigenvectors in Array Antennas. Optimization of Array Antennas for High Performance. Self-introduction

Hybrid ARQ Scheme with Antenna Permutation for MIMO Systems in Slow Fading Channels

Design and Test of FPGA-based Direction-of-Arrival Algorithms for Adaptive Array Antennas

Smart antenna for doa using music and esprit

Fundamental frequency estimation of speech signals using MUSIC algorithm

RIR Estimation for Synthetic Data Acquisition

ADAPTIVE CIRCULAR BEAMFORMING USING MULTI-BEAM STRUCTURE

Reference: PMU Data Event Detection

Performance analysis of passive emitter tracking using TDOA, AOAand FDOA measurements

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

HIGHLY correlated or coherent signals are often the case

This is a repository copy of White Noise Reduction for Wideband Beamforming Based on Uniform Rectangular Arrays.

ACOUSTIC SOURCE LOCALIZATION IN HOME ENVIRONMENTS - THE EFFECT OF MICROPHONE ARRAY GEOMETRY

Applying the Filtered Back-Projection Method to Extract Signal at Specific Position

260 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 2, FEBRUARY /$ IEEE

MIMO CHANNEL OPTIMIZATION IN INDOOR LINE-OF-SIGHT (LOS) ENVIRONMENT

Co-Prime Sampling and Cross-Correlation Estimation

A microphone array approach for browsable soundscapes

Time Difference of Arrival Estimation Exploiting Multichannel Spatio-Temporal Prediction

The Estimation of the Directions of Arrival of the Spread-Spectrum Signals With Three Orthogonal Sensors

Post beam steering techniques as a means to extract horizontal winds from atmospheric radars

Mikko Myllymäki and Tuomas Virtanen

Transcription:

LOCALIZATION AND TRACKING OF AN ACOUSTIC SOURCE USING A DIAGONAL UNLOADING BEAMFORMING AND A KALMAN FILTER Daniele Salvati, Carlo Drioli, Gian Luca Foresti Department of Mathematics, Computer Science and Physics University of Udine via delle Scienze, 26, 331 Udine, Italy arxiv:1812.1521v1 [cs.sd] 4 Dec 218 ABSTRACT We present the signal processing framework and some results for the IEEE AASP challenge on acoustic source localization and tracking (LOCATA). The system is designed for the direction of arrival (DOA) estimation in single-source scenarios. The proposed framework consists of four main building blocks: pre-processing, voice activity detection (VAD), localization, tracking. The signal pre-processing pipeline includes the short-time Fourier transform (STFT) of the multichannel input captured by the array and the cross power spectral density (CPSD) matrices estimation. The VAD is calculated with a trace-based threshold of the CPSD matrices. The localization is then computed using our recently proposed diagonal unloading (DU) beamforming, which has low-complexity and high resolution. The DOA estimation is finally smoothed with a Kalman filer (KF). Experimental results on the LOCATA development dataset are reported in terms of the root mean square error (RMSE) for a 7-microphone linear array, the 12-microphone pseudo-spherical array integrated in a prototype head for a humanoid robot, and the 32-microphone spherical array. Index Terms Acoustic source localization, speaker tracking, diagonal unloading beamforming, LOCATA, Kalman filter, microphone array. 1. INTRODUCTION The aim of an acoustic source localization and tracking system is to estimate the position of sound sources in space by analyzing the sound field with a microphone array, a set of microphones arranged to capture the spatial information of sound. Speaker spatial localization/tracking using microphone arrays is of considerable interest in applications of teleconferencing systems, hands-free acquisition, human-machine interaction, recognition, and audio surveillance. In this paper, we present the signal processing framework for the IEEE AASP challenge on acoustic source localization and tracking (LOCATA) [1]. We also present some performance results related to the LOCATA development dataset. The proposed localization and tracking system is designed for the direction of arrival (DOA) estimation in single-source scenarios. The localization algorithm is based on diagonal unloading (DU) beamforming, recently introduced in [2]. Broadband DU localization beamformer is computed in the frequency-domain [3] by calculating the steered response power (SRP) on each frequency bin and by summing the narrowband components with the incoherent frequency fusion [4]. The tracking is performed with a Kalman filter (KF) [5]. 2. METHOD The proposed system consists of four main building blocks: pre-processing; voice activity detection (VAD); localization; tracking. The organization of the signal processing components is illustrated in Figure 1. 2.1. Pre-Processing The signal pre-processing pipeline includes the short-time Fourier transform (STFT) of the multichannel input captured by the array x m(t) (m = 1, 2,..., M, where M is the number of microphones). It can be expressed as X m(k, f) = l= L 2 1 l= L 2 w(l)x m(l + kr)e j2πfl L, k =, 1,..., (1) where k is the frame time index, f is the frequency bin, w(l) is the analysis window, L is the size of the fast Fourier transform (FFT), and R is the hop size. After the frequency-domain transformation, the cross power spectral density (CPSD) matrices Φ(k, f) of the considered frequency range [f min,f max] are estimated through the averaging of the array signal blocks [6] Φ(k, f) = 1 N N 1 k n= f = f min, f min + 1,..., f max, x(k k n, f)x H (k k n, f), where N is the number of frames for the averaging, H denotes the conjugate transpose operator, and x(k, f) = [X 1(k, f), X 2(k, f),..., X M (k, f)] T, (3) where T denotes the transpose operator. 2.2. VAD The VAD used herein is based on the trace of the CPSD matrices that is related on the DU beamforming. The trace of a CPSD matrix is equivalent to the sum of the eigenvalues of the matrix, i.e., it (2)

Sound Acquisition Pre-Processing VAD Localization Tracking DOA Estimation Microphone Array STFT and CPSD Matrices Trace-based Threshold Diagonal Unloading Beamforming Kalman Filter Azimuth and Elevation Figure 1: Schematic diagram of the proposed system. represents the overall power of the array. The source detection is hence calculated as { 1, if f max f=f VAD(k) = min tr[ Φ(k, f)] > η, (4), otherwise, where tr[ ] is the operator that computes the trace of a matrix, and η is a given threshold. The parameter η was empirically set to the value allowing to effectively detect the source activity. 2.3. Localization The acoustic source DOA estimation method is a low complexity and robust beamformer based on a DU transformation of the covariance matrix involved in the conventional beamformer computation to exploit the high resolution subspace orthogonality property. The method is illustrated in details in [2]. The transformation, on which the DU method is based, is obtained by subtracting an opportune diagonal matrix from the CPSD matrix Φ(k, f) of the array output vector. As a result, the DU beamforming removes as much as possible the signal subspace from the covariance matrix and provides a high resolution beampattern. In practice, the design and implementation of the DU transformation is simple and effective, and is obtained by computing the matrix (un)loading factor. The broadband SRP is defined as [2, 4] P (k, Ω d ) = f max P DU(k, f, Ω d ), (5) g(k, f) f=f min where Ω d = [θ d, φ d ] (θ d and φ d are the azimuth and elevation angles) is the steering direction, denotes the Uniform norm, i.e., the maximum value of the vector g(k, f) = [P DU(k, f, Ω 1), P DU(k, f, Ω 2),..., P DU(k, f, Ω D)], (6) which contains all the narrowband SRP for the considered search direction D, and the narrowband DU response power beamforming P DU(k, f, Ω d ) is defined as 1 P DU(k, f, Ω d ) = a H (f, Ω d )[tr[ Φ(k, f)]i Φ(k, f)]a(f, Ω d ), (7) where a(f, Ω d ) is the array steering vector for the direction Ω d, and I is the identity matrix. Note that the unloading parameter is computed with the trace operation of the CPSD matrices. This solution guarantees that the transformed PSD matrix Φ DU(k, f) = [tr[ Φ(k, f)]i Φ(k, f)] has the attenuation of the signal subspaces with respect to the noise subspace, and hence the high resolution orthogonality is exploiting, even if partially, since the transformed PSD matrix is affected by a certain amount of signal subspace [2]. The array steering vector depends on the array geometry. Note that for the linear array the steering direction is given only by the azimuth angle. Then, the DOA estimate of the source is obtained by 2.4. Tracking ˆΩ s(k) = argmax Ω d [P (k, Ω d )], d = 1, 2,..., D. (8) The KF [5] is an optimal recursive Bayesian filter for linear systems observed in the presence of Gaussian noise. The filter equations can be divided into a prediction and a correction step. The state of the process is given by y(k) = [Ω(k), v θ (k), v φ (k)] T, (9) where v θ (k) and v φ (k) are the velocities. In the prediction step the update equations are where y p(k) = Ay(k 1), (1) P p(k) = AP(k 1)A T + BQB T, (11) 1 dt 1 dt A = 1, (12) 1.5dt 2 B =.5dt 2 dt, (13) dt σ 2 Q = q σq 2, (14) with σq 2 being the variance of the process error, dt = RN/f s the time elapsed between DOA estimations, f s the sampling rate. The filter is initialized with the state covariance matrix P(k i) = BQB T and the state y(k i) = [ ˆΩ s(k i),, ] T, where k i is the first time frame in which the VAD(k i) has value 1 and VAD(k i-1)=. After the prediction step, the Kalman gain is calculated as K = P p(k)c T (CP p(k)c T + R) 1, (15)

where 1 C =, (16) 1 σ 2 R = r σr 2, (17) with σ 2 r being the variance of the measurement error. In the correction step the measurement update equations are y(k) = y p(k) + K( ˆΩ s(k) Cy p(k)), (18) P(k) = (I KC)P p(k). (19) Hence, after the correction step the filtered DOA estimation (k) = Ω(k) is obtained. ˆΩ EKF s 3. EXPERIMENTAL RESULTS We present some experimental results on the LOCATA development dataset to show the performance of the proposed framework in the single-source scenario with: static loudspeaker and static array (task 1); moving speaker and static array (task 3); moving speaker and moving array (task 5). We tested the system with the distant talking interfaces for control of interactive TV (DICIT) array by considering a 7-microphone linear subarray ([4 5 6 7 9 1 11]) taking into account the far-field model, the 12-microphone pseudo-spherical array integrated in a prototype head for a humanoid robot array, and the 32-microphone eigenmike spherical array. The system setup is implemented with the following parameters: sampling rate: 48 khz; STFT window: Hann function w(l); FFT size: L = 248 samples; hop size: R = 512 samples; number of frames for CPSD estimation: N = 25; frequency range: [f min,f max]=[8,8] Hz; VAD threshold: η = 2 (linear array), η = 5 (robot head), η = 1 (eigenmike); spatial resolution: 1 degree (linear array, D = 181), 5 degrees (robot head and eigenmike, D = 271); DOA estimation time period: dt =.2667 s; KF parameters: σ 2 q = 1 3, σ 2 r = 1 4. The signal processing framework has been implemented using Matlab R217a. We used our own implementation for the KF. The performance was assessed in terms of the root mean square error (RMSE). Table 1 shows the DOA estimation results for each task and each recording. The azimuth angle was evaluated for the linear array, while both azimuth and elevation angles was considered for the robot head and eigenmike array. Three examples of detection, localization and tracking are depicted in Figures 2, 3, 4. Figure 2 shows the performance of the linear array for the task 1 (static loudspeaker, static array) and recording 3. Figure 3 shows the performance of the robot head array for the task 3 (moving speaker, static array) and recording 2. Figure 4 shows the performance of the eigenmike array for the task 5 (moving speaker, moving array) and recording 1. The top plot shows the waveform of channel 1 with the speaker activity (red line). 4. CONCLUSIONS The signal processing framework based on a DU beamforming and a KF for the IEEE AASP LOCATA challenge has been presented. We described the four main building blocks (pre-processing, VAD, localization, tracking) for the DOA estimation of a single source. We showed some results with the LOCATA development dataset using a linear array, the robot head pseudo-spherical array, and the eigenmike spherical array. 5. REFERENCES [1] H. W. Löllmann, C. Evers, A. Schmidt, H. Mellmann, H. Barfuss, P. A. Naylor, and W. Kellermann, The LOCATA challenge data corpus for acoustic source localization and tracking, in Proceedings of the IEEE Sensor Array and Multichannel Signal Processing Workshop, 218. [2] D. Salvati, C. Drioli, and G. L. Foresti, A low-complexity robust beamforming using diagonal unloading for acoustic source localization, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 3, pp. 69 622, 218. [3] J. Benesty, J. Chen, Y. Huang, and J. Dmochowski, On microphone-array beamforming from a MIMO acoustic signal processing perspective, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 3, pp. 153 165, 27. [4] D. Salvati, C. Drioli, and G. L. Foresti, Incoherent frequency fusion for broadband steered response power algorithms in noisy environments, IEEE Signal Processing Letters, vol. 21, no. 5, pp. 581 585, 214. [5] R. E. Kalman, A new approach to linear filtering and prediction problems, Journal of Basic Engineering, vol. 82, pp. 35 45, 196. [6] L. Zhang, W. Liu, and L. Yu, Performance analysis for finite sample MVDR beamformer with forward backward processing, IEEE Transactions on Signal Processing, vol. 59, no. 5, pp. 2427 2431, 211.

Table 1: The RMSE (degree) of the localization performance on the LOCATA development dataset. Linear array Robot head Eigenmike Azimuth Azimuth Elevation Azimuth Elevation task 1 recording 1.972 1.649 2.447 5.863 2.444 recording 2 5.96.38 1.13 6.676 6.54 recording 3 1.437 2.998 1.98 7.491 5.23 task 3 recording 1 6.48 3.596 2.326 9.939 3.232 recording 2 9.638 4.583 3.798 14.244 4.348 recording 3 4.355 2.88 2.87 9.37 5.84 task 5 recording 1 4.912 2.338 1.818 4.433 3.1 recording 2 21.196 3.217 11.333 32.942 5.738 recording 3 3.86 23.1 7.782 1.23 3.473 Task1, recording 3, linear array.2 -.2 1 2 3 4 5 6 9-9 1 2 3 4 5 6 Figure 2: The performance of the proposed system with the 7-microphone DICIT linear subarray for task 1 (static loudspeaker, static microphone array, recording 3).

.1 Task3, recording 2, robot head -.1 5 1 15 2 25 18 9-9 -18 5 1 15 2 25 18 9 5 1 15 2 25 Figure 3: The performance of the proposed system with the robot head array for task 3 (moving speaker, static microphone array, recording 2). 5-5 1-3 Task5, recording 1, eigenmike 5 1 15 2 18 9-9 -18 5 1 15 2 18 9 5 1 15 2 Figure 4: The performance of the proposed system with the eigenmike array for task 5 (moving speaker, moving microphone array, recording 1).