Multiple Sound Sources Localization Using Energetic Analysis Method

Similar documents
Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments

Recent Advances in Acoustic Signal Extraction and Dereverberation

Automotive three-microphone voice activity detector and noise-canceller

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Adaptive Beamforming Applied for Signals Estimated with MUSIC Algorithm

Localization of underwater moving sound source based on time delay estimation using hydrophone array

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

Omnidirectional Sound Source Tracking Based on Sequential Updating Histogram

arxiv: v1 [cs.sd] 4 Dec 2018

Sound Source Localization using HRTF database

Introduction to Audio Watermarking Schemes

Broadband Microphone Arrays for Speech Acquisition

BREAKING DOWN THE COCKTAIL PARTY: CAPTURING AND ISOLATING SOURCES IN A SOUNDSCAPE

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Measuring impulse responses containing complete spatial information ABSTRACT

Smart antenna for doa using music and esprit

IMPROVED COCKTAIL-PARTY PROCESSING

A Weighted Least Squares Algorithm for Passive Localization in Multipath Scenarios

Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling

ECMA-108. Measurement of Highfrequency. emitted by Information Technology and Telecommunications Equipment. 4 th Edition / December 2008

MINUET: MUSICAL INTERFERENCE UNMIXING ESTIMATION TECHNIQUE

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES

Analysis of Frontal Localization in Double Layered Loudspeaker Array System

An Adaptive Algorithm for Speech Source Separation in Overcomplete Cases Using Wavelet Packets

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

Digital Modulation Recognition Based on Feature, Spectrum and Phase Analysis and its Testing with Disturbed Signals

Estimation of Non-stationary Noise Power Spectrum using DWT

The psychoacoustics of reverberation

Nonlinear postprocessing for blind speech separation

TARGET SPEECH EXTRACTION IN COCKTAIL PARTY BY COMBINING BEAMFORMING AND BLIND SOURCE SEPARATION

DIRECTIONAL CODING OF AUDIO USING A CIRCULAR MICROPHONE ARRAY

Robust Low-Resource Sound Localization in Correlated Noise

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Enhancing 3D Audio Using Blind Bandwidth Extension

Blind Pilot Decontamination

Ambient Passive Seismic Imaging with Noise Analysis Aleksandar Jeremic, Michael Thornton, Peter Duncan, MicroSeismic Inc.

EE 464 Short-Time Fourier Transform Fall and Spectrogram. Many signals of importance have spectral content that

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

Effect of the number of loudspeakers on sense of presence in 3D audio system based on multiple vertical panning

Spatialisation accuracy of a Virtual Performance System

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

An improved direction of arrival (DOA) estimation algorithm and beam formation algorithm for smart antenna system in multipath environment

Lecture 7 Frequency Modulation

Psychoacoustic Cues in Room Size Perception

Separation of Multiple Speech Signals by Using Triangular Microphone Array

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method

Audio Imputation Using the Non-negative Hidden Markov Model

CORRELATION BASED SNR ESTIMATION IN OFDM SYSTEM

ONLINE REPET-SIM FOR REAL-TIME SPEECH ENHANCEMENT

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

Chapter 4 SPEECH ENHANCEMENT

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

ROOM AND CONCERT HALL ACOUSTICS MEASUREMENTS USING ARRAYS OF CAMERAS AND MICROPHONES

FROM BLIND SOURCE SEPARATION TO BLIND SOURCE CANCELLATION IN THE UNDERDETERMINED CASE: A NEW APPROACH BASED ON TIME-FREQUENCY ANALYSIS

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques

ECMA-108. Measurement of Highfrequency. emitted by Information Technology and Telecommunications Equipment. 5 th Edition / December 2010

Time- frequency Masking

Proceedings of Meetings on Acoustics

Analysis of room transfer function and reverberant signal statistics

Proceedings of Meetings on Acoustics

Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

ADAPTIVE ANTENNAS. TYPES OF BEAMFORMING

Local Relative Transfer Function for Sound Source Localization

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Adaptive Fingerprint Binarization by Frequency Domain Analysis

Speech Enhancement Based On Noise Reduction

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4

Experimental evaluation of massive MIMO at 20 GHz band in indoor environment

High-speed Noise Cancellation with Microphone Array

396 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011

Mikko Myllymäki and Tuomas Virtanen

DISTANCE CODING AND PERFORMANCE OF THE MARK 5 AND ST350 SOUNDFIELD MICROPHONES AND THEIR SUITABILITY FOR AMBISONIC REPRODUCTION

A Comparison of the Convolutive Model and Real Recording for Using in Acoustic Echo Cancellation

Time Delay Estimation: Applications and Algorithms

Base-station Antenna Pattern Design for Maximizing Average Channel Capacity in Indoor MIMO System

Multiple sound source localization using gammatone auditory filtering and direct sound componence detection

Informed Spatial Filtering for Sound Extraction Using Distributed Microphone Arrays

SOUND FIELD MEASUREMENTS INSIDE A REVERBERANT ROOM BY MEANS OF A NEW 3D METHOD AND COMPARISON WITH FEM MODEL

Three Element Beam forming Algorithm with Reduced Interference Effect in Signal Direction

38123 Povo Trento (Italy), Via Sommarive 14

Drum Transcription Based on Independent Subspace Analysis

Radar Signal Classification Based on Cascade of STFT, PCA and Naïve Bayes

The Estimation of the Directions of Arrival of the Spread-Spectrum Signals With Three Orthogonal Sensors

29th TONMEISTERTAGUNG VDT INTERNATIONAL CONVENTION, November 2016

Detection of Multipath Propagation Effects in SAR-Tomography with MIMO Modes

Sound source localization accuracy of ambisonic microphone in anechoic conditions

PERFORMANCE COMPARISON BETWEEN STEREAUSIS AND INCOHERENT WIDEBAND MUSIC FOR LOCALIZATION OF GROUND VEHICLES ABSTRACT

DESIGN AND IMPLEMENTATION OF AN ALGORITHM FOR MODULATION IDENTIFICATION OF ANALOG AND DIGITAL SIGNALS

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

ENHANCED PRECISION IN SOURCE LOCALIZATION BY USING 3D-INTENSITY ARRAY MODULE

Audio Fingerprinting using Fractional Fourier Transform

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

What applications is a cardioid subwoofer configuration appropriate for?

Speech Enhancement Using Microphone Arrays

Case study for voice amplification in a highly absorptive conference room using negative absorption tuning by the YAMAHA Active Field Control system

Proceedings Statistical Evaluation of the Positioning Error in Sequential Localization Techniques for Sensor Networks

Measuring procedures for the environmental parameters: Acoustic comfort

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks

Transcription:

VOL.3, NO.4, DECEMBER 1 Multiple Sound Sources Localization Using Energetic Analysis Method Hasan Khaddour, Jiří Schimmel Department of Telecommunications FEEC, Brno University of Technology Purkyňova 11, 1, Brno Czech Republic Email: xkhadd@stud.feec.vutbr.cz, schimmel@feec.vutbr.cz Abstract In this article a method for multiple sound source localization is proposed. This method depends on energetic analysis of B-format signals. The number of sound sources localized by this method can exceed the number of the used microphones. The method was simulated in Matlab and tested in a real environment. Both experimental and simulation results show the efficiency of this method. 1 Introduction Sound source localization methods have been intensively investigated, several methods have been designed for one sound source localization; most of them are based on time delay estimation [1], and on the phase difference []. Some methods are able to localize a number of sound sources that is equal or less than the number of used sensors (microphones), such as MUSIC (Multiple Signal Classification) [3]. MUSIC can estimate the directional of arrivals (DOAs) based on relation between the noise subspace and the signal subspace [3]. Some new methods solved this problem and they are able to localize more sound sources. To achieve that, one method uses binary time frequency masks for blind separation of speech mixtures [4]. This method relies on a property of the Gabor expansions of speech signals, which is called W- disjoint orthogonality [4]. Other method use so called blind separation method (BSS) was presented in [5]. BSS method presents a new approach in order to estimate direction of arrivals (DOAs) depending on applying the Expectation-Maximization algorithm (EM) to a sparseness-based approach [5]. In our paper, a new method is presented for multiple sound source localization using B-format signals, in which the number of sound sources localized by this method can exceed the number of used sensors (in our method B-format signals in horizontal plane only are used). This method depends on energetic analysis of sound signals (B-format signals). This paper is organized as follows; Section presents the B-format signals principle. The energetic analysis method is introduced in Section 3. Section 4 presents the simulation results for this method in Matlab. The experimental results are presented in Section 5, and Section concludes the paper. mation about sound sources [5]. The signals, provide information about the sound source in the horizontal plane, they are recorded using two figure-of-eight microphones facing front-back ( ) and left-right ( )), while provides information about the vertical plane, and it is recorded using figure-of-eight microphone that faces up-down. The signal is recorded using omni-directional microphone, see Figure 1. The encoding equations for B-format signals are [] where represents the azimuth angle of the source, represents the elevation angle of the source, and s represents the sound signal. Left (y(t)) Front (x(t)) Up (z(t)) W(t) (1) B-Format Signals B-format signals are able to represent the sound sources in the three dimensional sphere, they contain four signals,, and, which can carry all of the directional infor- Figure 1: Polar patterns of B-format components. 5

time (sec) VOL.3, NO.4, DECEMBER 1 3 Energetic Analysis Method Energetic analysis method is based on the fact, that the sound source direction is the opposite direction of the intensity vector of the sound. This principle has been used in spatial sound reproduction methods such as directional audio coding (DirAC) [7]. However, it is used in this paper for multiple sound source localization using other criteria. The sound energies are distributed in time and frequency, the spectral density distribution for one of the signals is presented in Figure using spectrogram in Matlab where the length of hamming window was chosen to be 14 samples, the overlaps was chosen to be 5 points, the number of sampling points to calculate the discrete Fourier transform was 5 points, and the sampling frequnecy was 441 Hz. Assuming that there are several sound sources, the energy in some timefrequency points are generated from several sound sources simultaneously. Therefore, from one frequency bin it is not possible to determine all sound source positions. the instantaneous intensity vector and it is defined as [] [ ] (3) The instantaneous intensity vector points to the direction of the flow of sound energy, while the direction of arrival is supposed to be opposite to this direction. We can get the azimuth of the sound source as [] { and the elevation as [ [ ] [ ] ] (5) 3.5-4 - After calculating the angles for each frequency bin in each time frame, a statistical estimation for angle distribution should be done, see Figure (3). - 1.5 1.5-1 -1-14 B-format signals Dividing signals in time Dividing signals into frequency bands Azimuth and elevation estimation Statistical calculation for angles for each time frame The estimated angles.5 1 1.5 Frequency (Hz) x 1 4 Figure. Spectral density distribution for a sample of speech signal recorded by omni-directional microphone. In this method, the sound signals are divided in time and then in frequency using short time Fourier transforms (STFT), where the window was chosen to be hanning window with length of 51 samples, and the overlapping was chosen to be half of the window size. The input signals for this method are B-format signals, the intensity vector can be obtained using following equations for each time frame [] Re{ }, Re{ }, () Re{ }, where Z is the acoustic impedance of the air, t is time, f is frequency, * denotes complex conjugate,,, and are the Fourier transform for the B-format signals,, and respectively, and is Figure 3: A diagram for Energetic Analysis method. In each time frame, we assume that only one sound source is dominant for each frequency bin. This assumption can be hold since that each sound signal differs from others, and the signals have different intensity in time. In this case, each frequency bin has information about one sound source direction. We consider the direction from where the sound signals come from as the direction that is the most repeated in the frequency bins in each time frame. When we have several sound signals that are emitted simultaneously, the direction for each sound source signal is repeated several times in each time frame for different frequency bins. We can obtain the sound source direction as the angle that maximizes the summation of function on the whole frequency interval for each time frame. In case there is only one sound source the estimated direction could be written

frequency bins number of frequency bins time (sec) VOL.3, NO.4, DECEMBER 1 as where is the estimated sound source direction, K is the number of the frequency bins for S is the sound signal, t denotes the time frame index, f is the frequency bin, and is the probability that this signal comes from the direction α. The main difficulties that face this method come generally from background noise, reverberation and microphone noise. However, the sound intensity coming from the sound source is bigger than noise and reverberation intensity. For some time frame, the detection error is bigger when there is no active speaker 4 Simulation Results The method was simulated in Matlab. Figure 4 shows the simulation results for this method. In this scenario we assumed that there are four speakers in the horizontal plane, who speak simultaneously. B-format signals are generated from these sound signals according to (1). With no additional noise, the method was able to estimate the sound source positions perfectly. Peaks in Figure 4 denote the angle from where the sound is coming from. As can be seen from Figure 4, the four sound source positions are estimated correctly. Some frequency bins indicates that the sound signal is coming from other directions. This angle detection error comes from the fact that more than one signal has component in the same frequency bin in the same time frame. 7 5 () by Matlab. The two noise signals were added to each other and asumed to be located in different places arround the microphones. These places were assumed to be equidistantly separated (i.e. 3 degrees from each other). The signal-to-noise ratio (SNR) between the signal and the additional noise signal was about - db, and it is calculated using following equation ( ) ( ) where, and are the average power for and respectively, and is the additional noise signal. The method was also able to detect the sound sources positions for the all speakers, see Figure. 3.5 1.5 1.5.5 1 1.5 Frequency (Hz) x 1 4 Figure 5: Spectral density distribution for a fan noise sound signal. As can be seen from simulation results, adding the additive noise decreases the ability of localization the sound sources. The noise signal influences the intelligibility of the sound source signal and it changes the distribution of sound intensity. However, since the sound source intensity is bigger than the noise intensity, the method is still able to localize the sound sources correctly. -4 - - -1-1 -14 4 3 1 - -15-1 -5 5 1 15 Figure 4: Simulation results with absence of noise. 1 9 7 5 4 3 Two different noise signals were added to each B-format signal. The first noise signal is a fan noise sound. Spectral energy distribution for this signal is presented in Figure 5 using the same spectrogram parameters as in Figure. The second noise signal is pseudo-random noise with a normal distribution with mean zero and standard deviation one which is generated 1 - -15-1 -5 5 1 15 Figure : Simulation result with the present of pseudorandom noise signal and a fan noise signal. 7

number of frequency bins VOL.3, NO.4, DECEMBER 1 5 Experimental results The measurements were carried out in the acoustic laboratory at Department of Telecommunications FEEC, Brno University of Technology where the experiment s conditions were same as in sound control rooms, listening rooms, or in living rooms with high quality listening environment; the laboratory provides semi-diffuse field with reverberation time RT <.3 s in all octave bands. The measurements were carried out only for sound source placed in horizontal plane of the B-format microphone setup. In the first part of the experiment, three people (two men and one woman) were talking simultaneously in forty different positions. The positions were selected arbitrary for each speaker in a circle around the microphones, see Figure 7. One sentence was chosen for all speakers to be said, the speech in each case lasted about seconds. The positions of the speakers for each case were registered and compared with the results of the method. localization less accurate. However, this method was able to estimate the sound source positions correctly in the real environment. Figure : The directional sensitivity of the eight-of- figure microphones [9]. B-format microphones The directions of arrival were estimated for each speakers positions, and the results were compared with the real speakers positions. The results are illustrated using box plots. The boxes have lines at lower quartile, median, and upper quartile values. The whiskers show the extent of the rest of the data. The outliers are presented by red cross outside of the whiskers. 15 1 Figure 7: Speakers positions around the microphones. Two figure-of-eight microphones were used to pick-up the signals,, and one omni-direction microphone was used to pick up the signal. The directional sensitivity of the figure-of-eight microphones is shown in Figure. The results show the ability of this method to estimate the sound source positions correctly. Figure 9 shows one case of the experiment where three speakers were in three different positions around the microphones. As can be seen, the positions of the three speakers are well estimated, where the peaks denote the positions of the speakers. The speaker who stood in position (+1 ) could be considered also to be in position (- 1 ). Since there are multiple sound sources (speakers), the sound intensity differs in time, and for some frequencies the sound energy is coming from more than one speaker. Furthermore, there are a background noise and noise coming from the microphones. These factors together make the sound sources 5 - -15-1 -5 5 1 15 Figure 9: Estimated speakers positions in real environment. As can be seen in Figure 1, the method was able to localize the sound source positions; the median error is between three and four degrees for all positions. The biggest error is about twelve degrees for the third speaker, where the first and the third speakers are two men and the second speaker is a woman.

absolute error (degree) absolute error (degree) VOL.3, NO.4, DECEMBER 1 1 1 4 the target s movement. The error could happen when the target moves too fast. Acknowledgment The described research was performed in laboratories supported by the SIX project; the registration number CZ.1.5/.1./3.7, the operational program Research and Development for Innovation. References Figure 1: Absolute angle error for each speaker in case of three speakers. In the second part of our experiment, four people (two men and two women) talked simultaneously. The same sentence as in the first part of the experiment was chosen to be said. The results showed the ability of the method to localize the sound sources as can be seen in Figure 11. It should be noted that the first and second speaker are women. The median error in this case was about 4 degrees. 1 1 4 first speaker second speaker third speaker first speaker second speaker third speaker fourth speaker Figure 11: Absolute angle error in case of four speakers. [1] Carter, G.C.; "Coherence and time delay estimation," Proceedings of the IEEE, vol.75, no., pp. 3-55, Feb. 197. [] Taff, L.G.;, "Target localization from bearings-only observations," Aerospace and Electronic Systems, IEEE Transactions on, vol.33, no.1, pp.-1, Jan. 1997. [3] Schmidt, R.;, "Multiple emitter location and signal parameter estimation," Antennas and Propagation, IEEE Transactions on, vol.34, no.3, pp. 7-, Mar 19. [4] Yilmaz, O.; Rickard, S.;, "Blind separation of speech mixtures via time-frequency masking," Signal Processing, IEEE Transactions on, vol.5, no.7, pp. 13-147, July 4. [5] Izumi, Yosuke; Ono, Nobutaka; Sagayama, Shigeki;, "Sparseness-Based CH BSS using the EM Algorithm in Reverberant Environment," Applications of Signal Processing to Audio and Acoustics, 7 IEEE Workshop on, vol., no., pp.147-15, 1-4 Oct. 7. [] Benjamin, E.; Heller, A.; Lee,.; "Localization in horizontal-only ambisonic systems, " in Proc. 11st Convention of the Audio Engineering Society, San Francisco,. pp.1. [7] Pulkki, V.; Spatial Sound Reproduction with Directional audio coding J.Audio Eng.Soc.,vol.55,pp.53-51,Jun 7. [] E. Ahonen J.; Pulkki V., Kuech F.; Kallinger M.; Schultz- Amling R.; Directional analysis of sound field with linear microphone array and applications in sound reproduction. In Proc. AES 14th Convention, Amsterdam, The Netherlands, May. [9] AKG- 5 years of innovation [online]. [Citied.11.1]. Accessible from < http://www.akg.com/site/products/powerslave,id,35,pid, 35,nodeid,,_language,EN,view,diagram.html>. Conclusion The energetic analysis method is a good method for multiple sound source localization. It achieved good results in both simulated and real environment. The angle detection errors come from the background noise and the reverberation signals. The method is able to localize more sound sources than the number of the used microphones. The method can be used for tracking mobile targets, when the duration of time frame is chosen to be suitable for the speed of 9