A. Czyżewski, J. Kotus Automatic localization and continuous tracking of mobile sound sources using passive acoustic radar

A. Czyżewski, J. Kotus Automatic localization and continuous tracking of mobile sound sources using passive acoustic radar Multimedia Systems Department, Gdansk University of Technology, Narutowicza 11/12, 80-233 Gdańsk, Poland Keywords: acoustic radar, sound intensity, source localization Abstract A concept, practical realization and applications of the passive acoustic radar for localization and continuous tracking of fixed and mobile sound sources such as: cars, trucks, aircrafts and sources of shooting, explosions were presented in the paper. The device consists of the new kind of multi-channel miniature three dimensional sound intensity sensors invented by the Microflown company and a group of digital signal processing algorithms developed in the Multimedia System Department, Gdansk University of Technology. Contrarily to active radars, the passive acoustic radar does not emit any scanning beam but listens to surrounding sounds and in result it provides information about the directions of arriving acoustical waves. Hence, monitoring of the acoustic field in this way remains unnoticeable. For the sound source localization the two independent 3D sound intensity probes and triangulation technique were used. In order to increase accuracy of the sound source localization an additional algorithm of the resonant narrow-band filtration of acoustic signals was applied. The practical examinations of the sensitivity and accuracy of the developed PAR were conducted in an anechoic chamber and in typical reverberant conditions. The functionality and acoustic properties of the passive radar were examined in details using three types of signals for given environmental conditions: broadband sounds, pure-tones and impulsive sounds. Taking the obtained results of the realized experiments into consideration it was ascertained that even the inconsiderable value of the signal to noise ratio was sufficient to localize sound source suitably. The obtained measurement results including real sounds samples can be remotely sent to the control station. The passive radar can be operated both automatically as a stand-alone unit and in manual mode. The proposed technology can provide the operator with many essential data representing the activity of objects and targets in a given area. Moreover, the automatic and continuous

tracking of the selected sound source movement is also possible. Additional procedures such as: sound source classification module or automatic control of the digital PTZ (Pan Tilt Zoom) camera can be used to extend the usefulness of the presented device. 1. INTRODUCTION A concept, practical realization and applications of a passive acoustic radar (PAR) to automatic localization and tracking of sound sources are presented below. Contrarily to active radars, PAR does not emit a scanning beam but after receiving surroundings sounds it provides information about the directions of incoming acoustical signals. The device consist of a new kind of multichannel acoustic vector sensor (AVS) invented by the Microflown company [1] and dedicated digital signal processing algorithms developed by the authors. Concerning the acoustic properties, the classical beamforming arrays have limited frequency range and a line (or plane) symmetry with significant decreases of resolution at directions far from symmetry axis. Contrarily, the AVS approach is broad-banded, works in 3D, and has a better mathematical robustness [2]. The ability of a single AVS to rapidly determine the bearing of a wideband acoustic source is of essence for numerous passive monitoring systems. 2. ACOUSTIC PARTICLE VELOCITY SENSORS The single acoustic vector sensor measures the acoustic particle velocity instead of the acoustic pressure which is measured by conventional microphones, see e.g. [3]. It measures the velocity of air across two tiny resistive strips of platinum that are heated to about 200 C, see Fig. 1. It operates in a flow range of 10 nm/s up to about 1 m/s. A first order approximation shows no cooling down of the sensors, however particle velocity causes the temperature distribution of both wires to alter. The total temperature distribution causes both wires to differ in temperature. Because it is a linear system, the total temperature distribution is simply the sum of the temperature distributions of the two single wires. Due to the convective heat transfer, the upstream sensor is heated less by the downstream sensor and vice versa. Due to this operation principle, the Microflown can distinguish between positive and negative velocity direction and it is much more sensitive than a single hot wire anemometer and because it measures the temperature difference, the sensitivity is (almost) not temperature sensitive [4].

Fig. 1. (Left) A microscope picture of a standard Microflown. (Right) Dotted line: temperature distribution due to convection for two heaters. Both heaters have the same temperature. Solid line: sum of two single temperature functions: a temperature difference occurs [4] Each particle velocity sensor is sensitive in only one direction, so three orthogonally placed particle velocity sensors have to be used. In combination with a pressure microphone, the sound field in a single point is fully characterized and also the acoustic intensity vector, which is the product of pressure and particle velocity, can be determined [5]. This intensity vector indicates the acoustic energy flow. With a compact probe as given in Fig. 2, the full three dimensional sound intensity vector can be determined within the full audible frequency range 20 Hz up to 20 khz. Fig. 2. A standard three dimensional sound probe (three orthogonally placed Microflowns and a 1/10 sound pressure microphone in the middle). For size comparison one Euro is shown [4] 3. PASSIVE ACOUSTIC RADAR In the Passive Acoustic Radar the audio processing is utilized in two modes: audio slave and audio mater. Its diagram is presented in Fig. 3. While the user operates the PTZ camera manually, the PAR algorithm is in the audio slave mode, performing adaptive changes of sound directivity characteristics 45 wide, respective to camera direction. That allows presenting only the sounds incoming from the view direction. In case the PAR algorithm detects an important sound (for more details see [6][7]), the

system is switched into the audio master mode, and the camera is automatically steered to the direction of the sound source, and operator is informed of detected sound event. Then manual steering is again switched on, therefore the user can further investigate the scene by himself. Foundations of audio processing method and details of both modes are presented below. Acoustic signals acquisition (3D AVS sensor) Angle value def ined by the operator Master Mode Slave Sound event detection Frequency detection and estimation Beam forming Sound event parameterization Narrow-band recursive f iltration Filtered sound stream ready to listen to SVM based sound event classification Sound intensity computation Direction of coming sound determination PTZ camera control Fig. 3. The block diagram of the PAR algorithm 3.1. Sound source direction detection audio master The algorithm of the PAR is based on 3D sound intensity component determination. In the first step, the particular acoustic signals are captured. In the second step, the dominant frequency of the sound is estimated based on the FFT coefficients and using Quinn's First Estimator [8]. Next, the frequency value is used to design the narrow-band recursive filter [9]. The result of the recursive filtration is finally used to compute the particular sound intensity components. Sound intensity is the average rate at which sound energy is transmitted through a unit area perpendicular to the specified direction at the considered point. The intensity in a certain direction is the product of sound pressure (scalar) p(t) and the particle

velocity (vector) component in that direction u(t). The time averaged intensity I in a single direction is given by Eq. 1 [10]: 1 I = p( t) u( t) dt T T It is important to emphasize that using the presented AVS, the particular sound intensity components can by simply obtained just based on Eq. 1. The sound intensity vector in three dimensions is composed of the acoustic intensities in three orthogonal (1) directions (x,y,z) and is given by Eq. 2 [10]: r r r r I = I e + I e + I e (2) x x y y z z In the presented algorithm the time average T (Eq. 1) was 4096 samples (sampling frequency 48kHz). It means that the direction of the sound source was updated more than 10 times per second. 3.2. Sound source direction filtration audio slave If the sources are in the far field, it is possible to create a virtual microphone with a variable directivity pattern. The pressure microphone has an omnidirectional pattern and the Microflown probe has a figure of eight pattern. The axis of zero sensitivity varies when a summation is made with the sound pressure signal (p) and particle velocity signal (u). The line of zero sensitivity is at 90 for a pure velocity signal u, the line of zero sensitivity is at 0 for p+u and at 180 for p-u. So the line of zero sensitivity is steered by the ratio of p and u in the summation (Fig. 4). This line of zero sensitivity is very sharp so it is possible to find out if there are one, two or even more sound sources [5]. Fig. 4: Steering of line of zero sensitivity by pressure signal p and velocity signal u summations [5] A cardioid type of directivity is achieved when the signal of the velocity u measured by probe positioned in normal direction is summed with the sound pressure signal p (Fig. 5a). The cardioid response can be shaped to a response more similar to an

unidirectional response by subtracting the squared velocity signal from lateral direction probe. This is shown in Fig. 5b. a) b) Fig. 5: Directivity patterns obtained by combination of a microphone and probe signals: a) omnidirectional microphone characteristic (left) summed with a figure of eight normal probe characteristic (middle) creates a cardioid microphone characteristic (right), b) The response of a cardioid (left) minus the squared response of the lateral velocity probe (middle) results in a response that is almost similar to an unidirectional microphone [5]. 4. METHODOLOGY OT THE PRACTICAL EVALUATION OF THE ACOUSTIC RADAR 4.1 Anechoic chamber evaluation The practical examinations of the sensitivity and accuracy of the developed PAR were conducted in an anechoic chamber and in typical reverberant conditions. Pure tones, 1/3 octave band noise from 125 to 16000 Hz and impulsive sounds were used. The AVS and measuring microphone were located in the same place to ensure identical acoustic field conditions. The acoustic sound pressure level (SPL) was measured using Bruel&Kjær PULSE system type 7540 with microphone type 4189, calibrated before the measurements using the acoustic calibrator type 4231. The set up of the measuring system was presented in Figs 6 and 7. The pure tone and noise test signal for particular frequency were presented twice. In first time only test signal from one loudspeaker was emitted. The test signal has two phases: starting phase with constant sound level of the sound and decay phase in which the sound level was monotonously decreased 1dB/s. Next, the same test signal was presented simultaneously with the additional disturbing pink noise. For both session the sound pressure level and angle value were noticed. Additionally, the sound level of the background noise for both sessions was determined. Such kind of data were used to

properly compute the sensitivity of the radar expressed by the absolute sound pressure level and its accuracy in the disturbing conditions expressed by the Signal-To-Noise ratio as is in Eq. 3, for particular frequencies [11],[12]. SNR db = SPL Signal db SPL Noise db (1) To properly obtain the SNR db indicator the two sessions of the measurements were required. During the first session, the SPL Signal db was determined. In the next session, the background noise level was obtained (SPL Noise db ). For that session the test signal was presented from one loudspeaker and the noise from another loudspeaker. For that conditions the values of the angle of the sound source were determined. CONTROL ROOM ANECHOIC CHAMBER PULSE TYPE 7540 90 Amp. TYPE 2716C Generator Recording and Processing Channel A Channel B Microphone TYPE 4189 1.3 m 1.3 m 0.005 m USP conditioning module USP Probe Fig. 6. Block diagram and equipments used during the measurements Fig. 7. Details of the measurements set up 4.2 Evaluation in the real acoustic conditions Evaluations of the localization and tracking of the sound sources were performed in the real acoustic conditions with many reflections and different reverberation times. An example of the system setup for the shooter localization using two AVS and triangulation technique is presented in Fig. 8. Additionally, the screen shot of the PAR application is shown. Here the results of the sound source classification and localization

can noticed. The PAR application automatically rotates the camera towards the dangerous sound source. Fig. 8. An example of a setup used during sound source localization measurements and application window acquired from a digital PTZ camera. 5. MEASUREMENT RESULTS 5.1 Anechoic chamber The obtained PAR sensitivity measured in the anechoic chamber was about 45 db SPL. The combined SNRdB results for all examined frequencies and both kinds of test signals are presented in Fig. 9 (A and B). The disturbing noise source was active and its level was equal to 62 db SPL. The following values of SNR were assigned for the given accuracy levels: ±1, ±3, ±5, ±10, ±15, ±30 and ±45. Taking obtained results into consideration, it was asserted that the developed PAR algorithm has very good features in continuous localization for all considered signals and frequencies. It is important to emphasize that recursive filtration can significantly improve correctness of the localization of the tonal-type source. A) B)

Fig. 9. Angular resolution of sound source localization for various source levels L and various noise conditions SNR. Triangle widths portray resolutions, triangle heights match L and SNR values on radial plot. Averaged results for all examined frequencies: A) absolute sensitivity L [dbspl], B) averaged SNRdB results. T pure tones, the recursive filtration was applied, N noisy test signals In Fig. 10 the example angle error as a function SNR level for 2000 Hz pure tone was presented. SNR [db] 25 20 15 10 5 0-5 -10-15 -20-25 SNRdB Angle values Averaged angle values Angle error [ o ] 50 40 30 20 10 0-10 -20-30 -40-50 1 23 45 67 89 111 133 155 177 199 221 243 265 287 309 331 353 375 397 419 441 463 485 507 529 551 573 Frame number Fig. 10. Example angle error as a function of the SNR level for 2000 Hz In acoustic radar examinations, several kinds of impulsive sounds were used. During first session the noise-like burst for different time length was used. The impulse length was equal to respectively: 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768 samples. It corresponds to time periods from 0.0007s to 0.6827s. The next signal was based on 4000 Hz tone burst with the same sample lengths. The level of that signals was constant and was 30 db greater than the background noise (45 db SPL). The results obtained for the particular kind of tests were presented in Fig. 11. For the tonal burst the recursive filtration was also applied (blue line). Another kind of impulsive test signal employed 4000 Hz. It has 4096 sample length and its amplify was decreased by 3 db in 12 steps from SNR db = 27 to SNR db = -6. The obtained results were presented in Fig. 12. The recursive filtration was also applied (blue line).

Angle error [ o ] 4000 Hz, rec f. on 4000 Hz Pink noise 25 20 15 10 5 0 0.0001 0.0010 0.0100 0.1000 1.0000 Impulse length [s] Fig. 11. Example results obtained for different kind of impulsive sounds. Noise and tonal burst with different length were used Angle error [ o ] 30 25 20 4000 Hz, rec f. on 4000 Hz 15 10 5 SNR [db] 0-10 -5 0 5 10 15 20 25 30 Fig. 12. Results obtained for impulse with different SPL level 5.2 Real acoustic conditions The obtained sound source localization results are presented in Figs. 13 and 14. The alarm signal revolver was used as a sound source. For every location three shoots were generated. In Figs. 4 and 5 the averaged computed shooter positions are shown. The Mean Squared Error (MSE) for the angle and for X and Y coordinates determination was additionally calculated. The MSE for angle determination was 5.5 and for X and Y coordinates was 0.7 m. During the continuous sound source tracking experiment a loudspeaker and pink noise were used. The sound level of the signal was 55 db SPL. The loudspeaker was moved by the person around the room between the reference points: P1 P9. The sound level of the source obtained near the point P1 was used as the reference value to calculate the distance between the sound source and the AVS. The distance of the sound

source was determined based on the decay of the sound level. The additional reflection significantly increased the computed sound level and disturbed the proper determination of the distance to the sound source. The results obtained are presented in Fig. 15. Y [m] 12.0 P9 Computed Shooter Position 10.0 P8 Real Shooter Position 8.0 P7 P6 AVS Pos itions 6.0 P5 4.0 P4 P3 2.0 P2 AVS1 AVS2 P1 0.0-8.0-6.0-4.0-2.0 0.0 2.0 4.0 6.0 8.0 X [m] 10.0 α [ O ] AVS1 Ref. AVS1 Ref. AVS2 90 80 70 60 50 40 30 20 10 0 AVS1 β [ O ] AVS2 AVS2 180 170 160 150 140 130 120 110 100 90 P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 Fig. 13. The sound source localization results (the position of the AVS and real shooter position are also shown) and the source localization for a particular AVS. Experiments were conducted in the Gdansk University of Technology main building. Y [m] Real Shooter Position Computed Shooter Position 2.5 2.0 1.5 P11 P10 P9 P8 P7 1.0 0.5 P12 AVS 0.0 P6-0.5-1.0-1.5-2.0 P1 P2 P3 P4 P5-2.5-6.5-6.0-5.5-5.0-4.5-4.0-3.5-3.0-2.5-2.0-1.5-1.0-0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 X [m] 7.5 Fig. 14. The shooter localization results obtained using single AVS. The real shooter positions are also shown. Experiments conducted in the seminar room. Y [m] Ref. Points Position Computed Sound Source Position 2.5 2.0 P8 1.5 P9 P7 1.0 0.5 AVS 0.0 P6-0.5-1.0-1.5-2.0 P1 P2 P3 P4 P5-2.5-6.5-6.0-5.5-5.0-4.5-4.0-3.5-3.0-2.5-2.0-1.5-1.0-0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 X [m] 7.5 Fig. 15. An example of results of the continuous tracking of a mobile sound source (seminar room).

5.3. Sound source classification results The accuracy of sound event detection and classification was strictly dependent on noise level. To detect an alarming sound event, the equivalent sound level of the currently processed audio frame was calculated. The sound level was then compared with the threshold value. The threshold value varies depending on types of events recognized. In this experiment the threshold was set to 70 db SPL for classes 1 (explosion) and 3 (gunshot) and to 48 db SPL for remaining 2 classes. Examples from the event database were recorded with a low level of noise. To determine the system s ability to work in a noisy environment, the relation between accuracy and SNR db was examined. The rate of correctly classified instances over 90% was obtained if the SNR db for broken glasses and screams was greater than 30 db and for gunshots and explosions greater than 25 db. If the noise level increases, less objects are correctly classified and more false positive results are observed. 6. CONCLUSIONS A concept and the results of testing the passive acoustic radar were presented in the paper. Different types of test signals were used. The obtained results of the realized experiments ascertain that even the insignificant value of the signal to noise ratio (SNR db close to 0 db) are sufficient to localize sound source reliably. The application of the recursive filtration can significantly improve sensitivity and accuracy of the passive acoustic radar (SNR db below -10 db for tonal components). Such a kind of filtration can be used to discriminate between multiple sources. Examinations using impulsive sounds indicated that the presented PAR algorithm properly detects and localizes the source in real acoustic conditions. Moreover, automatic and continuous tracking of the selected sound source movement in real time is also possible. Additional procedures such as sound source classification module or automatic control of the digital PTZ camera can be used to extend the usefulness of the engineered device. The proposed device can significantly improve the functionality of the traditional surveillance monitoring systems. ACKNOWLEDGMENTS Research is subsidized by the European Commission within FP7 project "INDECT" (Grant Agreement No. 218086).

BIBLIOGRAPHY [1] Microflown Technologies Home: http://www.microflown.com/ [2] M. Hawkes, A. Nehorai, Wideband Source Localization Using a Distributed Acoustic Vector-Sensor Array; IEEE Transactions on Signal Processing, vol. 51, no. 6, June (2003). [3] H.-E. de Bree, The Microflown: An acoustic particle velocity sensor, Acoustics Australia 31, 91-94 (2003) [4] H.-E. de Bree, The Microflown, E-book: http://www.microflown.com/r&d_books_ebook_microflown.htm [5] J. de Vries, H.-E. de Bree, Scan & Listen: a simple and fast method to find sources, SAE Brazil (2008) [6] P. Zwan, A. Czyzewski, Automatic sound recognition for security purposes, Proc. 124th Audio Engineering Society Convention, Amsterdam, 2008. [7] P. Zwan, P. Sobala, P. Szczuko, A. Czyzewski, Multimedia Services In Intelligent Environments, Audio Content Analysis In the Urban Area Telemonitoring System, 2009. [8] B. G. Quinn, Estimating frequency by interpolation using Fourier coefficients, Signal Processing, IEEE Transactions on, 42(5):1264 1268, May 1994. [9] S.W. Smith, The Scientist and Engineer's Guide to Digital Signal Processing, California Technical Publishing (1997) [10] T. Basten, H.-E. de Bree, E. Tijs, Localization and tracking of aircraft with ground based 3D sound probes, ERF33, Kazan, Russia, 2007. [11] M. Lagrange, S. Marchand, Estimating the Instantaneous Frequency of Sinusoidal Components Using Phase-Based Methods, J. Audio Eng. Soc., Vol. 55, No. 5, May (2007) [12] Signal-to-noise ratio - Wikipedia, the free encyclopedia: http://en.wikipedia.org/wiki/signal-to-noise_ratio