Sound Source Localization in Median Plane using Artificial Ear

Similar documents
Sound Source Localization using HRTF database

HRIR Customization in the Median Plane via Principal Components Analysis

Computational Perception. Sound localization 2

Robotic Spatial Sound Localization and Its 3-D Sound Human Interface

EE1.el3 (EEE1023): Electronics III. Acoustics lecture 20 Sound localisation. Dr Philip Jackson.

TDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones and Source Counting

Spatial Audio Reproduction: Towards Individualized Binaural Sound

Proceedings of Meetings on Acoustics

A Virtual Audio Environment for Testing Dummy- Head HRTFs modeling Real Life Situations

Binaural Speaker Recognition for Humanoid Robots

University of Huddersfield Repository

Simultaneous Recognition of Speech Commands by a Robot using a Small Microphone Array

Microphone Array Design and Beamforming

Study on method of estimating direct arrival using monaural modulation sp. Author(s)Ando, Masaru; Morikawa, Daisuke; Uno

Acoustics Research Institute

Introduction. 1.1 Surround sound

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

BINAURAL RECORDING SYSTEM AND SOUND MAP OF MALAGA

Enhancing 3D Audio Using Blind Bandwidth Extension

Convention Paper 9870 Presented at the 143 rd Convention 2017 October 18 21, New York, NY, USA

A triangulation method for determining the perceptual center of the head for auditory stimuli

3D sound image control by individualized parametric head-related transfer functions

From Monaural to Binaural Speaker Recognition for Humanoid Robots

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA. Why Ambisonics Does Work

Proceedings of Meetings on Acoustics

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

From Binaural Technology to Virtual Reality

Active Audition for Humanoid

Multiple Sound Sources Localization Using Energetic Analysis Method

Sound Source Localization in Reverberant Environment using Visual information

Three-Dimensional Sound Source Localization for Unmanned Ground Vehicles with a Self-Rotational Two-Microphone Array

PAPER Enhanced Vertical Perception through Head-Related Impulse Response Customization Based on Pinna Response Tuning in the Median Plane

HRTF measurement on KEMAR manikin

High performance 3D sound localization for surveillance applications Keyrouz, F.; Dipold, K.; Keyrouz, S.

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

University of Huddersfield Repository

Auditory System For a Mobile Robot

The psychoacoustics of reverberation

Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model

Robust Low-Resource Sound Localization in Correlated Noise

Binaural Hearing. Reading: Yost Ch. 12

Excelsior Audio Design & Services, llc

ROOM AND CONCERT HALL ACOUSTICS MEASUREMENTS USING ARRAYS OF CAMERAS AND MICROPHONES

Ivan Tashev Microsoft Research

A five-microphone method to measure the reflection coefficients of headsets

EFFECTS OF PHYSICAL CONFIGURATIONS ON ANC HEADPHONE PERFORMANCE

A learning, biologically-inspired sound localization model

ENHANCED PRECISION IN SOURCE LOCALIZATION BY USING 3D-INTENSITY ARRAY MODULE

Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

Analysis of Frontal Localization in Double Layered Loudspeaker Array System

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks

Binaural Sound Source Localization Based on Steered Beamformer with Spherical Scatterer

Design and Evaluation of Two-Channel-Based Sound Source Localization over Entire Azimuth Range for Moving Talkers

Evaluating Real-time Audio Localization Algorithms for Artificial Audition in Robotics

Proceedings of Meetings on Acoustics

Recent Advances in Acoustic Signal Extraction and Dereverberation

Spatial Audio & The Vestibular System!

Active Noise Cancellation System Using DSP Prosessor

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Speech Synthesis using Mel-Cepstral Coefficient Feature

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Spatial audio is a field that

Search and Track Power Charge Docking Station Based on Sound Source for Autonomous Mobile Robot Applications

III. Publication III. c 2005 Toni Hirvonen.

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

On distance dependence of pinna spectral patterns in head-related transfer functions

SOUND 1 -- ACOUSTICS 1

Effect of the number of loudspeakers on sense of presence in 3D audio system based on multiple vertical panning

COM325 Computer Speech and Hearing

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Upper hemisphere sound localization using head-related transfer functions in the median plane and interaural differences

Proceedings of Meetings on Acoustics

Psychoacoustic Cues in Room Size Perception

HRTF adaptation and pattern learning

Eyes n Ears: A System for Attentive Teleconferencing

Virtual Acoustic Space as Assistive Technology

Reducing comb filtering on different musical instruments using time delay estimation

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Proceedings of Meetings on Acoustics

Automotive three-microphone voice activity detector and noise-canceller

SOUND SOURCE LOCATION METHOD

A Predefined Command Recognition System Using a Ceiling Microphone Array in Noisy Housing Environments

Implementation of Speaker Identification Using Speaker Localization for Conference System

ORIENTATION IN SIMPLE VIRTUAL AUDITORY SPACE CREATED WITH MEASURED HRTF

Proceedings of Meetings on Acoustics

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Audio Engineering Society. Convention Paper. Presented at the 131st Convention 2011 October New York, NY, USA

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O.

THE MATLAB IMPLEMENTATION OF BINAURAL PROCESSING MODEL SIMULATING LATERAL POSITION OF TONES WITH INTERAURAL TIME DIFFERENCES

Nonuniform multi level crossing for signal reconstruction

Performance Analysis of Parallel Acoustic Communication in OFDM-based System

Sound Processing Technologies for Realistic Sensations in Teleworking

Extracting the frequencies of the pinna spectral notches in measured head related impulse responses

Aalborg Universitet. Binaural Technique Hammershøi, Dorte; Møller, Henrik. Published in: Communication Acoustics. Publication date: 2005

Auditory Localization

3D Sound System with Horizontally Arranged Loudspeakers

THE TEMPORAL and spectral structure of a sound signal

Subband Analysis of Time Delay Estimation in STFT Domain

Multichannel Audio Technologies. More on Surround Sound Microphone Techniques:

Integrated Vision and Sound Localization

Transcription:

International Conference on Control, Automation and Systems 28 Oct. 14-17, 28 in COEX, Seoul, Korea Sound Source Localization in Median Plane using Artificial Ear Sangmoon Lee 1, Sungmok Hwang 2, Youngjin Park 3,Youn-sik Park 4 1 Department of Mechanical Engineering, KAIST, Daejeon, Korea (Tel : +82-42-869-364; E-mail: smansl@kaist.ac.kr) 2 Department of Mechanical Engineering, KAIST, Daejeon, Korea (Tel : +82-42-869-376; E-mail: tjdahr78@kaist.ac.kr) 3 Department of Mechanical Engineering, KAIST, Daejeon, Korea (Tel : +82-42-869-336; E-mail: yjpark@kaist.ac.kr) 4 Department of Mechanical Engineering, KAIST, Daejeon, Korea (Tel : +82-42-869-32; E-mail: yspark@kaist.ac.kr) Abstract: Sound source localization is the method using the measurements of the acoustic signals from microphone arrays in acoustical engineering. This technique has been used broadly in 3-D sound technology, humanoid robot and teleconferencing and so on. For robot industry, their ultimate purpose is to be with human being. This is why the industry is demanding applicable robot s auditory system in the form of artificial ears like human s external ear such as ear pinna. It has more benefits to make use of auditory system with ear pinna to humanoid robots for HRI. In this paper, we propose a specific sound source localization method using a pair of artificial ears, each of which consisting of a single ear pinna and two microphones. The feasibility and localization performance of proposed method for speech signal in median plane is shown. Through the experiment in office environment, we confirm that robots with artificial ears can estimate the elevation angle of speech signal just using two microphone output signals. Keywords: Sound source localization, Relative Transfer Function(RTF), group delay, artificial ear 1. INTRODUCTION Sound source localization is a listener s capability to estimating the direction or position of detected sound and also indicates the methods using the measurements of the acoustic signals from microphone arrays in acoustical engineering [1]. Compared to vision which is a well-directed sense, hearing is an undirected sense, i.e. omni-directional. Such ability not constrained in the field of view can supplement vision to identify the location of events of interest outside the field. The other way, visual information compensates localization error of audition as well. For example of application to humanoid robots, audio-visual integration that is the combined use of speech and face recognition can improve recognition of speech signals produced by using a pair of microphones [2-3]. And particularly when vision is blocked by barriers or speakers can t be recognized due to darkness, auditory information plays significant role in this case. Thus, sound source localization as an auditory perception is the step of more natural Human-Robot Interaction [4]. Sound source localization technique has been used broadly in 3-D sound technology, humanoid robot and teleconferencing and so on. For robot industry, their ultimate purpose is to be with human being. This is why the industry is demanding applicable robot s auditory system in the form of artificial ears like human s external ear such as ear pinna. It has more benefits to make use of auditory system with ear pinna to humanoid robots for HRI. Several trials to apply artificial ears for sound source localization for robots have been existed. For instance, the use of vision and audio sensor for sound source localization can make robots learn how to improve their initial localization ability by using supervised learning or visual information [5-7]. However, their learning process can be done only if speakers get in the field of view and they need training time whenever the place where robots exist changes and also need additional imaging systems. The SIG, humanoid robot, uses two pairs of microphones, one of which is on each ear position and the other one is installed inside the cover for cancelling noise induced by motors [8-9]. Keyrouz and Saleh proposed binaural localization using HRTF database measured by four microphones, two placed inside and two outside the ear canals of a KEMAR (Knowles Electronics manikin for Auditory Research) humanoid head. They showed localization performance for the sounds with a large bandwidth such as fingers snapping or percussive noises [1]. They used the direction-dependent spectral features corresponding to different sound source location. But, these spectral features were not in the region of voice frequency band. In order to overcome this problem, Hwang and Park applied artificial ears of large size to robots. They have shown that speech signal can be localized by using their proposed method [11]. Although they all use binaural localization, i.e. two-ear system, their proposed systems are hardly applicable for humanoid robot due to specific reasons mentioned above i.e. the use of too large ear pinna and possibility for sound with large band width. In this paper, we propose a specific sound source localization method using artificial ears consisting of ear pinna and ear canals using four microphones. Our proposed auditory system uses a spherical head and two ears, each of which composes of a single pinna and a pair of microphones. The feasibility and localization performance of proposed method for speech signal in median plane given limited computational resources is shown.

Through the experiment in office environment, we confirm that robots with artificial ears can estimate the elevation angle of speech signal just using two microphone output signals. front-back discrimination. Therefore, relative placement of microphones and ear pinna has essential part for front-back discrimination and possible localization range. 2. PROPOSED ARTIFICIAL EAR DESIGN AND HEAD SHAPE The artificial ears and spherical head model were manufactured as depicted in Fig. 1. Fig. 2 Placement of two microphones and ear pinna. Fig. 1 Proposed artificial ear built in a spherical head. Both shape and size of ear pinna attached to the ear flange were designed to attain spectral features distributed in the frequency range from 3 to 4 khz using the Diffraction and Reflection Model(DR model) suggested by Lopez-Poveda and Meddis for accurate reproduction of the spectral notches for elevated sources [12]. However, this DR model was applicable only at the positions in the concha aperture. Thus, on account of the limited reproduction region, we did experiment using several microphones as presented in Fig. 1 (left). 3. FRONT-BACK DISCRIMINATION AND PLACEMENT OF TWO MICROPHONES AND EAR PINNAE 3.1 Front-back confusion When two microphones in free field are used for localization of sound sources in 2-D space, two points sharing the same ITD (Inter-channel Time Difference) will exist and this phenomena is called front-back confusion. And the set of these points in 3-D space is often called cone of confusion [13]. Since the locations of all sounds originating from points on this cone are distinguishable. 3.1 Placement of two microphones and ear pinna If we use just two microphones for localization of sound sources in median plane, then front-back confusion will happen. As depicted in Fig. 2, when ear pinna is missing, cone of confusion occurs with respect to dotted line passing by two attached microphones. To get over cone of confusion, we located an ear pinna to pass between two microphones as shown Fig. 2. As sound source is elevated from lower to upper region, at a single elevation angle of sound source, the microphone output signal levels measured by two microphones will be equal. By letting this elevation position locate in the dotted line, we can perform the 4. ELEVATION ESTIMATION METHOD 4.1 Relative Transfer Function (RTF) Information of input sound is unknown in most practical situations. Especially, in case of localization of voice signal, their characteristics are rapidly changing from word to word and even remarkably dependent on individuals. Therefore, Relative Transfer Function (RTF) measured from two output signals can be useful and applicable as long as RTF doesn t have much side effect induced by addictive noises such as reflected sounds from environmental physical factors. Gxy ( fk ) RTF( fk ) = (1) G ( f ) RTF computation can be done using (1) equation [14]. 4.2 Cleansing method Measured RTF is no longer reliable if reflective wave has more major contribution to both two microphones than by direct wave. Thus, the cleaning procedure is necessary to avoid this side effect due to reflections that makes our auditory system to be hard to estimate real sound source positions accurately. We cleansed out RTF using hamming window whose length 67. 2π n α β.cos( ) w[n]= M, n M, α =.54, β =.46,, otherwise. xx k (2) Windowing length was determined from the smallest distance between a microphone and a dominant reflecting surface [15]. An example of cleansing process is in Fig. 3.

Fig. 3 Original Relative Impulse Response (RIR) (blue-line), cleansed RIR(red-line) and hamming window(black-line) are shown. As shown above, the obtained Relative Impulse Response (RIR), counterpart of RTF in time domain, in real environment has reflected components from the objects surrounding the listener. We can exclude reflective waves by using cleansing process. 4.3 Estimation Time Delay of Arrival (TDOA) Group delay is a measure of transmitt time of a signal from the input and output port. By using RTF phase response, we can obtain group delay and we also measure TDOA between microphone U and B [16]. 1 d Group Delay= ( RTF ( f k )) (3) 2π df By applying free-field and far-field conditions, we can directly measure the sound source direction. 5. EXPERIMENT IN AN OFFICE ENVIRONMENT 5.1 Selected microphone positions Proposed localization method mainly relies on RTF s phase responses. On the other hand, front-back discrimination is based on RTF s magnitude response associated with relative placement of microphones and artificial ear pinna in order to avoid cone of confusion. So, we selected microphones, one (Mic. B) of which is behind the ear pinna and the other (Mic. U) is in the upper part of ear flange for having less side effect by tune table refection as shown Fig. 1. Artificial ears fitted with the microphones and experimental set-up is shown in Fig. 4. 5.2 Verification of proposed artificial ear and localization method in median plane In an office environment, there are a lot of noise sources that makes localization performance worse. An experiment was carried out in an office environment. The size of room is 7m 13m 2.5m. Background noise level in this room is 45dB and SNR is 25dB. Male speech signals used as input voice signal are voice 1 ANG NYEONG HA SE YO and voice 2 BANG GAP SEUP NI DA.. The distance between a speaker and the center of artificial head is fixed apart from 1.2m. RTF magnitude response is shown in Fig. 5. Fig. 5 RTF magnitude response For the quantitative measure, Inter-channel Level Difference (ILD) is used for front-back discrimination [16]. ILD= 2log ( RTF( f ) ) = 1 n= N 1 n= 2log ( RTF ( f ) ) df db (4) In Fig. 6, computed ILD is shown for sound source positions. 1 n= N 1 n= df n UB n n Fig. 6 ILD profile Fig. 4 The experimental set-up in an office environment We can find that ILD shift occurs with respect to 6 position having same ILD. As shown in Fig. 2 before, in this case, ρ is equal to 6 and if ILD is less than db, then sound source is located below 6 and if ILD is larger than db, then sound source is located above 6. After front-back discrimination is accomplished, we can find the elevation angle of sound source by using RTF phase response in order to measure group delay

which means TDOA. Estimation performance and errors for sound sources located on the median plane from -3 to 21 elevation angle are shown below in Fig. 7 and Table 1. Estimated Elevation Angle [degree] 21 18 15 12 9 6 3 Sound Source Localization Performance -3-3 3 6 9 12 15 18 21 True Elevation Angle [degree] Fig. 7 Localization performance for voice 1(red) and voice 2(green) Table 1 Estimation error Elevation angles (degrees) Estimation range -3 ~ 21-3 ~ 7 8 ~ 11 12 ~ 21 Voice 1 4. 1.7 15.9 3.3 Voice 2 5.1 1.9 2..6 4.1 We found out that front-back discrimination can be operated by using RTF magnitude response because magnitude level changes up and down with respect to 6 elevation angle and also by using phase responses of RTF, it is possible to estimate elevation angles of sound sources on median plane. 5. CONDLUSIONS AND FUTURE WORKS We proposed a design of artificial ears consisting of a single ear pinna and two microphones and sound localization method using this artificial ear. By placing ear pinna and two microphones appropriately, we can solve front-back confusion problem. Although we use ear pinna of 7cm characteristic length, our proposed method is applicable for localization of speech signals. Through the experiment conducted in office environment, we showed the feasibility of proposed localization method for sound sources on median plane. In the near future, we ll determine the optimal positions of microphones for sound sources in 3-D space by the experiment in office environment and investigate localization performance. 6. ACKNOWLEDGEMENT This work was supported by the BK21 program, the Intelligent Robotics Development Program, and the Korea Science and Engineering Foundation through the national Research laboratory Program (RA-25-- 1112-) funded by the ministry of Education, Science and Technology. REFERENCES [1] M. S. Brandstein and H. Silverman, A practical methodology for speech source localization with microphone arrays, Computer Speech and language, Vol. 11, No. 2, pp. 91-126, 1997. [2] Y. Sasaki and S. Kagami and H. Mizoguchi, Multiple sound source mapping for a mobile robot by self-motion triangulation, In the Proceeding of the 26 IEEE/RSJ International Conference on Intelligent Robots and Systems, October 9-15, 26. [3] Kazuhiro Nakadai and Daisuke Matsuura and Hiroshi G. Okuno and Hiroshi Tsujino, Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots, Speech Communication, Vol. 44, pp. 97-112, 24. [4] Ira J. Hirsh and Charles S. Watson, Auditory psychophysics and perception, Annual Review of Psychology, Vol. 47, pp. 461-484, 1996. [5] Hiromichi Nakashima and Toshiharu Mukai, 3D Sound Source Localization System Based on Learning of Binaural Hearing, IEEE International Conference on Systems, Man and Cybernetics, 25. [6] Arabi, P and Zaky, S, Integrated Vision and Sound Localization, Proceedings of the third international conference on information fusion, Vol. 3, pp. 21-26, 2. [7] J Hornstein and M. Lopes and J. Santos-Victor and Francisco Lacerda, Sound Localization for Humanoid Robots-Building Audio-Motor Maps based on the HRTF, Proceedings of the 26 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, October 9-15, 26. [8] Kazuhiro Nakadai and Hiroshi G. Okuno and Hiroaki Kitano, Real-time sound source localization and separation for robot audition, In Proceedings of IEEE international Conference on Spoken Language Processing, pp. 193-196, 22. [9] Hiroshi G. Okuno and Kazuhiro Nakadai and Hiroaki Kitano, Social Interaction of Humanoid Robot based on Audio-Visual Tracking, Proceedings of Eighteenth International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (IEA/AIE-22), Vol. 2358, pp.725-735, 22. [1] F. Keyrouz and A. Abous Saleh, Intelligent Sound Source Localization Based on Head-related Transfer Functions, IEEE International Conference on Control, Automation and Systems, pp. 97-14, 27. [11] S. Hwang, Y. Park and Y. Park, Sound direction estimation using artificial ear, In the proceeding of the International Conference on Control, Automation and Systems, pp. 196-191, October 17-2, 27. [12] E. A. Lopez-Poveda and Ray Meddis, A physical model of sound diffraction and reflections in the human concha, Journal of the Acoustical Society

of America, Vol. 1, No. 5, pp. 3248-59, 1996. [13] C. I. Cheng and G. H. Wakefield, Introduction to Head-Related Transfer Functions (HRTFs): Representations of HRTFs in Time, Frequency, and Space, Journal of Audio Engineering of Society, Vol. 49, No. 4, pp. 231-249, 21. [14] Julius S. Bendat and Allan G. Piersol, Random Data: Analysis and Measurement Procedures, Wiley New York, 1999. [15] Sangmoon Lee and Youngjin Park and Youn-sik Park, Sound direction estimation using artificial ear for Human-Robot Interface, Control, Automation and Systems Symposium, October 14, 28, Seoul, Korea. [16] Jens Blauert, Spatial hearing, revised edition, MIT press, 1997.