Study on method of estimating direct arrival using monaural modulation sp. Author(s)Ando, Masaru; Morikawa, Daisuke; Uno

Similar documents
METHOD OF ESTIMATING DIRECTION OF ARRIVAL OF SOUND SOURCE FOR MONAURAL HEARING BASED ON TEMPORAL MODULATION PERCEPTION

Computational Perception. Sound localization 2

Proceedings of Meetings on Acoustics

Towards an intelligent binaural spee enhancement system by integrating me signal extraction. Author(s)Chau, Duc Thanh; Li, Junfeng; Akagi,

Acoustics Research Institute

3D sound image control by individualized parametric head-related transfer functions

Enhancing 3D Audio Using Blind Bandwidth Extension

Sound Source Localization using HRTF database

Method of Blindly Estimating Speech Transmission Index in Noisy Reverberant Environments

Computational Perception /785

Feasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants

Binaural Hearing. Reading: Yost Ch. 12

TDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones and Source Counting

Binaural hearing. Prof. Dan Tollin on the Hearing Throne, Oldenburg Hearing Garden

ORIENTATION IN SIMPLE VIRTUAL AUDITORY SPACE CREATED WITH MEASURED HRTF

IMPROVED COCKTAIL-PARTY PROCESSING

Proceedings of Meetings on Acoustics

Convention Paper 9870 Presented at the 143 rd Convention 2017 October 18 21, New York, NY, USA

A Virtual Audio Environment for Testing Dummy- Head HRTFs modeling Real Life Situations

III. Publication III. c 2005 Toni Hirvonen.

Listening with Headphones

Upper hemisphere sound localization using head-related transfer functions in the median plane and interaural differences

University of Huddersfield Repository

Adaptive Filters Application of Linear Prediction

HRIR Customization in the Median Plane via Principal Components Analysis

A binaural auditory model and applications to spatial sound evaluation

PERSONALIZED HEAD RELATED TRANSFER FUNCTION MEASUREMENT AND VERIFICATION THROUGH SOUND LOCALIZATION RESOLUTION

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

EE1.el3 (EEE1023): Electronics III. Acoustics lecture 20 Sound localisation. Dr Philip Jackson.

Intensity Discrimination and Binaural Interaction

Convention Paper Presented at the 139th Convention 2015 October 29 November 1 New York, USA

The analysis of multi-channel sound reproduction algorithms using HRTF data

Monaural and binaural processing of fluctuating sounds in the auditory system

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4

The relation between perceived apparent source width and interaural cross-correlation in sound reproduction spaces with low reverberation

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

I R UNDERGRADUATE REPORT. Stereausis: A Binaural Processing Model. by Samuel Jiawei Ng Advisor: P.S. Krishnaprasad UG

Robotic Spatial Sound Localization and Its 3-D Sound Human Interface

Psychoacoustic Cues in Room Size Perception

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend

Exploiting envelope fluctuations to achieve robust extraction and intelligent integration of binaural cues

PAPER Enhanced Vertical Perception through Head-Related Impulse Response Customization Based on Pinna Response Tuning in the Median Plane

Auditory Localization

HRTF adaptation and pattern learning

Acoustics, signals & systems for audiology. Week 9. Basic Psychoacoustic Phenomena: Temporal resolution

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 A MODEL OF THE HEAD-RELATED TRANSFER FUNCTION BASED ON SPECTRAL CUES

Machine recognition of speech trained on data from New Jersey Labs

Spatial Audio & The Vestibular System!

A triangulation method for determining the perceptual center of the head for auditory stimuli

THE DEVELOPMENT OF A DESIGN TOOL FOR 5-SPEAKER SURROUND SOUND DECODERS

On distance dependence of pinna spectral patterns in head-related transfer functions

Sound Processing Technologies for Realistic Sensations in Teleworking

Temporal resolution AUDL Domain of temporal resolution. Fine structure and envelope. Modulating a sinusoid. Fine structure and envelope

Extracting the frequencies of the pinna spectral notches in measured head related impulse responses

Lecture 2: SIGNALS. 1 st semester By: Elham Sunbu

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. 3D and Virtual Sound. Paris Smaragdis. paris.cs.illinois.

University of Huddersfield Repository

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA. Why Ambisonics Does Work

THE MATLAB IMPLEMENTATION OF BINAURAL PROCESSING MODEL SIMULATING LATERAL POSITION OF TONES WITH INTERAURAL TIME DIFFERENCES

Subband Analysis of Time Delay Estimation in STFT Domain

Sound Source Localization in Median Plane using Artificial Ear

EE228 Applications of Course Concepts. DePiero

Creating three dimensions in virtual auditory displays *

Virtual Acoustic Space as Assistive Technology

SOPA version 3. SOPA project. July 22, Principle Introduction Direction of propagation Speed of propagation...

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Sound localization with multi-loudspeakers by usage of a coincident microphone array

Matching the waveform and the temporal window in the creation of experimental signals

Introduction. 1.1 Surround sound

Acoustic sound source tracking for a object using precise Doppler-shift m Proceedings of the 21st Europea Processing Conference (EUSIPCO): 1-5

DECORRELATION TECHNIQUES FOR THE RENDERING OF APPARENT SOUND SOURCE WIDTH IN 3D AUDIO DISPLAYS. Guillaume Potard, Ian Burnett

Ivan Tashev Microsoft Research

Interior Noise Characteristics in Japanese, Korean and Chinese Subways

Measuring procedures for the environmental parameters: Acoustic comfort

Recurrent Timing Neural Networks for Joint F0-Localisation Estimation

EECS 216 Winter 2008 Lab 2: FM Detector Part I: Intro & Pre-lab Assignment

Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

3D Sound Simulation over Headphones

Auditory Distance Perception. Yan-Chen Lu & Martin Cooke

An evaluation on comfortable sound design of unpleasant sounds based on chord-forming with bandlimited sound

Spatial Audio Reproduction: Towards Individualized Binaural Sound

Convention Paper Presented at the 125th Convention 2008 October 2 5 San Francisco, CA, USA

Sound Radiation Characteristic of a Shakuhachi with different Playing Techniques

Dataset of head-related transfer functions measured with a circular loudspeaker array

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Loudness & Temporal resolution

PERFORMANCE COMPARISON BETWEEN STEREAUSIS AND INCOHERENT WIDEBAND MUSIC FOR LOCALIZATION OF GROUND VEHICLES ABSTRACT

WAVELET-BASED SPECTRAL SMOOTHING FOR HEAD-RELATED TRANSFER FUNCTION FILTER DESIGN

Binaural Speaker Recognition for Humanoid Robots

The role of intrinsic masker fluctuations on the spectral spread of masking

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

Binaural Hearing- Human Ability of Sound Source Localization

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Assessing the contribution of binaural cues for apparent source width perception via a functional model

Experiment 4- Finite Impulse Response Filters

A classification-based cocktail-party processor

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License

A cat's cocktail party: Psychophysical, neurophysiological, and computational studies of spatial release from masking

Audio Engineering Society. Convention Paper. Presented at the 131st Convention 2011 October New York, NY, USA

Method of acoustical estimation of an auditorium

Transcription:

JAIST Reposi https://dspace.j Title Study on method of estimating direct arrival using monaural modulation sp Author(s)Ando, Masaru; Morikawa, Daisuke; Uno Citation Journal of Signal Processing, 18(4): Issue Date 2014 Type Journal Article Text version publisher URL Rights http://hdl.handle.net/10119/12893 Copyright (C) 2014 信号処理学会. Masaru An Daisuke Morikawa, and Masashi Unoki, Signal Processing, 18(4), 2014, 197- http://dx.doi.org/10.2299/jsp.18.197 Description Japan Advanced Institute of Science and

Journal of Signal Processing, Vol.18, No.4, pp.197-, July 2014 SELECTED PAPER AT NCSP'14 Study on Method of Estimating Direction of Arrival Using Monaural Modulation Spectrum Masaru Ando, Daisuke Morikawa and Masashi Unoki School of Information Science, Japan Advanced Institute of Science and Technology 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan E-mail:{ma ando, morikawa, unoki}@jaist.ac.jp Abstract Human beings can localize a target sound by using binaural cues. On the other hand, we can also localize a target sound by using monaural cues. The monaural modulation spectrum (MMS) can be regarded as an important cue of monaural sound localization. A method of estimating direction of arrival (DOA) using a machine learning scheme for classification of MMS patterns has been proposed. However, this method cannot account for the monaural DOA mechanism by using MMS patterns. To further investigate how the MMS plays an important role in monaural sound localization, we aimed to find cues in the human ability for monaural sound localization and propose a method of estimating DOA by using these cues. We investigated how the MMS of observed signals vary with the azimuth. As a result, shapes of the MMS were drawn as arcs with azimuth variations. We then proposed a method of estimating monaural DOA using these results. Simulations were carried out to verify the effectiveness of the proposed method. We found that the proposed method could estimate DOA using the MMS, except with front-back confusion discrimination. 1. Introduction Human beings have the ability of sound localization. For example, we can easily localize the direction of an on-coming car from the noise from the car. In general, human beings use binaural cues to localize a target sound. It has been reported that humans can also localize a target sound by using monaural cues [1]. Gaining knowledge on the human ability of sound localization is important in learning more about our hearing mechanism. A method of estimating the direction of arrival (DOA) of a target sound using monaural cues can be applied to single-channel signal processing if we can apply our ability of sound localization to engineering problems. The main cues for sound localization by using binaural hearing are interaural time difference (ITD), interaural level difference (ILD), and spectral information [2]. They are included in the head-related transfer function (HRTF), which is a transfer function between a sound source and eardrum position in each ear. In these cues, available monaural cues for sound localization can be regarded as spectral cues in the HRTF such as peeks and notches in the monaural spectral envelope. However, it is unclear how the peeks and notches (a) (b) Source signal Transfer function Observed signal ) ) ) ) ) ) ) ) ) ) ) ) Figure 1: Relationship between sound source signals and observed signals at eardrum position: (a) Time domain and (b) Power envelope domain in the monaural spectral envelope vary with the DOA of the sound source; therefore, these cues cannot be used to directly estimate the DOA of a sound source. On the other hand, there have been studies on binaural modulation cues for sound localization. Thompson and Dau reported that ILD and ITD in the temporal envelope are also important cues of sound localization [3]. This report suggests that the monaural modulation spectrum (MMS) can be regarded as an important cue of monaural sound localization. There have been related studies conducted regarding DOA with the MMS approach. Kliper et al. proposed a DOA estimation method using monaural cues in amplitude modulation patterns based on a machine learning scheme [4]. They used the MMS patterns of signals observed at the eardrum position. However, they used the machine learning scheme to classify the MMS patterns to directly estimate the azimuth of the sound source. Therefore, their method cannot account for the monaural DOA mechanism. In particular, with their method, it is still unclear how the MMS patterns can be used for monaural sound localization. We aimed to find important monaural cues for sound localization and propose a method of estimating DOA using these cues. We investigated how the MMS of the observed signals vary with the azimuth to find monaural cues of DOA estimation. We propose a method based on the concept of the modulation transfer function (MTF). 2. Model Concept Figure 1(a) shows a transfer function from the sound Journal of Signal Processing, Vol. 18, No. 4, July 2014 197

Measured values Regression curve Figure 2: MMS characteristics with varying azimuth by AM signal source to the observed signal at the eardrum position in the time domain, where y(t, θ), h(t, θ), x(t), and θ are the observed signal, head-related impulse response (HRIR), sound source signal, and arrival direction of the sound source signal, respectively. The observed signal is represented as y(t, θ) =h(t, θ) x(t) (1) where is a convolution operation. The HRIR includes acoustic characteristics such as pinna reflection and head diffraction. Equation (1) can be represented in the frequency domain as Y (f,θ) =H(f,θ)X(f) (2) where Y (f,θ), H(f,θ), and X(f) are the spectrum of the observed signal, HRTF, and spectrum of sound source signal, respectively. Figure 1(b) represents a transfer function in the modulation domain from the power envelope of the original signal to that of the observed signal. This function is in a different domain, as shown in Fig. 1(a), which is based on the concept of the MTF [5], [6]. The power envelope of the observed signal e 2 y(t, θ) can be represented as e 2 y (t, θ) =e2 h (t, θ) e2 x (t) (3) where e 2 h (t, θ) and e2 x (t) are the power envelopes of h(t, θ) and x(t). Measured values Regression curve Figure 3: MMS characteristics with varying azimuth by AM noise Equation (3) can be represented in the modulationfrequency domain as E y (f m,θ)=e h (f m,θ)e x (f m ) (4) where E y (f m,θ), E h (f m,θ), and E x (f m ) are MMS of y(t, θ), head-related MTF (HRMTF), and MMS of x(t), respectively. The term f m is the modulation frequency. Then, HRMTF is defined as E h (f m,θ)= In this study, e 2 y(t, θ) was extracted by 0 e 2 h(t, θ)exp( j2πf m t)dt (5) e 2 y (t, θ) = LPF [ y(t, θ)+jhilbert[y(t, θ)] 2] (6) where LPF[ ] is a low-pass filtering and Hilbert[ ] is the Hilbert transform. This equation is based on of the instantaneous amplitude, and low-pass filtering is used to remove the higher modulation-frequency components in the power envelope as post-processing. We use the LPF with a cut-off frequency of Hz. Finally, e 2 y(t, θ) is transformed to E y (f m,θ) using fast Fourier transform (FFT). 3. Monaural Modulation-Spectrum Analysis We investigated how the MMSs of the observed signals vary with varying azimuth; from to 355 degrees, in which 198 Journal of Signal Processing, Vol. 18, No. 4, July 2014

(a) Training phase ) ) ) ) ) Power envelope (Eq.(6)) FFT Regression coefficient (Eq.(7)) ) Parameter tables (b) Estimating phase ) Power envelope (Eq.(6)) ) FFT ) Direction estimation (Eq.(8)) Figure 4: Proposed method: (a) Training phase and (b) Estimating phase the front of the head was at 0 degrees, through computer simulations. In these simulations, the observed signals were generated by convoluting the sound source signal with HRIRs. The analysis range of the azimuth corresponded to the left ear side. We used the HRTF database, which was recorded by the Research Institute of Electrical Communication of Tohoku University. The HRIRs of 114 people (228 ears) were recorded in this database. The recoding positions were 1225 points. The azimuth interval was 5 degrees and the elevation interval was 10 degrees. The sampling frequency was 48 khz. In our simulations, HRIRs containing ham noise and large differences between neighboring angles at the lower frequency components were eliminated from all conditions. Two types of the amplitude modulated (AM) signals were used in these simulations as the sound source. One was an AM signal with a sinusoidal carrier of 10 khz (AM signal) and the other was an AM signal with a white noise carrier (AM noise). Three modulation frequencies, 2, 20, and Hz, were used in these signals. Simulations were carried out to investigate the relationships between MMSs and the observed signals with azimuth. Figures 2 and 3 show the simulation results for observed AM signal and AM noise. The horizontal axis indicates the azimuths of the observed signals and the vertical axis indicates the MMS values. We found that the shapes of the MMS varied with azimuth and formed an arc as a function of the azimuth. These shapes were not symmetrical at 270 degrees, i.e., at the left ear side. These characteristics could be observed under all the modulation frequencies and types of stimuli. The MMS shapes with the AM noise were smoother than those with the AM signal. This trend varied depending on the individual s ears. Similar trends could be also observed for the right ears the left ear. The dynamic ranges of MMS variations were almost the same with all modulation frequencies, as shown in Figs. 2 and 3. Although we omit these results in this paper, similar trends could be observed in the simulations by using the other HRIRs. Therefore, we argue that this effect is cause by HRIR personality. These results suggest that humans may use cues based on the tendency of variation in the MMS. However, it is necessary to carry out a listening experiment. 4. Proposed Method We propose a method of estimating DOA based on the results of MMS analysis. The flow of the proposed method is shown in Fig. 4. The MMS values are plotted with open circles in Figs. 2 and 3. These plots are approximated using second order polynomials as follows: Ê y (f m,θ)=p 1 (f m )θ 2 + p 2 (f m )θ + p 3 (f m ) (7) where p 1 (f m ), p 2 (f m ), and p 3 (f m ) are the regression coefficients and Êy(f m,θ) is the approximated value. The solid lines in Figs. 2 and 3 indicate the ideal results from Eq. (7). An inverse function is derived by using these regression curves. This can be represented as follows: ˆθ(E y )= p 2 ± p 2 2 4p 1(p 3 E y ) 2p 1 (8) where ˆθ is the estimated azimuth. If HRIR is known, p 1, p 2, and p 3 can be calculated from the MMS of the observed signals y(t, θ). We assume that the input signal of the proposed method is y(t) with unknown azimuth θ. Then, the power envelope e 2 y(t) is calculated using Eq. (6) and E y (f m ) is calculated using the FFT. Finally, the unknown azimuth θ is estimated by substituting the MMS values and regression coefficients into Eq. (8). 5. Evaluations Simulations were carried out to verify the effectiveness of the proposed method. The AM signal, AM noise, and left ear side were used in these simulations. Figure 5 shows the simulation results. The horizontal axis indicates the azimuths of the input signals and the vertical axis indicates the estimated Journal of Signal Processing, Vol. 18, No. 4, July 2014 199

azimuths. There was no effect by varying the modulation frequency, as shown in Fig. 5. Two azimuths were estimated as positive and negative values derived by the inverse function of Eq. (8). For the positive value, estimates were correct in the back of ear position while, estimates were correct in the front of ear position for the negative value. However, in each reverse, estimates were incorrect. These false estimates were due to front-back confusion. Moreover, there were more false estimates with the AM signal than with AM noise. These results indicate that the proposed method can correctly estimate DOA using the MMS, except with front-back confusion discrimination. 6. Conclusions We investigated how the MMS of observed signals vary with the azimuth. The results showed that the MMS varied with the azimuth where the peak of the shape was around the ear position. We then proposed a method of estimating DOA based on our analysis results. These results indicated that the proposed method could correctly estimate DOA using the MMS, except with front-back confusion discrimination. For future work, we will investigate how to solve the discrimination problem with regard to front-back confusion. Acknowledgments This work was supported by the Strategic Information and Communications R & D Promotion Programme (SCOPE: 131205001) of the Ministry of Internal Affairs and Communications (MIC), Japan. It was also supported by a Grant-in-Aid for Young Scientists (Start-up, No. 25880011). References [1] R. Sato and K. Furuhata: An influence on the auditory system due to a skull fracture, IEICE Technical Report, EA2012 71, Vol. 112 No. 266, pp. 37 42, 2012. [2] J. Blauert: Spatial Hearing, The MIT Press, Cambridge, 1974. [3] E. R. Thompson and T. Dau: Binaural processing of modulation interaural level difference, J. Acoust. Soc. Am., Vol. 123, No. 2, pp. 1017 1029, 8. [4] R. Kliper, H. Kayser, D. Weinshall, I. Nelken and J. Anemuller: Monaural azimuth localization using spectal dynamics of speech, Proc. Interspeech 2011, pp. 33 36, Florence, Italy, 2011. [5] T. Houtgast and H. J. M. Steeneken: The modulation transfer function in room acoustics as a predictor of speech intelligibility, Acustica, Vol. 28, No. 1, pp. 66 73, 1973. [6] M. Unoki: Speech signal processing based on the concept of modulation transfer function (1) Basis of power envelope inverse filtering and its applications, Journal of Signal Processing, Vol. 12, No. 5, pp. 339 348, 8. + AM noise AM noise + AM signal AM signal Ideal estimation Figure 5: Results of monaural DOA estimates: Modulation frequencies of (a) 2, (b) 20, and (c) Hz Journal of Signal Processing, Vol. 18, No. 4, July 2014