Lateralisation of multiple sound sources by the auditory system

Similar documents
The importance of binaural hearing for noise valuation

Binaural Hearing. Reading: Yost Ch. 12

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

Perception of tonalness of tyre/road noise and objective correlates

Auditory System For a Mobile Robot

The psychoacoustics of reverberation

The Human Auditory System

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Psychoacoustic Cues in Room Size Perception

Convention Paper Presented at the 126th Convention 2009 May 7 10 Munich, Germany

Stefan Launer, Lyon, January 2011 Phonak AG, Stäfa, CH

SOUND QUALITY EVALUATION OF FAN NOISE BASED ON HEARING-RELATED PARAMETERS SUMMARY INTRODUCTION

FREQUENCY RESPONSE AND LATENCY OF MEMS MICROPHONES: THEORY AND PRACTICE

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Sound Processing Technologies for Realistic Sensations in Teleworking

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Adaptive Filters Application of Linear Prediction

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

Envelopment and Small Room Acoustics

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

SEPTEMBER VOL. 38, NO. 9 ELECTRONIC DEFENSE SIMULTANEOUS SIGNAL ERRORS IN WIDEBAND IFM RECEIVERS WIDE, WIDER, WIDEST SYNTHETIC APERTURE ANTENNAS

A New Statistical Model of the Noise Power Density Spectrum for Powerline Communication

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

MATCHED FIELD PROCESSING: ENVIRONMENTAL FOCUSING AND SOURCE TRACKING WITH APPLICATION TO THE NORTH ELBA DATA SET

Acoustic Resonance Analysis Using FEM and Laser Scanning For Defect Characterization in In-Process NDT

You know about adding up waves, e.g. from two loudspeakers. AUDL 4007 Auditory Perception. Week 2½. Mathematical prelude: Adding up levels

Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling

Multi-Path Fading Channel

Channel. Muhammad Ali Jinnah University, Islamabad Campus, Pakistan. Multi-Path Fading. Dr. Noor M Khan EE, MAJU

Chapter 16 Sound. Copyright 2009 Pearson Education, Inc.

Robust Speech Recognition Based on Binaural Auditory Processing

IMPROVED COCKTAIL-PARTY PROCESSING

Faraday s Law PHYS 296 Your name Lab section

Principles of Musical Acoustics

Auditory modelling for speech processing in the perceptual domain

Computational Perception. Sound localization 2

SOUND 1 -- ACOUSTICS 1

Acoustics, signals & systems for audiology. Week 9. Basic Psychoacoustic Phenomena: Temporal resolution

From Binaural Technology to Virtual Reality

Image Enhancement in Spatial Domain

Robust Speech Recognition Based on Binaural Auditory Processing

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Name: Lab Partner: Section:

TNS Journal Club: Efficient coding of natural sounds, Lewicki, Nature Neurosceince, 2002

Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model

Speech quality for mobile phones: What is achievable with today s technology?

Piezoceramic Ultrasound Transducer Enabling Broadband Transmission for 3D Scene Analysis in Air

Narrow- and wideband channels

Binaural Mechanisms that Emphasize Consistent Interaural Timing Information over Frequency

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O.

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Practical Application of Wavelet to Power Quality Analysis. Norman Tse

Influence of artificial mouth s directivity in determining Speech Transmission Index

EE1.el3 (EEE1023): Electronics III. Acoustics lecture 20 Sound localisation. Dr Philip Jackson.

Annex - Propagation environment: real field example Analysis with a high resolution Direction Finder

O P S I. ( Optimised Phantom Source Imaging of the high frequency content of virtual sources in Wave Field Synthesis )

Simulating a PTA with metronomes and microphones: A user s guide for a double-metronome timing & correlation demonstration

Analysis of room transfer function and reverberant signal statistics

Perception and evaluation of sound fields

Linear Frequency Modulation (FM) Chirp Signal. Chirp Signal cont. CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis

PLL FM Demodulator Performance Under Gaussian Modulation

Analysis of ripple on noisy gears

Room Impulse Response Measurement and Analysis. Music 318, Winter 2010, Impulse Response Measurement

EE228 Applications of Course Concepts. DePiero

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

A Computational Efficient Method for Assuring Full Duplex Feeling in Hands-free Communication

CMPT 468: Frequency Modulation (FM) Synthesis

Proceedings of Meetings on Acoustics

IMPLEMENTATION AND APPLICATION OF A BINAURAL HEARING MODEL TO THE OBJECTIVE EVALUATION OF SPATIAL IMPRESSION

Application Note (A12)

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend

The relation between perceived apparent source width and interaural cross-correlation in sound reproduction spaces with low reverberation

Introduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem

Chapter 2 Channel Equalization

Enhancing 3D Audio Using Blind Bandwidth Extension

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA. Why Ambisonics Does Work

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

RECOMMENDATION ITU-R S.1257

Convention Paper Presented at the 116th Convention 2004 May 8 11 Berlin, Germany

Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation

EENG473 Mobile Communications Module 3 : Week # (12) Mobile Radio Propagation: Small-Scale Path Loss

Multi-channel Active Control of Axial Cooling Fan Noise

I R UNDERGRADUATE REPORT. Stereausis: A Binaural Processing Model. by Samuel Jiawei Ng Advisor: P.S. Krishnaprasad UG

Spectrum Analysis: The FFT Display

TRANSFORMS / WAVELETS

How To... Commission an Installed Sound Environment

Predicting localization accuracy for stereophonic downmixes in Wave Field Synthesis

Narrow- and wideband channels

Hearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin

A cat's cocktail party: Psychophysical, neurophysiological, and computational studies of spatial release from masking

THE PRINCIPLE OF LINEAR SUPERPOSITION AND INTERFERENCE PHENOMENA

CSC475 Music Information Retrieval

A mechanical wave is a disturbance which propagates through a medium with little or no net displacement of the particles of the medium.

THE PERCEPTION OF ALL-PASS COMPONENTS IN TRANSFER FUNCTIONS

Real-time multiband dynamic compression and noise reduction for binaural hearing aids

Indoor Sound Localization

EE 791 EEG-5 Measures of EEG Dynamic Properties

Instruction Manual for Concept Simulators. Signals and Systems. M. J. Roberts

Complex Sounds. Reading: Yost Ch. 4

Transcription:

Modeling of Binaural Discrimination of multiple Sound Sources: A Contribution to the Development of a Cocktail-Party-Processor 4 H.SLATKY (Lehrstuhl für allgemeine Elektrotechnik und Akustik, Ruhr-Universität Bochum, D-4630 Bochum, Germany) The human auditory system is able to "focus" on one sound source in the presence of noise, echoes, reverberation and other interfering sources. Such a situation is given, for instance, in a room with more than one speaker ("cocktail-party-effect"). In my study, I intend to find algorithms modeling these binaural phenomena, which can be used for technical purposes. Lateralisation of multiple sound sources by the auditory system In order to answer the question how the human auditory system reacts on presenting more than one simultaneous sound source, auditory experiments have been conducted, presenting two sinus or narrow band noise s simultaneously in an anechoic room. Fig. 1: Setup for localization experiments of multiple sound sources Presented s 1. Sinus 500 Hz + Sinus ( 500 Hz + x), x=10.. 160 Hz 2. Sinus 2000 Hz + Sinus (2000 Hz + x), x=10..1200 Hz 3. Narrow band noise (7% rel. bandwidth): Noise 500 Hz + Noise (500 Hz + x), x=10.. 160 Hz 4. Narrow band noise (7% rel. bandwidth): Noise 2000 Hz + Noise (2000 Hz + x), x=10..1200 Hz When simultaneously presenting two narrow-band sound sources with spectral differences substantially smaller than the critical bandwidth (i.e. sinusoidal s 500 Hz + 530 Hz or noise with 7% relative bandwidth 500 Hz + 510 Hz) the auditory system is able to localize these sources correctly and to identify the sound sources by their pitch (high-pitched, low-pitched) 3 Fig. 2: Percentage of correctly localized sound sources. --H-- high-pitched sound source --T--- low-pitched sound source one sound source localized correctly for both sound sources localized correctly... guess probability for one sound source.... guess probability for both sound sources at 500 Hz the guess probability is exceeded for f>10 Hz (Noise) or f>30 Hz (Sinus) 1

lower critical band 490..600 Hz 580 Hz interaural time difference 500 Hz 580 Hz 500 Hz upper critical band 600..730 Hz interaural time difference Fig.3: Interaural cross correlation function 2 of s of the auditory experiments within concerned critical bands: Presented s: : Sinus 500 Hz τ= 0.6 ms Sinus 580 Hz τ=-0.2 ms Dotted lines: interaural time differences of the presented s Within the lower critical band there is no correspondence between the positions of the maxims of the cross correlation function and the directions of the sound sources. Within the upper critical band the positions of the maxims of the cross correlation function corresponds to the direction of the high-pitched sound source. Binaural models Presenting these s to binaural models, which are based on cross correlation functions within critical bands and which determine the direction of incidence directly from the positions of the maxims of the correlation function (i.e. LINKDEMANN 2, GAIK 1 ), only one incidence direction can be determined correctly, because maxim positions of only one (from two) concerned critical bands stay constant in time. The cross correlation pattern at the other critical band varies quickly with time. A direct evaluation of directions of incidence is not possible. 3 Assuming, that the auditory system analyzes the incidence directions within critical bands and that the localization process of the auditory system can be described by cross correlation functions, a method must exist, to extract relevant information on sound directions out of these patterns. ("recomputation mechanism" 3 ) Localisation Sound Loudness Signals within critical bands Signals within upper critical band Localisation of high-pitched Sound of high-pitched High-pitched with reduced loudness Signals within lower critical band no localisation Mixture of both s Sum of both s Result of auditory experiments Consequence for binaural modeling Both s localised correctly Extension of auditory models necessary Original sound for high-pitched Mixture of s at direction of low-pitched Model and experiments match High-pitched 140% of loudness of low pitched Extension of auditory models necessary Fig.4: Comparison between cross correlation models and the results of the auditory experiments 2

A m 2 A(t) = a e j(2π f + ϕ) L(t) = A(t-τ/2) H l (τ) 2πf τ 2 a m K(t) = R(t) L(t)* R(t) = A(t+τ/2) H r (τ) Fig.5: Interaural cross product k(t) for one sound source witch constant amplitude τ Searching for a suitable mathematical description Another method of describing binaural interactions within critical bands is the complex cross product of the analytic time functions of the ear s. The features are: - Using analytic time functions within critical bands, ear s may be processed with reduced data rate, so processing becomes faster. - The dependence of binaural interaction patterns on ear s can be evaluated in mathematical exact form. - In the presence of stationary s from only 1 or 2 directions, the binaural interaction pattern results in a simple geometric form (see below). Within critical bands arbitrary s can be described as amplitude and frequency modulated sinus s. Their analytic time function A(t) is: (f(t)=frequency, a(t)=magnitude, ϕ(t)=phase) +j2π f(t)t + j ϕ(t) A(t) = a (t) e The corresponding ear s are: (τ=interaural time difference, H l (τ), H r (τ) outer ear transfer functions) L(t) = A(t- τ/2) H l (τ) R(t) = A(t+ τ/2) H r (τ) The cross product K(t) of left and right ear s results to: +j2π f(t)τ K(t) = R(t) L(t)* = a m (t)² e a m (t)² = a(t)² H l (τ) H r (τ) For sinusoidal s (a(t), f(t), ϕ(t)=const.) the locus curve of the interaural cross product K(t) is represented by a single point in the complex plane. The magnitude is proportional to the Fig. 6: locus curve of the cross product, presenting 2 sound sources: a) sine 500Hz, a=1,τ a =0µs b) sine 560Hz, b=0.5,τ b =400µs left figure: interaural level difference 0dB right figure: interaural level difference 6dB locus curve of each sound source alone complex mean value ---circle around mean value, radius =standard deviation 3

medium energy of the ear s, the phase correlates to the interaural phase. This corresponds to the results of cross correlation models depicting the maxims in polar coordinates. Presenting 2 s A(t), B(t) from different directions, the corresponding ear s are added and binaural beats arise. The locus curve of the cross product varies quickly with time. When presenting stationary s, the locus curve has the form of a straight line or of an ellipsoid, depending on the interaural level differences. Introducing the complex mean value µ and the complex standard deviation σ of this time dependent locus curve, a system of complex equations can be obtained. Interaural phases 2α=2πf a (t)τ a, 2β=2πf b (t)τ b, and the mean amplitudes a m (t), b m (t) of the sound sources can be estimated from this equation system. t+t µ(t) = 1/2T K(t') dt' σ²(t) = 1/2T [ K(t')- µ ] 2 dt' t-t t-t µ(t) = a m (t) 2 e j2α + b m (t) 2 e j2β Properties of the presented algorithm t+t σ²(t) = 2 a m (t) 2 b m (t) 2 e j2(α+β) The accuracy of this method depends on the integration time and the variation rate of sound source attributes. Stationary s (sine, harmonic s) and a long integration time result into a sufficiently accurate estimation (error < 1dB) up to differences in the sound source levels of 100 db. Using s with varying amplitudes (noise, speech) the integration time must be short (10-20 ms). Thus, the range of accurate estimations of sound source magnitudes and directions is limited to sound level differences of -20 db between desired and interfering. Compared to other methods of directional selection (beam microphone, linear microphone array technique) the algorithm leads to rather sharp directional beams for receiver distances, which are substantially shorter than the wave length (ear distance). In the low frequency range directional beams of +/-150 µs (+/-15 related to the front dir ection) can be obtained. Presenting more than two sound sources within the frequency band of one critical band, the attributes of the two most intense sound sources can be estimated by using the locus curve of the cross product. For a given direction it is possible to estimate the probability that estimators of the algorithm correspond to this direction (evaluation of the error of estimation). In this way the probable amplitude of a coming from a desired direction can be estimated. Fig. 7: Directional filtering of amplitude modulated s. Desired : level = 0dB sine 560Hz, f mod =5 Hz,τ=400µs Interfering level=10db sine 500 Hz, f mod =5 Hz,τ=0µs Level / db Signal envelopes of desired and interfering Estimator for the envelope of the desired x-axis: time in ms y-axis: level in db, relative to mean desired Time / ms 4

Fig: 8: The binaural processing model (inside one critical band) Construction of a binaural processing model A binaural processing model based on this algorithm must include the following units: - Preprocessing: critical band filtering of the ear s and evaluation of the analytic time. - Evaluation of the cross product and its complex mean value and standard deviation. - Estimation of directions and amplitudes of sound sources from the statistical parameters of the cross product, estimation of the error and validity range of the estimation. - Choice of the desired direction. - Estimation of the probable magnitude of the desired by considering estimated values and errors of estimation. - Evaluation of the -to-noise-ratio in each ear by comparing the estimated desired with the ear magnitudes => weighting factors for the ear s. - Generation of the processed broadband out of these weighted critical band s. Using this process, an enhancement of a desired speaker's of up to 20 db can be obtained, presenting 2 speakers under free field conditions with original -to-noise-ratios of up to -30 db. Intelligibility of the desired speaker grows considerably. By processing complex analytic time functions instead of real s, data rate and computation time can be reduced significantly. Since the magnitudes of spectral components in the range 5

Fig. 9: Preprocessing unit: Generation of the analytic time combined with the reduction of the sampling rate f s /2..f s are zero (f s =sampling rate), critical band filtered s can be transformed to the low frequency range and be processed with a sampling rate corresponding to the bandwidth of the critical band filter. Using 24 critical bands, the data rate can be reduced to 10-20%, compared to a digital filter bank without down-sampling Discussion The presented algorithm is based on the evaluation of the interaural phase. For the high frequency range (f>800 Hz) the relationship between the direction of incidence and the interaural phase gets ambiguous. When interaural phases of desired and interfering directions meet, there is no effect in directional filtering. This problem could be solved by an additional directional filter mechanism based on interaural level differences. In Psychoacoustics this model can be used for the interpretation of multiple sound source effects and especially the precedence effect. For this purpose a "directional processor" should be added to the model, which selects the desired directions out of the estimators and marks s from other directions (i.e. echoes) as interfering s, which should be suppressed. Exceptions of the precedence effect can be explained as the taking of a new desired direction. Multiple images, which arise when interaural time and intensity differences do not match (GAIK 1 ), can be interpreted by the model as differences in the directional estimations out of phase and level differences. Technical applications of a directional filter can be directional selective hearing aids, directional selective front ends for speech processing systems (speech recognizer, hands-free-telephones) or a low frequency supplement to beam microphones and microphone arrays. 1 GAIK(1990); Untersuchungen zur binauralen Verarbeitung kopfbezogener Signale; Fortschritts-Berichte VDI, Reihe 17: Biotechnik, Nr.63; VDI-Verlag, Düsseldorf 2 LINDEMANN(1986): Extensions of a binaural cross-correlation model by contralateral inhibition; JASA 80; p.1608 3 SLATKY (1990); Lokalisation simultan abstrahlender Schallquellen: Konsequenzen für den Aufbau binauraler Modelle; Fortschritte der Akustik DAGA'90, Wien; DPG-Verlag, Bad Honnef, Germany, p.751 4 Based on::slatky(1991); Ein binaurales Modell zur Lokalisation und Signalverarbeitung bei Darbietung mehrerer Schallquellen; Fortschritte der Akustik DAGA'91, Bochum; DPG-Verlag, Bad Honnef, Germany 6