State of art and Challenges in Improving Speech Intelligibility in Hearing Impaired People Stefan Launer, Lyon, January 2011 Phonak AG, Stäfa, CH Content Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 2 Speech intelligibility in complex listening environments for hearing impaired persons Noise reduction technologies in hearing instruments De-reverberation Single microphone technology Multi-microphone technology FM systems Results Challenges 1
Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 3 Speech Intelligibility in Noise??? Speech Intelligibility in Complex Listening Conditions!! Different types of interfering sources Different spatial arrangements of sources and interferers Dynamic Room acoustics Reverberation Distance Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 4 Speech Intelligibility in Noise??? Speech Intelligibility in Complex Listening Conditions!! Test methodology Speech tests: short sentences, words, phonemes - target from front, static White noise from the back Anechoic environment Lab / real life results Speech intelligibility Listening effort 2
Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 5 Speech Intelligibility in Noise 20 15 SNR db 10 5 0 20 30 40 50 60 70 80 90 Hearing Loss db (3FA) Mild hearing loss Moderate hearing loss Severe hearing loss Killion 1997 Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 6 Physical structure of interfering signal has a strong impact on speech intelligibility Introducing... spectral dips: SH 3-4 db SRT NH: 9-15 db temporal dips: SH 1-2 db SRT NH: 6-7 db combination of both: SH 4-5 db SRT NH: 15-20 db... improves speech intelligbility a lot for normal-hearing subjects, much less so for hearing impaired subjects! Peters, Moore and Baer 1998, JASA 3
Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 7 Speech Intelligibility in Multi-talker Environment Speech intelligibility as a function of interfering talkers Fig. 2, Bronhorst and Plomp, JASA 1992 Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 8 Spatial Release from Masking Anechoic Chamber NH HI 10 db! Beutelmann & Brand, JASA 2006 4
Spatial Release from Masking - Office Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 9 Spatial release reduced by reverberation Beutelmann & Brand, JASA 2006 4 db! NH HI Spatial Release from Masking - Cafeteria Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 10 6-7 db! NH HI Beutelmann & Brand, JASA 2006 5
Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 11 Speech intelligibility in reverberant environments Correct % 100 90 80 70 60 50 40 30 20 10 normal mild Moderate / severe 0 Sound suite T = 0.54 T = 1.55 Reverberation Time Harris & Swenson, Audiology 1990, p. 314-321 Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 12 How to mix a Speech in Noise Cocktail noise canceling directional microphones Speech in noise cocktail Objectives for a hearing instrument: Speech intelligibility improvement!!!! Ease of listening, listening effort, listening comfort 6
Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 13 Noise Reduction Using a Single Microphone Single Microphone Noise-Cancellers: in principal estimate the noise and subtract it from the noisy signal. (S + N) Adaptive Filter: H = 1 - N* / (S + N) N* (S + N) - N* S Speech Detection Noise Estimation Statistical estimation, amplitude modulation, noise detection in speech pauses Use a single information source to separate two signals Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 14 Reverberation Canceller Reduces the smearing effects by de-blurring the speech signal Level Signal EchoBlock Time span of early reflections Time span of disturbing reflections Time 7
Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 15 Single Microphone Noise Reduction - Summary This technique performs well eliminating stationary noises like a fan or in a car, etc. Reverberation: very reverberant rooms Speech like noises can t be suppressed without degrading speech quality at the same time.... ease of listening: improving listening comfort reduction of perceived noisiness less annoyance Improvement of speech intelligibility??? Sound quality is a trade off Delay & Sum - Technique Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 16 Delay = d/c Target direction f b d The acoustical signal is picked up at two different locations by the front and the back microphones The signal from the back is delayed The signals from both microphones are summed Depending on delay - different directions are attenuated + 8
Digital Adaptive Directional Microphones Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 17 Adaptive: minimize output energy of the two microphones Front AD- Converter AD- Converter Spatial Processor α Back The spatial weighting factor (a) is continuously adapted, the Directivity Index hereby optimized. Digital Adaptive Directional Microphones Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 18 Amplify sounds from front Adaptively attenuates strongest noise source 9
Frequency Specific Beamforming Directivity in each frequency band Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 19 Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 20 Directional Microphones: Potential and Limitations Significant speech intelligibility benefit compared to omnidirectional systems in complex listening conditions from side & asymmetric diffuse moving noises reverberant environments & larger distances Lab results: 3-6 db improvements 10
Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 21 Directional Microphones: Potential and Limitations Positioning on head Microphone mismatch, ageing etc More than two microphones Noise floor Narrow beam pattern acceptable? Size constraint: low frequency rolloff Computational complexity Delay = d/c Target direction + f b d Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 22 Directivity Index for Different Products Styles and Placements 5 BTE DI 0 11
Factors causing BF mismatch Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 23 The beamformer performance in our current products can be limited due to level and phase mismatch caused by the following factors: time invariant time variant Microphone production mismatch HI assembly Clean W&W variability Customer individual head/pinna shape Device geometry: ITEs and microbtes have unfavorable mic positions Microphone ageing HI repairing W&W pollution Non-idealities of current adaptive level matching block Customer HI positioning variance Effects of Microphone/BF mismatch Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 24 microphone phase deviation rotated null direction microphone magnitude deviation reduced suppression target blocking 12
Binaural Directional Microphones Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 25 Improving directivity by linear combination of monaural directional microphone outputs Beamformer Beamformer wireless transmission w i i X i w i X i i Maximum SNR improvement: 3 db Test setup Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 26 Subjects 20 adults Moderate - moderately-severe hearing loss Exélia Art and Ambra microp BTE Algorithms Excelia Ambra UltraZoom (monaural) Ambra StereoZoom (binaural) Test setup OLSA: speech intelligibility in noise Listening effort scaling Paired comparison 13
Binaural Beamforming Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 27 OLSA Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 28 A) OLSA, 60 angle B) OLSA, 45 angle 0 0-2 -2-4 -4 SRT 50% in db SNR -6-8 -10-6 -8-10 -12-14 Exélia Art VoiceZoom Ambra UltraZoom Ambra StereoZoom -12-14 14
Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 29 Paired Comparison Subjective Speech Intelligibility 70 Mit welchem Hörgerät verstehen Sie besser? Ambra UZ Ambra SZ Exelia VZ 60 Anzahl Vergleiche 50 40 30 20 10 0 45 Winkel Störgeräusch 60 Winkel Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 30 Paired Comparison Subjective Listening Effort Anzahl Vergleiche 80 70 60 50 40 30 20 10 0 Mit welchem Hörgerät verstehen Sie leichter? Ambra UZ Ambra SZ Exelia VZ 45 Winkel Störgeräusch 60 Winkel 15
User-Steered directionality Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 31 Traditional beamforming systems focus only to the front Speech signals do not always come from the front and facing the speaker is not always possible Car, restaurants, small groups ZoomControl, accessible through mypilot, allows Exélia wearers to select in which direction to focus hearing Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 32 Listen to the side: User-Steered directionality Uses four-microphone network of full bandwidth binaural instruments Broadband audio data transfer between devices focuses hearing in one specific direction, while suppressing all signals in other directions 16
Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 33 User-Steered directionality Fall_Launch_2010_Ambra_GB_Page 33 Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 34 User-Steered directionality 7 5 3 1 SNR (db SPL) -1-3 -5-7 -9 Without 0 (front) 90 (left) 180 (back) 270 (right) Adaptive multichannel directionality Steerable directionality ExeliaArt P Fall_Launch_2010_Ambra_GB_Page 34 17
Subjective Evaluation Listening Effort Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 35 Which setting needs the least listening effort to understand well? (For first time and experienced user (n=9)) 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 88% 78% 11% 13% 11% Without ZoomControl VoiceZoom Omni Male speech Female speech Binaural noise reduction techniques Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 36 Different types of algorithms Beam former: spatial information, timing difference Binaural Wiener Filter Blind source separation: statistical information estimating room transfer function Auditory processing schemes 18
BWF: Speech Intelligibility Weighted Gain Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 37 Speech Intelligibility Weighted Gain Acoustic Environment PhD Thesis van den Bogaert 2008 BWF: Speech Intelligibility Weighted Gain Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 38 Speech Intelligibility Weighted Gain Acoustic Environment PhD Thesis van den Bogaert 2008 19
Binaural Beam Forming / Noise Reduction Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 39 No stereo output signal => loss of spatial sensation / localization Artificially re-introduce that by split-directionality Mixing in part of the original signal at the output Narrower beam width How narrow should the beam be (head movement!)? Complex environments Dynamic -> target tracking, target identification Reverberation & distance Expected improvements: specific situation, no generic solution Single /few strong interfering source, frontal hemisphere Environments with little reverberation Technical constraints Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 40 Delay over the link Clock jtter Noise floor, signal degradation Microphone calibration (amplitude and phase) 20
Earlevel FM Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 41 Modern FM Technology Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 42 Dynamic Speech Extraction Automatic FM advantage: Adjusts the FM gain depending on the environmental noise level Surrounding Noise Compensation Voice Activity Detector Multi-talker networks: New team teaching concept using up to 10 transmitters 21
SNR at ear level for different technologies Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 43 SNR (db) 40 35 30 25 20 15 10 5 0-5 -10-15 -20 No FM Traditional FM: fix FM Advantage Adaptive FM Advantage 40 45 50 55 60 65 70 75 80 85 Surrounding Noise (db SPL) 10 db FM advantage: - Good environmental awareness - Audibility of the own voice - Compromise at high noise levels Fieldstudy with 48 adults Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 44 315 * 0 TX3 45 HINT Sentences Correlated HINT Noise 1 m * 1 meter (39 ) Loudspeakers to center of head. 7.6 cm (3 ) Loudspeaker to TX3 Transmitter 225 135 Source: Valente, 2002 22
Speech Intelligibility Threshold Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 45 Mean RTS (db) 10 5 0-5 -10-15 -20 5.9 2.3 1.8 6.5 3.2 2.1 (SD) -14.6-18.9 8.6 3.1-0.8-5 Unaided Omni Dual FM-M FM-B Normal Listening Conditions Source: Valente, 2002 Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 46 Auditory Scene Analysis / Hearing Instrument Processing Auditory Processing Bottom up / top down No delay constraint, no real-time processing Higher resolution signal analysis Much higher computational power => Stream segregation & source formation: works on several different time scales No signal reconstruction! Perceptual attenuation, focus attention, suppression of neuronal activity Channel: full information capacity A priori knowledge, situational knowledge - other sensory modalities - world knowledge, models of sources fill in information Attention control: Target signal identification and tracking, switching back and forth between objects, overcoming salient sources Hearing Instrument Processing Bottom up Delay constraint, real-time processing comp. power constraint - limited signal analysis, spectro-temporal resolution Signal reconstruction & signal modification: amplification & attenuation / filtering -> distortions Channel with limited information capacity Retrospective analysis Dynamic aspects head / source movement Target signal assumption: in front 23
Conclusion Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 47 Hearing instruments offer several algorithms to improve speech intelligibility in complex listening environments Algorithms based mainly on Speech intelligibility in complex listening environments remains a huge challenge Reverberation and distance Dynamic target selection and tracking Technical limitations Realistic test setups and test procedures Questions Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 48 Speech intelligibility: how much is top-down driven versus bottom-up processing? Speech intelligibility: how fast is it really?? How much information do we infer at the end of a sentence? Which cues (pitch, temporal fine structure, location,.) are the essential ones, does it depend on situation? How does the auditory system pick the relevant one?? How do we achieve perceptual constancy voices in real life always sounds the same, (almost) independent of environment? 24
Thank you!!! Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 49 Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 50 Speech Intelligibility in reverberant environments 100 90 80 normal Speech Intelligibility % 70 60 50 40 30 20 10 0 Reverberation time Sound suite T = 0.54 T = 1.55 mild moderate / severe Harris & Swenson, Audiology 1990, p. 314-321 25
Binaural processing - audio delay Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 51 Group delay: Is mainly determined by Radio bandwidth, ADC, CODEC, buffering (Error correction) Delay shall be deterministic and constant For binaural audio processing the link delay adds to the other signal processing delay ie. FFT block processing, ADC. Overall system delay should be less than 10 ms (Stone & Moore 2005, ) Audiosignals + control data: Some more delay for Gain control is acceptable (Hohmann 2009) Jitter examples: 800 Hz pure tone Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 52 Acoustic delay from head dimension: typ. 500 µs for ear distance Normal hearing minimum audible angle: a few µs Jitter should be smaller than 20 µs RMS -> allows for binaural beamforming without significant localization errors phase difference / deg 30 0 0 phasediff T jitter = 30µ s 360 freq time / s 10 26