Auditory System For a Mobile Robot

Size: px

Start display at page:

Download "Auditory System For a Mobile Robot"

Diane Baker
6 years ago
Views:

1 Auditory System For a Mobile Robot PhD Thesis Jean-Marc Valin Department of Electrical Engineering and Computer Engineering Université de Sherbrooke, Québec, Canada Jean-Marc.Valin@USherbrooke.ca

2 Motivations Robots need information about their environment in order to be intelligent Artificial vision has been popular for a long time, but artificial audition is new Robust audition is essential for humanrobot interaction (cocktail party effect)

3 Approaches To Artificial Audition Single microphone Two microphones (binaural audition) Human-robot interaction Unreliable Imitate human auditory system Limited localisation and separation Microphone array audition More information available Simpler processing

4 Objectives Localise and track simultaneous moving sound sources Separate sound sources Perform automatic speech recognition Remain within robotics constraints complexity, algorithmic delay robustness to noise and reverberation weight/space/adaptability moving sources, moving robot

5 Experimental Setup Eight microphones on the Spartacus robot Two configurations Noisy conditions Two environments Reverberation time Lab (E1) 350 ms Hall (E2) 1 s cube (C1) shell(c2)

6 Sound Source Localisation

7 Approaches to Sound Source Localisation Binaural Microphone array Interaural phase difference (delay) Interaural intensity difference Estimation through TDOAs Subspace methods (MUSIC) Direct search (steered beamformer) Post-processing Kalman filtering Particle filtering

8 Steered Beamformer Delay-and-sum beamformer Maximise output energy Frequency domain computation

9 Spectral Weighting Normal cross-correlation peaks are very wide PHAse Transform (PHAT) has narrow peaks Apply weighting Weight according to noise and reverberation Models the precedence effect Sensitivity is decreased after a loud sound

10 Direction Search Finding directions with highest energy Fixed number of sources Q=4 Lookup-and-sum algorithm 25 times less complex

11 Post-Processing: Particle Filtering Need to track sources over time Steered beamformer output is noisy Representing pdf as particles One set of (1000) particles per source State=[position, speed]

12 Particle Filtering Steps 1) Prediction 2) Instantaneous probabilities estimation As a function of steered beamformer energy

13 Particle Filtering Steps (cont.) 3) Source-observation assignment Need to know which observation is related to which tracked source Compute : Probability that q is a false alarm : Probability that q is source j : Probability that q is a new source

14 Particle Filtering Steps (cont.) 4) Particle weights update Merging past and present information Taking into account source-observation assignment 5) Addition or removal of sources 6) Estimation of source positions Weighted mean of the particle positions 7) Resampling

15 Localisation Results (E1) Detection accuracy over distance Localisation accuracy

16 Tracking Results Two sources crossing with C2 Video E1 E2

17 Tracking Results (cont.) Four moving sources with C2 E1 E2

18 Sound Source Separation & Speech Recognition

19 Overview of Sound Source Separation Frequency domain processing Simple, low complexity Linear source separation Non-linear post-filter Tracking information Microphones X n k, l Sources S m k, l Geometric Y m k, l Postsource filter separation Separated Sources S m k, l

20 Geometric Source Separation Frequency domain: Constrained optimization Minimize correlation of the outputs: Subject to geometric constraint: Modifications to original GSS algorithm Instantaneous computation of correlations Regularisation

21 Multi-Source Post-Filter

22 Interference Estimation Source separation leaks Incomplete adaptation Inaccuracy in localization Reverberation/diffraction Imperfect microphones Estimation from other separated sources

23 Reverberation Estimation Exponential decay model Example: 500 Hz frequency bin

24 Results (SNR) Three speakers C2 (shell), E1 (lab) 15 12,5 10 SNR (db) 7,5 5 Source 1 2,5 Source 2 0 Source 3-2,5-5 -7,5 Input Delayandsum GSS GSS + singlesource GSS + multisource

25 Speech Recognition Accuracy (Nuance) Proposed post-filter reduces errors by 50% Reverberation removal helps in E2 only No significant difference between C1 and C2 E2, C2, 3 speakers 90% Digit recognition 85% 80% 3 speakers: 83% 75% Right 70% 2 speakers: 90% Front 65% microphone Word correct (%) separated Left 60% 55% 50% GSS only Post-filter (no dereverb.) Proposed system

26 Man vs. Machine How does a human compare? 90% Word correct (%) 85% 80% 75% 70% 65% 60% 55% 50% Listener 1 Is it fair? Yes and no! Listener 2 Listener 3 Listener 4 Listener 5 Proposed system

27 Real-Time Application Video from AAAI conference

28 Speech Recognition With Missing Feature Theory Speech is transformed into features (~12) Not all features are reliable MFT = ignore unreliable features Compute missing feature mask Use the mask to compute probabilities

29 Missing Feature Mask Interference: unreliable Stationary noise: reliable black: reliable white: unreliable

30 Results (MFT) Japanese isolated word recognition (SIG2 robot, CTK) 3 simultaneous sources 200-word vocabulary 30, 60, 90 degrees separation Word correct (%) Right Front Left GSS GSS+postfilter GSS+postfilter+MFT

31 Summary of the System

32 Conclusion What have we achieved? Localisation and tracking of sound sources Separation of multiple sources Robust basis for human-robot interaction What are the main innovations? Frequency-domain steered beamformer Particle filtering source-observation assignment Separation post-filtering for multiple sources and reverberation Integration with missing feature theory

33 Where From Here? Future work Complete dialogue system Echo cancellation for the robot's own voice Use human-inspired techniques Environmental sound recognition Embedded implementation Other applications Video-conference: automatically follow speaker with a camera Automatic transcription

34 Questions? Comments?

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research

Improving Meetings with Microphone Array Algorithms Ivan Tashev Microsoft Research Why microphone arrays? They ensure better sound quality: less noises and reverberation Provide speaker position using