Microphone Array Design and Beamforming Heinrich Löllmann Multimedia Communications and Signal Processing heinrich.loellmann@fau.de with contributions from Vladi Tourbabin and Hendrik Barfuss EUSIPCO Tutorial on Embodied Audition for Robots August 31, 2015
Introduction Overview Microphone Array Design measure for array performance array design for Spherical Harmonics (SHs) new robot head design Beamforming robust least-squares beamformer design HRTF-based design for robots evaluation for Automatic Speech Recognition (ASR) Conclusions 2
Introduction Beamforming is used by robots to enhance the performance of the automatic speech recognition (ASR) Numerous publications about beamformers design and related approaches, e.g., [Brandstein & Ward, 2001, Van Trees 2002, Benesty et al, 2008] fixed beamformer designs adaptive beamforming blind source separation (BSS)... Which issues are special for a beamformer design in the context of robot audition? 3
Design Aspects for Robot Audition Construction of the robot (head) How many microphones are needed/feasible? What are the optimal microphone positions? Beamforming with head microphones influence of the head (no free-field) presence of ego-noise ( tutorial talk on signal enhancement) influence of robot movements, especially head rotations Localization and tracking of the desired speaker tutorial talks by C. Evers and R. Horaud 4
Optimal Microphone Positions Many robot systems use 2 microphones to mimic the human auditory system [Argentieri et al., 2013] Can we do better than with only two ears? How can we determine the optimal microphone placement? Measure for array performance in dependence of sensor positions presented in [Tourbabin & Rafaely, 2014] 5
Generalized HRTF Matrix Model for generalized head-related transfer function (HRTF) complex amplitude of far-field source complex pressure amplitudes transfer function between source j and sensor l 6
Generalized HRTF Matrix Representation for D sources, M sensors, K frequency points Compact matrix formulation 7
Generalized HRTF Matrix Singular Value Decomposition (SVD) of matrix with generalized head related transfer functions (GHRTFs) Observation information to construct from mainly contained by the most dominant eigenvectors high effective rank of GHRTF matrix indicates a good sensor placing 8
Optimal Sensor Placement The effective rank of the GHRTF matrix is given by Optimal microphone positions obtained by Solution can be found by genetic algorithm optimization 9
Optimal Sensor Placement Simulation example effective rank of GHRTF matrix for single positions on head surface Relation between effective rank and beamformer robustness as well as DOA estimation accuracy derived in [Tourbabin & Rafaely, 2014] 10
Array Design for Spherical Harmonics (SH) A nearly ball-shaped robot head with many microphones motivates array designs in the Spherical Harmonic (SH) domain An approach related to the previous method developed for this case optimal position founds by minimizing the aliasing level for different positions outline of this concept provided by the appendix, but a detailed treatment of SH exceeds this tutorial 11
New Head Array Design for Nao Robot Possible regions for microphone placement (green) determined by mechanical constraints of the manufacturer Positions considered for the optimization 327 positions on a simulated head 12
New Head Array Design for Nao Robot Simulation results (BG University) Layout for new Nao head (Aldebaran) slight deviations from optimal position due to mechanical constraints 13
New Head Array Design for Nao Robot First prototype head 14
Beamformer Design for Robot Audition Review: Filter-and-Sum Beamformer Beampattern 15
Robust Beamformer Design Robust Least-Squares Frequency-Invariant (RLSFI) Design [Mabande et al, 2009] 16
Design Example RLSFI beamformer design for different WNG thresholds beampattern for free-field response trade-off between spatial selectivity and robustness (WNG) 17
Design Example RLSFI beamformer design for beampattern for HRTF-based response reduced spatial selectivity and distortions in look direction HRTFs should be considered in the design! 18
HRTF-Based Beamformer Design HRTF-based RLSFI Design [Barfuss et al, 2015] 19
Design Example HRTF-based RLSFI design for different WNG thresholds HRTF-based design free-field design distortionless response in look direction similar spatial selectivity as for free-field design 20
Experimental Evaluation Setup ASR Pocket Sphinx trained on clean speech of GRID corpus test corpus contained 200 utterances signal quality evaluated by frequency weighted segmental SNR (fwsegsnr) [Hu & Loizou, 2008] interference at 45 o and desired speaker between 0 o and 180 o room impulse responses measured for a T60 of 190ms 21
Results scenario with speaker at 30 o and 60 o most challenging better performance for HRTF-based design in almost all cases Experimental Evaluation 22
Conclusions Beamformer design for robot audition requires tailored algorithms and designs Optimal microphone positions can be determined by the effective rank of GHRTF matrix can be found for a SH design by minimizing the aliasing level Beamformer design based on common free-field assumption leads to inferior results knowledge about the HRTF of the robot should be incorporated in the design 23
Appendix Array Design for Spherical Harmonics (SHs) How to find the best microphone positions for SHs Concept [Tourbabin et al., 2015] 24
Appendix Array Design for Spherical Harmonics Example for matrix aliasing level for sensor positions given by ratio of highest off-line element and corresponding diagonal element Optimal microphone positions obtained by 25
Appendix White Noise Gain for HRTF-based Design Maximal possible WNG f or free-field (delay-and-sum beamformer) lower maximal WNG for HRTF-based design 26
Appendix References [Argentieri et al., 2013] S. Argentieri, A. Portello, M. Bernard, P. Danés, and B. Gas, Binaural Systems in Robotics, in The Technology of Binaural Listening, J. Blauert, Ed., Modern Acoustics and Signal Processing, pp.225 253, Springer [Brandstein & Ward, 2001]: M. Brandstein and D. Ward (Eds.): Microphone Arrays, Springer, 2001 [Van Trees 2002]: H. L. Van Tress: Optimum Array Processing (Detection, Estimation and Modulation Theory, Part IV), Wiley Intersience [Benesty et al., 2008] J. Benesty, J. Chen, and Y. Huang: Microphone Array Signal Processing, Springer [Tourbabin & Rafaely, 2014] V. Tourbabin and B. Rafaely: Theoretical Framework for the Optimization of Microphone Array Configuration for Humanoid Robot Audition, IEEE Trans. on Audio, Speech, and Language Processing, vol. 22, no.12 [Mabande et al., 2009] E. Mabande, A. Schad, and W. Kellermann: Design of Robust Superdirective Beamformers as a Convex Optimization Problem, IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan [Barfuss et al., 2015] H. Barfuss, C. Huemmer, G. Lamani, A. Schwarz, and W. Kellermann: HRTF- Based Robust Least-Squares Frequency-Invariant Beamforming, Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA [Hu & Loizou, 2008] Y. Hu and P.C. Loizou: Evaluation of Objective Quality Measures for Speech Enhancement, IEEE Trans. on Audio, Speech, and Language Processing, vol.16, no.1, pp.229-238 27
Acknowledgment The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 609465. 28