Microphone Array Power Ratio for Speech Quality Assessment in Noisy Reverberant Environments 1

Similar documents
Recent Advances in Acoustic Signal Extraction and Dereverberation

Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION

SUBJECTIVE SPEECH QUALITY AND SPEECH INTELLIGIBILITY EVALUATION OF SINGLE-CHANNEL DEREVERBERATION ALGORITHMS

Speech Quality Assessment for Listening-Room Compensation

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Microphone Array project in MSR: approach and results

Adaptive Systems Homework Assignment 3

Room Impulse Response Measurement and Analysis. Music 318, Winter 2010, Impulse Response Measurement

Calibration of Microphone Arrays for Improved Speech Recognition

Channel Modelling for Beamforming in Cellular Systems

Time-of-arrival estimation for blind beamforming

Local Relative Transfer Function for Sound Source Localization

IN REVERBERANT and noisy environments, multi-channel

Estimation of Reverberation Time from Binaural Signals Without Using Controlled Excitation

Mikko Myllymäki and Tuomas Virtanen

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Multiple Sound Sources Localization Using Energetic Analysis Method

Speech Enhancement Using Microphone Arrays

Psychoacoustic Cues in Room Size Perception

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

Channel Selection in the Short-time Modulation Domain for Distant Speech Recognition

29th TONMEISTERTAGUNG VDT INTERNATIONAL CONVENTION, November 2016

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Automotive three-microphone voice activity detector and noise-canceller

Joint dereverberation and residual echo suppression of speech signals in noisy environments Habets, E.A.P.; Gannot, S.; Cohen, I.; Sommen, P.C.W.

Robust Low-Resource Sound Localization in Correlated Noise

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

SINGLE CHANNEL REVERBERATION SUPPRESSION BASED ON SPARSE LINEAR PREDICTION

Performance of wireless Communication Systems with imperfect CSI

Wideband Channel Characterization. Spring 2017 ELE 492 FUNDAMENTALS OF WIRELESS COMMUNICATIONS 1

Directionality. Many hearing impaired people have great difficulty

Time Delay Estimation: Applications and Algorithms

Dual-Microphone Speech Dereverberation in a Noisy Environment

MAXXSPEECH PERFORMANCE ENHANCEMENT FOR AUTOMATIC SPEECH RECOGNITION

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments

Mobile Radio Propagation Channel Models

RIR Estimation for Synthetic Data Acquisition

Clustered Multi-channel Dereverberation for Ad-hoc Microphone Arrays

ROOM SHAPE AND SIZE ESTIMATION USING DIRECTIONAL IMPULSE RESPONSE MEASUREMENTS

Speech Enhancement Based On Noise Reduction

The psychoacoustics of reverberation

Sound Source Localization using HRTF database

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

IEEE Working Group on Mobile Broadband Wireless Access <

EENG473 Mobile Communications Module 3 : Week # (12) Mobile Radio Propagation: Small-Scale Path Loss

II. Random Processes Review

Meeting Corpora Hardware Overview & ASR Accuracies

Accurate sound reproduction from two loudspeakers in a living room

Auditory System For a Mobile Robot

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

Sound Processing Technologies for Realistic Sensations in Teleworking

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

ONE of the most common and robust beamforming algorithms

Robust Speech Recognition Group Carnegie Mellon University. Telephone: Fax:

Michael Brandstein Darren Ward (Eds.) Microphone Arrays. Signal Processing Techniques and Applications. With 149 Figures. Springer

Do We Need Dereverberation for Hand-Held Telephony?

Speech Enhancement using Wiener filtering

A blind algorithm for reverberation-time estimation using subband decomposition of speech signals

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks

Mobile Radio Propagation: Small-Scale Fading and Multi-path

EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTION

Analysis of room transfer function and reverberant signal statistics

A COHERENCE-BASED ALGORITHM FOR NOISE REDUCTION IN DUAL-MICROPHONE APPLICATIONS

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method

Uplink and Downlink Beamforming for Fading Channels. Mats Bengtsson and Björn Ottersten

A generalized framework for binaural spectral subtraction dereverberation

MULTICHANNEL AUDIO DATABASE IN VARIOUS ACOUSTIC ENVIRONMENTS

546 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY /$ IEEE

/$ IEEE

Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

A Computational Efficient Method for Assuring Full Duplex Feeling in Hands-free Communication

Ultra Wideband Radio Propagation Measurement, Characterization and Modeling

Joint Position-Pitch Decomposition for Multi-Speaker Tracking

arxiv: v1 [cs.sd] 4 Dec 2018

Digitally controlled Active Noise Reduction with integrated Speech Communication

Michael E. Lockwood, Satish Mohan, Douglas L. Jones. Quang Su, Ronald N. Miles

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

NOISE ESTIMATION IN A SINGLE CHANNEL

Machine recognition of speech trained on data from New Jersey Labs

1. Experimental methods I. INTRODUCTION. II. OPTIMAL CLASSROOM REVERBERATION TIMES A. Literature review

AN ADAPTIVE MICROPHONE ARRAY FOR OPTIMUM BEAMFORMING AND NOISE REDUCTION

AN ADAPTIVE MICROPHONE ARRAY FOR OPTIMUM BEAMFORMING AND NOISE REDUCTION

-AMp. Gpt 60-Iv. hearing LOOp AMpLIFIErs 60VA INDUCTION LOOP AMPLIFIER. gpt. DESIGnS manufacturing Excellence Through Engineering

Speech Enhancement for Nonstationary Noise Environments

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Combined Use of Various Passive Radar Range-Doppler Techniques and Angle of Arrival using MUSIC for the Detection of Ground Moving Objects

Spectral Methods for Single and Multi Channel Speech Enhancement in Multi Source Environment

Signal Processing 91 (2011) Contents lists available at ScienceDirect. Signal Processing. journal homepage:

Blind Beamforming for Cyclostationary Signals

AN547 - Why you need high performance, ultra-high SNR MEMS microphones

Method of Blindly Estimating Speech Transmission Index in Noisy Reverberant Environments

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Advances in Direction-of-Arrival Estimation

Transcription:

for Speech Quality Assessment in Noisy Reverberant Environments 1 Prof. Israel Cohen Department of Electrical Engineering Technion - Israel Institute of Technology Technion City, Haifa 3200003, Israel IWAENC 2016 1 Joint work with Reuven Berkun (Technion) and Baruch Berdugo (Phoenix Audio Technologies) Prof. Israel Cohen 1\30

Outline Introduction 1 Introduction 2 3 4 5 Prof. Israel Cohen 2\30

Hands-free communication systems Teleconferencing Hands-free communication systems Enhancement of speech signals is of great interest in many hands-free communication systems: Hearing-aids devices. Cell phones and hands-free accessories for wireless communication systems. Conference and telephone speakerphones. Etc. Prof. Israel Cohen 3\30

Teleconferencing Hands-free communication systems Teleconferencing Teleconferencing in large rooms: Use more than one microphone for audio pickup. A major challenge: Monitor the perceived quality of each microphone signal and select, at any given point in time, the microphone with the best reception. Daisy-Chaining for Larger Rooms Daisy Prof. Israel Cohen 4\30

Teleconferencing (cont.) Hands-free communication systems Teleconferencing Microphones that are used in industrial applications are generally not calibrated. The sensitivities of different microphones may be quite different. Therefore, the power is not reliable for a comparison between signals measured with different microphones (Wolf and Nadeu, 2010). The signal-to-noise ratio is also not a reliable measure to quantify the level of reverberation, since in real applications, the noise cannot be assumed uniform, nor the late reverberation is uniform (Obuchi, 2004, Wölfel et al., 2006). Prof. Israel Cohen 5\30

Problem Formulation Problem Formulation Related Works A source signal measured at point p i = (x i,y i,z i ) (i = 1,2,...,N) is given by r i (t) = s(t) h i (t)+n i (t). Perception of the amount of reverberation in a given signal is closely related to the direct-to-reverberation ratio. For evaluating the direct-to-reverberation ratio, the impulse response h i (t) is split into early (direct) and late (reverberant) parts: h i (t) = h i,d (t)+h i,r (t). Prof. Israel Cohen 6\30

Problem Formulation (cont.) Problem Formulation Related Works The direct-to-reverberation ratio is defined as the ratio between the energy of the direct path (including the early reflections) and the energy of the reverberant paths (containing only the late reflections). DRR = E Td d 0 h 2 (t)dt = E r T d h 2 (t)dt Our objective is to determine which signal out of the given set of measured signals {r i (t) i = 1,2,...,N} has the greatest direct-to-reverberation ratio. Real-time quality monitoring based on short segments of the signals, robust to differences in sensitivities of microphones and environmental conditions. Prof. Israel Cohen 7\30

Related Works Introduction Problem Formulation Related Works Channel selection measures for multi-microphone speech recognition (Wolf and Nadeu, 2014) Microphones are arbitrarily located. Position and orientation of the speaker is unknown. Objective: Rank the channels as close as possible to the word error rate (WER) based ranking. Envelope-variance measure: The effect of reverberation is observed as a reduction in the dynamic range of the speech intensity envelope (Houtgast and Steeneken, 1985). Channel selection provides significant recognition improvements (in some cases, up to 46% compared to randomly selected channel). A good calibration of all microphones is still required, which is not a trivial task. Prof. Israel Cohen 8\30

Related Works Introduction Problem Formulation Related Works Acoustic Characterization of Environments (ACE) Challenge (Eaton, Gaubitch, Moore, and Naylor, 2016) The ACE Challenge attracted participation from 9 research teams around the world. Focused on non-intrusive estimation of the reverberation time (T60) and DRR. Classes of algorithms: 1 Analytical with or without bias compensation (ABC); 2 Single feature with mapping (SFM); 3 Machine learning with multiple features (MLMF). Non-intrusive T60 estimation is a mature field. Non-intrusive DRR estimation however is a significantly less mature field: Large biases and MSEs (the best algorithm estimates DRR to within an RMS error of about 3 db and a ρ 0.6 for typical operating scenarios of 1 to 18 db SNR). Prof. Israel Cohen 9\30

Related Works (cont.) Problem Formulation Related Works Signal-based quality measures: Signal-to-diffuse ratio estimation Spatial complex coherence between microphones (Jeub, Nelke, Beaugeant, and Vary, 2011). Direct & diffuse part segregation using beamforming (Thiergart, Ascherl, and Habets, 2014) (Hioka et. al, 2012). Modulation spectral analysis: Speech to reverberation modulation energy ratio (SRMR) (Falk, Zheng, and Chan, 2010). Generally, correlation of signal-based measures with subjective listening tests is insufficient (Goetze, Albertin, Kallinger, Mertins, and Kammeyer, 2010). Prof. Israel Cohen 10\30

Configuration Introduction Configuration Signal Model Directional Array Response Directional Power Ratio Experimental Results Unidirectional microphone array Directional elements Beamforming s(t) Omni g mic ( ) z(t) g dir/opp (θ) - The microphone directional gain at angle θ Prof. Israel Cohen 11\30

Signal Model Introduction Configuration Signal Model Directional Array Response Directional Power Ratio Experimental Results The measured signal: z(t) = t s(t) speech signal h(t) room impulse response (RIR) v(t) ambient noise Reverberated RIR model: h d (t), for 0 t < T r h(t) = h r (t), for t T r 0, otherwise, s(τ)h(t τ)dτ +v(t), Prof. Israel Cohen 12\30

Signal Model (cont.) Configuration Signal Model Directional Array Response Directional Power Ratio Experimental Results Statistical room acoustics model (Polack, 1988) (Habets, 2007) { b d (t)e δt, for 0 t < T r h d (t) = 0 otherwise, b d (t) N(0,σd 2) δ = 3ln10 T 60 { b r (t)e δt, for t T r h r (t) = 0 otherwise, b r (t) N(0,σr) 2 The measured signal energy: E z {z 2 (t)} = E z {z 2 d (t)}+e z{z 2 r (t)} λ s (t) = E s {s 2 (t)}, E z {z 2 d (t)} = f(λ s(t),σ 2 d,t r), E z {z 2 r (t)} = f(λ s (t),σ 2 r,t r ) Prof. Israel Cohen 13\30

Directional Array Response Configuration Signal Model Directional Array Response Directional Power Ratio Experimental Results The direct microphone signal energy: E z {[z dir (t)] 2 } = [g dir (θ)] 2 E z {zd 2 (t)} + 1 [g dir (θ )] 2 dθ E z {zr 2 (t)} Ω The opposite microphone signal energy: E z {[z opp (t)] 2 } = 1 [g opp (θ )] 2 dθ E z {zr 2 Ω (t)} Ω Ω Prof. Israel Cohen 14\30

Directional Power Ratio Configuration Signal Model Directional Array Response Directional Power Ratio Experimental Results Assuming the microphones are calibrated: ḡ 2 = 1 Ω Ω [gdir (θ )] 2 dθ = 1 Ω Ω [gopp (θ )] 2 dθ The Power Ratio between the direct & opposite microphones: E z {[z dir (t)] 2 } E z {[z opp (t)] 2 } = [gdir (θ)] 2 ḡ 2 d [σ 2 σr 2 (e 2δTr 1) ] +1 Prof. Israel Cohen 15\30

Directional Power Ratio (cont.) Configuration Signal Model Directional Array Response Directional Power Ratio Experimental Results Replace E z { } temporal smoothing The Directional Power Ratio quality measure: PR(t) = Pdir (t) P opp (t) = t t T [zdir (τ)] 2 dτ t t T [zopp (τ)] 2 dτ = [gdir (θ)]2 ḡ 2 DRR(t)+1 Non-intrusive DRR estimator: PR-DRR(t) = ḡ 2 [g dir (θ)] 2 ( P dir ) (t) P opp (t) 1 Prof. Israel Cohen 16\30

Experimental Results Configuration Signal Model Directional Array Response Directional Power Ratio Experimental Results Experiments: Variable source-microphone distance with fixed T 60. Variable Ì 60 with fixed source-microphone distance. Simulation environment: Prof. Israel Cohen 17\30

Experimental Results (cont.) Configuration Signal Model Directional Array Response Directional Power Ratio Experimental Results Reference quality measures Speech-to-reverberation modulation energy ratio (SRMR) (Falk, Zheng, and Chan, 2010) Envelope Variance (EV) (Wolf and Nadeu, 2014) Correlation coefficients with: Clarity (C50) (Kuttruff, 2009) ITU-T P.862 (PESQ) ITU-T P.563 Input type White noise Speech signals Correlation ref. Correlation ref. Test type Algorithm C50 C50 PESQ P. 563 T 60 = 0.3 sec, PR 0.999 0.999 0.911 0.712 variable distance SRMR -0.27 0.845 0.973 0.934 EV -0.66 0.931 0.994 0.875 distance = 0.5 m, PR 0.944 0.951 0.899 0.562 variable T 60 SRMR 0.392 0.640 0.991 0.873 EV 0.235 0.614 0.984 0.912 Prof. Israel Cohen 18\30

Experimental Results (cont.) Configuration Signal Model Directional Array Response Directional Power Ratio Experimental Results Reference DRR measure Coherent-to-diffuse-ratio (CDR)-based DRR (Jeub, Nelke, Beaugeant, and Vary, 2011) Correlation coefficient with: DRR Input type White noise Speech signals Correlation ref. Correlation ref. Test type Algorithm DRR DRR T 60 = 1 sec, PR-DRR 0.999 0.999 variable distance CDR 0.964 0.972 Distance = 2 m, PR-DRR 0.999 0.999 variable T 60 CDR 0.852 0.913 Prof. Israel Cohen 19\30

Experimental Results (cont.) Configuration Signal Model Directional Array Response Directional Power Ratio Experimental Results Performance of the DRR estimate for variable source-microphone distance: PR-DRR [db] (solid-circled line), and the true DRR [db] (dashed-line), as a function of source-microphone distance, with fixed T 60 = 0.3 sec. DRR [db] 18 16 14 12 10 8 6 4 DRR PR-DRR 2 0 2 0 0.5 1 1.5 2 2.5 3 source-microphone distance [m] Prof. Israel Cohen 20\30

Experimental Results (cont.) Configuration Signal Model Directional Array Response Directional Power Ratio Experimental Results Performance of the DRR estimate for variable SNR: Absolute difference of the proposed DRR estimate PR-DRR [db] (solid-circled line), and of Jeub et al. CDR-based DRR estimate [db] (dashed-asterisk line), as a function of SNR [db]. T 60 = 0.3 sec and source-microphone distance = 0.5 m. 6 5 CDR PR-DRR AD-DRR [db] 4 3 2 1 0 5 10 15 20 25 SNR [db] Prof. Israel Cohen 21\30

Experimental Results (cont.) Configuration Signal Model Directional Array Response Directional Power Ratio Experimental Results Performance of the DRR estimate for variable T 60 - Off main-lobe: PR-DRR [db] (solid-circled line), and the true DRR [db] (dashed line), as a function of T 60. (source receiver angle [ 30..+30 ], source-microphone distance = 2 m) DRR [db] 8 6 4 2 0 2 4 6 8 DRR PR-DRR 10 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 reverberation-time T 60 [s] Prof. Israel Cohen 22\30

Experimental Results (cont.) Configuration Signal Model Directional Array Response Directional Power Ratio Experimental Results Recorded speech PR measure vs. source location: The measured PR of all microphone arrays (1 6) vs. the source position (hall of size 15 10 6 m, with 3 m spacing between adjacent arrays) PR 4.5 4 3.5 3 2.5 2 1.5 1 1 2 3 4 5 6 0.5 1 2 3 4 5 6 speaker location (in front of array #) Prof. Israel Cohen 23\30

System Configuration Implementation Demonstration Our system is based on clusters of uni-directional microphones, each looking at a different direction (for demonstration, we use four uni-directional microphones looking at direction 90 degrees apart). We compare the signal received by each of the microphones in a cluster (referred to as local) and compare it with the other local microphones. Prof. Israel Cohen 24\30

System Configuration (cont.) System Configuration Implementation Demonstration The PR-DRR measure is based on the assumption that direct signals are received with different levels by the local microphones, while indirect signals (reverberations) are received with a much closer level on all the local microphones. We compare the PR-DRR between all the clusters and select the audio source with the least amount of reverberation. Prof. Israel Cohen 25\30

Implementation System Configuration Implementation Demonstration The proposed procedure contains two stages. 1 The first stage is local: for each point we compute some features of the local signals. 2 The second stage is global: we select the least reverberant signal based on the features of the local signals. The features include local power and local power-ratio. The local power is associated with the directional microphone that measures the strongest signal at a given point, compared to the signals that are measured by the other microphones at that point. Prof. Israel Cohen 26\30

Implementation (cont.) System Configuration Implementation Demonstration The local power-ratio is defined as the ratio between the local maximum power and the local minimum power. Find maximum power max increases? yes Find the set of relevant points Find maximum power-ratio max Prof. Israel Cohen 27\30

Demonstration Introduction System Configuration Implementation Demonstration Noise Source Microphone Cluster 5 Microphone Cluster 4 Microphone Cluster 3 Source 1 Microphone Cluster 2 Microphone Cluster 6 Microphone Cluster 1 Source 2 Prof. Israel Cohen 28\30

Introduction Future Work Instead of using randomly placed omnidirectional microphones, we use directional microphone clusters. Calibration is needed only within clusters, and not between clusters. Short segments of the signals are sufficient. The PR-DRR facilitates fast-switching real-time selection of the microphone with the best reception amongst randomly placed microphone clusters in a conference room. Prof. Israel Cohen 29\30

Future Work Introduction Future Work Directional non-stationary noise. Time delay between signals in different clusters. Direction of arrival estimation. Clusters of circular differential microphone arrays. Combine the PR-DRR with other measures (e.g., spatial coherence). Prof. Israel Cohen 30\30