High-speed Noise Cancellation with Microphone Array

Similar documents
Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Discriminative Training for Automatic Speech Recognition

Mikko Myllymäki and Tuomas Virtanen

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE

Calibration of Microphone Arrays for Improved Speech Recognition

Optimal Adaptive Filtering Technique for Tamil Speech Enhancement

SEPARATION AND DEREVERBERATION PERFORMANCE OF FREQUENCY DOMAIN BLIND SOURCE SEPARATION. Ryo Mukai Shoko Araki Shoji Makino

REAL-TIME BLIND SOURCE SEPARATION FOR MOVING SPEAKERS USING BLOCKWISE ICA AND RESIDUAL CROSSTALK SUBTRACTION

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Modulation Spectrum Power-law Expansion for Robust Speech Recognition

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Speech Synthesis using Mel-Cepstral Coefficient Feature

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

DURING the past several years, independent component

Using RASTA in task independent TANDEM feature extraction

Simultaneous Recognition of Speech Commands by a Robot using a Small Microphone Array

Chapter 4 SPEECH ENHANCEMENT

Recent Advances in Acoustic Signal Extraction and Dereverberation

Speech Enhancement Using a Mixture-Maximum Model

Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping

TARGET SPEECH EXTRACTION IN COCKTAIL PARTY BY COMBINING BEAMFORMING AND BLIND SOURCE SEPARATION

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Speech Enhancement using Wiener filtering

Multirate Algorithm for Acoustic Echo Cancellation

Robust Low-Resource Sound Localization in Correlated Noise

Adaptive Speech Enhancement Using Partial Differential Equations and Back Propagation Neural Networks

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

Performance Evaluation of Noise Estimation Techniques for Blind Source Separation in Non Stationary Noise Environment

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

A Three-Microphone Adaptive Noise Canceller for Minimizing Reverberation and Signal Distortion

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 6, AUGUST

REAL TIME DIGITAL SIGNAL PROCESSING

TIMIT LMS LMS. NoisyNA

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Audio Restoration Based on DSP Tools

Sound Processing Technologies for Realistic Sensations in Teleworking

Adaptive Noise Reduction Algorithm for Speech Enhancement

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

+ C(0)21 C(1)21 Z -1. S1(t) + - C21. E1(t) C(D)21 C(D)12 C12 C(1)12. E2(t) S2(t) (a) Original H-J Network C(0)12. (b) Extended H-J Network

SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim

Speech Enhancement Based On Noise Reduction

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

Report 3. Kalman or Wiener Filters

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research

Comparative Study of Different Algorithms for the Design of Adaptive Filter for Noise Cancellation

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

ANALYSIS-BY-SYNTHESIS FEATURE ESTIMATION FOR ROBUST AUTOMATIC SPEECH RECOGNITION USING SPECTRAL MASKS. Michael I Mandel and Arun Narayanan

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

Image De-Noising Using a Fast Non-Local Averaging Algorithm

A Real Time Noise-Robust Speech Recognition System

BLIND SOURCE separation (BSS) [1] is a technique for

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

Detection Algorithm of Target Buried in Doppler Spectrum of Clutter Using PCA

Adaptive Systems Homework Assignment 3

Acoustic Echo Cancellation using LMS Algorithm

A Spectral Conversion Approach to Single- Channel Speech Enhancement

SPEECH communication under noisy conditions is difficult

VOL. 3, NO.11 Nov, 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Source Separation and Echo Cancellation Using Independent Component Analysis and DWT

Microphone Array project in MSR: approach and results

WHITENING PROCESSING FOR BLIND SEPARATION OF SPEECH SIGNALS

Automotive three-microphone voice activity detector and noise-canceller

How to Use the Method of Multivariate Statistical Analysis Into the Equipment State Monitoring. Chunhua Yang

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet

MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM

Application of Affine Projection Algorithm in Adaptive Noise Cancellation

Drum Transcription Based on Independent Subspace Analysis

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

ICA for Musical Signal Separation

Enhancement of Speech in Noisy Conditions

MIMO Receiver Design in Impulsive Noise

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System

FROM BLIND SOURCE SEPARATION TO BLIND SOURCE CANCELLATION IN THE UNDERDETERMINED CASE: A NEW APPROACH BASED ON TIME-FREQUENCY ANALYSIS

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

A Novel Technique for Automatic Modulation Classification and Time-Frequency Analysis of Digitally Modulated Signals

Single channel noise reduction

INSTANTANEOUS FREQUENCY ESTIMATION FOR A SINUSOIDAL SIGNAL COMBINING DESA-2 AND NOTCH FILTER. Yosuke SUGIURA, Keisuke USUKURA, Naoyuki AIKAWA

Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators

Speech Recognition using FIR Wiener Filter

SIMULATIONS OF ADAPTIVE ALGORITHMS FOR SPATIAL BEAMFORMING

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

An Adaptive Algorithm for Speech Source Separation in Overcomplete Cases Using Wavelet Packets

NOISE ESTIMATION IN A SINGLE CHANNEL

Noise Reduction for L-3 Nautronix Receivers

Lecture 14: Source Separation

Voice Recognition Technology Using Neural Networks

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

A variable step-size LMS adaptive filtering algorithm for speech denoising in VoIP

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES

Speech and Music Discrimination based on Signal Modulation Spectrum.

Transcription:

Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent component analysis as a method for high-speed elimination of noise in speech input to mobile terminals. The effectiveness of the proposed method is confirmed by evaluation experiments that reproduce an actual mobile environment. Zhipeng Zhang and Minoru Etoh 1. Introduction Noise cancellation is an important technology for improving call quality for mobile terminals and for speech user interfaces such as speech recognition and speech translation. Techniques for enhancing speech in signals that include background noise are being researched widely. The single microphone-based spectral subtraction [1] has been widely known as a technique for suppressing background noise. The technique differentiates between intervals with speech and intervals without speech. The input signal for the intervals without speech is recognized as background noise and use it to create a noise spectrum. The noise spectrum is then subtracted from the input signal for the intervals with speech and noise are mixed to suppress the noise. Accordingly, good performance is achieved in the noise cancellation when the noise signal is stationary and the technique is easily implemented, so this technique is currently in wide use. When the background noise is a non-stationary signal, however, such as in a noisy restaurant or with vehicles coming and going in busy traffic, good noise cancellation performance cannot be obtained. For this reason, methods that use multiple microphones (microphone arrays) are being investigated. Since a microphone array uses spatial information such as the phase differences among the signals that arrive from the sound source to each microphone to suppress noise, it has better noise suppression performance than a single microphone that assumes a stationary noise signal. There are a beam forming method [2] and a Blind Source Separation (BSS) method [3] that uses Independent Component Advantages Disadvantages Analysis (ICA) in the microphone array methods. The comparison of these two methods are shown in Table 1. Because of limited installation space and computing power, it is currently very difficult to mount many microphones on a mobile terminal. Nevertheless, practical use of a small microphone array with a limited number of microphones is now possible. The beam forming method has a long history, and PDA terminals that combine adaptable beam forming and nonlinear signal processing for hands-free communication are already on the market. However, beam forming in principle assumes that the desired sound source is in a different position from other sound sources, so when the posi- Table 1 Comparison of methods based on a microphone array Beam forming method Already proven in commercial use Poor performance in automatic tracking of a moving sound source ICA-based BSS method Can automatically follow a moving sound source Estimation of the separation matrix takes computation Hardware and microphone cost are barriers to commercialization 31

High-speed Noise Cancellation with Microphone Array tion of the sound source changes or the noise and target sound sources are in the same direction, the noise canceling performance may be reduced. As opposed to the beam forming method, which separates sound sources that are in different spatial locations, the ICA-based BSS method uses signal independence to separate sound sources that have independent statistical properties, and so does not require location information in principle. That is to say, even if the noise and the target sound source are in the same direction, only the target sound can be extracted, and has an advantage of a wider range of application. On the other hand, however, the ICA requires successive learning of the statistical properties (maximization of a non-gaussian distribution *1 to be exact). The solution to this optimization problem requires nonlinear iterative computing, which makes the ICA unsuitable for real-time processing. Although a module that achieves greatly improved separation performance in a real-time actual environment by implementing high-speed computation has been developed, the implementation cost is high because it requires dedicated hardware and a directional microphone. This becomes an obstacle to commercialization as a product. In this article, we propose a new ICA-based BSS method for obtaining a high quality speech signal with a small number of microphones. The proposed method uses the fact that the parameters of the transfer function *2 from the user's mouth to the microphone of the mobile terminal settle within a prescribed range, and adopts the Maximum a Posteriori Probability (MAP) of these parameters. This is a way to estimate the parameters, and the parameters are estimated so as to maximize the a posteriori probability of the parameters based on the speech data. Furthermore, since the ICA of this method converges faster than the conventional ICA, speech of higher quality can be extracted. Below we describe evaluation experiments on ICA-based noise cancellation and the ICA that makes use of the transfer function. 2. Noise Cancellation Using ICA In the previous chapter, ICA-based BSS was described as a method for separating out the target signal. The method estimates multiple linear mixed signals without using any knowledge about the s 1 : User speech from the target sound source s 2 : Noise from the interference sound source Mixing process a 11 : Approximation constant a 12 : Approximation constant a 21 a 22 y 1, y 2 : User speech of mixed recorded noise original signal or the mixing process. There are two types of BSS method: time domain ICA and frequency domain ICA. The proposed method uses the frequency domain ICA for simplicity in handling the transfer function. Furthermore, the environment is assumed to have two sound sources, which are speech (target sound source) and noise (interference sound source) to simplify the target system. This makes it possible to use only two microphones, thus reducing both the computational complexity and the implementation cost. 2. 1 Model for Mixed Signals (Measured Signal) in an Actual Environment The model for mixed signal separation in a two-microphone ICA system for mobile terminal is shown in Figure 1, where s represents the sound source signal. The term s 1 is the user speech from the target sound source and s 2 is noise from the interference sound Separation process w 12 : Approximation constant w 21 w 11 y 1 z 1 Microphone 1 Microphone 2 w 22 : Approximation constant z 1 : Separated noise z 2 : Separated user speech Figure 1 Mixed signal separation model for a two-microphone ICA system for mobile terminals y 2 z 2 *1 Non-Gaussian distribution: The property of a probability variable that does not show a Gaussian probability distribution. *2 Transfer function: Ratio of the Laplace transform of the output signal and the Laplace transform of the input signal in a transmission system. 32

source. The signals y 1 and y 2 detected at the two microphones are recorded via the transmission path from the sound source signal. Denoting the transfer function from the signal source to the microphone as A, the relation between the signal source and transfer function and the measured data is linear, y=as. However, A is a mixed array, and each element of A represents the transfer situation from the respective target and interference sound sources to the two microphones. Here, the interference sound source and the target sound source are assumed to be statistically independent. 2. 2 Separation Signal Model The BSS method uses the signal independence to reconstruct the original signal from the mixed signal y. The separation matrix W is obtained and the separation signal z is obtained from equation (1). a If the entire transfer function from the signal source to the microphone is known, then z can be reconstructed to the sound source signal by calculating W with W=A -1, but the details of transfer function A are not known in advance. Also, the transfer function changes if the sound source moves, some means of changing the separation matrix W to track the moving sound source is required. Therefore, on the basis of the assumed independence with respect to the signal source, we adopted the BSS method that reconstruct the original signal source, by estimating the separation matrix W for the independence to be maximum. 2.3 Estimation of the Separation Matrix Here we explain a method based on the maximum likelihood criterion *3, which is often used for estimating the separation matrix [4][5]. If p(y) is the probability distribution function for measured signal y, then the likelihood with W as a variable is expressed as p{y(t)/w}. Actually, the log of the likelihood is used more often for convenience in computation. Estimation with the log maximum likelihood is expressed by the following equation (2). ^ 9 arg max log F{O(J)/9} J=1 s Generally, W^ cannot be solved analytically by using this equation, so an iterative computation using an equation such as equation (3) must be used. 9 E+ 9 E 9 From the gradient method *4, W 6 log p{y(t)/w} (I E[ (y) y T ])W d f Here, I is the unit matrix, E[X] is the expected value of X, and ø(y) is the differential of the probability distribution function of y. Finally, the updating equation for W is as follows. W i+1 W i {I E[ (y) y T ]}W i g This W is used as the basis for independent component separation in the frequency domain, and then the frequency separation signal is reconverted into the time domain to obtain the independent signals in the time domain. 3. ICA Based on a Transfer Function Conventional estimation method based on the maximum likelihood criterion is one of the general non-linear optimization methods that involve multiple local optima and require iteration. Unless some means is contrived to overcome it, the result may depend on the initial value and the number of iterations needed may be large and indeterminate. These issues are particularly troublesome in the case of non-stationary noise, for which tracking is a strikingly difficult problem. To solve that problem, we propose using the a priori knowledge that the parameters of the transfer function for the space between the user s mouth and the microphone are confined to a certain range to achieve an optimization method that is both fast and stable. The flow of the proposed method is *3 Maximum likelihood criterion: A criterion in which the probability of obtaining the observed data is maximum, assuming a particular probability model. *4 Gradient method: One of the methods for numerical optimization using the slope of a function (differential of a vector). 33

High-speed Noise Cancellation with Microphone Array shown in Figure 2. A major feature of Initialization Iterative estimation Setting of the initial values w 12 and w 22 Estimation of w 11 : Based on the energy minimization criterion of noisy interval Estimation of w 21 : Based on uncorrelated signal criterion Initial value of separation matrix W Estimation of separation matrix W Sound source separation Results Figure 2 Processing procedure the proposed method is that the computation is done in two stages, which are initialization and iterative estimation. In the first stage, the initial values of the transfer function parameters are obtained from the relation between the user s speaking position and the microphone position. The obtained initial values are then used to estimate the separation matrix. In the mobile phone environment, the position of the user s mouth relative to the microphone can be considered as constrained to a certain range. Therefore, we use the parameters of the transfer function for between the user s mouth and the microphone. For the background noise, on the other hand, the parameters are unknown, but they can be estimated by using data from the noise intervals at the beginning of the speaking. If the parameter distribution is even partially known in advance, parameter estimation can be formalized by MAP estimation rather than maximum likelihood estimation, allowing accurate and stable estimation of the separation matrix with fewer iterations. Because the position of the user s mouth relative to the microphone is confined to a certain range, we can assume that a part of the mixed matrix (a 11 and a 12 ) to be nearly constant, and the part that corresponds to the separation matrix (w 12 and w 22 ) can also be assumed as nearly constant. 3.1 Estimation of the Initial Values We first measure the frequency response between the user s mouth and the microphone. Those measurements are used in the following procedure to estimate the initial values of the separation matrix elements that relate to the transmission of the target sound source. 1) Set the Initial Value of w 11 Detect intervals during which the user is not speaking. Use the measured data from the speechless intervals (y 1 and y 2 ) and equation (6) to obtain w 11 such that of energy of output z 1 is minimized. w 11 arg min R[z 1, z - 1] h Here, R[x, y] expresses the correlation of x and y [5]. Estimate w 11 from equations (5) and (6) in the following way. w 11 a 1 (R 22 R 21 )/(R 11 R 12 ) j Here, R ij R[y i, y j ]. 2) Set the Initial Value of w 21 Based on the uncorrelated signal criterion between z 1 and z 2. R[z 1, z 2 ] 0 k With this criterion, obtain the initial value of w 21. -O O -O O -O O -O O l 3.2 Estimation of the Separation Matrix Generally, if there is a priori knowledge about the parameters, use of the MAP criterion for estimation is considered effective. The a posteriori probability of the separation matrix, p(w/y), is expressed as the product of the a priori probability of W, p(w), and the likelihood, p(y/w). p(w/y) p(w)p(y/w) 0 As we can see from the above equation, if, without prior knowledge of W, p(w) is assumed to be a uniform distribution, then the MAP criterion and the maximum likelihood criterion become the same. Given a priori probability p(w) about W, a more accurate estimation is possible. The equation for estimation of W on the basis of the MAP 34

criterion is as follows. ^ 1 Here, assuming that a priori probability p(w) concerning W is a normal distribution, the density function p(w) is as shown below. The expected value µ is taken as the initial value of the W obtained in Section 3.1. The variance represents the change in the prior estimated value of the separation matrix. 2 When estimating W with the gradient method based on the MAP criterion, 3 The second term in equation (13) is the same as the maximum likelihood estimation, and the first term is as follows. Thus, The update formula is 6 9 arg max log [ F(9)F{O(J)/9}] J=1 1 (W ) p(w) exp{ 2 } 2 2 2 W log p(w) log p(w) log[ p(w) p{y(t)/w}] log[ p{y(t)/w}] (W )/ 2 9 {1 (O) O 6 }9 (9 )/ 2 4 5 W i+1 W i {I (y) y T }W i (W i )/ 6 2 The separation matrix estimated as above is used to perform the separation and extract the target signal. 4. Evaluation Experiments 4.1 Evaluation Data We evaluated the proposed method by recognition of digits in connected speech using 30 digits uttered continuously by one female speaker. The sampling rate was 16 khz. We used the mixed matrix 4, 1 2, 3 A to create noisy speech data by mixing airport noise and noise-free speech (in the frequency domain). The experiments described below assume that part of the separation matrix (w 12 : 3.0, w 22 : 2.0) is already known. 4.2 Overview of the Speech Recognition Experiments For the speech recognition, we used the Hidden Markov Model (HMM) *5 speech recognition software [6] openly published by the University of Cambridge. The software uses a 12-dimensional feature vector that comprises the Mel-Frequency Cepstrum Coefficient (MFCC) *6 frequency characteristics and the normalized power *7. The HMM parameters include the countable states and the probability distribution function for the output of each state. In the speech recognition, the output probability function for each state is represented by a mixture of multiple normal Gaussian distributions. In these experiments, the HMM parameter for the number of states is five, and the number of normal Gaussian distribution mixtures is four for each state. 4.3 Evaluation Results Samples of the speech signal extracted by the BSS method based on the conventional and proposed ICA are shown in Figure 3. The proposed method suppresses the noise component in the speech signal more than does the conventional method. To confirm the effectiveness in practical use, we performed evaluation experiments in which the method was used as speech recognition preprocessing. The target sound source was extracted by sound source separation and then evaluated by speech recognition. The proposed method and the conventional method evaluation results (accuracy, %) are shown in Figure 4, where the horizontal axis is the number of iterations required for separation matrix estimation. With a single estimation, the proposed method performed with about the same results as did the conventional method with multiple rounds of estimation results. We confirmed that the proposed method improved the recognition rate compared to the conventional method by from 79% to 84%. 5. Conclusion We have described noise cancellation technology that uses a microphone *5 HMM: A statistical method for modeling indeterminate time series data. *6 MFCC: A series of speech feature coefficients modeled on human auditory perception. *7 Normalized power: The normalized value of a speech signal power in log domain. 35

High-speed Noise Cancellation with Microphone Array (a) Conventional method We expect this microphone array noise cancellation technology to broaden the scope of future speech communication and serve as a basic technology for speech recognition and translation services. (b) Proposed method Figure 3 Speech signal sample: BSS method based on the conventional and proposed ICA Accuracy rate (%) 90 Proposed method Conventional method 80 70 1 2 4 8 Trials Figure 4 Evaluation results array. This technology uses a twomicrophone array and introduces an firmed that measuring and using the actual mobile phone environment, con- optimization method that uses sound parameters of the transfer function source location information obtained between the user s mouth and the from an actual mobile phone environment to highly general ICA statistical recognition performance with less com- microphone resulted in better speech signal processing. The experiments, putational complexity compared to the which reproduced a use scenario in an conventional method. References [1] S.F. Boll: Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on ASSP, No. 2, pp. 113-120, 1979. [2] Y. Kaneda: Audio Systems and Digital Processing, 1995. [3] T. Nishikawa, S. Araki, S. Makino and H. Saruwatari: Optimization of Band Divisions in Blind Source Separation using Band-division ICA, 2001 Spring Meeting of the Acoustical Society of Japan, 2001. [4] T-W. Lee, et al.: Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Sub-Gaussian and Super-Gaussian Sources, Neural Computation, Vol. 11, pp. 417-441, 1999. [5] A. Bell and T. Sejnowski: An Information Maximization Approach to Blind Separation and Blind Deconvolution, Neural Computation, Vol. 7, pp. 1129-1159, 1995. [6] M.J.F. Gales and P.C. Woodland: Recent advances in large vocabulary continuous speech recognition: An HTK perspective, ICASSP, 2006. 36