Speech Enhancement using Wiener filtering

Similar documents
NOISE ESTIMATION IN A SINGLE CHANNEL

Speech Synthesis using Mel-Cepstral Coefficient Feature

EE482: Digital Signal Processing Applications

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Chapter IV THEORY OF CELP CODING

Digital Speech Processing and Coding

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Chapter 4 SPEECH ENHANCEMENT

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

Overview of Code Excited Linear Predictive Coder

Audio Signal Compression using DCT and LPC Techniques

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition

Speech Synthesis; Pitch Detection and Vocoders

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Speech Enhancement in Noisy Environment using Kalman Filter

OFDM Transmission Corrupted by Impulsive Noise

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Speech Compression Using Voice Excited Linear Predictive Coding

Chapter 9. Digital Communication Through Band-Limited Channels. Muris Sarajlic

Auditory modelling for speech processing in the perceptual domain

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

APPLICATIONS OF DSP OBJECTIVES

Linguistic Phonetics. Spectral Analysis

COMPARISON OF CHANNEL ESTIMATION AND EQUALIZATION TECHNIQUES FOR OFDM SYSTEMS

Adaptive Filters Application of Linear Prediction

THERE are numerous areas where it is necessary to enhance

Can binary masks improve intelligibility?

Page 0 of 23. MELP Vocoder

Speech Enhancement Based On Noise Reduction

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Single channel noise reduction

Adaptive Filters Linear Prediction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Pitch Period of Speech Signals Preface, Determination and Transformation

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS. Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik

Speech Coding using Linear Prediction

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Speech Enhancement Techniques using Wiener Filter and Subspace Filter

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Speech synthesizer. W. Tidelund S. Andersson R. Andersson. March 11, 2015

Joint Transmitter-Receiver Adaptive Forward-Link DS-CDMA System

The fundamentals of detection theory

High-speed Noise Cancellation with Microphone Array

Communications Theory and Engineering

Multirate Algorithm for Acoustic Echo Cancellation

Audio Restoration Based on DSP Tools

REAL-TIME BROADBAND NOISE REDUCTION

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Digitally controlled Active Noise Reduction with integrated Speech Communication

Introduction to cochlear implants Philipos C. Loizou Figure Captions

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

Application of Affine Projection Algorithm in Adaptive Noise Cancellation

A Spectral Conversion Approach to Single- Channel Speech Enhancement

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

Speech Enhancement Using a Mixture-Maximum Model

NCCF ACF. cepstrum coef. error signal > samples

Wavelet Speech Enhancement based on the Teager Energy Operator

Wideband Speech Coding & Its Application

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

RECENTLY, there has been an increasing interest in noisy

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

Voice Activity Detection for Speech Enhancement Applications

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation

Cepstrum alanysis of speech signals

Adaptive Filters Wiener Filter

Acoustic Echo Cancellation using LMS Algorithm

Noise Plus Interference Power Estimation in Adaptive OFDM Systems

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

HIGH ORDER MODULATION SHAPED TO WORK WITH RADIO IMPERFECTIONS

Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding

VQ Source Models: Perceptual & Phase Issues

A Sphere Decoding Algorithm for MIMO

Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA

Synthesis Algorithms and Validation

Digital Signal Processing of Speech for the Hearing Impaired

Analysis of LMS and NLMS Adaptive Beamforming Algorithms

Comparison of CELP speech coder with a wavelet method

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Computational Complexity of Multiuser. Receivers in DS-CDMA Systems. Syed Rizvi. Department of Electrical & Computer Engineering

CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS

A LPC-PEV Based VAD for Word Boundary Detection

EE 6422 Adaptive Signal Processing

Monophony/Polyphony Classification System using Fourier of Fourier Transform

Study of Turbo Coded OFDM over Fading Channel

A Survey and Evaluation of Voice Activity Detection Algorithms

Improved Detection by Peak Shape Recognition Using Artificial Neural Networks

Transcription:

Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing effects of additive white noise on a speech signal is considered when a noisereference is not available. Wiener filtering with all-pole modeling built built upon line spectral pair (LSP) frequencies is considered. The filter parameters have been optimized to achieve the highest reduction of noise. The noise is filtered using an iterative LSP-based estimations of LPC parameters. The speech model filter uses an accurate updated estimate of the current noise power spectral density with the aid of a voice activity decoder. I. INTRODUCTION The problem examined here is the enhancement of speech disturbed by additive noise. The basic assumption is that the enhancement system does not have access to any other signal except the corrupted speech itself. The is, no noisereference signal is available, which could allow one to employ classical adaptive noise canceling [l]. The objective of obtaining higher quality and/or intelligibility of the noisy speech may have a fundamental impact on applications like speech compression, speech recognition, and speaker verification, by improving the performance of the relevant digital voice processor. The technique considered in this paper is based on the all-pole model of the vocal tract and uses the estimated coefficients to process the noisy speech with a wiener filter. This is a new and improved iterative speech enhancement technique based on spectral constrains. The iterative technique, originally formulated by Lim and Oppenheim [2], attempts to solve for the maximum likelihood estimate of a speech waveform in additive white noise using Linear predictive coding (LPC). Thus the LPC parameters are estimated using the output of the wiener filter. The LPC model of this estimation is in the form of complex numbers in z - plane and the complex numbers cannot be associated for interframe smoothing; so the LPC poles are represented as Line Spectral Pair (LSP). Inter-frame spectral constraints are applied to LSP parameters across time on a fixed- frame basis. These constraints are applied to ensure that vocal tract characteristics do not vary wildly from frame to frame when speech is present. This method allows constraints to be efficiently applied to speech model pole movements across time so that formants lay along smooth tracks. An N- th order LPC model pole positions are equivalently represented by a set of N/2 LSP position roots and N/2 difference roots. The position root (P) and the difference root (Q) represent a lossless models of the vocal tracts with the glottis closed and open, respectively. They lie on the unit circle in the complex z-plane. The lightly formant locations in the signal s LPC model spectrum are highly correlated with the LSP position roots and the bandwidths of the LPC spectrums at these formants are highly correlated with the LSP difference roots. For a stable LPC model, there is a root at z=- 1 and at z= 1 resepcive!y for Pand Q. P and Q roots alternate around the unit circle. For each iteration to be filtered the LPC pole of the speech estimate is smoothed out around that particular pole in different frames using the LSP equivalent roots. A lower bound on minimum distance of a difference root to adjacent position root is applied to restrain the sharpness of any LPC model s formant to be speech like. Here, we have considered one future frame and one past frame for smoothing in a particular iteration. Then the smoothed LSP roots are again converted to the smoothed LPC parameters. The smoothed LPC model power spectrum and the current noise power estimates are used to get the next iteration of the Wiener filter. The output from the previous Weiner filter iteration is used along with the original input data to get less muffled sounding speech estimate, with a tradeoff of slightly increased residual noise in the output. When we input the noisy signal initially to the Wiener filter input we also input the Fast Fourier transform(fft) of the signal to the Voice Activity Detector (VAD). II. ALGORITHM We know that over a given frame of speech, say coefficients. The method by Lim and Oppenheim is based on maximum a posteriori (MAP) estimation of the LP coefficients, gain, and noise-free speech. The method is an iterative one in which the LP parameters and speech frame are repeatedly reestimated. It is assumed that all unknown 365

parameters are random with a priori Guasian pdf's. The resulting MAP estimator, which maximizes the conditional pdf of parameters given the observations, corresponds to solution of a set of nonlinear equations for the additive white Guassian noise (AWGN) case. In the noise case, the estimator requires a k, g k, and S I be chosen to maximize the pdf p (a k,, g k, s I y). Essentially, we wish to perform joint MAP estimation of the LP speech modeling parameters and noise-free speech by maximizing the joint probability density which is p(a k,s k/y,g k,s 1 ), where the terms g k and s I are assumed to be known(or estimated). Lim and Oppenheim consider a sub optimum solution using sequential MAP estimation of S, followed by MAP estimation of a k, g k, given s k. The sequential estimation procedure is linear at each iteration and continues until some criterion is satisfied. With further simplifying assumptions, it can be shown that MAP estimation of s k is equivalent to non causal Wiener filtering of the noisy speech y. Lim and Oppenheim showed that this technique, under certain conditions, increases the joint likelihood of a k and Sk with each iteration. It can also be shown to be the optimal solution in the MSE sense for a input noisy signal is digitized at an rate of 8kHz, and the time series are processed in frames. The number of samples considered per frame is 256 (32 msec) for determining the speech signal. The noise spectral density, or noise variance for the white Guassian case. musl be estimated during non speech activity. Step 1. Estimated a k from Sk USing either: a.) First M values as the initial condition vector, or b.) Always assume a zero initial condition S k = O. Step 2. Estimate S k (N) given the present estimate a l (N). a.) Estimated a,, estimate the speech spectrum : resulting equation for estimating the noise-free speech is simply the optimum Wiener filter Where the extra index k included to indicate the k th parameters holds, this is the optimum processor in a MSE sense. If the Guassian assumption does not hold, this filter is the best linear processor for obtaining the next speech length of signal frame and k is the iteration number. With this relation, sequential MAP estimation- of the LP parameters and the speech frame generally follows these steps: The first step is perfomed via LP parameter and the second step through adaptive Wiener filtering. The final implementation of the algorithm is presented below. The 366

LSPS only periods. When speech is present, the noise is attenuated by the filter, leaving oniy speech. The energy of the inverse filter filtered signal is compared to a threshold which is updated only during noise-only periods. This threshold rides above the energy of the noise signal after it has been filtered. If the energy is greater than the threshold, then speech is detected. There are variables which need to be updated only when noise is present, but it is obviously dangerous to use the output of the VAD to decide when to update them, because this output is itself the function of these variables. for this we use a secondary VAD. The secondary VAD detects only noise periods but does not endpoint the speech. The secondary VAD makes it decision based on the fact that if the frames have a similar spectral shape for a long period of time then it is either speech or noise. Therefore, if the distortion between frames is below a fixed threshold for a sufficiently long period of time, it is assumed that noise has been detected, unless a steady pitch component has been detected, in which case the input was probably a vowel sound. Another criteria we have used for this VAD is that if the above VAD detects noise frame in between 6 speech frames or before 3 speech frames or after 3 speech frame then that frame is considered to be unvoiced and is finally considered as speech frame. This is in order to fail-safe whenever it is not possible distinguish between unvoiced and speech. III. SIMULATION RESULTS Voice Activity Detector The most critical component of the system is the VAD. A VAD operating in a mobile environment must be able to detect speech in the presence of a range of very diverse types of acoustic background noise. The biggest difficulty in detection of speech is in the presence of very low signal to noise ratio. Thus, a VAD based on the spectral characteristics of the input signal is used in this paper. The block diagram of this VAD is shown in Fig. 1. It incorporates an inverse filter, the coefficients of which are derived during noise The purpose of this computer simulation is to test the performance of the above technique. The first step is to test the performance of VAD (voice activity detector). We need to use VAD to distinguish if current frame is pure noise or noisy speech. For the case SN R is too low, it is really hard to detect the noise frame and unvoiced frame. We use the sentence Don t ask me to carry an oily rage like that" with noise to test the performances of the simulation programs. And we also test the simulation results vs. different SNR. For larger SNR,the processed speech signal gets better quality. Figure 3. shows the output results with input noisy speech for SNR of 10 db. We have performed speech enhancement for SNR of about 5 db, but we can see that as the SNR becomes considerably low is it impossible to detect between unvoiced speech and the noise. Hence this algorithm does not work for very low SNR. Also SNR of 5 db or lower means very low SNR for unvoiced portion of speech. Thus this aspect also needs to be considered for future study. In the table below we have shown input SNR and output SNR for voiced, unvoiced, and noise frame. Thus, we can see that enhancement is about 7-8 db. 368 4

Fig. 2- Table for speech enhancement Fig.3 - Original speech and the enhanced speech REFERENCES 1) John H. L. and Mark A. Clements, Constrained Iterative Speech Enhancement with Application to Automatic Speech Recognition. IEEE, 1988. 2) Jae S. Lim and Alan, V. Oppenheim, Enhancement and Bandwidth Compression of Noisy Speech. Invited paper, IEEE, 1979. 3) John H. L. Hanson and Levent M. Arslan, Robust Feature-Estimation and Objective Quality Assessment for Noisy Speech Recognition Using the Credit Card Copus, IEEE Trans, Speech and Audio Processing, Vol-3. No. 3, May 1995. 4) WYNN,Woodson, Transmitted Noise Reduction in communications systems, Patent Cooperation Treaty. 369 5