Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Similar documents
Enhancement of Speech in Noisy Conditions

Different Approaches of Spectral Subtraction Method for Speech Enhancement

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Signal Enhancement Techniques

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Modulation Domain Spectral Subtraction for Speech Enhancement

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Chapter 4 SPEECH ENHANCEMENT

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Lecture 4 Biosignal Processing. Digital Signal Processing and Analysis in Biomedical Systems

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments

Mel Spectrum Analysis of Speech Recognition using Single Microphone

REAL-TIME BROADBAND NOISE REDUCTION

RECENTLY, there has been an increasing interest in noisy

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

NOISE ESTIMATION IN A SINGLE CHANNEL

Different Approaches of Spectral Subtraction method for Enhancing the Speech Signal in Noisy Environments

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Estimation of Non-stationary Noise Power Spectrum using DWT

Analysis of the SNR Estimator for Speech Enhancement Using a Cascaded Linear Model

Chapter 3. Speech Enhancement and Detection Techniques: Transform Domain

Analysis Modification synthesis based Optimized Modulation Spectral Subtraction for speech enhancement

Speech Enhancement Based on Non-stationary Noise-driven Geometric Spectral Subtraction and Phase Spectrum Compensation

GUI Based Performance Analysis of Speech Enhancement Techniques

Performance Analysis of gradient decent adaptive filters for noise cancellation in Signal Processing

Modified Kalman Filter-based Approach in Comparison with Traditional Speech Enhancement Algorithms from Adverse Noisy Environments

Speech Coding using Linear Prediction

Single channel noise reduction

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement

Adaptive Noise Reduction Algorithm for Speech Enhancement

Audio Restoration Based on DSP Tools

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Single-channel speech enhancement using spectral subtraction in the short-time modulation domain

Frequency Domain Implementation of Advanced Speech Enhancement System on TMS320C6713DSK

Online Version Only. Book made by this file is ILLEGAL. 2. Mathematical Description

Optimal Adaptive Filtering Technique for Tamil Speech Enhancement

Speech Enhancement in Noisy Environment using Kalman Filter

Short-Time Fourier Transform and Its Inverse

PERFORMANCE ANALYSIS OF SPEECH SIGNAL ENHANCEMENT TECHNIQUES FOR NOISY TAMIL SPEECH RECOGNITION

Comparative Performance Analysis of Speech Enhancement Methods

Modified Least Mean Square Adaptive Noise Reduction algorithm for Tamil Speech Signal under Noisy Environments

Auditory modelling for speech processing in the perceptual domain

Wavelet Speech Enhancement based on the Teager Energy Operator

Speech Enhancement Based on Audible Noise Suppression

Speech Enhancement Using LPC Analysis-A Review

Speech Enhancement for Nonstationary Noise Environments

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Implementation of SYMLET Wavelets to Removal of Gaussian Additive Noise from Speech Signal

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

DESIGN AND IMPLEMENTATION OF ADAPTIVE ECHO CANCELLER BASED LMS & NLMS ALGORITHM

Speech Enhancement Using a Mixture-Maximum Model

Digital Signal Processing

Phase estimation in speech enhancement unimportant, important, or impossible?

Noise estimation and power spectrum analysis using different window techniques

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

Speech Enhancement By Exploiting The Baseband Phase Structure Of Voiced Speech For Effective Non-Stationary Noise Estimation

International Journal of Advancedd Research in Biology, Ecology, Science and Technology (IJARBEST)

Transient noise reduction in speech signal with a modified long-term predictor

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

Quality Estimation of Alaryngeal Speech

Calibration of Microphone Arrays for Improved Speech Recognition

/$ IEEE

Speech Enhancement Techniques using Wiener Filter and Subspace Filter

EE 403: Digital Signal Processing

ANUMBER of estimators of the signal magnitude spectrum

SPEECH communication under noisy conditions is difficult

SGN Audio and Speech Processing

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

EE 470 Signals and Systems

Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition

Adaptive Speech Enhancement Using Partial Differential Equations and Back Propagation Neural Networks

Contents. Introduction 1 1 Suggested Reading 2 2 Equipment and Software Tools 2 3 Experiment 2

Speech Enhancement Based On Noise Reduction

B.Tech III Year II Semester (R13) Regular & Supplementary Examinations May/June 2017 DIGITAL SIGNAL PROCESSING (Common to ECE and EIE)

MULTIRATE DIGITAL SIGNAL PROCESSING

24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE

Stochastic Image Denoising using Minimum Mean Squared Error (Wiener) Filtering

MINUET: MUSICAL INTERFERENCE UNMIXING ESTIMATION TECHNIQUE

A COHERENCE-BASED ALGORITHM FOR NOISE REDUCTION IN DUAL-MICROPHONE APPLICATIONS

Adaptive Line Enhancer (ALE)

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

CS3291: Digital Signal Processing

Fundamental frequency estimation of speech signals using MUSIC algorithm

Adaptive Noise Reduction of Speech. Signals. Wenqing Jiang and Henrique Malvar. July Technical Report MSR-TR Microsoft Research

SPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters

Advanced Digital Signal Processing Part 2: Digital Processing of Continuous-Time Signals

8.3 Basic Parameters for Audio

Corso di DATI e SEGNALI BIOMEDICI 1. Carmelina Ruggiero Laboratorio MedInfo

GSM Interference Cancellation For Forensic Audio

Transcription:

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Project Proposal Avner Halevy Department of Mathematics University of Maryland, College Park ahalevy at math.umd.edu Professor Radu Balan Department of Mathematics Center for Scientific Computation and Mathematical Modeling (CSCAMM) University of Maryland, College Park rvbalan at math.umd.edu Abstract This project will implement a few standard algorithms for reducing additive white noise in the processing of speech signals. Among these are spectral subtraction and iterative Wiener filtering. The performance of the algorithms will be evaluated on a database of IEEE sentences corrupted by several types of real-world noise. Initially only objective measures will be used to evaluate the quality of processed speech, but if time allows evaluation will be extended to include subjective listening tests as well.

Background The need to enhance speech signals arises in many situations in which the speech signal originates from a noisy location or is degraded by noise over a communication channel. Speech enhancement algorithms can be used to enhance both quality and intelligibility of speech signals, thus making communication more effective and reducing listener fatigue. The precise goals of speech enhancement algorithms depend on the specific application, and the specific type of noise involved, as well as its statistical relation to the clean signal. The main challenge in designing effective speech enhancement algorithms is reducing noise without introducing perceptible distortion to the speech signal. This project will focus on the reduction of additive white Gaussian noise which is statistically uncorrelated with the clean speech signal. We are assuming that y(n), the noisy signal, is composed of the clean speech signal x(n), and the additive noise signal, d(n), i.e. y(n) = x(n) + d(n). If time allows, we will also explore the possibility of dealing with colored noise Approach Since speech signals are highly non stationary, short time Fourier analysis will be used for the analysis and syntheses of the signal, with frames typically consisting of 15-20 msec of speech, during which the properties of the signal are assumed not to change significantly. The discrete Fourier transform (DFT), computed via the fast Fourier transform (FFT), will be used for this purpose. The overlap and add method (OLA) will be used for reconstructing the enhanced signal. Initially, two main algorithms will be implemented: Spectral subtraction This algorithm estimates the noise spectrum when speech is absent from the signal and subtracts it from the spectrum of the noisy signal to recover (an estimate of) the clean signal. The magnitude of the noise spectrum can be estimated by computing its average value during speech pauses, and the phase of the noise can be replaced by the phase of the noisy signal, which has been shown to be good enough for practical purposes. Precise upper bounds for the error thus introduced can be obtained. It has been shown that as long as the spectral SNR is larger than about 8 db, the error will not be perceptible by the auditory system. Performing the simplest spectral subtraction may result in negative spectral magnitude components, but various methods exist for rectifying this situation.

The main challenge in designing these methods is to avoid introducing musical noise, which can be especially prominent in unvoiced segments, where SNR values are low. As always, the simple subtraction comes at a price. If too much is subtracted, too much distortion will be introduced, whereas if too little is subtracted, too much noise will remain. This tradeoff will be explored and dealt with in several ways. Iterative Wiener filtering This algorithm constructs a linear time invariant (LTI) finite impulse response (FIR) filter which is optimal in the sense that it minimizes the mean square of the estimation error. The filter is constructed iteratively. At each iteration, given a previous estimate of the clean signal, linear prediction is used to estimate the parameters of a speech production model assumed to hold for the clean speech signal, and these are in turn used to update the filter and obtain a new estimate of the clean signal. The optimal number of iterations may vary with the characteristics of the clean signal. Various parameters in these two algorithms may be adjusted in order to optimize performance for specific types of signals. Different values will be experimented with and the effects will be reported. Additional algorithms: If time allows, two other algorithms will be considered. The first is the MMSE estimator of the spectral magnitude proposed by Ephraim and Malah. The second is an SVD-based algorithm. Implementation The algorithms will be implemented on a standard PC using MATLAB. Computational complexity is not expected to require special resources or a need for parallelization. Validation and Testing A basic validation of the spectral subtraction algorithm implementation will be done by setting the estimate of the noise spectrum magnitude equal to zero, in which case the output signal should be the input signal. Validation of the linear prediction module of the Wiener filtering algorithm will be done by artificially constructing a speech production model with known parameters, in which case the module should predict very nearly the same parameters.

A noisy speech corpus (NOIZEUS) containing 30 sentences from an IEEE database corrupted by eight different real-world noises at different SNRs, will be used for testing the algorithms. This corpus is available to researchers free of charge, to facilitate comparison of speech enhancement algorithms developed by different research groups. Performance of speech enhancement algorithms is usually judged in terms of quality and intelligibility of the enhanced speech. Quality measures may assess several different dimensions of the enhanced speech, and are concerned with how words were said. Intelligibility measures assess what words were said. Quality can be measured both subjectively and objectively. Subjective quality tests, as well as intelligibility tests, are usually highly time-consuming, and may possibly also require access to trained listeners. For this reason, in this project, evaluation of the algorithms will initially be confined to the use of objective measures. The measures that will be used are called SNR, where we consider the energy of the clean signal as compared with the energy of the error in estimation. Two versions will be used: The first, performed in the time domain, is called segmental SNR, and is defined as where x(n) is the clean signal, is the enhanced signal, N is the frame length, and M is the number of frames. The second, performed in the frequency domain, is called frequency weighted segmental SNR, and is defined as where K is the number of frequency bands, is the weight placed on the jth frequency band, is the short term clean signal energy contained in the jth frequency band in the mth frame, and is the analogous quantity for the energy of the error. quality. has been shown to perform moderately well in predicting subjective overall If time allows, we will also conduct a Mean Opinion Score test of quality, which is a widely used subjective quality measure. In this test listeners rate the signal on a five-point scale (1 corresponding to unsatisfactory, 5 to excellent) where the numerical value reflects the listener s subjective impression of overall quality.

Schedule, Milestones and Deliverables September: Preliminary background reading and formulation of project scope October: 9 Research Proposal and Presentation 31 Finish background reading November: Implementation of the spectral subtraction algorithm, including several variations December End of semester progress report and presentation January Spring break February Implementation of iterative Wiener filtering, including several variations March Testing, modification, and finalization of code April Prepare final report and presentation May Deliver final report and presentation

Bibliography [1] Deller, J., Hansen, J., and Proakis, J. (2000) Discrete Time Processing of Speech Signals, New York, NY: Institute of Electrical and Electronics Engineers [2] Quatieri, T. (2002) Discrete Time Speech Signal Processing, Upper Saddle River, NJ: Prentice Hall [3] Loizou, P. (2007) Speech Enhancement: Theory and Practice, Boca Raton, FL: Taylor & Francis Group