Modulator Domain Adaptive Gain Equalizer for Speech Enhancement

Similar documents
Modulation Domain Improved Adaptive Gain Equalizer for Single Channel Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Chapter 4 SPEECH ENHANCEMENT

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

Mel Spectrum Analysis of Speech Recognition using Single Microphone

REAL-TIME BROADBAND NOISE REDUCTION

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Speech Synthesis using Mel-Cepstral Coefficient Feature

EE482: Digital Signal Processing Applications

Chapter IV THEORY OF CELP CODING

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

DEMODULATION divides a signal into its modulator

Multichannel Wiener Filtering for Speech Enhancement in Modulation Domain

Speech Enhancement using Wiener filtering

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Audio Restoration Based on DSP Tools

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

RECENTLY, there has been an increasing interest in noisy

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

NOISE ESTIMATION IN A SINGLE CHANNEL

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Different Approaches of Spectral Subtraction method for Enhancing the Speech Signal in Noisy Environments

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Enhancement of Speech in Noisy Conditions

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

Audio Signal Compression using DCT and LPC Techniques

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System

Speech Enhancement in Noisy Environment using Kalman Filter

Speech Enhancement Based On Noise Reduction

Online Version Only. Book made by this file is ILLEGAL. 2. Mathematical Description

Phase estimation in speech enhancement unimportant, important, or impossible?

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Overview of Code Excited Linear Predictive Coder

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC

Modulation Domain Spectral Subtraction for Speech Enhancement

Reliable A posteriori Signal-to-Noise Ratio features selection

Speech Signal Enhancement Techniques

Automotive three-microphone voice activity detector and noise-canceller

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

Sound Synthesis Methods

HUMAN speech is frequently encountered in several

Speech Synthesis; Pitch Detection and Vocoders

Application of Affine Projection Algorithm in Adaptive Noise Cancellation

ON-LINE LABORATORIES FOR SPEECH AND IMAGE PROCESSING AND FOR COMMUNICATION SYSTEMS USING J-DSP

Introduction of Audio and Music

Speech Enhancement for Nonstationary Noise Environments

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Wavelet Speech Enhancement based on the Teager Energy Operator

Enhancement of Speech Communication Technology Performance Using Adaptive-Control Factor Based Spectral Subtraction Method

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Epoch Extraction From Emotional Speech

Fundamental frequency estimation of speech signals using MUSIC algorithm

OFDM Transmission Corrupted by Impulsive Noise

EXTRACTING a desired speech signal from noisy speech

NCCF ACF. cepstrum coef. error signal > samples

Modulation Spectral Filtering: A New Tool for Acoustic Signal Analysis

Enhanced Waveform Interpolative Coding at 4 kbps

A Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder

Auditory modelling for speech processing in the perceptual domain

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

Analysis on Extraction of Modulated Signal Using Adaptive Filtering Algorithms against Ambient Noises in Underwater Communication

Lecture 9: Time & Pitch Scaling

Optimal Adaptive Filtering Technique for Tamil Speech Enhancement

Robust Low-Resource Sound Localization in Correlated Noise

ROBUST echo cancellation requires a method for adjusting

Journal of American Science 2015;11(7)

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Speech Enhancement Using a Mixture-Maximum Model

ROBUST F0 ESTIMATION IN NOISY SPEECH SIGNALS USING SHIFT AUTOCORRELATION. Frank Kurth, Alessia Cornaggia-Urrigshardt and Sebastian Urrigshardt

Estimation of Non-stationary Noise Power Spectrum using DWT

DOPPLER SHIFTED SPREAD SPECTRUM CARRIER RECOVERY USING REAL-TIME DSP TECHNIQUES

A METHOD OF SPEECH PERIODICITY ENHANCEMENT BASED ON TRANSFORM-DOMAIN SIGNAL DECOMPOSITION

DEMODULATION divides a signal into its modulator

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE

Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

Comparative Performance Analysis of Speech Enhancement Methods

Audio Imputation Using the Non-negative Hidden Markov Model

Frequency Domain Implementation of Advanced Speech Enhancement System on TMS320C6713DSK

Nonuniform multi level crossing for signal reconstruction

L19: Prosodic modification of speech

Speech Compression Using Voice Excited Linear Predictive Coding

Innovative Communications Experiments Using an Integrated Design Laboratory

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

SGN Audio and Speech Processing

Adaptive Noise Reduction Algorithm for Speech Enhancement

Problem Sheet 1 Probability, random processes, and noise

Drum Transcription Based on Independent Subspace Analysis

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering

Transcription:

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement Ravindra d. Dhage, Prof. Pravinkumar R.Badadapure Abstract M.E Scholar, Professor. This paper presents a speech enhancement method for personal communication where the input signal is divided into a number of sub bands that are individually and adaptively weighted in time domain according to a short term SNR estimate in each sub band at every time an enhanced noise reduction method. The input signal is divided into a number of sub bands that are individually weighted in the time domain according to the short time signal to noise ratio estimate (SNR) in each sub band. Instead of focusing on suppression of the noise the method focuses on speech enhancement algorithms. The method has proven to be advantageous since it offers low complexity, low delay and low distortion. There working of AGE in modulation frequency domain with the use of a convex optimization demodulation technique. The performance of the modified AGE is compared with the traditional AGE and another modulation frequency domain AGE based on demodulation using the spectral center-of-gravity used performance measures are Signal to Noise Ratio Improvement (SNRI). Keywords: Adaptive gain equalizer, Noise reduction, Modulation and Convex demodulation, Speech enhancement. I. INTRODUCTION The Adaptive gain equalizer (AGE) is a time domain speech enhancement algorithm in which the speech signal is amplified based on signal-to-noise (SNR) estimates in sub bands. A signal is divided into sub bands for calculation of a gain which is independent for each band. The algorithm has shown advantages over contemporary techniques because of its low complexity implementation no requirement of voice activity detector and has no presence of musical noise [1]. Different types of background noise corrupt the otherwise clean speech signals in everyday communication. A phone call can be disturbed by a variety of noises present nearby ranging from computer fan noise to factory noise. There are a wide variety of context in which it is desired to enhance speech. The objective of enhancement is usually to improve the overall speech quality to increase intelligibility and to reduce listener s fatigue etc. In this paper, the specific goal we attempt to attain is to increase output to input SNR gains which is defined as the ratio of the output SNR to the input SNR. A very important application for speech enhancement is in conjunction with speech compression system. Because of the increasing role of digital channels coupled with the need for encrypting of speech and increased emphasis on integrated voice data networks speech compression system based on speech production model is destined to play an increasing important role in speech communication system. 412

It is generally agreed that the performance of current speech compression systems based on the speech LPC model degrades rapidly with the presence of additive noises. In this situation, it is desirable to enhance the noisy speech in the preprocessing stage [2]. An enhanced version of a speech signal is useful for speech recognition applications, mobile communication and coding etc. The Kalman filtering based speech enhancement has several advantages over other speech enhancement methods e.g. speech production model using Linear Predication (LP) inherited to Kalman filtering modeling [3]. Many speech enhancement implementations of today are either digital or analog. Digital solutions are often superior in time to market price per unit structured and powerful development tools, flexibility, high degree of reconfiguration, robustness, the ability to use a Digital Signal Processor (DSP) for many tasks and the possibility to handle high complexity algorithms [4]. This many advantages digital solutions might suffer from limitation in signal bandwidth, limited number of operations per second and quantization errors. The drawbacks of digital solutions could be minimized by using high speed DSPs and longer word length. However, such preventive measures are likely to increase the total power consumption as well as the total price per unit. High signal bandwidth, continuous time signal processing, no quantization of data, and lower power consumption as opposed to corresponding DSP based solutions. On the contrary, analog solutions might require expensive simulation and design software and suffer from long time to market. Moreover, since analog solutions tend to be static, reconfiguration of analog solutions constitutes a troublesome task. Many speech enhancement algorithms require so called Voice Activity Detectors for identification of speech activity. The speech activity detection in turn controls the activity of the speech enhancement algorithm. Speech enhancement algorithms are often applied in hand held battery powered applications e.g. microphone front-ends it is of highest importance to optimize the power consumption for battery life time. Speech enhancement algorithms should be flexible, versatile and adjustable to different scenarios. Furthermore, the algorithms should be adaptive, robust and of low complexity with a high level of speech enhancement quality and performance. AGE in modulation domain is mainly the ambiguity associated with the demodulation process of having unlimited number of possible modulator-carrier pairs. Moreover, proven ability of this method for efficiently demodulating a variety of carriers such as harmonic stochastic and time-varying ones further justifies its usage. II. DEMODULATION There are a number of approaches to solve the demodulation problem. A classic method for demodulation is Hilbert envelope detection. This process simply assumes the modulator is the magnitude of the analytic signal. This method certainly returns a valid decomposition from a purely mathematical [2]. A spectrogram is a type of demodulation because the magnitude coefficient of each channel of the filter bank gives a down sampled energy estimate over time. This method is familiar easy to implement and it allows for a great deal of versatility, by intelligently choosing the parameters for the spectrogram (i.e., narrowband versus wide-band) a wide range of decompositions are possible. However, this method is subject to the same time frequency tradeoffs that any spectrogram encounters where increasing resolution in one dimension decreases resolution in the other. A simple way to address the time-varying nature of the speech is to view it as a direct concatenation of these short time segments each segment 413

being individually represented by a linear AR model. Excitation sources are respectively periodical impulses for voiced speech and white noise for unvoiced speech. Alternatively, we can approximately use the white noise excitation sources for all speech sounds both voiced and unvoiced [1]. Kalman filtering method is undoubtedly more complicated computationally. Matrix-vector multiplications are needed at each iteration resulting in an O (p2) number of operations [3]. Interesting point is that for each segment error covariance and Kalman gain matrices reach a steady state value after a few steps. After that point, steady state gain value can be used for the rest of the segment. Thus, a large saving in computation can be achieved demodulation divides a signal into its modulator m (t) and carrier c (t). In this context, the original signal is the product of the two components. Following is a brief description on one of the methods used for coherent carrier detection which is also used in this work apart from convex optimization demodulation process. Spectral Center of Gravity Carrier Estimation: The demodulation framework works on sub-bands, the filter bank divides the speech signal into sub-bands demodulation process decomposes each sub-band into its carrier and modulator components. Sub-band Instantaneous Frequency: The first step in calculating the carrier is to detect the instantaneous frequency Wk (n) of each sub-band. S k (w, n) = g(p) p x k (n + p)e jwp (1) Where g(p) is a window function (hamming window of length 128 is used for this experiment). Center of Gravity (CoG) estimation of wk(n) is given by: w k (n) = π π π π w S k (w, n) 2 dw S k (w, n) 2 dw (2) The phase k (n) of the carrier is computed as follows n k (n) = w k (p) p=0 (3) The carrier c k (n) is c k (n) = e j k (n) And the complex valued modulator m k (t) is given by 414

m k (t) = x k (t)c k (t) (4) The modulator is typically defined as a lower frequency signal and the carrier is a higher frequency signal. Demodulation, originally just used in radio communications has become a more interesting problem because of a number of uses in speech analysis and processing. In addition to extracting a valid modulator and carrier from signal a demodulation algorithm should meet a few additional criteria, we believe that an acoustic demodulator should distinguish pitch from modulation consistently and based on a transparent and clearly understandable metric. it should act as an identity operator on modulators and it should satisfy the projection property. Distinguishing Pitch and Modulation: Several demodulation algorithms are unable to explicitly defined the characteristics that comprise a modulator or a carrier. The components are determined on a case-by-case basis instead of operating under a higher level definition of the modulator or carrier class. We argue that an effective demodulation algorithm should explicitly define the characteristics of a modulator and a carrier and then obey those characteristics. Generally, we define a modulator as a lower frequency signal and a carrier as a higher frequency signal. For the purposes of this paper, we will expand this definition to account for the perceptual experience. A human listener will interpret low-frequency modulation (below approximately 25 Hz) as amplitude variation, while higher frequency modulation is interpreted as multiple carrier frequencies. III. A. MODULATION DOMAIN AND AGE Each sub band specific gain function constitutes a quotient of a short term average and a noise floor level estimate. The noise floor level estimate should be set to track slow changes in the background noise and the short term average should track the bursts of speech. The proposed system used for the enhancement of noisy speech signal x (n). A K bands band-pass filter is used to divide the input speech signal x (n) into sub-bands according to: x k (n) = h k (n) x(n) (5) Where h k (n) impulse response of the k is sub band. Natural signals such as speech can be represented by the corresponding high frequency and low frequency components. The final enhanced signal is obtained by adding all the modified sub bands according to the synthesis equation: k x (n) = x k (n) (6) k=1 415

The observed noisy modulator for sub-band k is given by S k (n) and where (pp) is a short spectral estimation window. The center of gravity approach estimates the w k (n)as the average frequency of instantaneous spectrum of x k Center of Gravity (CoG) estimation w k (n) is given by: p m k (n) = a k m k (n j) + w k (n) j=1 (7) x k (n) = m k (n) + v k (n) H T = [0,0 1] At time instant n estimated sample is given by following relationship: m k (nn) = H T m k (nn) (8) B. Adaptive Gain Equalizer System The AGE consists of a filter bank and each sub-band is weighted by a gain function which amplifies the signal when speech is present and keeps the noisy part of the signal where no speech is present to unity x k (n) = h k (n) x(n) A filter bank of K band pass filters divides the input signal (nn) into K sub-bands [7]. Here hkk is the impulse response of the filter bank sub-band k and denotes the convolution. The output signal with the amplified speech signal is computed as k x (n) = G k (n)x k (n) (9) k=1 Where (nn) is the AGE weighting function which amplifies the signal when speech is active and is given by G k (n) = min {( A k (n) L opt B k (n) )p k, L k (10) Where L opt is the optimized suppression level for gain function and ppkk gain rise exponent constant, L k is a limiting threshold limiting gain function value, Fast average (nn) and slow average BB(nn) of sub-band kk calculated according to: A k (n) = a k A k (n 1) + (1 a k ) x k (n) 416

Where a k = IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 2 Issue 5, May 2015. 1 f s T a forgetting factor constant and f s is is sampling frequency. B k (n) = A k (n) if A k (n 1) B k (n 1) (11) (1 + B k )(B k (n 1)) Otherwise 1 m k (n) = m k (n)g k Where BB k = is a positive constant control the noise level based on the above mentioned f s T b principle of AGE a speech signal modulator can also be enhanced by the equalizer Modulation domain separates each sub-band signal into a carrier and a modulator. While only modulators are considered here, the AGE is implemented on each modulator to enhance the speech. This system mathematics for AGE in the modulation domain is the same as for AGE in the sub-band domain the long term average and the short term average are calculated for each sub-band modulator instead of the sub-band itself. The gain function is multiplied with the modulator of the sub-band to yield a modified modulator which is then used with the carrier in the reconstruction stage of the modulation system. COMPARATIVE PERFORMANCE ANALYSIS A. Mean Opinion Score(MOS) The Mean Opinion Score (MOS) calculated by observing the clean speech signal processed by a system to check how much it degrades the clean speech signal. Fig. 1 shows a speech signal processed by a system where SNR. The system with convex demodulation has MOS value around less degradation as compare to CoG modulation and AGE system where is average MOS observed respectively. Speech polluted by wind noise has been enhanced by using coherent modulation filtering as reported, although the modulation filtering has mostly been used for the purpose of speech enhancement. 417

Fig. 1 Mean Opinion Score B. Signal to Noise Ratio Improvement The Adaptive gain equalizer (AGE) is a time domain speech enhancement algorithm in which the speech signal is amplified based on signal-to-noise (SNR) estimates in sub-bands. A signal is divided into sub-bands for calculation of a gain which is independent for each band. The commonly used method for reducing noise is spectral subtraction but it has an inherent problem of generating musical noise due to spectral flooring. There have also been some efforts to reduce this musical noise but this improvement has the tendency of producing audible distortion causing listening discomfort even compared to the unprocessed signal. Fig. 2 shows the Signal to Noise Ratio Improvement (SNRI) for AGE, (CoG and Convex demodulation) speech signal distorted by having SNR. The convex demodulation has the highest SNRI for all the values and around 5dB and 8dB improvement over the AGE methods but system show improvement. 418

Fig.2 Signal to Noise Ratio Improvement C. Spectrogram Analysis The spectrogram of speech signal corrupted by noise at -10dB SNR, there is less residual noise in enhanced speech signal. Significant improvement can be observed noise corrupted speech signal. Fig. 3 shows spectrogram of original signal with processed signal with AGE. This improvement can be observed in term of speech formants being not affected as visible in spectrogram for noise. Fig.3 Spectrogram Conclusion: An alternative method of demodulation has been proposed for AGE in the modulation frequency domain. The presented method solves the demodulation process as a convex optimization problem, thereby avoiding the inherent problem of multiple solutions of a demodulation algorithm. We have tested the proposed method for various conditions and magnitudes of noise injected in a clean speech signal. The performance of our method has been validated by mean opinion score, spectral distortion and signal to noise ratio improvement in comparison to two other techniques. Results thus obtained show improvement in speech enhancement while AGE is used in modulator domain in comparison to its traditional use. The improvement in MOS and spectrogram has shown the system capability of the proposed for reducing noise from noisy laryngeal speech and SNR improvement has confirmed the system performance over the previous methods for speech. 419

References IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 2 Issue 5, May 2015. [1] S. F. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE trans. Accoust. Speech and Sig. Proc., vol. 27, no. 2, pp. 113 120, 1979. [2] Z. Goh, K.-C. Tan, and T. Tan, Postprocessing method for suppressing musical noise generated by spectral subtraction, Speech and Audio Processing, IEEE Transactions on, vol. 6, no. 3, pp. 287 292, may 1998. [3] Y. Ephraim and D. Malah, Speech enhancement using a minimum mean square error shorttime spectral amplitude estimator, Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 32, no. 6, pp. 1109 1121, dec 1984. [4] C. Plapous, C. Marro, and P. Scalart, Improved signal-to-noise ratio estimation for speech enhancement, Audio, Speech, and Language Processing, IEEE Transactions on, vol. 14, no. 6, pp. 2098 2108, nov. 2006. [5] N. Westerlund, M. Dahl, and I. Claesson, Speech enhancement for personal communication using an adaptive gain equalizer, Elsevier Signal Processing., vol. 85, pp. 1089 1101, 2005. [6] B. S allberg, N. Grbic, and I. Claesson, Implementation aspects of the adaptive gain equalizer, 2006. [7] M. Shahid, R. Ishaq, B. S allberg, N. Grbic, B. L ovstr om, and I. Claesson, Modulation domain adaptive gain equalizer for speech enhancement, in Signal and Image Processing Application 2011, by IASTED, 2011. [8] G. Sell and M. Slaney, Solving demodulation as an optimization problem, Audio, Speech, and Language Processing, IEEE Transactions on, vol. 18, no. 8, pp. 2051 2066, nov. 2010. [9] N. Westerlund, M. Dahl, and I. Claesson, Real-time implementation of an adaptive gain equalizer for speech enhancement purposes, WSEAS.,2003. [10] M. Dahl, I. Claesson, B. S allberg, and H. Akesson, A mixed analog -digital hybrid for speech enhancement purposes, ISCAS., 2005. [11] S. M. Schimmel, K. R. Fitz, and L. Atlas, Frequency reassignment for coherent modulation filtering, IEEE, Acoustics, Speech and Signal Processing, ICASSP, vol. 5, pp. 261 264, 2006. [12] K. Paliwal, K. W ojcicki, and B. Schwerin, Single-channel speech enhancement using spectral subtraction in the short-time modulation domain, Speech Commun., vol. 52, no. 5, pp. 450 475, May 2010. [13] M. H. Hayes, Statistical Digital Signal Processing and Modeling, 1st ed. New York, NY, USA: John Wiley & Sons, Inc., 1996. [14] M. Dahl, I. Claesson, B. Sallberg, and H. Akesson, A mixed analog -digital hybrid for speech enhancement purposes, ISCAS., 2005. 420