Multichannel Wiener Filtering for Speech Enhancement in Modulation Domain

Size: px

Start display at page:

Download "Multichannel Wiener Filtering for Speech Enhancement in Modulation Domain"

Audrey Mitchell
5 years ago
Views:

Multichannel Wiener Filtering for Speech Enhancement in Modulation Domain Muhammad Awais This thesis is presented as part of Degree of Master of Sciences in Electrical ering with emphasis on

1 Multichannel Wiener Filtering for Speech Enhancement in Modulation Domain Muhammad Awais This thesis is presented as part of Degree of Master of Sciences in Electrical ering with emphasis on Signal Processing Blekinge Institute of Technology School of ering Department of Electrical ering Blekinge Institute of Technology, Sweden Supervisor: Dr. Benny Sällberg Examiner: Dr. Benny Sällberg

2 Contact Information: Author: Muhammad Awais Supervisor and Examiner: Dr.Benny Sällberg Signal Processing Group (SPG) School of ering, BTH Blekinge Institute of Technology, Sweden

3 Abstract Normally speech signals are contaminated with noise and interference that reduces the intelligibility of speech during communication. In order to make speech signals effective and useful, they need to be enhanced from the noisy speech signal. In speech processing field many speech enhancement techniques are developed and are providing very good results. Multichannel microphone array is also one of the techniques used for speech enhancement, that provides better results than the single channel speech enhancement. Moreover, Wiener filtering is the most commonly used technique for multichannel microphone array for speech enhancement. The main focus of this thesis is to implement multichannel microphone array using Wiener filtering in the modulation domain system and also in the time domain system to enhance the speech. Both are implemented successfully and validated effectively by considering different parameters, i.e., Signal to Noise Ratio (SNR), Signal to Noise Ratio Improvement (SNRI) and Mean Opinion Score (MOS). Both the systems are tested with one female and one male speech distorted with different types of noise at -5dB,dB,5dB,dB,5dB and db SNR. The values are calculated for the different number of subbands, as the main focus is on modulation domain system. It provided efficient results in terms of SNRI and MOS score. While listening to modulation system, results are clearer than time domain. MOS has been shown between 3.9 and. for both systems.

4 iv

5 Acknowledgements I would like to thanks almighty God for blessing me with health, peace of mind and patience to carry out this thesis work. He gave me this ability and opportunity to understand, learn and gain the in-depth knowledge at the Blekinge Institute of Technology (BTH). I sincerely appreciate the way my supervisor Dr. Benny Sällberg helped me during whole thesis work. He guided me throughout my work in a very nice way. His deep knowledge in this field allowed me to learn and to carry out thesis work smoothly. His knowledge and guidance is of great value to me. I would not have been able to do this work without the support and encouragement from my parents and family.they helped me throughout my educational carrier and motivated me. They supported me both financially and morally. I would like to dedicate my till date education carrier to them. I would also like to pay my regards to my friends and faculty at BTH. I was able to learn a great knowledge and wisdom in a pleasant learning environment. Muhammad Awais, Sweden

7 Contents Abstract Acknowledgements iii v Introduction. Introduction Contents Modulation Domain 3. Introduction Non-Coherent Envelope Detection Hilbert Envelope Detection Magnitude Detection Limitations of Non-coherent Envelope Detection Motivation for Complex Valued Modulator Coherent Envelope Detection Smoothed Hilbert Carrier Estimator Instantaneous Frequency Carrier Estimator Frequency Reassignment Carrier Estimator Spectral Center of Gravity MultiChannel Speech Enhancement 3 3. Single Channel Speech Enhancement Microphone Array Speech Enhancement Multichannel Wiener Filtering in Time Domain Multichannel Wiener Filtering in Modulation Domain... 6 Evaluation 9. Consideration Results Results without Modulation Domain Results with Modulation Domain Spectrogram Analysis for Modulation Domain vii

8 viii 5 Conclusion 7 Bibliography 9

9 Chapter Introduction. Introduction Speech signals are low frequency modulating signals which modulate high frequency carriers. From several studies it is observed that the modulating signal of speech is very important for speech reception. Speech signal can be represented by x(t) = m(t)c(t) (.) where, m(t) is the modulator signal and c(t) is the carrier signal. New applications are emerging for speech acquisitions with the development in speech processing technologies and the vogue of telecommunications. The lust for improving interactivity between individuals, is motivated while providing flexibility, quality and ease of use. Increasingly between personal computers and communication devices, telephone and other interactive devices are powered by voice. To improve speech intelligibility in noise, speech processing techniques are proven very effective. Normally receiver is at remote distance from the speech transmitter, that causes the speech to be computed with environment noise, interference and reverberation from walls or ceilings []. Hence, speech enhancement techniques should provide speech dereverberation and efficient noise reduction. In speech processing field there are many techniques proposed for handling these issues. Echo cancelation is used widely in the last decades. In reverberant environment speech enhancement is used. Background noise reduction methods using one microphone [ ]and using multiple microphones are addressed [5 9]. Multiple microphone also called microphone array, which is derived from classical signal processing. For instance blind source separation leads to speech separation algorithms. Spectral subtraction, signal subspace, adaptive gain equalizer etc., are the other techniques. The spatial correlation of multiple received signals has enabled the combined temporal and spatial filtering algorithms known as beamforming techniques [5]. Multichannel array using Wiener filtering for speech

10 Chapter. Introduction enhancement has already been implemented in the time frequency domain. In this thesis, I have implemented the same but also implemented in the modulation system domain and compared the results. The modulation system is to use Wiener filter on modulators of every subband. The modulation domain is mainly research on modulator signal which is discussed in detail in chapter. Multichannel Microphone Array Wiener Filtering for Speech Enhancement is implemented in time frequency domain as well as in modulation domain discussed in detail in chapter 3.. Contents This thesis consists of five chapters.chapter describes about modulation domain. Chapter 3 is about single and multichannel algorithms, mainly multichannel Wiener filtering. Chapter evaluates multichannel techniques using Wiener filtering in both domains. Finally in chapter 5 thesis is concluded.

11 Chapter Modulation Domain. Introduction Normally natural signals are represented as low frequency modulators which modulate high frequency carriers. Generally signal can be represented by s(t) = m(t)c(t) (.) where, m(t) and c(t) are modulator and carrier respectively. From several studies it is observed that the modulator of speech signal is very important for speech reception. For intelligible speech it is necessary to preserve modulator of speech. The modulation domain system basically splits the signal into its modulator and carrier and then modulator is analyzed. A method commonly used to obtain the modulating signal of broadband signals is to divide the signal into narrowband frequency subbands, using filterbanks and then decomposes each subband into carrier and a modulating signal. To decompose each subband into modulator and carrier, coherent and non-coherent envelope detection methods are used. In this thesis coherent envelope detection estimating carrier using the Center of Gravity (COG) [] method is used to decompose signal into modulator and carrier. The generalized model of modulation domain system is presented in figure.. In the modulation domain system filterbank is the heart of the whole domain. Broadband signal, say x(t) passes through filterbank which is set of LTI bandpass filters h k (t), x k (t) = x(t) h k (t) for k =,..., K. The resulting subbands from filterbank decomposes to modulator and carrier by two types of envelope detection, coherent and non-coherent. Then achieved modulator signal from each subband is filtered by LTI filter g k (t), m k (t) = m k (t) g(t) (.) After this, the modulators recombined with the original subband carriers is x k (t) = m k (t)c k (t) (.3) 3

12 Chapter. Modulation Domain Figure.: Generalized Model of Modulation Domain. Then modulation filtered broadband signal is obtained or reconstructed referred, as filterbank reconstruction by filterbank summation method, x (t) = K x k (t) (.) k= In the modulation domain system envelope detection method is very critical part. Magnitude or magnitude like methods are used in non-coherent envelope detection where carrier estimation methods are used in coherent envelope detection.. Non-Coherent Envelope Detection The two methods used in non-coherent envelope detection are based on Hilbert envelope (for real-valued subband) and the magnitude operator (for complexvalued subband). Each type is discussed below.. Hilbert Envelope Detection As from generalized framework of modulation system, based on a filterbank with the set of real-valued LTI bandpass filters h k (t). For subband x k (t), Hilbert envelope detector is magnitude of the analytic subband x + k (t), i.e., m k (t) = D H {x k (t)} (.5) = x + k (t) (.6) = x k (t) + jh{x k (t)} (.7)

13 Chapter. Modulation Domain 5 To obtain carrier from subband using Hilbert transform, it is defined by, c k (t) = D c H{x k (t)} (.8) = cos{arg[x + k (t)]} (.9) = cos{arg[x k (t) + jh{x k (t)}]} (.) where, D H and D c H are the envelope and carrier detection operators respectively. H is the Hilbert transform symbol. arg gives the complex valued number argument []... Magnitude Detection The magnitude detector D for an analytic subband signal x + k (t) is defined as, m k (t) = D x + k (t) (.) = x + k (t) (.) whereas, the carrier detector D c is defined as, c k (t) = D c x + k (t) (.3) = e jarg[x+ k (t)] (.)..3 Limitations of Non-coherent Envelope Detection There are some limitations in non-coherent envelope detection which are stated below []; the magnitude and phase spectrums of the subbands exceed the bandwidth of the subband signal. it assumes a conjugate symmetric spectrum of the modulator which is unrealistic for natural signals. modulator domain is not closed under convolution..3 Motivation for Complex Valued Modulator How can we overcome the limitations of non-coherent envelope detector? For this we have to revisit the basics, as or we can rewrite this in polar form, x(t) = m(t)c(t) (.5) a x (t)e jφx(t) = [a m (t)e jφm(t) ][a c (t)e jφc(t) ] (.6)

14 6 Chapter. Modulation Domain this allows us to decompose multiplicative amplitude and an additive phase, a x (t) = a m (t)a c (t) (.7) φ x (t) = φ m (t) + φ c (t) (.8) Main objective of this is to detect a m (t) and φ m (t) given a x (t) and φ x (t), which is very difficult to decompose in amplitude and phase without any additional restrictions. In non-coherent detection technique, this ambiguity is resolved by assuming that φ m (t) = and a c (t) = for all t, such that a m (t) = a x (t) and φ c (t) = φ x (t). But these assumptions causes problems as discussed above. We can improve the behavior of envelope detector if we set that φ m (t) for all t, which says that modulator is complex valued. By this assumption limitations in non-coherent technique can be solved. From original signal, carrier phase φ c (t) can be estimated by low pass filtering an estimate of the signal instantaneous frequency []. α c (t) = IF {φ x (t)} h lp (t). (.9) which can be defined as the derivative of the signal phase, IF {φ x (t)} = dφ x(t). (.) dt by integrating the instantaneous frequency signal α c (t) over time, φ c (t) = t α c (τ)dτ. (.) carrier signal is given by c(t) = e jφc(t). (.) so modulator signal m(t) using coherent demodulation, m(t) = x(t) c(t) = x(t) e jφc(t) = x(t)e jφc(t) (.3) Hence, this type of detectors and carrier estimators are called as coherent envelope detectors and coherent carrier estimators.. Coherent Envelope Detection.. Smoothed Hilbert Carrier Estimator An analytic subband can be presented in polar form as, x + k (t) = α k(t)e jφ k(t). The phase signal of subband φ k (t) can be written as [], φ k (t) = φ k,m t + φ k, + θ k (t) (.)

15 Chapter. Modulation Domain 7 appropriate values of φ k,m and φ k, should be selected so that φ k (t) has zero mean. Where, φ k,m is the average frequency of the subband x k (t). Smoothed carrier is as follows, φ k (t) = Ω k,m t + Ω k, + [θ k (t) h lp (t)] (.5) from the smoothed subband phase signal φ k (t), carrier can be estimated as [], and the modulator as, c k = D c shx k (t) (.6) = e jφ k(t) (.7) m k = D sh x k (t) (.8) = x + k (t)e j φ k (t) (.9).. Instantaneous Frequency Carrier Estimator The Instantaneous Frequency (IF) [3] is the modification of the differential FM detector []. Signal x(t), decomposes its subband into real and imaginary parts. The estimator derives an unnormalized IF estimate, Z k (m) = Z i,k (m) + jz q,k (m), (.3) A phase-only IF estimate α k (m) is obtained, ( ) Zk (m) α k (m) = Z k (m) α k (m ) Z k (m) > ɛ Z k (m) ɛ where, Z k (m) is unnormalized IF estimate for very small ɛ. This α k (m) smoothed IF is obtained from condition W k ( ) =, W k () = α k () or general recursive equation W k (m) = W k (m )α k (m). (.3) From estimate of phase W k (m) of subband X k (m), carrier estimated will be C k (m) = D c IF X k (m) (.3) = W k (m) (.33) So modulator can be found by the demodulation of the subband signal M k (m) = D IF X k (m) (.3) = X k (m)c k(m). (.35)

16 8 Chapter. Modulation Domain..3 Frequency Reassignment Carrier Estimator The IF estimator depends on the finite central phase difference as, α k (m) = expj [φ k(m + ) φ k (m )] (.36) IF estimator is linear approximation to the true subband instantaneous frequency. The accuracy of this approximation decreases with the increase in decimation factor. This can be avoided by using the frequency reassignment operator from time frequency reassignment [5]. This method overcomes the problems associated with IF method. This method operated on continues time moving window transform, can be expressed as X(τ, Ω) = x(t + τ)w(t)e jωt dt (.37) where, w(t) is analysis window. Signal x(t) can be reconstructed from x(t) = X(τ, Ω) = A(τ, Ω)e jψ(τ,ω) (.38) A(τ, Ω)h(τ t)e j[ψ(τ,ω) Ωτ+Ωt] dωdτ (.39) for maximum contribution of the reconstruction integral Ω should satisfy the condition, [ψ(τ, Ω) Ωτ + Ωt] = (.) τ or equivalently, ψ(τ, Ω) ˆΩ(τ, Ω) = (.) τ the reassignment frequency can be computed by, ˆΩ(τ, Ω) = Ω + Im X Dh(τ, Ω)X (τ, Ω) X(τ, Ω) (.) where, X(τ, Ω) is the short time Fourier transform of the signal x(t) by analysis window h(t) and X Dh (τ, Ω) by h D (t), which is time derivative as h D (t) = dh(t). dt So coherent carrier and modulator using frequency reassignment will be and C Ω (τ) = D c F RX Ω (τ) (.3) = exp[j τ ˆΩ(ζ, Ω)dζ] (.) M Ω (τ) = D F R X Ω (τ) (.5) = X Ω (τ)c Ω(τ) (.6)

17 Chapter. Modulation Domain 9.. Spectral Center of Gravity The COG is the time varying IF, which means that the IF at time t as the COG of a windowed segment centered at t []. Time varying IF can be calculated as, w k (t) = wx k(w, t)dw X k(w, t)dw (.7) where X k (w, t) = f(t τ)x(t)e jwt dt (.8) Hence, we can see that time varying IF is basically short time Fourier transform. We can calculate the phase of the carrier by φ k (t) = t w k (i) (.9) i= The carrier c k will be c k (t) = e jφ k(t) After estimating carrier it is easy to compute complex modulator m k (t) by, (.5) m k (t) = x k (t)c k(t) (.5) The filterbank is the main part of modulation domain. It is used in this thesis for different number of subbands. For instance, figure. shows the frequency and impulse response of the filterbank used. It has 3 subbands and down sampling factor of 6. To observe one of the subband spectrogram, female speech is used which is shown in figure.3. Spectrogram of the subband of speech signal is shown in figure.. This implementation is done through [6] and also used in [7].

18 Chapter. Modulation Domain db magnitude Subband filter frequency responses Filterbank impulse response (analysis + synthesis) db magnitude Frequency (Hz) Figure.: Overall Response of Filter Bank.. Female speech (fs=6khz) x Figure.3: Female Speech Signal.

19 Chapter. Modulation Domain Spectral COG Modulation Spectrum 3 5 Carrier number Modulation frequency (Hz) Figure.: Modulation Spectrogram of Female Speech.

21 Chapter 3 MultiChannel Speech Enhancement Researchers proposed many algorithms for speech enhancements. They can be divided into two parts: Single channel speech enhancement and multichannel (or microphone array) speech enhancement. 3. Single Channel Speech Enhancement This technique of speech enhancement is a method which performs processing on data available in a single channel provided by one microphone, i.e., M= x k [n] = T k τ= s k [n τ] + v k [n] (3.) Mostly speech enhancement algorithms are single channel processing. In these algorithms only one microphone is used so as easily embedded in devices such as telephones, mobile phones, computers, etc. So comparatively these algorithms have less computational complexity. Algorithms proposed for single channel speech enhancement are [8] : Short time spectrum based algorithm (spectral subtraction, improved spectral subtraction, Wiener filtering, etc.). Statistical model based algorithms (maximum likelihood estimator, the minimum mean square error estimator, posteriori estimators). Hearing model based algorithm. Speech generation model based algorithm. Subspace algorithm. 3

22 Chapter 3. MultiChannel Speech Enhancement Wavelet algorithm. Single channel speech separation algorithms. 3. Microphone Array Speech Enhancement Microphone array techniques are favorable than single channel when it turns to the very low SNR. Methods used in multi microphone are spatiotemporal filtering or beamforming. The need for movable microphone or telephone is eliminated by microphone beamforming. Frequency domain representation of input- output relation for beamforming FIR filters w m [k] [n] is [9], y [k] [n] = M m= L k l= w [k] m [l] x [k] m [n l] (3.) where, x [k] m [n l] is the received microphone signal for a set of I spatial sources with an additive noise in frequency domain. w m [k] [n] is the frequency representation of w m [t]. Frequency domain beamforming representation is shown in figure below: Filter coefficients w k can be designed in many different ways for different algorithms. Algorithms used for microphone array are: Delay and Sum (DAS). Linear Constraint Minimum Variance (LCMV). Adaptive Noise Canceling (ANC). Post filtering. Generalized Sidelobe Canceling (GSC). Blind Source Separation (BSS). Subband processing. In microphone array noise is suppressed and speech is enhanced through beamforming. The simplest algorithm for beamforming is Delay and Sum (DAS) beamforming. But it has very low efficiency, that under the ideal conditions to get db enhancement, must use at least array of microphones. The Linear Constrained Minimum Variance (LCMV) algorithm, uses present signal and also the delayed samples for constructing the beamforming, which may give better enhanced speech than DAS algorithm. The Adaptive Noise Canceling (ANC) is used to cancel highly correlated noise. It is used for many kind of noises also with less complexity. but the problem with

23 Chapter 3. MultiChannel Speech Enhancement 5 this algorithm is if the speech signal leaked into its referential channel, the speech will also cancel and quality may degrade. Post filtering algorithm uses Wiener filter to suppress noise for the enhancement of speech by the DAS algorithm. Furthermore Wiener filter coefficients processed through multichannel noisy speech signal. The Generalized Sidelobe Canceling (GSC) is a form of LCMV algorithm. It is very important algorithm for speech enhancement using microphone array. Fixed beamformer used to suppress uncorrelated noise, its adaptive noise canceler with blocking matrix may cancel the correlated noise. So GSC may suppress both correlated and uncorrelated noises. This makes it more practical. The drawback with GSC is the blocking matrix, that can t completely block the speech signal which make partial cancelation of the speech in the enhanced speech. Improve GSC algorithm have been introduced to overcome this drawback. The Blind Sources Separation (BSS) works in the criteria that speech and noise are independent. It is very difficult to find separation matrix when elements are time variable in the mixed matrix in BSS. In Subband processing algorithms noisy speech signals are divided into group of subbands. So every subband falls in narrow frequency band. Hence, cancelation of noise is performed with shorter length of the filter. It reduces the complexity of algorithm and speech is enhanced. 3.. Multichannel Wiener Filtering in Time Domain Wiener filtering is most widely used technique for the enhancement of speech. In multichannel, as shown in figure 3. [9], broadband noisy speech signal is divided into multiple subbands through filterbank called analysis. Each subband then filtered from Wiener filter for enhancement. All the outputs from Wiener filter recombined through inverse filterbank are called synthesis. In order to formulate Wiener filter, it is required to have a desired signal d [k] [n]. We need to have autocorrelation matrix and cross correlation vector, which can be calculated as, R [k] x = E{x [k] [n]x [k]h [n]} (3.3) r [k] dx = E{d[k] [n]x [k] [n]} (3.) Practically, these are calculated according to the array model and sources. Wiener Hopf can be calculated as [], R x [k] w k = r [k] dx (3.5) w [k] = R x [k] r [k] dx (3.6) Error signal can be calculated from the difference between desired signal and the beamforming output signal as, e[n] = d [k] [n] w [k]h x [k] [n] (3.7)

24 6 Chapter 3. MultiChannel Speech Enhancement Figure 3.: Generalized Multichannel. 3.. Multichannel Wiener Filtering in Modulation Domain In this thesis main focus is on implementing Wiener filter in modulation domain for speech enhancement. Same methodology is implemented in modulation domain with a small difference. As discussed in previous section broadband signal is divided by a filter bank of K bandpass filters into K subbands in analysis part. x k (n) = h k x(n) (3.8) where, x(n) is the broadband noisy speech signal that is divided into K sub-bands x k (n) and h k is the impulse response of sub-band filter of filter bank. This can be shown in time domain as x(n) = K x k (n) = k= K d k (n) + v k (n) (3.9) k= where, d k (n) and v k (n) are the desired speech and noise signal of the every k th sub-band. Now from every subband COG is used to estimate the carrier and the modulator signal. Only modulators are considered here for reducing noise by Wiener filtering. So Wiener filtering is applied to every modulator to enhance

25 Chapter 3. MultiChannel Speech Enhancement 7 the speech and then multiplied with the same carrier estimated as, m k (n) = w k x k (n) (3.) Y k (n) = m k (n) c k (n) (3.) where, m k (n) is the modulator signal resulted from the k th sub-band Wiener filter w k, which is then multiplied with the carrier c k (n) to get Y k (n) for reconstruction. Improved speech signal output y(n) is then obtained from the inverse filter-bank called as synthesis part. The modulation system for multi-channel looks like figure 3., Figure 3.: Generalized Multichannel in Modulation Domain. Hence, it can be observed from the figure 3. and figure 3., that the only difference between two domain is to split every subband into modulator and carrier which is not the case in simple existing multichannel technique.

27 Chapter Evaluation. Consideration In this thesis, the model implemented for modulation domain is shown in figure.. It is considered to have microphone array of three microphones placed in a room. The room dimension and parameters are taken by using image method []. I, here used vertical array of three microphones placed with a small difference to overcome aliasing. In figure. d(t) is the clean speech signal and v(t) is the noise signal. d,d,d3 are the speeches on microphones and v,v,v3 are the noise on three microphones respectively. x (t) is the sum of speech and noise scaled with the required Signal to Noise Ratio (SNR) using d (t)+ SNR v (t) for microphone and so on. M d,c d,m x,c x,m v,c v are the modulator and carrier signals of d (t),x (t) and v (t) respectively for microphone and so on. Wiener filter is applied to the modulators and then after remodulation output signals y d,y x, y v generated through inverse filterbank. Same scenario is used for implementing multichannel Wiener filtering without demodulation filterbank. This technique is already implemented previously by many researchers. It is implemented in this thesis for the comparison of performance of method in both domains. The speech signal used in this thesis is one female and one male speech at sampling frequency of 6KHz. Both speeches are tested using three kind of noises, and scaled to have -5dB,dB,5dB,dB,5dB and db SNR. The results are taken by varying the number of subbands. Number of subbands used are,,8,6 and 3. The performance is measured through the Signal to Noise Ratio Improvement(SNRI), Perceptual Evaluation of Speech Quality(PESQ) and spectrograms are shown to evaluate the results. 9

28 Chapter. Evaluation. Results Figure.: Modulation Domain Model... Results without Modulation Domain It is noticed that SNRI value is around 7dB for all the cases used for both the female and male speeches without modulation domain. It is giving clear sound while listening but with a very small background noise. MOS value calculated through PESQ for this is estimated to. for all cases. This MOS value is considered as good for speech. Results obtained in thesis by considering same scenario for both the domains. In section. obtained plots of SNRI and MOS for both the female and male speech are shown. For illustration results achieved only for SNR -5dB,dB and 5dB. Figure. to figure. shows SNRI plots for female speech while figure.5 to figure.7 are the SNRI plots for male speech. As discussed MOS value is calculated also. MOS value plots for female speech are shown from figure.8 to figure. and MOS value results are presented in figure. to figure.3.

29 Chapter. Evaluation 8 Signal to Noise Ratio Improvement 6 SNR Improvement [db] Figure.: Signal to Noise Ratio Improvement Using Female Speech Signal with Noise at -5dB SNR. 8 Signal to Noise Ratio Improvement 6 SNR Improvement [db] Figure.3: Signal to Noise Ratio Improvement Using Female Speech Signal with Noise at db SNR.

30 Chapter. Evaluation 8 Signal to Noise Ratio Improvement 6 SNR Improvement [db] Figure.: Signal to Noise Ratio Improvement Using Female Speech Signal with Noise at 5dB SNR. 8 Signal to Noise Ratio Improvement 6 SNR Improvement [db] Figure.5: Signal to Noise Ratio Improvement Using male Speech Signal with Noise at -5dB SNR.

31 Chapter. Evaluation 3 8 Signal to Noise Ratio Improvement 6 SNR Improvement [db] Figure.6: Signal to Noise Ratio Improvement Using male Speech Signal with Noise at db SNR. 8 Signal to Noise Ratio Improvement 6 SNR Improvement [db] Figure.7: Signal to Noise Ratio Improvement Using male Speech Signal with Noise at 5dB SNR.

32 Chapter. Evaluation.5 MOS Score through PESQ 3.5 MOS Score Figure.8: MOS Score by PESQ of female Speech for -5dB SNR..5 MOS Score through PESQ 3.5 MOS Score Figure.9: MOS Score by PESQ of female Speech for db SNR.

33 Chapter. Evaluation 5.5 MOS Score through PESQ 3.5 MOS Score Figure.: MOS Score by PESQ of female Speech for 5dB SNR..5 MOS Score through PESQ 3.5 MOS Score Figure.: MOS Score by PESQ of male Speech for -5dB SNR.

34 6 Chapter. Evaluation.5 MOS Score through PESQ 3.5 MOS Score Figure.: MOS Score by PESQ of male Speech for db SNR..5 MOS Score through PESQ 3.5 MOS Score Figure.3: MOS Score by PESQ of male Speech for 5dB SNR.

35 Chapter. Evaluation 7.. Results with Modulation Domain The main emphasis of this thesis is on implementation of multichannel Wiener filtering in modulation domain. For modulation domain, if we consider SNRI first it is observed that the SNRI is almost the same while using different number of subbands. SNRI for male speech is around 5-8dB for almost all engine, factory and noises used, is shown in fig. to fig.5. It is calculated for -5dB,dB, 5dB, db, 5dB and db SNR. It is very small variation in values between different noises. SNRI for female speech is shown in figure. to figure.9 for SNR -5dB, db, db, 5dB and db respectively. It can be seen form figures that SNRI varies for female speech is 6-9dB for all engine, factory and noises. SNRI for both male and female speeches are shown in figures for different number of subbands. PESQ is taken by comparison of d(t) and y s (t) and gives the results how can be the enhanced speech is graded. The Mean Opinion Score (MOS) as calculated by PESQ is around to. for most of tests for both speeches with different number of subbands. This score of MOS is considered to be a very good score for speech signals. For consideration only, MOS value plots are only shown at -5dB, db, 5dB, db and 5dB SNR for both male and female speeches. It is shown from figure.6 to figure.35 that MOS score varies for different number of subbands. Signal to Noise Ratio Improvement 8 6 SNR Improvement [db] Figure.: Signal to Noise Ratio Improvement Using Female Speech Signal with Noise at -5dB SNR.

36 8 Chapter. Evaluation Signal to Noise Ratio Improvement 8 6 SNR Improvement [db] Figure.5: Signal to Noise Ratio Improvement Using Female Speech Signal with Noise at db SNR. Signal to Noise Ratio Improvement 8 6 SNR Improvement [db] Figure.6: Signal to Noise Ratio Improvement Using Female Speech Signal with Noise at 5dB SNR.

37 Chapter. Evaluation 9 Signal to Noise Ratio Improvement 8 6 SNR Improvement [db] Figure.7: Signal to Noise Ratio Improvement Using Female Speech Signal with Noise at db SNR. Signal to Noise Ratio Improvement 8 6 SNR Improvement [db] Figure.8: Signal to Noise Ratio Improvement Using Female Speech Signal with Noise at 5dB SNR.

38 3 Chapter. Evaluation Signal to Noise Ratio Improvement 8 6 SNR Improvement [db] Figure.9: Signal to Noise Ratio Improvement Using Female Speech Signal with Noise at db SNR. Signal to Noise Ratio Improvement 8 6 SNR Improvement [db] Figure.: Signal to Noise Ratio Improvement Using Male Speech Signal with Noise at -5dB SNR.

39 Chapter. Evaluation 3 Signal to Noise Ratio Improvement 8 6 SNR Improvement [db] Figure.: Signal to Noise Ratio Improvement Using Male Speech Signal with Noise at db SNR. Signal to Noise Ratio Improvement 8 6 SNR Improvement [db] Figure.: Signal to Noise Ratio Improvement Using Male Speech Signal with Noise at 5dB SNR.

40 3 Chapter. Evaluation Signal to Noise Ratio Improvement 8 6 SNR Improvement [db] Figure.3: Signal to Noise Ratio Improvement Using Male Speech Signal with Noise at db SNR. Signal to Noise Ratio Improvement 8 6 SNR Improvement [db] Figure.: Signal to Noise Ratio Improvement Using Male Speech Signal with Noise at 5dB SNR.

41 Chapter. Evaluation 33 Signal to Noise Ratio Improvement 8 6 SNR Improvement [db] Figure.5: Signal to Noise Ratio Improvement Using Male Speech Signal with Noise at db SNR..5 MOS Score through PESQ MOS Score Figure.6: MOS Score by PESQ of Female Speech for -5dB SNR.

42 3 Chapter. Evaluation.5 MOS Score through PESQ 3.5 MOS Score Figure.7: MOS Score by PESQ of Female Speech for db SNR..5 MOS Score through PESQ 3.5 MOS Score Figure.8: MOS Score by PESQ of Female Speech for 5dB SNR.

43 Chapter. Evaluation 35.5 MOS Score through PESQ 3.5 MOS Score Figure.9: MOS Score by PESQ of Female Speech for db SNR..5 MOS Score through PESQ 3.5 MOS Score Figure.3: MOS Score by PESQ of Female Speech for 5dB SNR.

44 36 Chapter. Evaluation.5 MOS Score through PESQ 3.5 MOS Score Figure.3: MOS Score by PESQ of Male Speech for -5dB SNR..5 MOS Score through PESQ 3.5 MOS Score Figure.3: MOS Score by PESQ of Male Speech for db SNR.

45 Chapter. Evaluation 37.5 MOS Score through PESQ 3.5 MOS Score Figure.33: MOS Score by PESQ of Male Speech for 5dB SNR..5 MOS Score through PESQ 3.5 MOS Score Figure.3: MOS Score by PESQ of Male Speech for db SNR.

46 38 Chapter. Evaluation.5 MOS Score through PESQ 3.5 MOS Score Figure.35: MOS Score by PESQ of Male Speech for 5dB SNR.

47 Chapter. Evaluation 39.3 Spectrogram Analysis for Modulation Domain In this section spectrum analysis is represented. Effect of speech enhancement is easy to observe in spectrograms which are presented in this section. The spectrogram of clean female speech signal is represented in figure.36. The spectrogram analysis is shown only by taking 3 number of subband with 5dB and -5dB SNR. Figure.37 and figure.38 shows the enhanced female speech for -5dB and 5dB SNR. It can be observed from enhanced spectrograms that noise is removed from the speech while formants remained. Though noise spread around speech but after enhancement it can be listened clearly with very small lost in energy of speech. From figure.39 to figure. represents the same female speech for -5dB and 5dB SNR but mixed with the factory noise. It can be seen from the enhanced speech figures that formants of the speech are very clear than that for noise. Figure. to figure. shows the female speech mixed with engine noise for -5dB and 5dB SNR. Enhanced speech spectrogram shows clear formants for -5dB SNR and 5dB SNR. It offers very good improvement in speech for engine noise. Also same noises behavior is tested for male speech. From figure.3 to figure. represents enhanced male speech for noise. It shows little amount of information is lost same as for female speech. For factory noise mixture it gives better result than noise. It can be seen from figure.5 to figure.6 that formants in enhanced speech spectrograms are very much clear for factory noise. The male speech test for engine noise is shown from figure.7 to figure.8. From enhanced speech spectrograms one can find formants clear and with no loss of speech energy. Hence, all the tested spectrograms suggests that multichannel Wiener filtering is providing effective results in modulation domain also. It has valuable implementation in this domain.

Chapter. Evaluation Orginal Speech Signal.9.8.7 Frequency.6.5..3.

7 Enhanced Speech Through Modulation Domain Frequency.6.5.

48 Chapter. Evaluation Orginal Speech Signal Frequency Time x Figure.36: Clean Female Speech Enhanced Speech Through Modulation Domain Frequency Time x Figure.37: Processed Female Speech for -5dB SNR with Noise.

Chapter. Evaluation Enhanced Speech Through Modulation Domain.9.8.

49 Chapter. Evaluation Enhanced Speech Through Modulation Domain Frequency Time x Figure.38: Processed Female Speech for 5dB SNR with Noise. Enhanced Speech Through Modulation Domain Frequency Time x Figure.39: Processed Female Speech for -5dB SNR with Noise.

Chapter. Evaluation Enhanced Speech Through Modulation Domain.9.8.7 Frequency.6.5..3.. 6 8 Time Figure.

50 Chapter. Evaluation Enhanced Speech Through Modulation Domain Frequency Time Figure.: Processed Female Speech for 5dB SNR with Noise. Enhanced Speech Through Modulation Domain Frequency Time x Figure.: Processed Female Speech for -5dB SNR with Noise.

Chapter. Evaluation 3 Enhanced Speech Through Modulation Domain.9.8.7 Frequency.

51 Chapter. Evaluation 3 Enhanced Speech Through Modulation Domain Frequency Time x Figure.: Processed Female Speech for 5dB SNR with Noise.

52 Chapter. Evaluation Enhanced Speech Through Modulation Domain Frequency Time Figure.3: Processed Male Speech for -5dB SNR with Noise. Enhanced Speech Through Modulation Domain Frequency Time Figure.: Processed Male Speech for 5dB SNR with Noise.

53 Chapter. Evaluation 5 Enhanced Speech Through Modulation Domain Frequency Time Figure.5: Processed Male Speech for -5dB SNR with Noise. Enhanced Speech Through Modulation Domain Frequency Time Figure.6: Processed Male Speech for 5dB SNR with Noise.

6 Chapter. Evaluation Enhanced Speech Through Modulation Domain.9.8.7 Frequency.6.5..3.. 6 8 Time Figure.

54 6 Chapter. Evaluation Enhanced Speech Through Modulation Domain Frequency Time Figure.7: Processed Male Speech for -5dB SNR with Noise. Enhanced Speech Through Modulation Domain Frequency Time Figure.8: Processed Male Speech for 5dB SNR with Noise.

55 Chapter 5 Conclusion Multichannel microphone using Wiener filtering is implemented both in the time and modulation domain for the purpose of enhancement in speech. Multichannel already has advantages over single channel speech enhancement algorithms. In this thesis it is compared with the modulation domain. Complete analysis is given in the thesis about implementation in each domain. Three different noises used for the testing of this method. After observing the results it is concluded that multichannel implemented in modulation domain gives better result while listening enhanced speech. While from MOS and SNRI values, it gives us the observation that results are almost same for both the domains. So this method is successfully implemented and validated to achieve the results. Plots and spectrograms shown in thesis gives the better view of results. The problem observed in the results is that a very little background noise is present which requires to be improved. Though this noise is not effecting much the intelligibility of speech but can be improved. This can lead to future work of this thesis. It can also be compared with other enhancement schemes implemented in modulation domain for future work. 7

57 Bibliography [] J. R. Deller, J. G. Proakis, and J. H. L. Dudgeon, Discrete-Time Processing of Speech Signals. Macmillan, 993. [] J. Yang, Frequency domain noise suppression approaches in mobile telephone systems, IEEE International Conference on Acoustics, Speech and Signal Processing, vol.. [3] P. Vary, Noise suppression by spectral magnitude estimation mechanism and theoretical limits, Elsevier Signal Processing, vol. 8. [] S. Boll, Suppression of acoustical noise in speech using spectral subtraction, IEEE Transactions on Accoustics, Speech and Signal Processing, vol. 7. [5] D. Johnsonn and D. Dugeon, Array Signal Processing - Concepts and Techniques. Pentice Hall, 993. [6] Y. Kaneda and J. Ohga, Adaptive microphone array system for noise reduction, IEEE Transaction Conference on Acoustics, Speech and Signal Processing, vol. 3, no. 6, pp. 39, 986. [7] Y. Grenier and M.Xu, An adaptive array for speech input in cars, Proceedings of International Symposium of Automative Technology and Automation. [8] N. Grbic, Speech signal extraction- a multichannel approach. [9] N. Grbic, M. Dahl, and I. Claesson, Neural network based adaptive microphone array system for speech enhancement, IEEE World Congress on Computational Intelligence, no. ISBN X, May 998. [] P. Clark and L. E. Atlas, Time-frequency coherent modulation filtering of nonstationary signals, IEEE Transaction on Signal Processing, vol. 57, no., Nov. 9. [] S. M. Schimmel, Theory of modulation frequency analysis and modulation filtering, with applications to hearing devices, Ph.D. dissertation, 7. 9

58 5 Bibliography [] Q. Li and L. Atlas, Coherent modulation filtering for speech, IEEE,Acoustics, Speech and Signal Processing, ICASSP, pp. 8 8, 8. [3] L. Atlas and C. Janssen, Coherent modulation spectral filtering for single channel music source seperation, IEEE-International Conference ICASSP. [] J. Glas, A differential rm detector for low-if radios, IEEE-Vechicular Technology Conference VTC. [5] S. Schimmel, K. Fitz, and L. Atlas, Frequency reassignment for coherent modulation filtering, IEEE,Acoustics, Speech and Signal Processing, ICASSP, vol. 5, pp. V V, 6. [6] L. Atlas, P. Clark, and S. Schimmel, Modulation toolbox for matlab, [7] R. Ishaq, Adaptive gain equalizer and modulation frequency domain for noise reduction, Master s thesis,. [8] Z. Yermeche, Subband beamforming for speech enhancement in hands-free communication, Ph.D. dissertation,. [9] B. Sallberg, Applied methods for blind speech enhancement, Ph.D. dissertation, Sweden, 8. [] S. Haykin, Adaptive Filter Theory. John Wiley and Sons,. [] J. B. and D. A. Berkley, Image method for efficiently simulating small-room acoustics, Acoustics Research Department, Bell Laboratories Murray Hill, New Jersey 797, vol. 65, April 979.

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement Ravindra d. Dhage, Prof. Pravinkumar R.Badadapure Abstract M.E Scholar, Professor. This paper presents a speech enhancement method for personal