Modulation Domain Improved Adaptive Gain Equalizer for Single Channel Speech Enhancement

Master Thesis Electrical Engineering Modulation Domain Improved Adaptive Gain Equalizer for Single Channel Speech Enhancement ADITHYA VALLI NETTEM SHAKIRA SHAHEEN This thesis is presented as part of Degree of Master of Science in Electrical Engineering with emphasis on Signal Processing Blekinge Institute of Technology October 2013 Blekinge Institute of Technology School of Engineering Department of Signal Processing Supervisor: Mr. Muhammad Shahid Co-Supervisor: Mr. Ishaq Rizwan Examiner: Dr. Benny Lövström I

ABSTRACT Human speech is the main method for personal communication. During communication speech may get impaired with ubiquitous noise. Enduring interfering noise decreases speech intelligibility and makes speech communication troublesome. To improve the speech quality and intelligibility, speech enhancement is one of the emerging and most used branches of signal processing. For the reduction of noise from speech signals, methods are continuously developed, one among those is the Improved Adaptive Gain Equalizer (IAGE) which is a single-channel speech enhancement method that has particular focus on enhancement of speech instead suppression of noise. Improved Adaptive Gain Equalizer is an enhanced version of the Adaptive Gain Equalizer (AGE). The noise reduction algorithm used in IAGE helps the speech to be amplified according to Signal to Noise Ratio (SNR) estimate in sub bands. Modulation decomposition of the speech signals brought the idea of a modulation system which is useful for modeling of speech and other signals. The purpose of this thesis is to implement the IAGE within modulation system (IMAGE) for speech enhancement. This thesis report present details of the implementation of IAGE in modulation domain. The successful implementation of the system has been validated with different performance measurements, i.e., Spectral Distortion (SD), Signal to Noise Ratio Improvement (SNRI), Mean Optimum Score (MOS) and Spectrogram Analysis. The system was analyzed with male and female speech with noise signals Engine noise (EN), Factory noise (FN) and Gaussian Noise (GN) at 0 db, 5 db, 10 db, -5 db and -10 db Signal to Noise Ratio (SNR). MOS has found to be between 4 and 3 for all the test cases. In comparison to IAGE, IMAGE has produced optimal results. Keywords: Speech Enhancement, Adaptive Gain Equalizer, Improved Adaptive Gain Equalizer III

ACKNOWLEDGEMENT Firstly, we would like to express our sincere gratitude to our thesis supervisor Mr. Muhammad Shahid and co-supervisor Mr. Rizwan Ishaq for providing us an interesting topic to work within the field of speech processing and accepting us for supervision. We thank them for their persistent help, guidance and support throughout the thesis work irrespective of their busy schedule. Their deep knowledge in this field helped us to learn new things and complete our masters successfully. Secondly, we would like to thank BTH for providing us with a good educational environment to gain required knowledge and learn about new technologies that helped us to move forward with the thesis work. Finally, we would like to extend our immense gratitude to our parents for their support throughout our educational career. They have motivated and helped us to complete our thesis work successfully. We would also like to thank our friends for their support during the thesis work. Adithya Shakira Shaheen V

TABLE OF CONTENTS Modulation Domain Improved Adaptive Gain Equalizer for Single Channel Speech Enhancement I ABSTRACT... III ACKNOWLEDGEMENT... V TABLE OF CONTENTS... VI Chapter 1-INTRODUCTION... 1 1.1 Introduction... 1 1.2 Literature Survey... 2 1.3 Thesis Outline... 3 Chapter 2-MODULATION SYSTEM... 4 2.1 Filter Banks... 4 2.2 Modulation System... 4 2.3 Incoherent Envelop Detection... 5 2.3.1 Hilbert Envelope... 5 2.3.2 Magnitude Detection... 6 2.4 Coherent Carrier Estimation... 6 2.4.1 Instantaneous Frequency Carrier Estimator... 6 2.4.2 Frequency Reassignment Carrier Estimation... 7 2.4.3 Smooth Hilbert Carrier Estimator... 8 2.4.4 Spectral Center of Gravity Carrier Estimation... 8 Chapter 3-IMPROVED ADAPTIVE GAIN EQUALIZER... 9 3.1 Improved Adaptive Gain Equalizer (AGE) in Time Domain... 9 3.2 Improved Adaptive Gain Equalizer in Modulation Domain... 11 Chapter 4-EVALUATION... 13 4.1 Evaluation... 13 4.2 Results and comparison... 14 4.2.1 Perceptual Evaluation of Speech Quality (PESQ)... 14 4.2.2 Spectral Distortion (SD)... 17 4.2.3 Signal to Noise Ratio Improvement (SNRI)... 19 4.3 Spectrogram Analysis... 21 Chapter 5-CONCLUSION... 35 5.1 Conclusion... 35 5.2 Future Work... 35 References... 36 VI

Chapter 1-INTRODUCTION This chapter gives brief introduction to speech, various applications of speech processing, problems faced during speech communication and the techniques used to overcome the problems. Literature survey explains what has been done in the related area of our research work and point outs what are we adding more. 1.1 Introduction Speech is considered to be the best choice for humans to communicate. Speech processing is a technique where speech is processed for improvement of parameters like intelligibility and clearness. Main branches of Speech Processing include Speech Coding, Speech Recognition, Speaker Verification or Identification, Speech Synthesis, Speech Enhancement. Speech coding is used for data compression/reluctance in audio signals containing speech. Speech recognition is recognizing the speech and converting it into text format. Speaker verification or identification is used for verifying the correctness of the utterance pronunciation. The purpose of speech synthesis is to convert a text string into a speech waveform. Speech enhancement is used for improving the performance of speech by reducing the noise and improving the perceptual aspects of speech such as quality and intelligibility. Speech signals are immersed in acoustic ambient noise where noise is inevitable and ubiquitous and speech can seldom be recorded in pure form. Such noise, when added to the speech degrades the performance of digital speech processors and also forces the user to strain for hearing the speech which makes difficult to communicate. The human speech is inherently sensitive to interfering noise. However, speech from the uncontrolled environment may contain noise components which degrade the speech signals. The degradation components are generally the noise such as, background noise, speech from other speakers etc. Speech signals are also distorted by different types of noise, e.g. additive noise, gaussian noise, periodic noise and other interferences. In some cases, corrupted speech will be dangerous when it is sent over a communication system which gives poor performance in automatic speech processing, tasks like speech recognition and speaker identification [7, 13]. This thesis deals with speech enhancement, which refers to the restoration of the clean speech. The speech enhancement implementation should preferably be robust to the right environment in which they are intended. Moreover, versatility and flexibility are key features for speech enhancement devices e.g. the ability to adapt to changing environment and to fit into a variety of applications. Improved speech quality reduces listener s fatigue. To reduce noise and enhance speech there are variety of methods, e.g., spectral substraction [1], wiener filtering [3]. Adaptive Gain Equalizer (AGE) is a noise reduction method that focuses on enhancing the speech instead of suppressing the noise. The speech enhancement is processed by weighting the sub bands in time-frequency domain according to an estimate of the Signal-to Noise Ratio (SNR). The method offers its advantages by 1

having low complexity, low delay, low distortion and no need of Voice Activity Detector (VAD). However, the drawback lies in situations with intense continuous speech [7, 13]. Improved Adaptive Gain Equalizer (IAGE) is mended version of AGE where the drawbacks of AGE are averted and the speech is further improved with less distortion and provides more noise damping in speech pauses [13, 14]. The detail of IAGE is discussed in chapter 3. The modulation system assumes that a signal is composed of a modulator and a carrier. The signal is represented by, x(t) = m(t)c(t) where m(t) denotes the modulator of the signal, and it normally represents the low frequency part of the signal. The modulator modulates a high frequency carrier c(t). The speech signal can be represented by x(t). Studies have shown that the modulators of the speech signals are most important for the intelligibility of the speech signal. Thus, modulation frequency signal processing is the result of research on modulators. The detailed description of modulation system and filtering is discussed in the next chapter. IAGE is originally implemented in Time-Frequency domain. This thesis focuses to implement IAGE within modulation frequency domain. Modulation systems which are based on subband modulators perfectly fit the IAGE system which works on the subbands of the signal. Details of the analysis of this system comprises chapter 3. 1.2 Literature Survey Research on noise reduction and speech enhancement had tracked 40 years ago. Many algorithms were proposed for speech enhancement which was categorized into three fundamental techniques such as filtering technique, spectral restoration and speech model based. S.F.Boll used the concept of spectral substraction for suppression of noise in speech. The objectives are to develop the noise suppression, implement computational efficient algorithm and test it in actual environment. Spectral substraction offers a computationally efficient, processor independent approach to effective digital speech analysis [1]. Numerous techniques were developed for noise reduction, among which Wiener filtering is considered as fundamental approach which has been adopted in many applications although it may cause some detrimental effects to the speech signal [2]. Eric J Diethorn used sub band noise reduction method for speech enhancement, for many applications 12 to 18 db of noise reduction is achieved in real world settings [3]. Nils Westerlund.N et al. introduced the concept of AGE for speech enhancement [4] which has proven to be advantageous as it offers low complexity and low delay also when implemented in real time [18]. Reference [19] explains the implementation aspects of AGE in three different domains with advantages and disadvantages. The AGE was implemented in analog domain [6] which enlightens the disadvantages of implementation in digital domain and also Henrik Akesson et al. implemented hybrid domain AGE for speech enhancement purposes which overcomes the problem in analog and digital domain [5]. The authors in [7] used AGE with frequency dependent parameters focusing on speech enhancement rather than reducing noise. Steven M Schimmel and Les E.Atlas described novel coherent modulation filtering technique for single channel target talker enhancement in the presence of interfering talkers and proved that the coherent modulation increases speech intelligibility [8]. Qin li and Les Atlas proposed a new coherent modulation method which is capable of estimating carriers and modulators accurately in modulation signals based on conditional mean frequency [9]. 2

Coherent modulation technique uses frequency reassignment for demodulation of signal into modulator and carrier [10]. Charles Pascal Clark used coherent modulation filtering to remove or amplify specific articulation rates in speech and is pertinent to speech perception studies and for interpolation of long gaps in acoustic signals [11]. Steven Marco Schimmel developed a frame work with its core components as filter bank that separates broad band signals into narrow band sub bands and a carrier estimator or envelop detector that decomposes each sub band into a carrier and modulator and described coherent and noncoherent envelop detectors [12]. Reference [13] describes AGE used for the personal communication speech enhancement. The authors in reference [14] demonstrated the drawbacks and proposed Improved AGE for improved noise suppression when speech is absent and also with less speech distortion. The authors in reference [20] successfully implemented AGE in modulation domain and tested it with different types of noises for enhancing the speech with reduction of noise. This thesis combines the Improved AGE and modulation frequency domain for noise reduction of a speech signal. 1.2 Thesis Outline This report is organized into five chapters. Chapter 2 briefly introduces the modulation system, coherent and incoherent envelope detection. Chapter 3 briefly introduces the concept of AGE, its disadvantages and then how IAGE overcomes this disadvantages and implementation of IAGE in time domain and then modulation domain. Chapter 4 shows the evaluation, results and the spectrogram analysis. Finally the report is concluded with insight to future work in the last chapter. 3

Chapter 2-MODULATION SYSTEM The theory about modulation system, modulation filtering and the most important part of modulation system i.e., envelope detection, types of envelope detection along with the mathematical equations are explained in this chapter. 2.1 Filter Banks Modern speech processing methods are usually implemented in time frequency domain, means the signal is not only represented as function of time but also function of frequency. The time frequency representation of signals can be achieved by filtering the input signal through bank of band pass filters. Thus, filter bank is a technique that transforms a signal from time to time-frequency domain. The transformed signals from the filter banks are known as sub band signals. This method improves the efficiency as the processing load can be implemented in parallel for all sub bands [16]. The part of filter bank which transforms time signal to time frequency signal representation is referred to as analysis filter bank and the part of filter bank where inverse transformation or reconstruction takes place is known as synthesis filter bank. The following figure shows the analysis-synthesis filter bank where x[n] is the input time signal and [x0[n]...x(k-1)[n]] are time frequency signals or sub band signals which are processed through sub band processors [g0...g(k-1)] to yield [y0[n]...y(k-1)[n]] which are to be reconstructed by synthesis filter bank to yield original time signal y[n] [16]. Figure2.1 Analysis and Synthesis Filter Bank 2.2 Modulation System Modulation filtering is the process of modifying analytic subband x(t) by filtering its modulator and recombining the result with its carrier [17]. Speech signals are categorized into low frequency signals and high frequency signals where the low frequency signals are modulators and carriers are high frequency signals. Modulators are the low frequency components of signal which modulate high frequency carriers. Speech signals and music 4

signals can be represented as modulators and carriers. The speech signal can be represented as x(t) = m(t)c(t) (1) where m(t) and c(t) represent the modulator and carrier of the signal. Studies have shown that when modulators are preserved by altering the carrier, speech is ineligible and hence the modulators are the important components for speech intelligibility. Modulation domain actually decomposes the speech or the natural signals into modulator and carriers and further the modulators are analyzed [15]. Modulation is a process where the properties of a high frequency signal is varied accordingly with low frequency signal which contains information. The modulation domain process is described in Fig 2.2. Filter banks are used to get the subband signals as discussed in previous section and then the modulation domain actually decomposes the sub band signal into modulators and carriers. For decomposition of signal into modulator and carrier coherent and non-coherent envelope detectors are used. In this thesis, we have used coherent modulation based on carrier estimation using the Center Of Gravity (COG) methods for decomposition of sub band signals into modulator and carrier. Envelope detection is the most important part of modulation system. Magnitude and magnitude like operations are used to estimate the modulators in non-coherent detection while carrier estimation operations are used for coherent detection. Coherent detection uses carrier estimation for the calculation of envelope where as non-coherent detection detects envelope and carrier independently. The detected modulators are then analyzed and recombined with the original carrier to form the modified sub band signals. These modified sub band signals are then synthesized to form the original signal. Figure 2.2 Modulation Domain Analysis[15] 2.3 Incoherent Envelop Detection Incoherent envelope detectors are based on the Hilbert envelope (for real-valued sub bands), or the magnitude operator (for complex-valued sub bands). Each type is discussed in following sections. 2.3.1 Hilbert Envelope Hilbert Envelope (t) is given as the magnitude of the analytic sub-band (t), and the Hilbert carrier is (t) as shown in equation (2) and (3) 5

(t) = { (t)} = (t) = (t) + jh{ (t)} (2) (t) = { (t)} = cos {arg [ (t) + jh{ (t)}] (3) where and are the modulation and carrier detection operations, respectively. H donates the Hilbert transform, j denotes the imaginary unit j= and arg takes the argument of a complex valued number [15]. 2.3.2 Magnitude Detection For an analytical sub-band (t), the modulator (t) and carrier (t) are represented as in the following equations (4) and (5) { } (4) { } (5) where and are the modulator and carrier detectors respectively. Incoherent envelope detection of modulators has the following limitations The bandwidth of the sub-band signal is generally less than the sub-band magnitude and sub-band phase. Conjugate symmetric spectrum on the modulator is forced by incoherent detectors and for natural signal it is not a realistic assumption. Modulation domain is not closed under convolution in incoherent detectors. 2.4 Coherent Carrier Estimation Coherent demodulation estimates the carrier and the estimated carrier is used to calculate the modulator, according to (2.13). The basic equation for the signal yields the modulator when the carrier is found. The analytically computed modulator avoids the limitations of incoherent demodulation. The modulator is computed as in equation (6) (t) (6) where and are the modulator and the carrier of the sub-band signal respectively and the super-index * denotes the complex conjugate operator. 2.4.1 Instantaneous Frequency Carrier Estimator The proposed method in reference [6] uses the concept of instantaneous frequency for demodulation of sub-band signals after modulation of the differential FM detector [24] by Atlas and Janssen. Matematically, a unimodular phase-only IF estimate (t) is given by ( ) (t) { } (7) 6

where (t) is an un-normalized IF estimate, and ε is a small threshold that is used to reduce noise in the IF estimate. Smoothed IF estimates give an instantaneous phase estimate by the following equation, with the initial assumptions (-1) and (0) = (0) (t) = (t-1) (0) (8) Once the instantaneous phase estimate (t) is calculated for the sub-band (t), the IF carrier is computed as (t) = (t) (9) and the coherent envelope (t) is calculated from the carrier (t) = (t) (t) (10) In this demodulation process a complex valued envelope is obtained [15]. 2.4.2 Frequency Reassignment Carrier Estimation The IF estimator used in the above section is actually a linear approximation of the true sub-bands IF. By analysis of the IF estimator we get the following equation for the IF estimate (t) = exp{j (t+1)- (t-1)]} (11) and the true IF of sub-band is given as (12) but the problem with this linear approximation is that accuracy decreases as the decimation factor of discrete factor of discrete Short-time Fourier transform increases. This problem can be reduced by the help of frequency ressignment operator from time frequency reassignment. To reconstruct the x(t) from the sub-band signal ε(t,w), X(t) = dwdτ (13) = dwdτ (14) where ε(t,w) is given by A(T,w), w should satisfy the phase stationary condition for maximum contribution of the reconstruction integral [15] [ψ(t,w)- + ] = 0 (15) (16) the reassignment frequency can be computed as { } (17) Where X(t,w) is the short time Fourier transform of x(t) by analysis window h(t) and (t,w) is Short-time Fourier transform of x(t) by (t) and is a time derivative analysis window. The coherent carrier (T) and modulator (t) are given by (18) (T) = (T) (T) (19) 7

2.4.3 Smooth Hilbert Carrier Estimator The polar form of an analytical sub-band signal is (t) = (t) ), and the subbands phase signal defined as (t)+ + (20) where is a linear phase term, is an initial phase and (t) is phase elevation term. A smooth version of the sub-band phase (t) is computed as k (t) = (t) + + + (21) where a low pass filter. The smooth Hilbert carrier estimator is can be obtain using the smooth sub-band k (t) (22) and the coherent envelope by the help of carriers given by (23) 2.4.4 Spectral Center of Gravity Carrier Estimation In the center-of-gravity approach the instantaneous frequency is defined as instantenous spectrum average frequency of (t) at time t. An instantaneous spectrum with Short-time Fourier transform is computed as shown in equation (24) (w,t) = (t+p) (24) where g(p) is a short spectral-estimation window.the instantenous frequency (t) of the sub-band (t) is estimated as (t) = (25) the phase (t) of the carrier is computed as follows (26) (t) = (27) and the complex valued modulator (t) is given by (t) = (t)* (t). 8

Chapter 3-IMPROVED ADAPTIVE GAIN EQUALIZER This chapter explains advantages, disadvantages of Adaptive Gain Equalizer and explains how Improved Adaptive Gain Equalizer in Time Domain is designed to overcome the disadvantages and explores the implementation of Improved Adaptive Gain Equalizer in Modulation Domain. 3.1 Improved Adaptive Gain Equalizer (AGE) in Time Domain In speech enhancement, a method called Adaptive Gain Equalizer (AGE) is used where the main focus is to enhance the speech rather than suppressing the noise. It is a device that boosts the signal only when the speech is present. Enhancement of the speech is done by using a speech signal which is corrupted by noise, this signal can be divided into a number of sub bands and that each of these sub bands can be individually weighed according to the SNR estimate in that particular sub band signal [7, 13]. To achieve this speech boosting effect, a slowly varying noise floor level estimate is calculated in each sub band. Noise floor level estimate is designed to adaptively track the background noise level. A short term average is calculated simultaneously. Using the noise floor level and short term average the gain function can be achieved that weights the sub band signal accordingly to the sub band signal SNR at that particular time instant. The gain function can be defined as the ratio of the short term average and the noise floor level estimate [13] as shown in equation 1 {( ) } (1) where is the noise speech signal level, is the estimate of the noise level. The noise speech level is estimated by taking the short time average of the input signal according to (2) where is a forgetting factor constant. Estimation of the noise level is based on the short term average can be given as { ( ) } (3) where a positive constant defining the increase rate of the noise level. AGE is a versatile method that proven to be advantageous in offering low complexity, less delay, low distortion and also AGE does not require a Voice Activity Detector (VAD). Speech enhancement is performed continuously in each sub band. Although, AGE has proven it advantageous in enhancing the speech but still suffers from a drawback in case of intense continues speech. In such situation, the sub band SNR estimate will gradually become inaccurate which results in undesired damping and ultimately reduced speech signal quality. Improved AGE is a modification for AGE to overcome this drawback. Further, improved AGE also produces less distorted speech and provides more noise damping in speech pauses. This section explains how the IAGE overcomes the disadvantages of AGE. One problem with the AGE is the reduction of speech boosting gain during intense continuous speech with the increase of noise level estimate as can be seen in equation (1). To overcome this problem an alternative noise estimation method is proposed [14] which uses modified update controller as in equation (4) 9

{ } (4) where (n) is the noise estimation update controller which is defined as where { } (5) is the ratio of maximum and minimum signal amplitudes defined as { } (6) { } where is the number of blocks used for estimation of, is the constant included for avoiding division by zero error and is the accumulated signal block defined as (7) where is the number of samples accumulated in every block and l is the block index. The noise estimation update controller is based on and. The ratio of equation (6) determines the largest and smallest accumulated signal blocks out of most recent blocks. High ratio of means the signal is dominated by speech under considered time frame and a low ratio of means the signal is dominated by noise under considered time frame. Hence based on the equation (5) the noise estimation update controller prevents noise estimation during speech and thus eliminates the problem of reduction of speech booster gain during intense continuous speech [14]. Second disadvantage with the AGE is as in equation (1) if is set too high there is a risk of fast pumping of noise and distortion of speech. To overcome this, a second gain factor is introduced which provide noise damping in longer speech pauses still providing significant noise reduction and less distortion of speech. This gain factor is the full band gain denoted as which is applied to the input signal as (8) the full band gain is based on gain controller which is defined as { } (9) where is the threshold. A hold function of samples is introduced for the gain controller which then becomes = { } (10) Mathematically, the full band gain is expressed as ( ) (11) where L(n) is known as target damping value and is the forgetting factor which are defined as in equation (12) and (13) { } (12) { } (13) where is the small constant which defines the limit of transition between the fast and slow damping regions. As it can be seen in equation (9) the full band gain is dependent on. If sufficient gain is applied to sub bands during speech the gain controller 10

will be 1 which means the full band gain should rise as in equation (11) and (12). If little sub band gain is applied the update controller will be 0 which means the full band gain should fall as in equation (11) and (13). The speech pause driven gain is designed to adapt quickly to a certain value with smoothing parameter and adapt slowly to the level < with a smoothing parameter [14]. The full band gain is set to be operated in three regions, the first region is L(n) = 1, is used when only speech is present. The second region L(n) = is used after the speech segment in audio signal. In this region the gain is quickly reduced which reduces the noise, the adaptation to the lowest gain is relatively fast and hence noise suppression cannot be too large. The third region L(n) = is used to adapt to the lowest desired gain. This adaptation is slow in order to make the transition between the noise levels less apparent [14]. Figure 3.1 IAGE System in Time Domain 3.2 Improved Adaptive Gain Equalizer in Modulation Domain The main contribution of this thesis is to combine the improved AGE with modulation domain for enhancement of speech. The functionality of IAGE in modulation domain is same as in time domain except that in modulation domain, the sub band signal is divided into modulators and carriers (n), while only modulators are considered here in IAGE. The IAGE system in modulation domain is shown in following Fig 3.2. The mathematics for IAGE in modulation domain is same as in IAGE for sub band domain, while the short term average and long term average are calculated for each sub band modulator instead of sub band itself. The gain function is multiplied with the modulator sub band to yield the modified modulator k(n) which is then combined with the carrier in the reconstruction stage of modulation system [15]. k(n (14) k(n) = k(n) (15) The synthesized output y(n) is finally calculated by adding up all the components (16) 11

Figure 3.2 IAGE System in Modulation Domain 12

Chapter 4-EVALUATION This chapter explains how the implemented system is practically evaluated and how the results are produced with different input signals in different noise backgrounds. The following table 4.1 shows the system parameters set for evaluation and the below section explains why these values are considered for evaluation. The input speech signals considered for simulation of male, female and the noise signals are Engine Noise (EN), Factory Noise (FN) and Gaussian Noise (GN). The performance measurement parameters considered for evaluation, Perceptual Evaluation of Speech Quality (PESQ), Spectral Distortion (SD) and Signal to Noise Ratio Improvement (SNRI) are explained in following sections. 4.1 Evaluation In this section practical evaluation is performed, where all the experimental evaluations are performed on signals recorded at 16 khz frequency. For comparison all the parameters are set to be same as shown in the table 4.1. The description of parameters for simulation is as follows, is the threshold determining the maximum allowed gain in sub band, the suitable choice of maximum allowed sub band gain is made between 10 and 20., are the first and last damping limits of full band gain which has the purpose of damping noise in longer speech pauses. Δ determines the transition region between fast and slow damping it is a small constant which defines the limit of transition between the fast and slow damping regions. is the forgetting factor constant which is chosen to be between 0 and 1, it is recommended to set lower values of so that the gain will be stable., are the smoothing parameters. The speech pause driven gain is designed to adapt quickly to a certain value with smoothing parameter and adapt slowly to the level < with a smoothing parameter. determines the relation between the SNR estimate and the sub band gain, for linear relationship =1 and for >1 alteration of SNR estimate will have a larger effect on the gain than <1, for simulation a setting of =1 was chosen. is the threshold which determines the decision point that distinguishes between the speech and noise based on the values of number of blocks ( ) and number of samples in each block ( The noise estimation update controller is based on and is the hold function which is used to avoid changes in the full band gain during short speech pauses. The following table 4.1 shows the parameter values used in simulation. 13

Parameters Value 20 0.5 0.125 Δ 0.5 0.984 0.9687 0.999 1 64 8 100 δ 2.2 Table 4.1 4.2 Results and comparison This section presents the results having the system parameters = 40dB, = 0.004ms, = 4ms and = 1 to 20. The speech signals comprise of female and male signals and the noise signals are scaled so as to have 10dB, 5dB, 0dB, -5dB and -10dB. Noise signals are Engine Noise [EN], Gaussian Noise [GN] and Factory Noise [FN]. The performance measurement is evaluated by Signal to Noise Ratio Improvement (SNRI), Spectral Distortion (SD) and Perceptual Evaluation of Speech Quality (PESQ). 4.2.1 Perceptual Evaluation of Speech Quality (PESQ) PESQ is computed by comparing the original signal and the degraded output signal which gives the PESQ-MOS values measuring how much degradation the system has introduced on the speech signal due to the introduction of AGE gain function. PESQ results in mean opinion score (MOS) values which cover scale from 1(bad) to 5(excellent). Figure 4.1 and Figure 4.2 show the MOS values for the female speech with EN and FN, Figure 4.3 and Figure 4.4 show the MOS values for the male speech for both IAGE and IMAGE with FN and GN at different SNR s. The purpose through all the experiments is to find the optimal value on the critical system parameter, for different speaker situations. The Mean Opinion Score (MOS) as computed by the PESQ is for most of the tests around 2.5 to 3.5 which is consider fair for speech signals. Plot in red indicates IMAGE and plot in blue indicates IAGE. 14

PESQ PESQ 4.5 Perceptual Evualation of Speech Quality(PESQ) 4 3.5 3-10dB SNR -5dB SNR 0dB SNR 5dB SNR 10dB SNR -10dB SNR -5dB SNR 0dB SNR 5dB SNR 10dB SNR 2.5 2 1.5 0 2 4 6 8 10 12 14 16 18 20 Lopt 4.5 Figure 4.1: MOS for Female Speech Signal for Engine Noise 4 3.5 3 Perceptual Evualation of Speech Quality(PESQ) -10dB SNR -5dB SNR 0dB SNR 5dB SNR 10dB SNR -10dB SNR -5dB SNR 0dB SNR 5dB SNR 10dB SNR 2.5 2 1.5 0 2 4 6 8 10 12 14 16 18 20 Lopt Figure 4.2: MOS for Female Speech Signal for Factory Noise 15

PESQ PESQ 4.5 4 3.5 3 Perceptual Evualation of Speech Quality(PESQ) -10dB SNR -5dB SNR 0dB SNR 5dB SNR 10dB SNR -10dB SNR -5dB SNR 0dB SNR 5dB SNR 10dB SNR 2.5 2 1.5 0 2 4 6 8 10 12 14 16 18 20 Lopt 4.5 4 3.5 3 Figure 4.3: MOS for Male Speech Signal for Factory Noise Perceptual Evualation of Speech Quality(PESQ) -10dB SNR -5dB SNR 0dB SNR 5dB SNR 10dB SNR -10db SNR -5dB SNR 0dB SNR 5dB SNR 10dB SNR 2.5 2 1.5 0 2 4 6 8 10 12 14 16 18 20 Lopt Figure 4.4: MOS for Male Speech Signal for Gaussian Noise 16

Spectral Distortion [db] 4.2.2 Spectral Distortion (SD) Spectral distortion is defined as the spectral deviation in the power of input clean speech signal and the power of the processed speech signal at the output. The Figures 4.5 to 4.8 shows the spectral distortion of female and male speech signals with EN and GN at different SNR values. The SD is very less for all cases for < 5 and then increases with the Lopt. All the SD graphs show that the SD is less for IAGE in modulation domain. The SD for female speech signal for EN is about 0 to 12dB. The SD for male speech signal for FN is about -8 to 3dB and SD for female speech signal for GN is about 2 to 12dB and for male speech signal its about 2 to 10DB. Plot in red indicates IMAGE and plot in blue indicates IAGE. 30 Spectral Distortion 20 10 0-10 -20-30 IMAGE IAGE -10 db SNR -5 db SNR 0 db SNR 5 db SNR 10 db SNR -10 db SNR -5 db SNR 0 db SNR 5 db SNR 10 db SNR -40 0 5 10 15 20 25 Lopt Figure 4.5: Spectral Distortion for Female Speech with Engine Noise 17

Spectral Distortion [db] Spectral Distortion [db] 20 Spectral Distortion 10 0-10 -20-30 -10 db SNR -5 db SNR 0 db SNR 5 db SNR 10 db SNR -10 db SNR -5 db SNR 0 db SNR 5 db SNR 10 db SNR -40 0 5 10 15 20 25 Lopt Figure 4.6: Spectral Distortion for Female Speech with Gaussian Noise 30 Spectral Distortion 20 10 0-10 -20-30 -10 db SNR -5 db SNR 0 db SNR 10 db SNR 10 db SNR -10 db SNR -5 db SNR 0 db SNR 5 db SNR 5 db SNR -40 0 5 10 15 20 25 Lopt Figure 4.7: Spectral Distortion for Male Speech with Gaussian Noise 18

Spectral Distortion [db] 15 Spectral Distortion 10 5 0-5 -10-15 -20-25 -30-10 db SNR -5 db SNR 0 db SNR 0 db SNR 10 db SNR -10 db SNR -5 db SNR 10 db SNR 5 db SNR 5 db SNR -35 0 5 10 15 20 25 Lopt Figure 4.8: Spectral Distortion for Male Speech with Factory Noise 4.2.3 Signal to Noise Ratio Improvement (SNRI) An improvement in SNR can be calucalted by finding SNR at input of the enhancement system and then the output. SNR improvement can be represented as SNRI= SNR-output is SNR at output of enhanced system and SNR-input is SNR at the input of the enhancement system. Figures 4.9 to 4.12 shows the SNRI of female and male speech signals for both IAGE and IMAGE different noise at different SNR values. The female speech signal has SNRI of 10 to 16dB for FN and the male speech signal has SNRI of 2 to 8dB for FN. The female speech signal has SNRI of 40 to 44dB for GN and male speech speech signal has SNRI of 30 to 35dB fro GN. Plot in red indicates IMAGE and plot in blue indicates IAGE. 19

SNR Improvement [db] SNR Improvement [db] 45 Signal to Noise Ratio Improvement 40 35 30-10 db SNR -5 db SNR 0 db SNR 5 db SNR 10 db SNR -10 db SNR -5 db SNR 0 db SNR 5 db SNR 10 db SNR 25 0 5 10 15 20 25 Lopt Figure 4.9: Signal to Noise Ratio Improvement for Female Speech with Gaussian Noise 18 Signal to Noise Ratio Improvement 16 14 12 10 8 6-10 db SNR -5 db SNR 0 db SNR 10 db SNR 10 db SNR 4-10 db SNR 5 db SNR 0 db SNR 2-5 db SNR 5 db SNR 0 0 5 10 15 20 25 Lopt Figure 4.10: Signal to Noise Ratio Improvement for Female Speech with Factory Noise 20

SNR Improvement [db] SNR Improvement [db] 45 Signal to Noise Ratio Improvement 40 35-10 db SNR -5 db SNR 30 0 db SNR 10 db SNR 10 db SNR -10 db SNR 25-5 db SNR 0 db SNR 5 db SNR 5 db SNR 20 0 5 10 15 20 25 Lopt Figure 4.11: Signal to Noise Ratio Improvement for Male Speech with Gaussian Noise 12 Signal to Noise Ratio Improvement 10 8 6 4-10 db SNR 2-5 db SNR 0 db SNR 0 10 db SNR 10 db SNR -2-10 db SNR -5 db SNR 0 db SNR -4 5 db SNR 5 db SNR -6 0 5 10 15 20 25 Lopt Figure 4.12: Signal to Noise Ratio Improvement for Male Speech with Factory Noise 4.3 Spectrogram Analysis This chapter presents the spectrogram analysis of the signals. The spectrograms are generated for noisy speech input signal and the enhanced output signal. Spectrogram uses window size of 256 and the number of frequency point used are 1024 and the number of samples to be overlapped is considered as 200 which should be less than the window size. The following figures show the spectrogram of noisy male and female speech signals along 21

with the enhanced signals with both IAGE and IMAGE with EN, FN and GN with different SNR values. The spectrogram of male speech signal that has been mixed with GN signal at -10dB SNR is shown in Figure 4.13 and the corresponding figure after enhancement with the modulation domain IAGE in shown in Figure 4.14 where it can be observed that the disturbing noise is reduced while the formants of the speech is maintained. Figure 4.15 shows the spectrogram of the enhanced speech signal with time domain IAGE where it can be observed that the enhanced signals clearly shows the formants after processing. Similarly Figures 4.16-4.24 shows the spectrogram analysis of male speech signal with GN signal at 5dB and with EN signal at -10dB and 5dB SNR. Figures 4.25-4.30 shows the spectrogram analysis of female speech signal with EN signal at -10dB and 5dB SNR. Figure 4.31 shows the spectrogram of female speech signal mixed with GN at -10dB SNR and the corresponding figure after enhancement with the modulation domain IAGE is shown in Figure 4.32 where it can be observed that the disturbing noise is reduced while the formants of the speech is maintained. Figure 4.33 show the spectrogram of the enhanced speech signal with time domain IAGE where it can be observed that the enhanced signals clearly shows the formants after processing. Similarly Figures 4.34-4.36 shows the spectrogram analysis of female speech signal with GN signal at 5dB SNR. Figure 4.13: Spectrogram of Noisy Male Speech Signal with Gaussian Noise at -10dB SNR 22

Figure 4.14: Spectrogram of Modulation Domain IAGE Processed Male Speech Signal with Gaussian Noise at -10dB SNR Figure 4.15: Spectrogram of Time Domain IAGE Processed Male Speech Signal with Gaussian Noise at -10dB SNR 23

Figure 4.16: Spectrogram of Noisy Male Speech Signal with Gaussian Noise at 5dB SNR Figure 4.17: Spectrogram of Modulation Domain IAGE Processed Male Speech Signal with Gaussian Noise at 5dB SNR 24

Figure 4.18: Spectrogram of Time Domain IAGE Processed Male Speech Signal with Gaussian Noise at 5dB SNR Figure 4.19: Spectrogram of Noisy Male Speech Signal with Engine Noise at -10dB SNR 25

Figure 4.20: Spectrogram of Modulation Domain IAGE Processed Male Speech Signal with Engine Noise at -10dB SNR Figure 4.21: Spectrogram of Time Domain IAGE Processed Male Speech Signal with Engine Noise at -10dB SNR 26

Figure 4.22: Spectrogram of Noisy Male Speech Signal with Engine Noise at 5dB SNR Figure 4.23: Spectrogram of Modulation Domain IAGE Processed Male Speech Signal with Engine Noise at 5dB SNR 27

Figure 4.24: Spectrogram of Time Domain IAGE Processed Male Speech Signal with Engine Noise at 5dB SNR Figure 4.25: Spectrogram of Noisy Female Speech Signal with Engine Noise at -10dB SNR 28

Figure 4.26: Spectrogram of Modulation Domain IAGE Processed Female Speech Signal with Engine Noise at -10dB SNR Figure 4.27: Spectrogram of Time Domain IAGE Processed Female Speech Signal with Engine Noise at -10dB SNR 29

Figure 4.28: Spectrogram of Noisy Female Speech Signal with Engine Noise at 5dB SNR Figure 4.29: Spectrogram of Modulation Domain IAGE Processed Female Speech Signal with Engine Noise at 5dB SNR 30

Figure 4.30: Spectrogram of Time Domain IAGE Processed Female Speech Signal with Engine Noise at 5dB SNR Figure 4.31: Spectrogram of Noisy Female Speech Signal with Gaussian Noise at -10dB SNR 31

Figure 4.32: Spectrogram of Modulation Domain IAGE Processed Female Speech Signal with Gaussian Noise at -10dB SNR Figure 4.33: Spectrogram of Time Domain IAGE Processed Female Speech Signal with Gaussian Noise at -10dB SNR 32

Figure 4.34: Spectrogram of Noisy Female Speech Signal with Gaussian Noise at 5dB SNR Figure 4.35: Spectrogram of Modulation Domain IAGE Processed Female Speech Signal with Gaussian Noise at 5dB SNR 33

Figure 4.36: Spectrogram of Time Domain IAGE Processed Female Speech Signal with Gaussian Noise at 5dB SNR 34

Chapter 5-CONCLUSION This chapter describes the conclusion of implementation and evaluation of IAGE in Modulation Domain and possible future work. 5.1 Conclusion A modulation frequency system has been presented in which Improved Adaptive Gain Equalizer (IAGE) has been introduced with application of noise reduction of noisy speech signals. IAGE is the amended version of AGE. The noise reduction algorithm used in IAGE is an improvement to Speech Booster Algorithm (SBA) presented in [7], which incorporates subband division of audio signal with a noise damping in each subband. The subband damping is proportional to current SNR estimate in the corresponding subband, yielding noise reduction with low levels of speech distortion. The proposed algorithm introduces an additional noise reduction functionality which is applied in speech pauses, allowing the noise level to be further reduced without adding any speech distortion. Furthermore, the proposed algorithm introduces a noise estimation update controller and a gain controller which are used to determine whether the audio signal contains speech or only background noise. Owing to this fact, it is possible to obtain a more reliable noise level estimation and thus the gain in each subband will correspond to the actual SNR, resulting in less speech distortion compared to the original SBA. IAGE has been tested and experimented with different kinds of noises such as GN, EN and FN both in time and modulation domain. The detailed analysis of the system has put light on its advantages and disadvantages where the evaluation section highlights the low SD of IAGE in modulation domain compared to other implementations. The system provides good improvement on the female speech signal with low SD, better SNRI, fair MOS and output speech signal sounds good. The spectrogram analysis provides another view of these results. 5.2 Future Work The possible future work can be experimenting with the system parameters L k, T a, T b, and L opt to get much better results as low SD and better SNRI. Furthermore, IAGE can be implemented in modulation frequency domain with the use of demodulation technique convex optimization and compare with the traditional IAGE implemented with spectral center of gravity demodulation technique. 35

References [1] S.F. Boll. "Suppression of acoustic noise in speech using spectral substraction," IEEE trans. Acoust. Speech and Sig. Proc., vol. 27, no. 2, pp. 113-120, April 1979. [2] J. Benesty, Y. Huang, J. Chen and S. Doclo. "New insights into the noise reduction wiener filter," IEEE trans. Audio. Speech and Lang. Proc., vol. 14, no. 4, pp. 1218-1234, July 2006. [3] E.J. Diethorn. "A subband noise-reduction method for enhancing speech in telephony and teleconferencing," in Applications of signal processing to Audio and Acoustics, Murray Hill, NJ, 1997. [4] N. Westerlund. "Applied speech enhancemnet for personal communication." Ph.D. dissertation, Blekinge Institute of Technology, Sweden, 2003. [5] M. Dhal, I. Claesson, B. Sallberg, and H. Akesson. "A mixed analog-aigital hybrid for speech enhancement purposes," IEEE internationl symposium on circuit and systems, vol. 2, pp. 852-855, 2005. [6] N. Westerlund, M. Dhal, I. Claesson, B. Sallberg, and H. Akesson. "Analog circuit implementation for speech enhancement purposes," Asilomar conference on Signals, Systems and Computers, vol. 2, pp. 2285-2289, 2004. [7] N. Westerlund, M. Dhal, and I. Claesson. "Speech enhancement using adaptive gain equalizer with frequency dependent parameter settings," IEEE 60th conference on Vehicular Technology, vol. 5, pp. 3718-3722, 2004. [8] L.E. Atlas and S.M. Schimmel. "Target talker enhancement in hearing devices," IEEE International Conference on Acoustics,Speec and Signal Processing, pp. 4201-4204, 2008. [9] Q. Li and L.E. Atlas. "Cohorent modulation filtering for Speech," IEEE conference on Acoustics, Speech and Signal Processing, pp. 4481-4484, 2008. [10] S.M. Schimmel, K.R. Fitz, and L.E. Atlas. "Frequency reassingment for coherent modulation filtering," IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 5, pp. 261-264, 2006. [11] C.P. Clark. "Effective coherent modulation filtering and interpolation of long gaps in acoustic signals." Master thesis, University of Washington, Washington, 2008. [12] S.M. Schimmel. "Theory of modulation frequency analysis and modulation filtering, with application to aearing aevices." Master thesis, University of Washington, Washington, 2007. [13] N. Westerlund, M. Dhal, and I. Claesson. "Speech enhancemnet using adaptive gain equalizer," in Department of Telecomunications and Signal Processing, Blekinge Institue of Technology, Ronneby, 2003. [14] M. Borgh, C. Schuldt, F. Lindstrom, M. Berggren, and I. Claesson. "An improved adaptive gain equalizer for noise reduction with low speech distortion," EURASIP Journal on Audio,Speech and Music Processing, vol. 7, pp. 1-7, 2011. [15] I. Rizwan. "Adaptive gain equalizer and modulation frequency domian for noise reduction." Master thesis, Blekinge Institute of Technology, Karlskrona, 2010. [16] B. Sallberg. "Digital signal processors." Department of Electrical Engineering, Blekinge Institue of Technology, Karlskrona, 2010. [17] P. Clark and L.E. Atlas. "Time-frequency coherent modulation filtering of nonstationary signals," IEEE Transcations on Signal Processing, vol. 45, no.57, pp. 36

4323-4332, 2009. [18] N. Westerlund, M. Dhal and I. Claesson. "Real time implementation of adaptive gain equalizer for speech enhancment purposes, " WSEAS., 2003. [19] B. Sallberg, I. Claesson, and N. Grbic. "Implementation Aspects of Adaptive Gain Equalizer," Blekinge Institue of Technolgy, Karlskrona, 2006. [20] M. Shahid and I. Rizwan. Modulation domain adaptive gain equalizer for speech enhancement. Presented at The IASTED Int. Conf. Signal and Image Processing and Applications, Greece, 2011. 37