CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS

Size: px
Start display at page:

Download "CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS"

Transcription

1 66 CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS 4.1 INTRODUCTION New frontiers of speech technology are demanding increased levels of performance in many areas. In the advent of Wireless Communications new speech services are becoming a reality with the development of modern robust speech processing technology. Many researchers discussed about the ill effect of environmental noise on the system performance of speech processing. Abhijeet Sangwan et al (2002) discussed many issues associated with desirable aspects of Voice Activity Detection (VAD) algorithms based on a good decision rule, adaptability to background noise and low computational complexity for estimating the noise spectrum. Background noise acoustically added to speech can degrade the performance of digital voice processors used for applications such as speech compression, recognition, and authentication (Isrel 2003). Digital voice systems will be used in a variety of environments, and their performance must be maintained at a level near that measured using noise-free input speech. To ensure continued reliability, the effects of background noise can be reduced by using, internal modification of the voice processor algorithms to explicitly compensate for signal contamination, or preprocessor noise reduction and noise-cancelling microphones

2 67 Khaled et al (1997) observed that high-energy voiced speech segments are always detected in all VADs under very noisy conditions such as car, bus, babble, and street noise. However, low-energy unvoiced speech is commonly missed. The background noise which contaminates the signal results in either noise only or speech plus noise segments. The VAD developed by Javier Ramirez et al (2005), makes it possible to define an effective endpoint detection algorithm employing a novel noise reduction techniques and order statistic filters for the formulation of the decision rule. The VAD performs an advanced detection of beginnings and delayed detection of word endings which, in part, avoids the inclusion of additional hangover schemes. In addition, VAD provides speech / non speech discrimination also. It has been observed that low energy portions of speech are first to be falsely rejected. A hangover scheme is required to lower the probability of false rejections (Alan Davis et al 2006). Robustness can be achieved by an appropriate extraction of robust features in the front-end and/or by the adaptation of the reference to the noise situation. Noise signals are selected to represent the most probable application scenarios for telecommunication terminals. Some noises are fairly stationary. They are the car noise and the recording in the exhibition hall. Others noises contain non-stationary like, the recordings on the street and at the airport. A fast noise estimation algorithm proposed by Sundarrajan Rangachari et al (2004) resulted in good performance for a single sentence. The noise estimate was found by averaging past spectral power values using a smoothing parameter that was adjusted by the signal presence probability in subbands case as discussed Sundarrajan Rangachari et al (2004).

3 68 A novel VAD algorithm developed by Dong Kook Kim et al (2007) based on the Gaussian distribution and the uniformly most powerful (UMP) test to detect the speech or nonspeech from the input noisy signal. This method provides the decision rule by comparing the magnitude of the noisy speech signal to the adaptive threshold estimated from the noise statistics. A conditional Maximum a posteriori (MAP) criterion decides the hypothesis with the maximum conditional probability given both the observation and the voice activity in the previous frame. This criterion leads to two separates thresholds for Likelihood Ratio Test (LRT) depending on the previous VAD result frame case as discussed Jong Won Shin et al (2008). Several VAD algorithms have been proposed for detecting the voiced / unvoiced region (Boll Steven et al 1980, Dhananjaya et al 2010, Falk Tiago et al 2006, Haitian Xu et al 2007, Jongseo Sohn et al 1999, Juan Manuel Gorriz et al 2008, Matteo Gerosa et al 2007, Plante et al 1998, Qi Li et al 2002, Richard et al 2000, Yutaka Kaneda et al 1986, Zenton Goh et al 1999, Zhong Lin et al 2007). In this chapter, the Voice Activity Detection (VAD) developed by Ramirez et al (2005) is presented along with the noise estimation algorithm as discussed in Sundarrajan Rangachari et al (2004) and Abhijeet Sangwan et al (2002). Various VAD algorithms are studied and comparison of their performance based on parameters such as Zero Crossing Detection (ZCD), Weak Fricative Detection (WFD), Pitch Based Detection (PBD), Energy Based Detection (EBD) and Subband Order Statistics Filter (OSF) in presence of different types of noise like suburban train noise, babble, car, exhibition hall, restaurant, street, airport and train-station noise for Automatic Speech Recognition (ASR) are carried out.

4 VOICE ACTIVITY DETECTION ALGORITHMS A straight forward approach is to identify Voice Activity Detection (VAD), i.e, the processes of discrimination of speech from silence or other background noise. The VAD algorithms are based on any combination of general speech properties such as temporal energy variations, periodicity, and spectrum. The detection task is not as trivial as it appears since the increasing level of background noise degrades the classifier effectiveness. VAD indicates the presence or absence of speech as observed by Ramirez Voice is differentiated into speech or silence based on speech characteristics. The signal is sliced into contiguous frames. A real valued nonnegative parameter is associated with each frame. If this parameter exceeds a certain threshold, the signal is classified as active or inactive. The basic principle of VAD device is that it extracts some measured features or quantities from the input signal and then compares these values with thresholds. Voice activity (VAD=1) is declared if the measured value exceeds the threshold. Otherwise (VAD=0) is declared for no speech activity. In general, a VAD algorithm outputs a binary decision in a frame by frame basis where a frame of a input signal is a short unit of time such as 20-40ms. algorithm: The following are some of the required features of a good VAD (i) Good Decision Rule: A physical property of speech that can be exploited to give consistent and accurate judgment in classifying segments of the signal into silence or otherwise.

5 70 (ii) Adaptability to background Noise: Adapting to non stationary background noise improves the robustness, especially in wireless telephony. (iii) Low Computational Complexity. The complexity of VAD algorithm must be low to suit real-time applications. A tree diagram that represents the classification techniques for VAD algorithms are shown in Figure 4.1. VAD Algorithms Parameter Based Frequency Based Thresholding: ZCD Transform (Power Spectral Density):WFD Subband OSF Linear Variance: EBD Segmentation :PBD Figure 4.1 Tree diagram for VAD Algorithms The VAD Algorithm is classified into two types. (i) Parameter Based VAD Algorithm and (ii) Frequency Based VAD Algorithm.

6 71 types. Parameter Based VAD Algorithms are further classified into three (i) Zero Crossing Detector which based on thresholding. (ii) Energy Based Detection which is implemented through Linear Variance. (iii) Pitch Based Detection through Segmentation. Frequency Based VAD Algorithm which consists of Weak Fricative and Subband Order Statistic Filter which are formed under Transformation method Zero Crossing Detector (ZCD) The Zero Crossing Detector (ZCD) is defined as the number of times in a sound sample that the amplitude of sound wave changes sign. Zero Crossing for a signal is the number of times that it crosses the line of no disturbance or zero line (Abhijeet Sangwan et al 2002). The number of zero crossings for a voice signal lies in fixed range. For a 10ms duration, the number of zero crossing lies between 5 and 15. The number of zero crossing for noise is random and unpredictable. This reason innovate formulate a decision rule that is independent of energy and hence able to detect some low energy phonemes. If Frame is ACTIVE else Frame is INACTIVE (4.1) is the number of Zero Crosses detected in f j R is the set of values of {5-15}, the number of zero crossing for speech duration of 10ms.

7 Weak Fricatives Detector (WFD) The main drawback of ZCD is that of misclassification of noise frames as Active one when zero crossings of the noise frames satisfies the equation (4.1). The problem of discriminating speech from background noise is not trivial, except in the case of extremely high Signal to Noise Ratio acoustic environments. For such high Signal to Noise Ratio (SNR) environments, the energy of the lowest level speech sounds exceeds the background noise energy, and thus a simple energy measurement suffices. However, such ideal recording conditions are not practical for most applications (Rabiner 2004). Therefore a method is required to classify weak fricatives from noise dependent of SNR or other noise characteristics. This particular problem can be made to overcome by using Auto correlation function which is exploited by the high correlation found in speech signals. The unbiased autocorrelation function as [ ] = [ ] ( ) [ ] (4.2) A[x] is the autocorrelation vector y[n] is the vector under consideration n is the frame length Each frame of the incoming signal is segmented into frames of duration 20ms. The energy of each frame is computed as = (( 1) 4 + )

8 73 2where subframes takes the value from 1 to the total number of subframes in the sample, index denotes each sample in the given vector. Thus a vector of 20 such energy values is computed for each frame, which is denoted as ( ) (4.4) where j is the frame under consideration. The classification parameter that is used the variance of the above vector. The Autocorrelation Vector Variance (AVV) is determined as ( ( ) ) (4.5) A reference value for AVV for silence frame is computed by assuming that the first 20 frames to be inactive = ( ) (4.6) A reference value for AVV for silence frame is computed by assuming that the first 20 frames to be inactive. We compare the AVV of subsequent frames with a scalar multiple of this reference value, to determine speech activity. If Frame is ACTIVE else Frame is INACTIVE (4.7) The value of k was set to 7 after trial and error. Only active frames are marked as voiced signal and inactive frames are unvoiced signal Pitch Period Based Detector (PBD) Pitch period estimation is one of the most important problems in speech processing. Pitch detectors are used in vocoders, speaker identification

9 74 and verification systems. Pitch period estimation can be done using the autocorrelation function. The autocorrelation function provides a convenient representation and it forms the basis for pitch detection. One of the major limitations of using the autocorrelation representation is that of retention of information in the speech signals. As a result the autocorrelation function has too many peaks. To estimate this problem it is useful to process the speech signal so as to make the periodicity more prominent while suppressing other distracting features of the signal. Numerous techniques have been proposed and a technique called centre clipping is reported in this thesis. The centre clipped (Sondhi 1968) speech signal is obtained by a nonlinear transformation ( ) = [ )] (4.8) where C[ ] is shown in Figure 4.2 Figure 4.2 Centre clipper transformation function The operation of center clipping is depicted in Figure 4.3

10 75 Figure 4.3 Centre clipping affects a speech waveform It can be seen that for samples above C L, the output of the centre clipper is equal to the input minus the clipping level. For samples below the clipping level, the output is zero. For high clipping levels, fewer peaks will exceed the clipping level and thus fewer pulses will appear in the output. If the clipping level is decreased, more peaks pass through the clipper and the auto correlation function becomes more complex (Rabiner 2004). The problem of extraneous peak can be eliminated in the autocorrelation function by center clipping prior to computing the autocorrelation function. However another difficulty with autocorrelation function representation is that large amount of computation that is required. A simple modification to centre clipping function leads to greater simplification in autocorrelation computation. The output of the clipper is +1 if x(n) > + C L

11 76 and -1 if x(n) < - C L. Otherwise the output is zero. The computation of the autocorrelation function for a 3-level center clipped signal is particularly simple. Most of the extraneous peaks are eliminated, and a clear indication of periodicity is retained. The three level center clipping function is shown in Figure 4.4. Figure 4.4 Three level center clipping function A novel algorithm for estimating the pitch period from the shorttime autocorrelation function proposed by Dubnowski et at (1976). The steps in the pitch based VAD algorithm is given below: i. The speech signal is filtered with a 900 Hz low pass analog filter and sampled at a rate of 10 khz. ii. Segments of length 30msec are selected at 10msec intervals. iii. Using the clipping level, the speech signal is processed by a 3- level centre clipper and the correlation function is computed over a range spanning the expected range of pitch periods. iv. The largest peak of the autocorrelation function is located and the peak value is compared to a fixed threshold. If the peak falls below threshold, the segment is classified as unvoiced else the segment is voiced.

12 Energy Based Detector (EBD) The amplitude of the speech signal varies appreciably with time. The amplitude of unvoiced segments is generally much lower than that of voiced segments. The energy of a signal represents a convenient representation that reflects the amplitude of the signal. Energy of a frame indicates the possible presence of voice data and is an important parameter used in VAD algorithms. Let X(i) be the i th sample of speech. If the length of the frame were k samples, then the j th frame can be represented in time domain by a sequence as = { ( )} ( ) (4.9) E j represents the energy of the j th frame as, = ( ) ( ) (4.10) The VAD algorithm is trained for a small period by a prerecorded sample that contains only background noise. The initial threshold for various parameters is computed from these samples. The initial energy theorem is obtained by taking the mean of the energies (E m ) of the samples = = (4.11) E is the initial threshold estimate and is the number of frames in a prerecorded sample, and the initial 20 frames are considered as INACTIVE.

13 78 The classification rule for speech is as follows, if > k (k > 1) frame is ACTIVE else frame is INACTIVE (4.12) Here, represents the energy of noise frame, while k is the threshold being used in the decision making. Active frames are transmitted while Inactive frames are not transmitted. Energy based decisions are not good for low energy phonemes. Weak fricatives are sometimes silenced completely. High energy voiced speech segments are detected in all VAD algorithms even under noise conditions. However, low energy unvoiced speech is commonly missed, reducing speech quality Subband OSF Based VAD Javier Ramirez et al (2005) proposes the determination of the speech / nonspeech divergence by means of specialized Order Statistics Filter (OSF) working on the subband log-energies. The filters based on order statistics have been successfully employed in restoration of signals and images corrupted by additive noise. The most common OSF is the median filter that is easy to implement and exhibits good performance in removing impulsive noise. Figure 4.5 enumerates the block diagram of the subband based VAD. This algorithm operates on the subband log-energies. Noise reduction is performed first and the VAD decision is formulated on the de-noised signal. The noisy speech signal is decomposed into 25 ms frames with a 10 ms window shift. Let X (m,l) be the spectrum magnitude for the m th band at frame l.the design of the noise reduction block is based on Wiener Filter

14 79 (WF) theory whereby the attenuation is a function of the Signal to Noise Ratio (SNR) of the input signal. The VAD decision is formulated in terms of the de-noised signal. The subband log-energies are processed by means of order statistics filters. NOISE ) FFT ( ) ( ) VAD REDUCTION ) SPECTRAL SMOOTHING WF DESIGN FREQUENCY DOMAIN FILTERING ) ) NOISE UPDATE Figure 4.5 Block diagram of Subband OSF based VAD 1) The noise reduction block consists of four stages. i) Spectrum smoothing: The power spectrum is averaged over two consecutive frames and two adjacent spectral bands. ii) Noise estimation: The noise spectrum ) is updated by means of a 1st order IIR filter on the smoothed spectrum ), ) = 1) + ( ) ) (4.13) where =0.99 and =0,1,,NFFT/2, (NFFT= Nonequispaced FFT)

15 80 iii) Wiener Filter (WF) design: First, the clean signal estimated by combining smoothing and spectral subtraction ) is ) = 1) + (1 ),0) (4.14) where = Then, the WF ) is designed as ( ) = ( ) ( ) (4.15) where ( ) = max ( ) ( ), (4.16) and is selected so that the filter yields a 20 db maximum attenuation. ), the spectrum of the cleaned speech signal, is assumed to be zero at the beginning of the process and is used for designing the WF through equation (2.13) to equation (2.15). It is given by ) = ) (4.17) The filter ) is smoothed in order to eliminate rapid changes between neighbor frequencies that may often cause musical noise. Thus, the variance of the residual noise is reduced and consequently, the robustness when detecting nonspeech is enhanced. The smoothing is performed by truncating the impulse response of the corresponding causal FIR filter to 17 taps using a Hanning window. With this operation performed in the time domain, the frequency response of the Wiener Filter is smoothed and the performance of the VAD is improved.

16 81 iv) Frequency domain filtering: The smoothed filter is applied in the frequency domain to obtain the denoised spectrum ) = ). (4.18) Once the input speech has been de-noised, the log-energies for the l th frame, ), in subbands ( = 0,1,.. 1) are computed by means of E( ) = log K NFFT (Y ( ) ) = k= 0,1, K-1 (4.19) where an equally spaced subband assignment is used. The algorithm uses two OSF for the multiband quantile (MBQ) SNR estimation. A first OSF estimates the subband signal energy by means of ) = ( ) ) ) + ) ) (4.20) where ) is the p sampling quantile, = [ 2 ] and = 2. Finally, the SNR in each subband is measured by ) = ) (4.21) where ) is the noise level in the k th band that needs to be estimated. For the initialization of the algorithm, the first N frames are assumed to be nonspeech frames and the noise level in the k th band, ), is estimated as the median of the set (0, ), (1, ), 1, )}. In order to track

17 82 nonstationary noisy environments, the noise references are updated during nonspeech periods by means of a second OSF (a median filter) ) = ) + ( ) ), k=0,1,..,k-1 (4.22) where ), is the output of the median filter and =0.97 was experimentally selected. On the other hand, the sampling quantile p=0.9 is selected as a good estimation of the subband spectral envelope. subband SNR The decision rule is then formulated in terms of the average ( ) = QSNR( ) (4.23) If the SNR is greater than a threshold, the current frame is classified as speech, otherwise it is classified as nonspeech. It is assumed that the system will work at different noisy conditions and that an optimal threshold can be determined for the system working in the cleanest ( ) and noisiest conditions ( ). Thus, the threshold is adaptive to the measured fullband noise energy < = ( ) (4.24) thus enabling the VAD selecting the optimum working point for different SNR conditions. Note that, the threshold is linearly decreased as the noise level is increased between (E, )and (E, ) which represent optimum thresholds for the cleanest and noisiest conditions defined by the noise energies E and, respectively.

18 DRAWBACKS OF EXISTING ALGORITHMS The existing algorithm is based on the assumption that noise spectrum does not significantly vary within a N frame of the neighborhood of the l st frame. However, this is not true in the case of highly stationary noise. Noise estimation of the first frame is used to denoise 8 frames forward. Noise estimate is very low for the first frame. So the algorithm fails at the beginning to evaluate the noise spectrum and the detection afterwards could be totally erroneous. The existing algorithm also fails to update the threshold in low noise conditions. This will degrade the performance of VAD Proposed Algorithm The proposed algorithm does not depend on the feedback loop for noise spectrum estimation. Instead it uses a noise estimation algorithm which updates noise for every frame. This method of noise estimation is best suited for highly non stationary environments, thus increasing the robustness as discussed in Sundarrajan Rangachari et al (2004). ) ) NOISE ) ) FFT VAD REDUCTION ) ) SPECTRAL WF FREQUENCY SMOOTHING DESIGN DOMAIN FILTERING NOISE UPDATE Figure.4.6 Block diagram of proposed VAD

19 84 The noise estimate is updated by averaging the noisy speech power spectrum using a time and frequency dependent smoothing factor, which is adjusted based on signal presence probability in subbands. It improves the speech/non-speech discriminability and speech recognition performance in noisy environments. Two problems are solved using VAD. The first one is performance of VAD in low noise condition and the second is with noisy environment. The block diagram of proposed VAD is shown in Figure 2.6. The noise estimation algorithm is as follows The smoothed power spectrum of the noisy speech signal is estimated using a first-order recursive formula as ) = 1, ) + ( ) ( ) (4.25) where Y(, k) is an estimate the short time power spectrum of noisy speech and is the smoothing constant, where is the frame index and k is the frequency bin index. Since the noisy speech power spectrum in the speech absent frames is equal to the power spectrum of the noise, we can update the estimate of the noise spectrum by tracking the speech absent frames. To compute the ratio of the energy of the noisy speech power spectrum in three different frequency bands (low: 0-1kHz, middle: 1-3 khz, high: 3 khz and above) to the energy of the corresponding frequency band in the previous noise estimate. The following three ratios are computed: ( ) = ( ) ( ) (4.26) ( ) = ( ) ( ) (4.27)

20 85 ( ) = ( ) ( ) (4.28) where ) is the estimate of the noise power spectrum at frame, and Low Frequency, Medium Frequency, Fs correspond to the frequency bins of 1 khz, 3 khz and the sampling frequency respectively. The speech frame is classified as speech present or speech absent in the following manner. The incoming frame is classified as speech absent frame if the following condition is satisfied ( ) < ( ) < ( ) (4.29) where is threshold. The speech-absent frame and the noise estimate is updated according to ( ) ( 1, ) + ( ) ( ) (4.30) where is a smoothing constant. If any or all of the above three ratios are larger than the threshold, then a different algorithm is used for updating and estimating the noise spectrum. In case of speech present frames, noise updation is as follows: Frequency bins are classified as speech present or absent by tracking the local minimum of noisy speech and then speech presence in each frequency bin is decided separately using the ratio of noisy speech power to its local minimum. A different non-linear rule is used for tracking the minimum of the noisy speech by continuously averaging the past spectral values.

21 86 if ( 1, ) < ) then ( ) = ( 1, ) + ( ( ) ( 1, )) (4.31) else ( ) = ) where ( ) is the local minimum of the noisy speech power spectrum and and are constants whose values are determined experimentally. Let ( ) )/ ) denote the ratio between the energy of the noisy speech to its local minimum. This ratio is compared against a frequency-dependent threshold and if it is found to be larger than that threshold, then the corresponding frequency is considered to contain speech. Using the above ratio smoothing constant can be estimated as follows: ), the new frequency-dependent ( ) = ( ) ( ) (4.32) where, are smoothing constants (, ) and ( ) is a frequencydependent threshold given as ( ) = /2 (4.33)

22 87 s ( Finally, after computing the frequency-depending smoothing factor,k) the noise spectrum estimate is updated according to N(,k)= s (,k)n(( -1,k)+(1- s (,k) t)) Y (,k) 2 (4.34) 4.4 RESULTS AND DISCUSSIONS The proposed structure for increasing the recognition accuracy of the robust speech recognition system using VAD algorithms is shown in the Figure 4.7. The system consists of two main parts, preprocessor and ASR. The preprocessor includes Voice Activity Detector (VAD). VAD identifies the presence or absence of speech and extracts the speech from the noise corrupted speech. Input speech Noise Estimation VAD ASR Recogniton Accuracy Figure 4.7 Structure of speech recognition system Figure 4.8 shows the original clean speech signal. Figure 4.9 shows the output of the existing algorithm. Original signal corrupted by airport noise of SNR 0 db is given as input. Due to false estimation of noise spectrum the algorithm fails at the beginning of the utterance itself. So most of the noise only frames are classified as speech present frames. Figure 4.10 shows the output of the proposed algorithm. The speech frames are extracted correctly from the noisy speech signal. One hundred words were taken for speech recognition (using isolated word recognition with statistical modeling - Hidden Markov Model),

23 88 after adding various noise environments. We have analyzed input word utterance under the most commonly encountered noise environments like suburban train noise, babble, car, exhibition hall, restaurant, street, airport and train-station noise were taken from the AURORA database. In the training phase, the uttered words of 100 samples each digits 0-9, both male and female voice (age from 15-25) are recorded using 8-bit Pulse Code Modulation (PCM) with a sampling rate of 8 khz from single channel input and saved as a wave file using sound recorder software The proposed framework uses a speech processing module includes the Hidden Markov Model (HMM)-based classification and noise language modeling to achieve effective noise knowledge estimation which was discussed in chapter 2. The performance of ASR was analyzed under noisy conditions and the same was analyzed using VAD and the accuracy in percentage is shown in the Figure The Subband Order Statistics Filter (OSF) method algorithm performs better than other VAD algorithms. And the recognition accuracy of all VAD algorithms can be improved if we consider noise estimation in the non-stationary environment. This chapter presented a proposed structure of Speech Recognition Systems with Subband Order Statistics Filter (OSF) improving speech detection robustness in noisy environments. The approach is based on an effective endpoint detection algorithm employing noise reduction techniques and order statistic filters for the formulation of the decision rule. The Automatic speech recognition systems work reasonably well under clean conditions but become fragile in practical applications involving real-world environments.

24 Original signal without noise Figure 4.8 Original clean speech signal 1 Noisy input signal Figure 4.9 Output of existing VAD

25 90 1 Noisy input signal Figure 4.10 Output of proposed algorithm Table 4.1 through Table 4.8 depicts the performance of Subband Order Statistics Filter (OSF) based Voice Activity Detection of Ramirez and proposed algorithm for under various noise conditions in terms improvement of Recognition Accuracy (RA). From Tables 4.1 and 4.2 it was observed that the ASR with VAD in presence of Babble noise source performed better with 20.81% of improvement in RA compared to existing algorithm for SNR at 0 db. The speech recognition accuracy of proposed algorithm has an improvement of 13.54% in RA when compared to the algorithm proposed by the Ramirez et al (2004) in presence of various noise sources.

26 91

27 92 From Tables 4.3 and 4.4 it was found that the in presence of Exhibition noise with 5 db SNR noise level the proposed algorithm performed better with 11.71% of improvement in RA. The proposed algorithm shows an improvement in percentage of RA as 8.07% when compared to the existing algorithm (Ramirez et al). From Tables 4.5 and 4.6 it was observed that the proposed algorithm with 10 db noise level for Train noise source shows an improvement RA of 8.27%. The existing algorithm has an average RA of 80.01%, and the proposed algorithm has got an average RA of 85.18%. From Tables 4.7 and 4.8 it was inferred that in presence the Airport noise source at 15 db level for proposed algorithm performed better with 5.64% of improvement RA. The proposed algorithm was having an improvement RA of 3.67% when compared to the existing algorithm.

28 93

29 94

30 95

31 96 Table 4.9 shows the performance of ASR. The proposed method performs better with maximum improvement of 20.81% RA for Babble noise and with a minimum improvement of 2.26% RA for Street noise. The overall performance analysis of the existing VAD algorithm with proposed algorithm is shown in the Table 4.10 Table 4.9 Overall performance analysis of proposed VAD algorithm in terms of % improvement in RA Percentage Improvement Better 0dB 5dB 10dB 15dB Babble Exhibition Train Airport (20.81 %) (11.71 %) (8.27 %) (5.64 %) Least Airport (6.33 %) Airport (6.13 %) Babble (4.21 %) Street (2.26 %) Table 4.10 Overall performance analysis of VAD Algorithms VAD Method 0dB ( % Accuracy) 5dB ( % Accuracy) 10dB ( % Accuracy) 15dB ( % Accuracy) EBD ZCD WFD PBD Ramirez et al Proposed

32 97 The VAD recognition accuracy of different SNR values for the Subband OSF based VAD and Proposed method are shown in the Figure It was observed that better recognition occurred for Restaurant noise (84.225%) and least recognition for Exhibition noise (78.625%) Overall % of RA for Proposed and Existing VAD % of RA Proposed VAD Existing VAD Noise Sources Figure 4.11 Comparison of Ramirez et al and proposed VAD method for various noise environments The proposed VAD works well for non-stationary signal. In most of the speech enhancement schemes the noise signal is suppressed and speech signal is enhanced. In our proposed VAD algorithm a new noise estimation algorithm is presented along with the OSF which improves the quality as well the RA of the speech recognition system. 4.5 CONCLUSION The algorithms based solely on energy did not give an acceptable Speech Recognition Accuracy with all the test templates. The other techniques (Autocorrelation function and Zero Crossing Detection) gave better Speech Recognition Accuracy. The ZCD was used to recover some low energy phonemes that were rejected by the energy-based detector. However, it also picked up certain noise frames that matched the Zero Crossing criteria.

33 98 WFD technique performed better than ZCD in detection of weak fricatives. A pitch based detection algorithm is an algorithm designed to estimate the pitch or fundamental frequency of a quasi periodic or virtually periodic signal. The performance of PBD is different from other techniques. It produces better performance as same as the WFD. The proposed method for combining the noise estimation algorithms and VAD algorithms, so that improved speech recognition accuracy performance can be obtained under these noise conditions. This chapter, presented a proposed structure of Speech Recognition Systems with Subband Order Statistics Filter (OSF) improving speech detection robustness in noisy environments. The approach is based on an effective endpoint detection algorithm employing noise reduction techniques and order statistic filters for the formulation of the decision rule. The proposed algorithm performs better in the case of non stationary noise than the existing algorithm.

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity

More information

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 46 CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 3.1 INTRODUCTION Personal communication of today is impaired by nearly ubiquitous noise. Speech communication becomes difficult under these conditions; speech

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Voice Activity Detection for Speech Enhancement Applications

Voice Activity Detection for Speech Enhancement Applications Voice Activity Detection for Speech Enhancement Applications E. Verteletskaya, K. Sakhnov Abstract This paper describes a study of noise-robust voice activity detection (VAD) utilizing the periodicity

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

A Survey and Evaluation of Voice Activity Detection Algorithms

A Survey and Evaluation of Voice Activity Detection Algorithms A Survey and Evaluation of Voice Activity Detection Algorithms Seshashyama Sameeraj Meduri (ssme09@student.bth.se, 861003-7577) Rufus Ananth (anru09@student.bth.se, 861129-5018) Examiner: Dr. Sven Johansson

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System C.GANESH BABU 1, Dr.P..T.VANATHI 2 R.RAMACHANDRAN 3, M.SENTHIL RAJAA 3, R.VENGATESH 3 1 Research Scholar (PSGCT)

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile 8 2. LITERATURE SURVEY The available radio spectrum for the wireless radio communication is very limited hence to accommodate maximum number of users the speech is compressed. The speech compression techniques

More information

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement 1 Zeeshan Hashmi Khateeb, 2 Gopalaiah 1,2 Department of Instrumentation

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic Masking

Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic Masking The 7th International Conference on Signal Processing Applications & Technology, Boston MA, pp. 476-480, 7-10 October 1996. Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic

More information

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,

More information

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Spring 2008 Introduction Problem Formulation Possible Solutions Proposed Algorithm Experimental Results Conclusions

More information

REAL-TIME BROADBAND NOISE REDUCTION

REAL-TIME BROADBAND NOISE REDUCTION REAL-TIME BROADBAND NOISE REDUCTION Robert Hoeldrich and Markus Lorber Institute of Electronic Music Graz Jakoministrasse 3-5, A-8010 Graz, Austria email: robert.hoeldrich@mhsg.ac.at Abstract A real-time

More information

Wavelet Speech Enhancement based on the Teager Energy Operator

Wavelet Speech Enhancement based on the Teager Energy Operator Wavelet Speech Enhancement based on the Teager Energy Operator Mohammed Bahoura and Jean Rouat ERMETIS, DSA, Université du Québec à Chicoutimi, Chicoutimi, Québec, G7H 2B1, Canada. Abstract We propose

More information

Auditory modelling for speech processing in the perceptual domain

Auditory modelling for speech processing in the perceptual domain ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

Automotive three-microphone voice activity detector and noise-canceller

Automotive three-microphone voice activity detector and noise-canceller Res. Lett. Inf. Math. Sci., 005, Vol. 7, pp 47-55 47 Available online at http://iims.massey.ac.nz/research/letters/ Automotive three-microphone voice activity detector and noise-canceller Z. QI and T.J.MOIR

More information

Multiband Modulation Energy Tracking for Noisy Speech Detection Georgios Evangelopoulos, Student Member, IEEE, and Petros Maragos, Fellow, IEEE

Multiband Modulation Energy Tracking for Noisy Speech Detection Georgios Evangelopoulos, Student Member, IEEE, and Petros Maragos, Fellow, IEEE 2024 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 6, NOVEMBER 2006 Multiband Modulation Energy Tracking for Noisy Speech Detection Georgios Evangelopoulos, Student Member,

More information

Pitch Period of Speech Signals Preface, Determination and Transformation

Pitch Period of Speech Signals Preface, Determination and Transformation Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com

More information

Audio Restoration Based on DSP Tools

Audio Restoration Based on DSP Tools Audio Restoration Based on DSP Tools EECS 451 Final Project Report Nan Wu School of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI, United States wunan@umich.edu Abstract

More information

Chapter IV THEORY OF CELP CODING

Chapter IV THEORY OF CELP CODING Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,

More information

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Engineering

More information

RECENTLY, there has been an increasing interest in noisy

RECENTLY, there has been an increasing interest in noisy IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques 81 Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Noboru Hayasaka 1, Non-member ABSTRACT

More information

EE 6422 Adaptive Signal Processing

EE 6422 Adaptive Signal Processing EE 6422 Adaptive Signal Processing NANYANG TECHNOLOGICAL UNIVERSITY SINGAPORE School of Electrical & Electronic Engineering JANUARY 2009 Dr Saman S. Abeysekera School of Electrical Engineering Room: S1-B1c-87

More information

Chapter 3. Speech Enhancement and Detection Techniques: Transform Domain

Chapter 3. Speech Enhancement and Detection Techniques: Transform Domain Speech Enhancement and Detection Techniques: Transform Domain 43 This chapter describes techniques for additive noise removal which are transform domain methods and based mostly on short time Fourier transform

More information

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Project Proposal Avner Halevy Department of Mathematics University of Maryland, College Park ahalevy at math.umd.edu

More information

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 1 Electronics and Communication Department, Parul institute of engineering and technology, Vadodara,

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances

More information

Speech Signal Enhancement Techniques

Speech Signal Enhancement Techniques Speech Signal Enhancement Techniques Chouki Zegar 1, Abdelhakim Dahimene 2 1,2 Institute of Electrical and Electronic Engineering, University of Boumerdes, Algeria inelectr@yahoo.fr, dahimenehakim@yahoo.fr

More information

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

Noise Estimation and Noise Removal Techniques for Speech Recognition in Adverse Environment

Noise Estimation and Noise Removal Techniques for Speech Recognition in Adverse Environment Noise Estimation and Noise Removal Techniques for Speech Recognition in Adverse Environment Urmila Shrawankar 1,3 and Vilas Thakare 2 1 IEEE Student Member & Research Scholar, (CSE), SGB Amravati University,

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

ELT Receiver Architectures and Signal Processing Fall Mandatory homework exercises

ELT Receiver Architectures and Signal Processing Fall Mandatory homework exercises ELT-44006 Receiver Architectures and Signal Processing Fall 2014 1 Mandatory homework exercises - Individual solutions to be returned to Markku Renfors by email or in paper format. - Solutions are expected

More information

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM Mr. M. Mathivanan Associate Professor/ECE Selvam College of Technology Namakkal, Tamilnadu, India Dr. S.Chenthur

More information

Combining Voice Activity Detection Algorithms by Decision Fusion

Combining Voice Activity Detection Algorithms by Decision Fusion Combining Voice Activity Detection Algorithms by Decision Fusion Evgeny Karpov, Zaur Nasibov, Tomi Kinnunen, Pasi Fränti Speech and Image Processing Unit, University of Eastern Finland, Joensuu, Finland

More information

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying

More information

Automatic Transcription of Monophonic Audio to MIDI

Automatic Transcription of Monophonic Audio to MIDI Automatic Transcription of Monophonic Audio to MIDI Jiří Vass 1 and Hadas Ofir 2 1 Czech Technical University in Prague, Faculty of Electrical Engineering Department of Measurement vassj@fel.cvut.cz 2

More information

L19: Prosodic modification of speech

L19: Prosodic modification of speech L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

Speech Enhancement Using a Mixture-Maximum Model

Speech Enhancement Using a Mixture-Maximum Model IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 6, SEPTEMBER 2002 341 Speech Enhancement Using a Mixture-Maximum Model David Burshtein, Senior Member, IEEE, and Sharon Gannot, Member, IEEE

More information

Using RASTA in task independent TANDEM feature extraction

Using RASTA in task independent TANDEM feature extraction R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t

More information

Lecture 4 Biosignal Processing. Digital Signal Processing and Analysis in Biomedical Systems

Lecture 4 Biosignal Processing. Digital Signal Processing and Analysis in Biomedical Systems Lecture 4 Biosignal Processing Digital Signal Processing and Analysis in Biomedical Systems Contents - Preprocessing as first step of signal analysis - Biosignal acquisition - ADC - Filtration (linear,

More information

Real time noise-speech discrimination in time domain for speech recognition application

Real time noise-speech discrimination in time domain for speech recognition application University of Malaya From the SelectedWorks of Mokhtar Norrima January 4, 2011 Real time noise-speech discrimination in time domain for speech recognition application Norrima Mokhtar, University of Malaya

More information

Speech Enhancement for Nonstationary Noise Environments

Speech Enhancement for Nonstationary Noise Environments Signal & Image Processing : An International Journal (SIPIJ) Vol., No.4, December Speech Enhancement for Nonstationary Noise Environments Sandhya Hawaldar and Manasi Dixit Department of Electronics, KIT

More information

Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio

Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio >Bitzer and Rademacher (Paper Nr. 21)< 1 Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio Joerg Bitzer and Jan Rademacher Abstract One increasing problem for

More information

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments 88 International Journal of Control, Automation, and Systems, vol. 6, no. 6, pp. 88-87, December 008 Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise

More information

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Compression Using Voice Excited Linear Predictive Coding Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium

More information

QUESTION BANK SUBJECT: DIGITAL COMMUNICATION (15EC61)

QUESTION BANK SUBJECT: DIGITAL COMMUNICATION (15EC61) QUESTION BANK SUBJECT: DIGITAL COMMUNICATION (15EC61) Module 1 1. Explain Digital communication system with a neat block diagram. 2. What are the differences between digital and analog communication systems?

More information

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991 RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response

More information

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition International Conference on Advanced Computer Science and Electronics Information (ICACSEI 03) On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition Jongkuk Kim, Hernsoo Hahn Department

More information

Published in: Proceesings of the 11th International Workshop on Acoustic Echo and Noise Control

Published in: Proceesings of the 11th International Workshop on Acoustic Echo and Noise Control Aalborg Universitet Voice Activity Detection Based on the Adaptive Multi-Rate Speech Codec Parameters Giacobello, Daniele; Semmoloni, Matteo; eri, Danilo; Prati, Luca; Brofferio, Sergio Published in: Proceesings

More information

Speech Enhancement Based On Noise Reduction

Speech Enhancement Based On Noise Reduction Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion

More information

Lab 8. Signal Analysis Using Matlab Simulink

Lab 8. Signal Analysis Using Matlab Simulink E E 2 7 5 Lab June 30, 2006 Lab 8. Signal Analysis Using Matlab Simulink Introduction The Matlab Simulink software allows you to model digital signals, examine power spectra of digital signals, represent

More information

Fundamental frequency estimation of speech signals using MUSIC algorithm

Fundamental frequency estimation of speech signals using MUSIC algorithm Acoust. Sci. & Tech. 22, 4 (2) TECHNICAL REPORT Fundamental frequency estimation of speech signals using MUSIC algorithm Takahiro Murakami and Yoshihisa Ishida School of Science and Technology, Meiji University,,

More information

FIBER OPTICS. Prof. R.K. Shevgaonkar. Department of Electrical Engineering. Indian Institute of Technology, Bombay. Lecture: 22.

FIBER OPTICS. Prof. R.K. Shevgaonkar. Department of Electrical Engineering. Indian Institute of Technology, Bombay. Lecture: 22. FIBER OPTICS Prof. R.K. Shevgaonkar Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture: 22 Optical Receivers Fiber Optics, Prof. R.K. Shevgaonkar, Dept. of Electrical Engineering,

More information

Speech/Non-speech detection Rule-based method using log energy and zero crossing rate

Speech/Non-speech detection Rule-based method using log energy and zero crossing rate Digital Speech Processing- Lecture 14A Algorithms for Speech Processing Speech Processing Algorithms Speech/Non-speech detection Rule-based method using log energy and zero crossing rate Single speech

More information

Adaptive Noise Reduction Algorithm for Speech Enhancement

Adaptive Noise Reduction Algorithm for Speech Enhancement Adaptive Noise Reduction Algorithm for Speech Enhancement M. Kalamani, S. Valarmathy, M. Krishnamoorthi Abstract In this paper, Least Mean Square (LMS) adaptive noise reduction algorithm is proposed to

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

Kalman Tracking and Bayesian Detection for Radar RFI Blanking

Kalman Tracking and Bayesian Detection for Radar RFI Blanking Kalman Tracking and Bayesian Detection for Radar RFI Blanking Weizhen Dong, Brian D. Jeffs Department of Electrical and Computer Engineering Brigham Young University J. Richard Fisher National Radio Astronomy

More information

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Vol., No. 6, 0 Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Zhixin Chen ILX Lightwave Corporation Bozeman, Montana, USA chen.zhixin.mt@gmail.com Abstract This paper

More information

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003 CG40 Advanced Dr Stuart Lawson Room A330 Tel: 23780 e-mail: ssl@eng.warwick.ac.uk 03 January 2003 Lecture : Overview INTRODUCTION What is a signal? An information-bearing quantity. Examples of -D and 2-D

More information

Voiced/nonvoiced detection based on robustness of voiced epochs

Voiced/nonvoiced detection based on robustness of voiced epochs Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies

More information

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department

More information

Enhancement of Speech in Noisy Conditions

Enhancement of Speech in Noisy Conditions Enhancement of Speech in Noisy Conditions Anuprita P Pawar 1, Asst.Prof.Kirtimalini.B.Choudhari 2 PG Student, Dept. of Electronics and Telecommunication, AISSMS C.O.E., Pune University, India 1 Assistant

More information

ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1

ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1 ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN SPEECH SIGNALS Zied Mnasri 1, Hamid Amiri 1 1 Electrical engineering dept, National School of Engineering in Tunis, University Tunis El

More information

A Computational Efficient Method for Assuring Full Duplex Feeling in Hands-free Communication

A Computational Efficient Method for Assuring Full Duplex Feeling in Hands-free Communication A Computational Efficient Method for Assuring Full Duplex Feeling in Hands-free Communication FREDRIC LINDSTRÖM 1, MATTIAS DAHL, INGVAR CLAESSON Department of Signal Processing Blekinge Institute of Technology

More information

Denoising Of Speech Signal By Classification Into Voiced, Unvoiced And Silence Region

Denoising Of Speech Signal By Classification Into Voiced, Unvoiced And Silence Region IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 11, Issue 1, Ver. III (Jan. - Feb.216), PP 26-35 www.iosrjournals.org Denoising Of Speech

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA

Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA ECE-492/3 Senior Design Project Spring 2015 Electrical and Computer Engineering Department Volgenau

More information

Lab/Project Error Control Coding using LDPC Codes and HARQ

Lab/Project Error Control Coding using LDPC Codes and HARQ Linköping University Campus Norrköping Department of Science and Technology Erik Bergfeldt TNE066 Telecommunications Lab/Project Error Control Coding using LDPC Codes and HARQ Error control coding is an

More information

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Takahiro FUKUMORI ; Makoto HAYAKAWA ; Masato NAKAYAMA 2 ; Takanobu NISHIURA 2 ; Yoichi YAMASHITA 2 Graduate

More information

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar Biomedical Signals Signals and Images in Medicine Dr Nabeel Anwar Noise Removal: Time Domain Techniques 1. Synchronized Averaging (covered in lecture 1) 2. Moving Average Filters (today s topic) 3. Derivative

More information

OFDM Transmission Corrupted by Impulsive Noise

OFDM Transmission Corrupted by Impulsive Noise OFDM Transmission Corrupted by Impulsive Noise Jiirgen Haring, Han Vinck University of Essen Institute for Experimental Mathematics Ellernstr. 29 45326 Essen, Germany,. e-mail: haering@exp-math.uni-essen.de

More information

Modified Kalman Filter-based Approach in Comparison with Traditional Speech Enhancement Algorithms from Adverse Noisy Environments

Modified Kalman Filter-based Approach in Comparison with Traditional Speech Enhancement Algorithms from Adverse Noisy Environments Modified Kalman Filter-based Approach in Comparison with Traditional Speech Enhancement Algorithms from Adverse Noisy Environments G. Ramesh Babu 1 Department of E.C.E, Sri Sivani College of Engg., Chilakapalem,

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

CHAPTER 3 ADAPTIVE MODULATION TECHNIQUE WITH CFO CORRECTION FOR OFDM SYSTEMS

CHAPTER 3 ADAPTIVE MODULATION TECHNIQUE WITH CFO CORRECTION FOR OFDM SYSTEMS 44 CHAPTER 3 ADAPTIVE MODULATION TECHNIQUE WITH CFO CORRECTION FOR OFDM SYSTEMS 3.1 INTRODUCTION A unique feature of the OFDM communication scheme is that, due to the IFFT at the transmitter and the FFT

More information

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding. Volume 5, Issue 2, February 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Speech Enhancement

More information

An Improved Voice Activity Detection Based on Deep Belief Networks

An Improved Voice Activity Detection Based on Deep Belief Networks e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 676-683 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K.

More information