Dual-Microphone Voice Activity Detection Technique Based on Two-Step Power Level Difference Ratio

Size: px
Start display at page:

Download "Dual-Microphone Voice Activity Detection Technique Based on Two-Step Power Level Difference Ratio"

Transcription

1 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 6, JUNE Dual-Microphone Voice Activity Detection Technique Based on Two-Step Power Level Difference Ratio Jae-Hun Choi and Joon-Hyuk Chang, Senior Member, IEEE Abstract In this paper, we propose a novel dual-microphone voice activity detection (VAD) technique based on the two-step power level difference (PLD) ratio. This technique basically exploits the PLD between the primary microphone and the secondary microphone in a mobile device when the distance between the microphones and the sound source is relatively short. Based on the PLD, we propose the use of the PLD ratio (PLDR) instead of the original PLD to take advantage of the relative difference between the PLD of speech and the PLD of noise. Indeed, the PLDR is obtained by estimating the ratio of the PLD between the input signals and the PLD between the two channel noises during periods without speech. The proposed technique offers a two-step algorithm using the PLDRs including long-term PLDR (LT-PLDR), which characterizes long-term evolution and short-term PLDR (ST-PLDR), which characterizes short-time variation during the first step. LT-PLDR-based and ST-PLDR-based VAD decision are performed using the maximum a posteriori (MAP) probability derived from the model-trust algorithm and combined at the second step to reach a superior VAD decision for both long-term and short-term situations. Extensive experimental results show that the proposed dual-microphone VAD technique outperforms the conventional two-channel VAD method as well as most standardized VAD algorithms. Index Terms Dual-microphone, power level difference ratio, two-step, voice activity detection. I. INTRODUCTION VOICE activity detection (VAD) has become an essential component of speech enhancement and speech recognition systems. Many approaches have focused on single microphone-based algorithms using linear predictive coding (LPC) parameters [1], energy levels, formant shape [2], the zero-crossing rate (ZCR) [3], the cepstral feature [4], periodicity measures [5], the spectral difference [6], and a statistical model-based approach [7]. Among these approaches, statistical model-based methods have been widely used due to their impressive performance as well as their efficient implementation [7]. Specifically, the statistical distributions of both Manuscript received June 21, 2013; revised November 12, 2013; accepted March 17, Date of publication April 22, 2014; date of current version May 09, This work was supported by an NRF grant funded by the Korean Government (MEST) ( 2012R1A2A2A ). The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Wai-Yip Geoffrey Chan. The authors are with the School of Electronic Engineering, Hanyang University, Seoul , Korea ( jchang@hanyang.ac.kr). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TASLP noisy speech and noise are assumed to follow parametric models, including Gaussian [7], [8], Laplacian [9], generalized Gaussian [9], and generalized gamma [10] as a candidate for a better model of the distribution of speech and noise. Based on the assumed statistical model, the likelihood ratio test (LRT) is established based on a set of hypotheses. This algorithm has been further improved by incorporating a conditional maximum aposteriori(map) estimator, which is conditioned not only on the data of the current frame, but also the voice activity decision of the previous frame [11]. One of the key issues in the VAD problem is the performance of noise power estimation. Many previous studies have investigated noise power estimation. One simple method is to average the noisy signal over non-speech areas. For example, Kim and Chang [12] incorporated a soft decision scheme into power spectrum estimation. However, soft-decision-based noise power estimation has difficulties in estimating background noise with non-stationary characteristics. A more recent noise estimation technique is minimum statistics (MS), which obtains a noise estimate from the minimum values of a smoothed power estimate of a noisy signal within a finite window [13]. The MS scheme is impaired by sudden rising and falling noise contours that are a result of picking the minimal value within a sliding window. Also, the minima controlled recursive averaging (MCRA) approach is known to be a successful noise power estimation technique due to its robustness to the type and intensity of ambient noise [14]. This approach estimates noise by recursively averaging past spectral power values, which are adjusted by the speech presence probability in subbands. In addition, the relevant noise estimation techniques based on Monte-Carlo method [15], linear dynamical system method [16], particle filtering with switching dynamical system [17], and switching Kalman filtering [18] have been reported to handle the time-varying noises. But, the performance improvement is restricted due to the usage of the single microphone. The use of multiple microphones is beneficial for VAD since it provides relevant spatial characteristics, while single-channel VAD cannot precisely discriminate the target noise from the noisy speech under the highly nonstationary condition. For example, a beamforming technique can be considered relevant since it incorporates both spatial and spectral information efficiently [19] [27]. However, the use of a microphone array requires aprioriknowledge of the direction of arrival (DOA) through several microphones, which is not realistic in mobile device systems such as smart phones. Also, the configuration of two microphones is preferred in the contemporary smart phones IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.

2 1070 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 6, JUNE 2014 because of the realization issues in terms of size, power consumption, and computational complexity while the most existing methods focus on more than four microphones [23] [27]. Accordingly, we consider a dual-microphone system based on a trade-off between complexity and performance. One of the two microphone VAD techniques is based on a decision measure referred to DOA homogeneity, which uses the entropy of the DOA estimates [27]. But, this method is sensitive to the estimation error of DOA. Several dual microphone systems avoiding the problem of DOA estimation have been widely used in speech enhancement as well as VAD [28] [34]. In particular, cross power spectral density (CPSD) has been used for exploiting spatial benefits, since noise components can be considered to be mutually uncorrelated [31], [32]. This algorithm works well if the distance between adjacent microphones is not too short, because the main idea behind this algorithm is that the speech signals of the two channels are significantly correlated, whereas noises are uncorrelated. However, the cross-correlation based method has a drawback in terms of noise estimation since it cannot accurately estimate abruptly changing noises due to the large smoothing coefficients required to compute the power spectral density (PSD) of the cross-correlation term. This observation holds for the two microphone VAD method of the magnitude squared coherence (MSC) [30]. In addition, the methods proposed in [33] [36] utilize the difference in the power of the signals received at the two microphones. These techniques rely on the fact that the speech signals emitted from the source (i.e., mouth) have different power levels at the microphones, while the power levels of the noise signals are almost identical. This assumption is valid if the distances between the source and two microphones are distinct. It should be noted that mobile devices such as contemporary smart phone systems have similar structures. Indeed, the algorithm based on the power level difference (PLD) is useful with highly non-stationary noises. However, the performance of the technique is sensitive to the noise level and noise types if the PLD is used in the criterion of the gain for speech enhancement as in [33]. In particular, Jeub et al. [35] derived the normalized difference of the PSD of the noisy speech to update the noise PSD. However, the assumption that the normalized difference of the PSD will be close to zero as the input power level is almost identical is not valid in practice due to many factors such as reverberation, microphones mismatch, and azimuth angles from the noise source at each microphone. This paper proposes a novel two-microphone VAD technique based on a two-step PLD ratio (denoted by PLDR). Based on the PLD derived between the primary microphone and the secondary microphone in each frame, we offer two kinds of PLDR that can efficiently characterize the evolution of speech over short-term and long-term time frames. Indeed, we consider the PLDR instead of the PLD to achieve robust performance because a relative comparison between the PLD of speech and the PLD of noise estimated during noise periods which is not sensitive to the noise level or type is performed. For this reason, we first propose a long-term PLDR (LT-PLDR) using a large long-term smoothing parameter for calculating the PLDR. While our approach is based on the PLD proposed by Yousefian et al. [33], [34], we offer the PLDR that can produce robust and superior performance under various noise environments. Specifically, the PLDR is definedbytheratioofthepldof the input signals and the PLD of noise estimated during speech inactivity. In order to compute the PLD of noise efficiently, we apply the minima controlled recursive averaging (MCRA) approach [14] to the estimation for the PLD of noise. Based on thelt-pldr,wedeviseanefficient framework to derive the a posteriori probability based on a parametric way employing the model-trust minimization algorithm in classifying the speech presence or absence regions. With the aposterioriprobability corresponding to the LT-PLDR, the interim VAD decision at the first step is performed by choosing the hypothesis with the maximum probability according to the maximum a posteriori (MAP) criterion, which provides a rough VAD to minimize cases of missing speech. On the other hand, a short-term PLDR (denoted by ST-PLDR) using a low smoothing parameter is derived from each frame, which establishes an appropriate parameter for detecting short non-speech intervals while having a high false-alarm rate. In addition, the PLD of noise for calculating the ST-PLDR is estimated by utilizing the speech presence probability derived at the first step, which eventually allows for the PLD of noise in estimating the ST-PLDR to be updated quickly. In a similar manner with the probability derived from the LT-PLDR at the first step, the probability for the ST-PLDR is obtained and the VAD decision is separately performed according to the MAP criterion. At the second step, we construct the final VAD decision rule, in which the interim VAD result determined by the LT-PLDR is modified by the VAD result of the ST-PLDR only when the VAD result provided by the LT-PLDR is speech presence. This creates a robust way to track speech evolution while keeping the missing error rate below an acceptable level and minimizing the false-alarm rate error below a tolerable level. Extensive objective evaluation of the proposed VAD technique is performed under various acoustic conditions in terms of noises, azimuths, and distances between the source and the microphones. We show that the proposed VAD parameter derived from the two-microphone PLDR is superior, particularly for non-stationary noises, and is robust with respect to the input SNRs and various acoustical circumstances. This paper is organized as follows. In the next section, we review the PLD, which is a basic parameter in our framework. In Section III, we describe the design of the VAD algorithm based on the proposed two-step technique. Extensive evaluation of the proposed algorithm is discussed in Section IV and conclusions are presented in Section V. II. REVIEW OF PLD In this section, we begin with a theoretical description of the PLD function and review the notion of the function for the VAD task. For this, we first need to define the acoustic experimental environment in brief. Two microphones are installed in a smart-phone mock-up on a dummy head, as shown in Fig. 1. Since we chose the smart phone as the main platform in this research, the distance between the primary microphone (close to the speaker s mouth) and the secondary microphone (distant to the speaker s mouth) was set to 12 cm. To simulate mobile environments, various distances and azimuths between the dummy

3 CHOI AND CHANG: DUAL-MICROPHONE VAD TECHNIQUE BASED ON TWO-STEP PLDR 1071 head and the noise source are considered. Detailed specification will be given in Section IV. Based on these conditions, we first define the noisy signal received at the microphones by, where denotes the microphone index and is the sample index such that (1) where is the convolution operator, is the main source signal, is the impulse response associated with the microphones, is the noise-free reverberant speech, and is the noise component at the each microphone, respectively [33], [34]. The above equations could be changed frame-by-frame into a frequency domain by taking the discrete Fourier transform (DFT) which length is bigger than the frame size as follows: where is the frequency-bin index of and is the frame index. Letting denote the Fourier transform of the acoustic transfer function between the primary microphone and secondary microphone and is equal to, the above Fourier transform of signal data model can be written as (2) (3) (4) Assuming that the speech and noise are independent, the signal power of each microphone is given by (5) (6) where,,,, and denote the power spectrum of,,,, and respectively. Since the distance between the primary and secondary microphone is distinct but short in a near field, as shown in Fig. 1, the signal power received at the primary microphone close to the mouth shows a stronger signal compared to the signal power of the secondary microphone, while the level of the noise signal at each microphone is almost identical [33], [34]. Based on this, we firstly define the difference of the signal power between the primary microphone and the secondary microphone as so that is derived for the primary and secondary microphone such that where.if can be neglected due to the assumption of a diffuse noise field [33], (6) after taking the absolute operation results in following: (7) (8) Fig. 1. Our acoustical architecture using smart phone with the dual-microphones located at the dummy head. Note that is directly used in the gain computation of the Wiener filter-based noise suppressing algorithm [33], [34] such that (9) where is an over-subtraction parameter and used in controlling the level of noise reduction and can be estimated by using the cross power spectral density (CPSD) of the input and noise signals in the two channels in [33]. The above conventional technique proposed by Yousefian et al. [33], [34] was used to directly apply the PLD value to calculate the spectral weighting gain for speech enhancement based on the assumption that is negligible for diffuse noise. However, in practice, the levels of the noise signals at the primary microphone and the secondary microphone cannot be identical as assumed in the real situation illustrated in Fig. 1. Thus, this assumption cannot be directly used in the VAD task which we focus on. As an analogous example, the premise of the coherence-based technique is that the noise signals in the two channels are uncorrelated, which is not valid in reality. In this regard, works similar to those in [31] have suggested modifying to the coherence filter to address this problem. III. PROPOSED DUAL-MICROPHONE VOICE ACTIVITY DETECTION BASED ON TWO-STEP PLD SCHEME A. Basic idea of PLD As stated in the previous section, the premise we choose in this paper is that the PLD has a larger value during speech activity than the PLD during speech inactivity regardless of the noise type; diffuse noise or coherence noise. For handling this premise in our algorithm, we first define the PLDR of the observed PLD and the PLD estimated during noise periods. For clear understanding, Fig. 2 is drawn to show the overall blockdiagram of the proposed two-step PLDR-based algorithm. Assuming that a noise is added to a noise-free speech signal

4 1072 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 6, JUNE 2014 Fig. 2. Overall block diagram of the proposed two-step PLDR-based technique., the representation of the frequency domain of the noisy signal of (1) is rewritten as: the observed PLD, we adopt the recursive averaging technique given by (10) where,, and are the DFT coefficients at the th frequency bin for the th frame of noisy speech, noise, and clean speech, respectively. Given two binary hypotheses, and, which indicate speech presence and absence, it is derived that (11) (12) By taking into account the independence assumption of speech and noise, we obtain the power spectral density (PSD) for the primary and secondary microphone as follows: (13) (14) We first derive the PLD between two microphones by taking the absolute operator for ensuring robust performances in actual situations such that (15) where the PLD becomes almost zero value if powers of two input signal are almost identical. B. LT-PLDR We then derive the PLDR between the observed PLD of the current frame and the PLD estimated at the noise regions. For (16) where 1 is a smoothing parameter, which thus characterizes efficiently the long-term evolution of the speech signal and is not sensitive to the type and intensity of environmental noise as will be explained in Section IV. Accordingly, is called the long-term PLD (LT-PLD) since it extends over a relatively long time periods. Once the LT-PLD is obtained, the LT-PLDR is computed as shown in Test Phase of Fig. 2 according to (17) where implies the ratio of and. Here, is the PLD of the noise estimated during the noise only periods. The estimation of is then performed using the MCRA approach known as the simple but computationally efficient noise power estimation technique used in the speech enhancement field [14]. In a similar manner as in the MCRA technique for estimating the variance of noise, the estimate of is given as follows: (18) 1 If we choose 0.9 for, more than latest 20 frames can dominantly affect the value of

5 CHOI AND CHANG: DUAL-MICROPHONE VAD TECHNIQUE BASED ON TWO-STEP PLDR 1073 where means the PLD definedin(15)and is a time-varying parameter. The time-varying smoothing parameter is adjusted by the speech presence probability (SPP) and is estimated by (19) where is a constant value, which is set to 0.95 based on [14]. In the conventional MCRA approach, the SPP in each subband is determined by comparing the ratio between the local energy of noisy speech and the minimum value within a finite window length with a probability threshold. Based on [14], the SPP at the each frequency bin is calculated by (20) where is the smoothing parameter in order to consider the strong correlation of speech presence over successive frames and represents the indicator function of the speech presence or speech absence at the each subband on the current frame. In order to compute, is calculated by (21) where is the local minimum of the observed input PLD and is picked within the finite window of the consecutive frames as proposed in the minimum-search procedure of [14]. Based on this, indication function to classify rough speech presence or absence regions is determined such that if speech presence if speech absence (22) where is the threshold. Subsequently, at the current frame is computed as the mean of the individual over each subband [7], [9] as shown in of Fig. 2 which constructs the interim VAD on the th frame at the firststepbythe following: (23) Using on each frame, we employ a parametric way to derive the probability of the VAD decision at the first step, which corresponds to in Fig. 2. For this, we derive the a posteriori probability as a probabilistic output for the LT-PLDR in each frame using the sigmoid fitting approach as proposed in the former studies [37], [38]. Note that is the hypothesis based on which indicates the speech presence or absence of the first step. Specifically, the LT-PLDR in each frame is transformed into the probability through the logistic regression model using a slope parameter andanoffset as follows: (24) where is the LT-PLDR in the th frame as explained in (23). For the computation of the reliable probability, the principal parameters and are given by using the discriminative training in separate a way to minimize the negative log likelihood of the data as shown in of Fig. 2, which is the cross-entropy error function obtained by (25) where as the target probability for the class ( and ) is given by manual labeling of every frame in the training process. Indeed, isassumed1ifthe th frame is speech presence and is assumed 0, otherwise. Based on this, we adopt the model-trust minimization algorithm in estimation of the parameters as proposed in [38] since the parameters and are estimated in terms of minimization problem as in (25) and are chosen to minimize a bound on the test misclassification rate, which can produce sparse kernel machine [39]. Based on obtained using the model-trust minimization algorithm, the interim decision rule for the VAD at the first step (denoted by ) could be represented as shownintheblockas of Fig. 2 using the maximum a posteriori (MAP) criterion as given by: (26) where is an experimentally chosen constant. As an example, and the corresponding probability are shown in Fig. 3 along with the manual labeling of each frame. As can be seen in Fig. 3, offers a robust VAD performance under non-stationary noise conditions especially for minimizing missing speech. However, it can be seen that tends to falsely detect the short pause regions between words and syllables as speech since uses a high smoothing parameter as designated in (16). C. ST-PLDR While the utilization of the large smoothing parameter in order to reduce the fluctuation in estimating the signal power provides low false alarm rate, keeping the high smoothing parameter eventually results in large false alarm rate in the short pause regions. Due to the drawbacks of for short pauses, we propose a technique to derive an ST-PLDR using a low smoothing parameter which is likely to be better for characterizing short-time variations in speech. As in the former case of, the short-time smoothed PLD between the two microphones is derived utilizing a low smoothing parameter such that (27) where the short-term PLD (ST-PLD) is relatively adequate for a short duration of time. In the following, the ST-PLDR is derived using the ST-PLD at each frequency bin such that (28)

6 1074 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 6, JUNE 2014 Fig. 3. An example of derived at the babble noisy signal (approx. 10 db SNR). Noise is located in front of the dummy head with a distance of 5 m. where denotes the estimated PLD of noise, which is estimated during noise only periods. The main objective behind the VAD decision based on the ST-PLDR is to reduce the false alarm rate in the short-pause regions and therefore is to modify the VAD decision result derived from the LT-PLDR. To allow the ST-PLDR to be suited for short-time variation of speech with the dynamic attributes, is separately estimated as follows: each frequency bin as in (28), the ST-PLDR for the VAD on the th frame is averaged by the following: (32) Insimilarmannerproposedin(24),theST-PLDR on the th frame is converted into the aposteriorispp of the hypothesis using the logistic regression model as shown in the block of of Fig. 2 as follows: with Here, is calculated by (29) (30) is the time-varying smoothing parameter and (33) where the principal parameters and for fitting into probability values between 0 and 1 are similarly estimated as in (25). Based on, the VAD decision with the decision threshold experimentally chosen as 1.5 for the short pause regions at the first step can be expressed by (34) (31) where is the SPP computed in (24). Note that since shows good performance in detecting speech presence regions, we use in calculating in (29). Specifically, in order to avoid the speech component of the observed PLD in updating the noise PLD in (30), is multiplied to. Compared to the constant value to compute the time-vary smoothing parameter in updating, is chosen as 0.3. This is because rapidly varies for quick update of. With the ST-PLDR derived in Summarizing the above procedures, and its corresponding probability have a significant advantage for detecting short-pauses between words due to adoption of and, as illustrated in Fig. 4. However, it can be seen that tends to fluctuate highly during long pauses in speech, which results in more false-alarm cases. D. Second step for VAD decision Based on these observations of the proposed two parameters, we perform the final VAD decision using both the LT-PLDR and ST-PLDR at the second step. Indeed, while minimizing cases of missing speech by avoiding miss-classifications of speech as noise, we attempt to reduce false classification of

7 CHOI AND CHANG: DUAL-MICROPHONE VAD TECHNIQUE BASED ON TWO-STEP PLDR 1075 Fig. 4. An example of derived at the babble noisy signal (approx. 10 db SNR). Noise is located in front of the dummy head with a distance of 5 m. noise as speech with the help of.todothis,wepresent a technique to combine the VAD decisions derived using the two PLDRs in the th frame to form the final decision at the second step as shown in of Fig. 2: if if (35) where at the second step is changed as in the case of at the first step and is same as if. An example of the result of the proposed method in conjunction with the manual label and speech waveform is shown in Fig. 5. This result indicates that the proposed VAD technique avoids the aforementioned problems and works well in adverse noise environments. As shown in Fig. 2, the final decision of VAD is performed during the second step by combining the interim VAD decision results of and derived at the first step for further performance improvement. IV. EXPERIMENTS This section describes the performance evaluation of the proposed two-step PLDR technique. In order to assess the proposed method, we carried out acoustic experiments at different distances and azimuth angles between the speech source and the noise source under various noise environments. For an objective comparison, the proposed algorithm was compared with a number of standardized VAD algorithms, including the European Telecommunications Standards Institute adaptive multirate (ETSI AMR) VAD option 2 [40] as well as the conventional two-microphone VAD techniques based on the MSC [30] and the dual-channel normalized PLD [35]. In addition, the wellknown, state-of-the-art multiple-statistical model-based VAD (MSM) was included in the performance comparison [9]. Fig. 5. Proposed result of at the babble noisy signal (approx. 10 db SNR). Noise is located in front of the dummy head with a distance of 5 m. (a) Waveform of the primary microphone signal (b) Waveform of the secondary microphone signal (c) Corresponding manual VAD (0: noise, 1: speech) (d) VAD result of the proposed two-step PLDR. A. Experimental Setup For the objective evaluation, we investigated the speech hit rates and non-speech hit rates of each algorithm for both speech and non-speech, where and are defined as the ratio of correct speech and non-speech decisions to the hand-labeling speech frames and non-speech frames, respectively. In order to simulate various noise environments and practical noisy conditions, noisy signals were recorded at two microphones in an office with a room size of m.the distance between the primary microphone and the secondary microphone was set to 12 cm, which follows up the configuration of a contemporary smart phone with two microphones. For the test, noisy sentences were recorded at various distances

8 1076 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 6, JUNE 2014 Fig. 6. The geographical placement of the sound source about the dummy head. of1m,3m,5m,and7mandatazimuthanglesof,, and between the speech source at the dummy head and the noise source. For easy comprehension, the overall geographical placement of the sound and noise sources is illustrated in Fig. 6. To simulate noisy conditions, five different noise sources such as babble, destroyer engine, factory, HF-channel, and white from the NOISEX-92 database [41] as well as office generated from actual recording were used. Among them, babble and office can be categorized into non-stationary noise [42]. The total samples were composed of 846 s long speech data originally recorded by four males and four females sampled at 8 khz for the application of narrow-band speech communication scenario. In order to evaluate and for speech and non-speech, we made reference decisions on the clean speech signal by labeling it manually at every 10 ms. The proportion of the hand-marked active speech frame was 58.2%, which was consisted of 44.5% voiced sounds and 13.4% unvoiced sounds. Note that settings on implementation of the algorithm such as sampling frequency and framesize can be easily changed since the proposed VAD decision rule is similar to the literature with the DFT baseline [7], [9]. For real-time processing, the windowed signal by a trapezoidal window of length 13 ms was transformed to 128 fast Fourier transform (FFT) coefficients after zero-padding. Thus, the frame size was 10 ms and the frame shift was 3 ms, respectively. The window length for the local minima search was set to 1.5 s. For evaluating the training phase through the model-trust algorithm and validating the model, we used 10-fold cross-validation at which total data was partitioned 10 equally sized segments [43]. Using these segment sets, 10-fold cross-validation was performed in the noise data we did not use during the training. Based on the results from 10-fold cross-validation experiment, the parameters and for the probability derived from the LT-PLDR were estimated in the discriminative training phase employing the model-trust Fig. 7. Comparison of the VAD performance under babble noise environment with approx. 11 db SNR. Noise is located in front of the dummy head with a distance of 5 m. (a) Waveform of the primary microphone signal (b) Corresponding manual VAD (0: noise, 1: speech) (c) VAD result of AMR option2 (d) VAD result of the MSM-based VAD (e) VAD result of two-channel MSC (f) VAD result of the proposed two-step PLDR. minimization algorithm. Also, the coefficients and in order to transform the ST-PLDR into the probability were obtained in a similar training manner. Note that the mean values of,,,and were,,,and, respectively. Also, standard deviation values were,,,and for,,,and, which looks very small and thus implies the robust applicability of our algorithm. B. Experimental Results Next, we evaluated the performance of the proposed approach compared to the aforementioned well-known VAD techniques. For convenience, we showed the resulting average accuracy after 10 cross-validation. First, in order to take advantage of the two-step approach, we examined the detection performance compared to option 2 of the AMR VAD, the multiple-statistical model-based VAD, the dual-microphone MSC-based VAD algorithm, and the dual-channel PLD-based algorithm. Figs. 7 and 8 illustrate the detection results for the babble and office noises, respectively, where the noise source was located front with 5 m from the dummy head (corresponding to SNR db). From the figures, it can be seen that the proposed two-step PLDRbased method has superior performance during both long noise periods and short pause periods, while the two conventional methods are inferior in detecting speech. In particular, the proposed method shows outstanding detection capability in terms of both onset and offset regions, which are known to be difficult to detect well, especially in non-stationary noise conditions. To determine the detection accuracy in terms of the speech and non-speech in a situation in which the noise source was placed at, we conducted the VAD experiment with six noise

9 CHOI AND CHANG: DUAL-MICROPHONE VAD TECHNIQUE BASED ON TWO-STEP PLDR 1077 TABLE I COMPARISON OF VOICE ACTIVITY IN TERMS OF SPEECH HIT RATES AND NON-SPEECH HIT RATES AMONG THE METHODS OF CONVENTIONAL MSM-BASED VAD [9], AMR VAD OPTION2 [40],TWO-CHANNEL MSC [30], DUAL-CHANNEL PLD [35], AND PROPOSED TWO-STEP PLDR (LOCATION OF NOISE SOURCE ABOUT THE DUMMY HEAD: ). NOTE THAT WINNING RESULTS IN TERMS OF AND ARE HIGHLIGHTED Fig. 8. Comparison of the VAD performance under office noise environment with approx. 10 db SNR. Noise is located in front of the dummy head with a distance of 5 m. (a) Waveform of the primary microphone signal (b) Corresponding manual VAD (0: noise, 1: speech) (c) VAD result of AMR option2 (d) VAD result of the MSM-based VAD (e) VAD result of two-channel MSC (f) VAD result of the proposed two-step PLDR. types and at different distances as listed in the previous subsection, and the results are summarized in Table I. We confirmed that the proposed two-step PLDR approach is superior to the conventional single- and two-microphone-based VAD techniques for all tested conditions. Note that the proposed method shows outstanding improvement in performance in terms of the probability of the detection for speech, especially for nonstationary noises such as babble and office noises. In particular, it is evident that the proposed algorithm outperformed the conventional dual-microphone MSC-based VAD technique [30], which implies that the proposed two-step PLDR technique is likely to address the issue raised by the MSC-based VAD as stated in Introduction Section. However, since many multiple microphone-based algorithms such as beamforming require the aprioriknowledge of the directivity of the noise source, which is difficult to estimate and often not possible in practice, it is difficult to ensure robust performance. Therefore, robust performance in dealing with a variety of noise signals must be consistently demonstrated. To test the robust performance of the proposed algorithm in terms of the directivity of the noise source, we carried out experiments by varying the azimuth angles between the speech source and the noise source. We examined the detection performance of the proposed method at an azimuth angle of and the results are given in Table II. The proposed method outperformed all other algorithms in six noise conditions. It is evident from the results that the proposed two-step technique is effective in enhancing detection performance, no matter where noise is located. As this tendency was observed at the azimuth, it can be seen that the proposed technique exhibits robust performance even when there is a difference in power level between two microphones and the assumption of a diffused noise field is not met and the set parameters are not sensitive to the direction of noise. What remains is to test the performance at the azimuth, the noise source was placed in a position opposite that used in the case and located back with 5 m from the dummy head (SNR 6 db). Representative results for babble and office noises are plotted in Figs. 9 and 10, respectively, and it can be seen that two-step PLDR technique in detecting both noise-only periods and short pause regions in nonstationary noise conditions is superior. The performances in terms of the probability of the detection for the speech and non-speech for various azimuth angles are given in Table III, which shows the outstanding performance. In addition, from Table IV containing the summary of performances over entire environments, the overall results demonstrate that the proposed algorithm is superior to conventional VAD methods in almost all conditions. This observation confirms that the proposed two-step PLDR-based VAD technique is not sensitive to the location of noise sources. On the other hand, the sensitivity of and can be important factor in the performance of the proposed method. We experimentally chose 0.9 and 0.3 for ensuring robust performance over diverse acoustic environments, which turns out the best performance regardless of the type and intensity of noise. C. Application to Speech Enhancement As can be seen through the various experimental results in the previous subsection, the proposed two-step PLDR-based VAD approach showed the robust performance under highly varying noise environments. In order to demonstrate the effectiveness of

10 1078 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 6, JUNE 2014 TABLE II COMPARISON OF VOICE ACTIVITY IN TERMS OF SPEECH HIT RATES AND NON-SPEECH HIT RATES AMONG THE METHODS OF CONVENTIONAL MSM-BASED VAD[9],AMRVADOPTION2 [40], TWO-CHANNEL MSC [30], DUAL-CHANNEL PLD [35], AND PROPOSED TWO-STEP PLDR (LOCATION OF NOISE SOURCE ABOUT THE DUMMY HEAD: ). NOTE THAT WINNING RESULTS IN TERMS OF AND ARE HIGHLIGHTED Fig. 9. Comparison of the VAD performance under babble noise environment withapprox.6dbsnr.noiseislocatedat about the dummy head with a distance of 5 m. (a) Waveform of the primary microphone signal (b) Corresponding manual VAD (0: noise, 1: speech) (c) VAD result of AMR option2 (d) VAD result of the MSM-based VAD (e) VAD result of two-channel MSC (f) VAD result of the proposed two-step PLDR. the proposed method, we investigated the overall speech quality when the proposed two-step PLDR-based VAD technique is incorporated in estimating the noise power in the speech enhancement system. As a target platform for speech enhancement system, we employed the state-of-the-art speech enhancement algorithm based on MCRA noise estimation incorporating second-order conditional maximum a posteriori (CMAP) criterion [11]. For verifying the performance by taking advantage of the proposed two-step PLDR-based VAD in updating the noise power at the conventional second-order CMAP-based algorithm, we used the composite measures proposed by Hu and Loizou [44] for the objective evaluation of speech quality. The composite measure, which is known to show the significantly high correlation with the mean opinion score (MOS) of subjective speech quality perceived by the listeners, is defined as a combination of representative objective evaluation measures as following: (36) Fig. 10. Comparison of the VAD performance under office noise environment with approx. 6 db SNR. Noise is located at about the dummy head with adistanceof5m.(a) Waveform of the primary microphone signal (b) Corresponding manual VAD (0: noise, 1: speech) (c) VAD result of AMR option2 (d) VAD result of the MSM-based VAD (e) VAD result of two-channel MSC (f) VAD result of the proposed two-step PLDR. where,,and mean the values obtained by the perceptual evaluation of speech quality (PESQ), the log-likelihood ratio (LLR), the weighted-slope spectral distance (WSS), respectively. Table V summarizes the averaged results of speech quality in terms of PESQ, LLR, WSS, and under various noise conditions and azimuth angles. As can be seen in Table V, we can confirm that the proposed two-step PLDR-based VAD consistently improves the performance of speech enhancement in terms of the speech quality by incorporating the proposed VAD approach in updating the noise power estimation within the conventional speech enhancement baseline. In order to verify the improvement of the performance, we compared the speech spectrograms between the output signal processed by the conventional second-order CMAP-based algorithm and the output signal enhanced by the second-order CMAP incorporating the two-step PLDR-based VAD algorithm. Fig. 11 shows performance comparison in terms of speech spectrograms of which the speech sentence

11 CHOI AND CHANG: DUAL-MICROPHONE VAD TECHNIQUE BASED ON TWO-STEP PLDR 1079 TABLE III COMPARISON OF VOICE ACTIVITY IN TERMS OF SPEECH HIT RATES AND NON-SPEECH HIT RATES AMONG THE METHODS OF CONVENTIONAL MSM-BASED VAD [9], AMR VAD OPTION2 [40],TWO-CHANNEL MSC [30], DUAL-CHANNEL PLD [35], AND PROPOSED TWO-STEP PLDR (LOCATION OF NOISE SOURCE ABOUT THE DUMMY HEAD: ). NOTE THAT WINNING RESULTS IN TERMS OF AND ARE HIGHLIGHTED TABLE IV SUMMARY OF VAD IN TERMS OF OF SPEECH AND OF NON-SPEECH BY AVERAGING AND OVER GIVEN AZIMUTH ANGLES AND DISTANCES. NOTE THAT WINNING RESULTS IN TERMS OF AND ARE HIGHLIGHTED TABLE V RESULTS IN TERMS OF SPEECH QUALITY PESQ, LLR, WSS, AND AVERAGED FOR VARIOUS NOISE CONDITIONS AND AZIMUTH ANGLES, OBTAINED USING CONVENTIONAL SECOND-ORDER CMAP [11] AND SECOND-ORDER CMAP INCORPORATING TWO-STEP PLDR-BASED VAD (WITH 95% CONFIDENCE INTERVAL) was corrupted with babble noise located front with 1 m from the dummy head (i.e., SNR db). As can be seen in the figure, the proposed two step PLDR-based VAD clearly contributes on the performance improvement of the second-order CMAP-based speech enhancement technique under the nonstationary noise environment. D. Computational Complexity and Discussion Computational complexity is considered to be one of the crucial factors in designing systems for mobile devices, since complexity increases power consumption. While the proposed method is superior to conventional single- or dual-microphone-based algorithms, computational complexity should be assessed for the purpose of practical implementation. In order to evaluate the additional computational burden, we compared the computational complexity of the proposed two-step algorithm with those of the MSM-based VAD, the dual-microphone MSC-based VAD technique, and the dual-channel normalized PLD-based VAD algorithm. For a fair comparison, the computation steps were divided into main VAD and feature extraction part. Actually, a single FFT routine in the external feature can be ignored since the FFT can be reused in the forthcoming noise suppression module [12]. A brief summary of the computational cost required by each algorithm in terms of million instructions per second (MIPS), which is based on the TXS320C55X [45], [46], is presented

12 1080 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 6, JUNE 2014 Fig. 11. Comparison of speech spectrograms (babble noise, SNR db). Noise was located at about the dummy head with a distance of 1 m. (a) Speech spectrogram of the clean speech signal (b) Speech spectrogram of the noisy speech signal (c) Speech spectrogram of the output signal processed by the conventional second-order CMAP [11] (d) Speech spectrogram of the output signal processed by the second-order CMAP incorporating the two-step PLDR-based VAD. TABLE VI COMPARISON OF COMPUTATIONAL COMPLEXITY IN TERMS OF MIPS PER FRAME ( ms) in Table VI. Among the existing methods, the MSM-based VAD has the highest computational load, followed by the dual-microphone MSC-based VAD technique. The computational burden of the main part of the VAD decision claimed by the multiple-statistical models is larger than those of both the MSC-based VAD and the proposed technique. In an aspect of the computation of the external feature module, the proposed two-step and MSC-based method and the normalized PLD-based VAD are twice bigger than the VAD technique based on the MSM since their techniques require dual-microphones and thus the forward FFT routine is implemented twice. However, considering the main module in the VAD, the proposed two-step scheme calls for a lower computational burden compared to the MSM-based VAD and the MSC-based scheme. This can be attributed to the fact that the MSM-based VAD [9] estimates the noise power spectrum of the noisy signal and SNR parameters such as the aposteriorisnr, apriorisnr, and the likelihood ratio according to multiple-statistical models, while the proposed method simply utilizes the PLDR between the input signals and the noise signals between the primary microphone and the secondary microphone. In this regard, the proposed two-step PLDR approach could be considered to be a simple but effective algorithm since it could be efficiently implemented without significant additional computation cost. V. CONCLUSIONS In this paper, we proposed a novel dual-microphone two-step PLDR technique based on the relative comparison between the PLD of input signals and the PLD of noise estimated during periods without speech. The proposed two-step PLDR approach is composed of two main parts in which the LT-PLDR and ST-PLDR are used to characterize the long-term evolution and the short-term variation, respectively, at the first step and are incorporated into a combined decision rule for VAD at the second step. In order to minimize the missing cases of speech in the first step, the LT-PLDR is derived using a large smoothing parameter for calculating the PLD and is changed to the aposteriori probability for VAD. The ST-PLDR is obtained using a small smoothing parameter in order to detect short pause intervals and to decrease the false-alarm rate and results in the a posteriori SNR for VAD. At the second step, based on the decision by the aposterioriprobability derived from the LT-PLDR and the ST-PLDR, the final VAD decision rule is established and the VAD result from the LT-PLDR is modified by the decision from the ST-PLDR when the periods of speech presence are detected by the LT-PLDR. This two-step framework allows for the proposed method to provide reliable VAD performances under various acoustical conditions incorporating nonstationary noise environments. Through extensive experiments under various noise environments, the proposed dual-microphone VAD technique was found to significantly improve the performance of the VAD compared to conventional standardized VAD algorithms and representative single- and dual-microphone VAD algorithms. The proposed method showed outstanding results in terms of performance improvement in nonstationary noise conditions, including babble and office noises. Furthermore, through simulation of the computational complexity, we confirmed that the proposed two-step VAD algorithms can be implemented simply and efficiently in the smart phone system with little additional computational cost. REFERENCES [1] L.R.RabinerandM.R.Sambur, Voiced-unvoiced-slience detection using Itakura LPC distance measure, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 1977, pp [2] J. D. Hoyt and H. Wechsler, Detection of human speech in structured noise, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.,May 1994, pp [3] J. C. Junqua, B. Reaves, and B. Mark, A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize, in Proc. Eurospeech, Sep. 1991, pp [4] J.A.HaighandJ.S.Mason, Robustvoiceactivitydetectionusing cepstral feature, in Proc. IEEE TELCON, Oct. 1993, pp [5] R. Tucker, Voice activity detection using a periodicity measure, in Proc. Inst. Electr. Eng, Aug. 1992, vol. 139, pp [6] ITU-T, A slience compression scheme for G.729 optimized for terminals conforming to recommendation V.70, ITU-T Rec. G. 729, Annex B, [7] J. Sohn, N. S. Kim, and W. Sung, A statistical model-based voice activity detection, IEEE Signal Process. Lett., vol. 6, no. 1, pp. 1 3, Jan [8] Y. D. Cho, K. Al-Naimi, and A. Kondoz, Improved voice activity detection based on a smoothed statistical likelihood ratio, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2001, pp [9] J.-H. Chang, N. S. Kim, and S. K. Mitra, Voice activity detection based on multiple statistical models, IEEE Trans. Signal Process., vol. 54, no. 6, pp , Jun

13 CHOI AND CHANG: DUAL-MICROPHONE VAD TECHNIQUE BASED ON TWO-STEP PLDR 1081 [10] J. W. Shin, J.-H. Chang, and N. S. Kim, Voice activity detection based on a family of parametric distributions, Pattern Recognition Lett., vol. 28, no. 11, pp , Aug [11] J.-M. Kum and J.-H. Chang, Speech enhancement based on minima controlled recursive averaging incorporating second-order conditional MAP criterion, IEEE Signal Process. Lett., vol. 16, no. 7, pp , Jul [12] N. S. Kim and J.-H. Chang, Spectral enhancement based on global soft decision, IEEE Signal Process. Lett., vol. 7, no. 5, pp , May [13] R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Trans. Speech Audio Process., vol. 9, no. 5, pp , Jul [14] I. Cohen and B. Berdugo, Noise estimation by minima controlled recursive averaging for robust speech enhancement, IEEE Signal Process. Lett., vol. 9, no. 1, pp , Jan [15] K. Yao and S. Nakamura, Sequential noise compensation by sequential Monte Carlo method, in Proc. Neural Inf. Process. Syst., Dec. 2001, pp [16] B. Raj, R. Singh, and R. Stern, On tracking noise with linear dynamical system models, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2004, pp [17] M. Fujimoto and S. Nakamura, Sequential non-stationary noise tracking using particle filtering with switching dynamical system, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2006, pp [18] M. Fujimoto and K. Ishizuka, Noise robust voice activity detection based on switching Kalman filtering, in Proc. Eurospeech, Aug. 2007, pp [19] L. J. Griffiths and C. W. Jim, An alternative approach to linearly constrained adaptive beamformer, IEEE Trans. Antennas Propag., vol. AP-30, no. 1, pp , Jan [20] D. R. Campbell and P. W. Shields, Speech enhancement using subband adaptive Griffiths-Jim signal processing, Speech Commun., vol. 39, pp , Jan [21] B. D. Van and K. M. Buckley, Beamforming: A versatile approach to spatial filtering, IEEE ASSP Mag., vol. 5, pp. 4 24, Apr [22] H. Krim and M. Viberg, Two decades of array signal processing research, IEEE Signal Process. Mag., vol. 13, no. 4, pp , Jul [23] J. Chen and W. Ser, Speech detection using microphone array, Electron. Lett., vol. 36, no. 3, pp , Jan [24] Y. Hioka and N. Hamada, Voice activity detection with array signal processing in the wavelet domain, in Proc. 6th Eur. Signal Process. Conf., Sep. 2002, vol. I, pp [25] I. Potamitis, Estimation of speech presence probability in the field of microphone array, IEEE Signal Process. Lett., vol. 11, no. 12, pp , Dec [26] T. Pirinen and A. Visa, Signal independent wideband activity detection features for microphone arrays, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2006, pp [27] J. E. Rubio, K. Ishizuka, H. Sawada, S. Araki, T. Nakatani, and M. Fujimoto, Two-microphone voice activity detection based on the homogeneity of the direction of arrival estimates, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Apr. 2007, pp [28] J. B. Allen, D. A. Berkley, and J. Blauert, Multi-microphone signal processing technique to remove room reverberation from speech signals, J. Acoust Soc. Amer., vol. 62, no. 4, pp , Oct [29] R. Le Bouquin-Jeannès and G. Faucon, Using the coherence function for noise reduction, Inst. Electr. Eng. Proc.-I Commun., Speech, Vis., vol. 139, no. 3, pp , Jun [30] R. Le Bouquin-Jeannès and G. Faucon, Study of a voice activity detector and its influence on a noise reduction system, Speech Commun., vol. 16, pp , Apr [31] R. Le Bouquin-Jeannès, A. A. Azirani, and G. Faucon, Enhancement of speech degraded by coherent and incoherent noise using a crossspectral estimator, IEEE Trans. Speech Audio Process., vol. 5, no. 5, pp , Sep [32] C. Nelke, C. Beaugeant, and P. Vary, Dual microphone noise psd estimation for mobile phones in hands-free position exploiting the coherence and speech presence probability, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2013, pp [33] N. Yousefian, M. Rahmani, and A. Akbari, Power level difference as a criterion for speech enhancement, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Apr. 2009, pp [34] N. Yousefian, A. Akbari, and M. Rahmani, Using power level difference for near field dual-microphone speech enhancement, Appl. Acoust., vol. 70, pp , Dec [35] M. Jeub, C. Herglotz, C. Nelke, C. Beaugeant, and P. Vary, Noise reduction for dual-microphone mobile phones exploiting power level differences, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Mar. 2012, pp [36] Z.-H. Fu, F. Fan, and J.-D. Huang, Dual-microphone noise reduction for mobile phone application, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process, May 2013, pp [37] J.-H. Chang, Q.-H. Jo, D. K. Kim, and N. S. Kim, Global soft decision employing support vector machine for speech enhancement, IEEE Signal Process. Lett., vol. 16, no. 1, pp , Jan [38] J. Platt, Probabilistic outputs for support vector machines and comparison to regularized likelihood methods, in Advances in Large Margin Classifiers. Cambridge, MA, USA: MIT Press, [39]G.Wahba,X.Lin,F.Gao,D.Xiang,R.Klein,andB.Klein,The Bias-Variance Trade-Off and The Randomized GACV, M.Kearns,S. Solla, and D. Cohn, Eds. Cambridge, MA, USA: MIT Press, 1999, vol. 11, pp , Advances in Neural Information Processing Systems, Proceedings of the 1998 Conference. [40] Voice activity detector (VAD) for adaptive multi-rate (AMR) speech traffic channels, ETSI EN v7.1.1, ETSI, Dec [41] A. Varga and H. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems, Speech Commun., vol. 12, no. 3, pp , Jul [42] N. Krishnamurthy and J. Hansen, Babble noise: Modeling, analysis, and applications, IEEE Audio, Speech, Lang. Process., vol.17,no.7, pp , Sep [43] G. McLachlan, K.-A. Do, and C. Ambroise, Analyzing Microarray Gene Expression Data. New York, NY, USA: Wiley, [44] Y. Hu and P. Loizou, Evaluation of objective quality measures for speech enhancement, IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 1, pp , Jan [45] TI, TMS320C55x DSP library programmer s reference,. Dallas, TX, USA, [46] J.-H. Choi and J.-H. Chang, On using acoustic environment classification for statistical model-based speech enhancement, Speech Commun., vol. 54, no. 3, pp , Mar Jae-Hun Choi was born in Seoul, Korea, in He received the B.S. and M.S. degrees in electronic engineering from Inha University, Incheon, Korea in 2007 and 2010, respectively. Since 2011, he has been pursuing the Ph.D. degree at the department of electronics computer engineering, Hanyang University, Seoul, Korea. His research interests include speech enhancement, voice activity detection, machine learning applied to speech signal processing. Joon-Hyuk Chang received the B.S. degree in electronics engineering from Kyungpook National University, Daegu, Korea in 1998 and the M.S. and Ph.D. degrees in electrical engineering from Seoul National University, Korea, in 2000 and 2004, respectively. From March 2000 to April 2005, he was with Netdus Corp., Seoul, as a chief engineer. From May 2004 to April 2005, he was with the University of California, Santa Barbara, in a postdoctoral position to work on adaptive signal processing and audio coding. In May 2005, he joined Korea Institute of Science and Technology, Seoul, as a Research Scientist to work on speech recognition. From August 2005 to February 2011, he was an assistant professor in the school of Electronic Engineering at Inha University, Incheon, Korea. Currently, he is an associate professor in the School of Electronic Engineering at Hanyang University, Seoul, Korea. His research interests are in speech coding, speech enhancement, speech recognition, audio coding, and adaptive signal processing. He is a senior member of IEEE. He is a winner of IEEE/IEEK IT young engineer of the year He is serving as Editor-in-chief of the Signal Processing Society Journal of the IEEK.

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

RECENTLY, there has been an increasing interest in noisy

RECENTLY, there has been an increasing interest in noisy IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In

More information

Signal Processing 91 (2011) Contents lists available at ScienceDirect. Signal Processing. journal homepage:

Signal Processing 91 (2011) Contents lists available at ScienceDirect. Signal Processing. journal homepage: Signal Processing 9 (2) 55 6 Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro Fast communication Minima-controlled speech presence uncertainty

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

Estimation of Non-stationary Noise Power Spectrum using DWT

Estimation of Non-stationary Noise Power Spectrum using DWT Estimation of Non-stationary Noise Power Spectrum using DWT Haripriya.R.P. Department of Electronics & Communication Engineering Mar Baselios College of Engineering & Technology, Kerala, India Lani Rachel

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

A COHERENCE-BASED ALGORITHM FOR NOISE REDUCTION IN DUAL-MICROPHONE APPLICATIONS

A COHERENCE-BASED ALGORITHM FOR NOISE REDUCTION IN DUAL-MICROPHONE APPLICATIONS 18th European Signal Processing Conference (EUSIPCO-21) Aalborg, Denmark, August 23-27, 21 A COHERENCE-BASED ALGORITHM FOR NOISE REDUCTION IN DUAL-MICROPHONE APPLICATIONS Nima Yousefian, Kostas Kokkinakis

More information

Automotive three-microphone voice activity detector and noise-canceller

Automotive three-microphone voice activity detector and noise-canceller Res. Lett. Inf. Math. Sci., 005, Vol. 7, pp 47-55 47 Available online at http://iims.massey.ac.nz/research/letters/ Automotive three-microphone voice activity detector and noise-canceller Z. QI and T.J.MOIR

More information

Residual noise Control for Coherence Based Dual Microphone Speech Enhancement

Residual noise Control for Coherence Based Dual Microphone Speech Enhancement 008 International Conference on Computer and Electrical Engineering Residual noise Control for Coherence Based Dual Microphone Speech Enhancement Behzad Zamani Mohsen Rahmani Ahmad Akbari Islamic Azad

More information

ROBUST echo cancellation requires a method for adjusting

ROBUST echo cancellation requires a method for adjusting 1030 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 On Adjusting the Learning Rate in Frequency Domain Echo Cancellation With Double-Talk Jean-Marc Valin, Member,

More information

Recent Advances in Acoustic Signal Extraction and Dereverberation

Recent Advances in Acoustic Signal Extraction and Dereverberation Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing

More information

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 1 Electronics and Communication Department, Parul institute of engineering and technology, Vadodara,

More information

THE problem of acoustic echo cancellation (AEC) was

THE problem of acoustic echo cancellation (AEC) was IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 6, NOVEMBER 2005 1231 Acoustic Echo Cancellation and Doubletalk Detection Using Estimated Loudspeaker Impulse Responses Per Åhgren Abstract

More information

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor Presented by Amir Kiperwas 1 M-element microphone array One desired source One undesired source Ambient noise field Signals: Broadband Mutually

More information

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS 17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS Jürgen Freudenberger, Sebastian Stenzel, Benjamin Venditti

More information

Voice Activity Detection for Speech Enhancement Applications

Voice Activity Detection for Speech Enhancement Applications Voice Activity Detection for Speech Enhancement Applications E. Verteletskaya, K. Sakhnov Abstract This paper describes a study of noise-robust voice activity detection (VAD) utilizing the periodicity

More information

Multiple Sound Sources Localization Using Energetic Analysis Method

Multiple Sound Sources Localization Using Energetic Analysis Method VOL.3, NO.4, DECEMBER 1 Multiple Sound Sources Localization Using Energetic Analysis Method Hasan Khaddour, Jiří Schimmel Department of Telecommunications FEEC, Brno University of Technology Purkyňova

More information

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Spring 2008 Introduction Problem Formulation Possible Solutions Proposed Algorithm Experimental Results Conclusions

More information

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium

More information

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE - @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu

More information

Speech Enhancement for Nonstationary Noise Environments

Speech Enhancement for Nonstationary Noise Environments Signal & Image Processing : An International Journal (SIPIJ) Vol., No.4, December Speech Enhancement for Nonstationary Noise Environments Sandhya Hawaldar and Manasi Dixit Department of Electronics, KIT

More information

IN REVERBERANT and noisy environments, multi-channel

IN REVERBERANT and noisy environments, multi-channel 684 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 11, NO. 6, NOVEMBER 2003 Analysis of Two-Channel Generalized Sidelobe Canceller (GSC) With Post-Filtering Israel Cohen, Senior Member, IEEE Abstract

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 46 CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 3.1 INTRODUCTION Personal communication of today is impaired by nearly ubiquitous noise. Speech communication becomes difficult under these conditions; speech

More information

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM Mr. M. Mathivanan Associate Professor/ECE Selvam College of Technology Namakkal, Tamilnadu, India Dr. S.Chenthur

More information

Speech Enhancement Using a Mixture-Maximum Model

Speech Enhancement Using a Mixture-Maximum Model IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 6, SEPTEMBER 2002 341 Speech Enhancement Using a Mixture-Maximum Model David Burshtein, Senior Member, IEEE, and Sharon Gannot, Member, IEEE

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

Speaker and Noise Independent Voice Activity Detection

Speaker and Noise Independent Voice Activity Detection Speaker and Noise Independent Voice Activity Detection François G. Germain, Dennis L. Sun,2, Gautham J. Mysore 3 Center for Computer Research in Music and Acoustics, Stanford University, CA 9435 2 Department

More information

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B. www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 4 April 2015, Page No. 11143-11147 Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya

More information

MULTICHANNEL systems are often used for

MULTICHANNEL systems are often used for IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 5, MAY 2004 1149 Multichannel Post-Filtering in Nonstationary Noise Environments Israel Cohen, Senior Member, IEEE Abstract In this paper, we present

More information

Real time noise-speech discrimination in time domain for speech recognition application

Real time noise-speech discrimination in time domain for speech recognition application University of Malaya From the SelectedWorks of Mokhtar Norrima January 4, 2011 Real time noise-speech discrimination in time domain for speech recognition application Norrima Mokhtar, University of Malaya

More information

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments 88 International Journal of Control, Automation, and Systems, vol. 6, no. 6, pp. 88-87, December 008 Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging

Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging 466 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 11, NO. 5, SEPTEMBER 2003 Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging Israel Cohen Abstract

More information

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,

More information

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in

More information

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Engineering

More information

Calibration of Microphone Arrays for Improved Speech Recognition

Calibration of Microphone Arrays for Improved Speech Recognition MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present

More information

CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS

CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS 66 CHAPTER 4 VOICE ACTIVITY DETECTION ALGORITHMS 4.1 INTRODUCTION New frontiers of speech technology are demanding increased levels of performance in many areas. In the advent of Wireless Communications

More information

AS DIGITAL speech communication devices, such as

AS DIGITAL speech communication devices, such as IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 4, MAY 2012 1383 Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay Timo Gerkmann, Member, IEEE,

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

Informed Spatial Filtering for Sound Extraction Using Distributed Microphone Arrays

Informed Spatial Filtering for Sound Extraction Using Distributed Microphone Arrays IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 7, JULY 2014 1195 Informed Spatial Filtering for Sound Extraction Using Distributed Microphone Arrays Maja Taseska, Student

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

arxiv: v1 [cs.sd] 4 Dec 2018

arxiv: v1 [cs.sd] 4 Dec 2018 LOCALIZATION AND TRACKING OF AN ACOUSTIC SOURCE USING A DIAGONAL UNLOADING BEAMFORMING AND A KALMAN FILTER Daniele Salvati, Carlo Drioli, Gian Luca Foresti Department of Mathematics, Computer Science and

More information

HUMAN speech is frequently encountered in several

HUMAN speech is frequently encountered in several 1948 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 7, SEPTEMBER 2012 Enhancement of Single-Channel Periodic Signals in the Time-Domain Jesper Rindom Jensen, Student Member,

More information

Can binary masks improve intelligibility?

Can binary masks improve intelligibility? Can binary masks improve intelligibility? Mike Brookes (Imperial College London) & Mark Huckvale (University College London) Apparently so... 2 How does it work? 3 Time-frequency grid of local SNR + +

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

Speech Enhancement Based On Noise Reduction

Speech Enhancement Based On Noise Reduction Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics

Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics 504 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 5, JULY 2001 Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics Rainer Martin, Senior Member, IEEE

More information

ARTICLE IN PRESS. Signal Processing

ARTICLE IN PRESS. Signal Processing Signal Processing 9 (2) 737 74 Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro Fast communication Double-talk detection based on soft decision

More information

Carrier Frequency Offset Estimation in WCDMA Systems Using a Modified FFT-Based Algorithm

Carrier Frequency Offset Estimation in WCDMA Systems Using a Modified FFT-Based Algorithm Carrier Frequency Offset Estimation in WCDMA Systems Using a Modified FFT-Based Algorithm Seare H. Rezenom and Anthony D. Broadhurst, Member, IEEE Abstract-- Wideband Code Division Multiple Access (WCDMA)

More information

Voiced/nonvoiced detection based on robustness of voiced epochs

Voiced/nonvoiced detection based on robustness of voiced epochs Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies

More information

Speech Signal Enhancement Techniques

Speech Signal Enhancement Techniques Speech Signal Enhancement Techniques Chouki Zegar 1, Abdelhakim Dahimene 2 1,2 Institute of Electrical and Electronic Engineering, University of Boumerdes, Algeria inelectr@yahoo.fr, dahimenehakim@yahoo.fr

More information

A Survey and Evaluation of Voice Activity Detection Algorithms

A Survey and Evaluation of Voice Activity Detection Algorithms A Survey and Evaluation of Voice Activity Detection Algorithms Seshashyama Sameeraj Meduri (ssme09@student.bth.se, 861003-7577) Rufus Ananth (anru09@student.bth.se, 861129-5018) Examiner: Dr. Sven Johansson

More information

A multi-class method for detecting audio events in news broadcasts

A multi-class method for detecting audio events in news broadcasts A multi-class method for detecting audio events in news broadcasts Sergios Petridis, Theodoros Giannakopoulos, and Stavros Perantonis Computational Intelligence Laboratory, Institute of Informatics and

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

Wavelet Speech Enhancement based on the Teager Energy Operator

Wavelet Speech Enhancement based on the Teager Energy Operator Wavelet Speech Enhancement based on the Teager Energy Operator Mohammed Bahoura and Jean Rouat ERMETIS, DSA, Université du Québec à Chicoutimi, Chicoutimi, Québec, G7H 2B1, Canada. Abstract We propose

More information

Voice Activity Detection Using Spectral Entropy. in Bark-Scale Wavelet Domain

Voice Activity Detection Using Spectral Entropy. in Bark-Scale Wavelet Domain Voice Activity Detection Using Spectral Entropy in Bark-Scale Wavelet Domain 王坤卿 Kun-ching Wang, 侯圳嶺 Tzuen-lin Hou 實踐大學資訊科技與通訊學系 Department of Information Technology & Communication Shin Chien University

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Probability of Error Calculation of OFDM Systems With Frequency Offset

Probability of Error Calculation of OFDM Systems With Frequency Offset 1884 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 49, NO. 11, NOVEMBER 2001 Probability of Error Calculation of OFDM Systems With Frequency Offset K. Sathananthan and C. Tellambura Abstract Orthogonal frequency-division

More information

546 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY /$ IEEE

546 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY /$ IEEE 546 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL 17, NO 4, MAY 2009 Relative Transfer Function Identification Using Convolutive Transfer Function Approximation Ronen Talmon, Israel

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA. Qipeng Gong, Benoit Champagne and Peter Kabal

NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA. Qipeng Gong, Benoit Champagne and Peter Kabal NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA Qipeng Gong, Benoit Champagne and Peter Kabal Department of Electrical & Computer Engineering, McGill University 3480 University St.,

More information

1856 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, SEPTEMBER /$ IEEE

1856 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, SEPTEMBER /$ IEEE 1856 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, SEPTEMBER 2010 Sequential Organization of Speech in Reverberant Environments by Integrating Monaural Grouping and Binaural

More information

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991 RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response

More information

Speech Endpoint Detection Based on Sub-band Energy and Harmonic Structure of Voice

Speech Endpoint Detection Based on Sub-band Energy and Harmonic Structure of Voice Speech Endpoint Detection Based on Sub-band Energy and Harmonic Structure of Voice Yanmeng Guo, Qiang Fu, and Yonghong Yan ThinkIT Speech Lab, Institute of Acoustics, Chinese Academy of Sciences Beijing

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

Single channel noise reduction

Single channel noise reduction Single channel noise reduction Basics and processing used for ETSI STF 94 ETSI Workshop on Speech and Noise in Wideband Communication Claude Marro France Telecom ETSI 007. All rights reserved Outline Scope

More information

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012 Signal segmentation and waveform characterization Biosignal processing, 5173S Autumn 01 Short-time analysis of signals Signal statistics may vary in time: nonstationary how to compute signal characterizations?

More information

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Project Proposal Avner Halevy Department of Mathematics University of Maryland, College Park ahalevy at math.umd.edu

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License Title Non-intrusive intelligibility prediction for Mandarin speech in noise Author(s) Chen, F; Guan, T Citation The 213 IEEE Region 1 Conference (TENCON 213), Xi'an, China, 22-25 October 213. In Conference

More information

COM 12 C 288 E October 2011 English only Original: English

COM 12 C 288 E October 2011 English only Original: English Question(s): 9/12 Source: Title: INTERNATIONAL TELECOMMUNICATION UNION TELECOMMUNICATION STANDARDIZATION SECTOR STUDY PERIOD 2009-2012 Audience STUDY GROUP 12 CONTRIBUTION 288 P.ONRA Contribution Additional

More information

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity

More information

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement 1 Zeeshan Hashmi Khateeb, 2 Gopalaiah 1,2 Department of Instrumentation

More information

SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim

SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION Changkyu Choi, Seungho Choi, and Sang-Ryong Kim Human & Computer Interaction Laboratory Samsung Advanced Institute of Technology

More information

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding. Volume 5, Issue 2, February 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Speech Enhancement

More information

ANUMBER of estimators of the signal magnitude spectrum

ANUMBER of estimators of the signal magnitude spectrum IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 5, JULY 2011 1123 Estimators of the Magnitude-Squared Spectrum and Methods for Incorporating SNR Uncertainty Yang Lu and Philipos

More information

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

RESEARCH ON METHODS FOR ANALYZING AND PROCESSING SIGNALS USED BY INTERCEPTION SYSTEMS WITH SPECIAL APPLICATIONS

RESEARCH ON METHODS FOR ANALYZING AND PROCESSING SIGNALS USED BY INTERCEPTION SYSTEMS WITH SPECIAL APPLICATIONS Abstract of Doctorate Thesis RESEARCH ON METHODS FOR ANALYZING AND PROCESSING SIGNALS USED BY INTERCEPTION SYSTEMS WITH SPECIAL APPLICATIONS PhD Coordinator: Prof. Dr. Eng. Radu MUNTEANU Author: Radu MITRAN

More information

Adaptive Speech Enhancement Using Partial Differential Equations and Back Propagation Neural Networks

Adaptive Speech Enhancement Using Partial Differential Equations and Back Propagation Neural Networks Australian Journal of Basic and Applied Sciences, 4(7): 2093-2098, 2010 ISSN 1991-8178 Adaptive Speech Enhancement Using Partial Differential Equations and Back Propagation Neural Networks 1 Mojtaba Bandarabadi,

More information

Local Oscillators Phase Noise Cancellation Methods

Local Oscillators Phase Noise Cancellation Methods IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834, p- ISSN: 2278-8735. Volume 5, Issue 1 (Jan. - Feb. 2013), PP 19-24 Local Oscillators Phase Noise Cancellation Methods

More information

SNR Estimation in Nakagami-m Fading With Diversity Combining and Its Application to Turbo Decoding

SNR Estimation in Nakagami-m Fading With Diversity Combining and Its Application to Turbo Decoding IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 50, NO. 11, NOVEMBER 2002 1719 SNR Estimation in Nakagami-m Fading With Diversity Combining Its Application to Turbo Decoding A. Ramesh, A. Chockalingam, Laurence

More information

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition International Conference on Advanced Computer Science and Electronics Information (ICACSEI 03) On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition Jongkuk Kim, Hernsoo Hahn Department

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY INTER-NOISE 216 WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY Shumpei SAKAI 1 ; Tetsuro MURAKAMI 2 ; Naoto SAKATA 3 ; Hirohumi NAKAJIMA 4 ; Kazuhiro NAKADAI

More information

A Soft-Limiting Receiver Structure for Time-Hopping UWB in Multiple Access Interference

A Soft-Limiting Receiver Structure for Time-Hopping UWB in Multiple Access Interference 2006 IEEE Ninth International Symposium on Spread Spectrum Techniques and Applications A Soft-Limiting Receiver Structure for Time-Hopping UWB in Multiple Access Interference Norman C. Beaulieu, Fellow,

More information

Joint dereverberation and residual echo suppression of speech signals in noisy environments Habets, E.A.P.; Gannot, S.; Cohen, I.; Sommen, P.C.W.

Joint dereverberation and residual echo suppression of speech signals in noisy environments Habets, E.A.P.; Gannot, S.; Cohen, I.; Sommen, P.C.W. Joint dereverberation and residual echo suppression of speech signals in noisy environments Habets, E.A.P.; Gannot, S.; Cohen, I.; Sommen, P.C.W. Published in: IEEE Transactions on Audio, Speech, and Language

More information

Online Monaural Speech Enhancement Based on Periodicity Analysis and A Priori SNR Estimation

Online Monaural Speech Enhancement Based on Periodicity Analysis and A Priori SNR Estimation 1 Online Monaural Speech Enhancement Based on Periodicity Analysis and A Priori SNR Estimation Zhangli Chen* and Volker Hohmann Abstract This paper describes an online algorithm for enhancing monaural

More information

Modulation Spectrum Power-law Expansion for Robust Speech Recognition

Modulation Spectrum Power-law Expansion for Robust Speech Recognition Modulation Spectrum Power-law Expansion for Robust Speech Recognition Hao-Teng Fan, Zi-Hao Ye and Jeih-weih Hung Department of Electrical Engineering, National Chi Nan University, Nantou, Taiwan E-mail:

More information

THE EFFECT of multipath fading in wireless systems can

THE EFFECT of multipath fading in wireless systems can IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 47, NO. 1, FEBRUARY 1998 119 The Diversity Gain of Transmit Diversity in Wireless Systems with Rayleigh Fading Jack H. Winters, Fellow, IEEE Abstract In

More information

DESIGN AND IMPLEMENTATION OF AN ALGORITHM FOR MODULATION IDENTIFICATION OF ANALOG AND DIGITAL SIGNALS

DESIGN AND IMPLEMENTATION OF AN ALGORITHM FOR MODULATION IDENTIFICATION OF ANALOG AND DIGITAL SIGNALS DESIGN AND IMPLEMENTATION OF AN ALGORITHM FOR MODULATION IDENTIFICATION OF ANALOG AND DIGITAL SIGNALS John Yong Jia Chen (Department of Electrical Engineering, San José State University, San José, California,

More information

Title. Author(s)Sugiyama, Akihiko; Kato, Masanori; Serizawa, Masahir. Issue Date Doc URL. Type. Note. File Information

Title. Author(s)Sugiyama, Akihiko; Kato, Masanori; Serizawa, Masahir. Issue Date Doc URL. Type. Note. File Information Title A Low-Distortion Noise Canceller with an SNR-Modifie Author(s)Sugiyama, Akihiko; Kato, Masanori; Serizawa, Masahir Proceedings : APSIPA ASC 9 : Asia-Pacific Signal Citationand Conference: -5 Issue

More information