IN RECENT YEARS, there has been a great deal of interest

Size: px
Start display at page:

Download "IN RECENT YEARS, there has been a great deal of interest"

Transcription

1 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 12, NO 1, JANUARY Signal Modification for Robust Speech Coding Nam Soo Kim, Member, IEEE, and Joon-Hyuk Chang, Member, IEEE Abstract Usually, the performance of a low-bit-rate speech coder degrades seriously in the presence of various interfering signals such as the background noise, acoustic echo, co-talkers speech and other unwanted signals This comes from the mismatch between the input signal and the assumed speech production model on which the design of the given speech coder is based In this paper, we present an approach to modify the input signal such that it can be coded more effectively within the generalized analysis-by-synthesis framework Signal modification in the presented approach is performed according to a criterion which makes a compromise between the modification and coder quantization errors The coder-decoder (CODEC) characteristic is described in terms of a transfer matrix, and an on-line method using the recursive least square (RLS) technique is proposed to estimate it Since each part of the speech signal is differently affected by the modification, we also devise an adaptive method based on the signal-to-quantization noise ratio (SQNR) In contrast to the conventional modification techniques, our approach can be implemented as a simple front-end for any analysis-by-synthesis type coders Index Terms Low-bit-rate speech coding, signal modification I INTRODUCTION IN RECENT YEARS, there has been a great deal of interest in low-bit-rate speech coding techniques [1] For an efficient use of the limited bandwidth resources, it is indispensable to describe the speech signal with minimal bits Low-bit-rate speech coding has become available due to a simplified speech production modeling in which the vocal tract and the excitation are treated separately Each coding technique is classified according to the way how it characterizes the excitation signal while the linear prediction analysis is usually applied to express the vocal tract In the code excited linear prediction (CELP) speech coders, the excitation is selected from a collection of codeword vectors [2] [4] Other successful approaches are based on a parametric representation of the excitation signal such as the sinusoidal model and the waveform interpolation [5] [8] In the sinusoidal model, the excitation signal is described in terms of a number of sine waves where the frequency, amplitude and phase of each sine wave are estimated or predicted from the input speech [5] In contrast, the excitation signal in the waveform interpolation approach is reconstructed by interpolating a set of pitch cycle waveforms referred to as the characteristic waveforms based on the assumption that their general shape evolves slowly [8] Since the generation of speech signals should be represented in terms of a simple parametric model in order to achieve high Manuscript received January 21, 2002; revised August 8, 2003 The associate editor coordinating the review of this manuscript and approving it for publication was Dr Peter Vary The authors are with the School of Electrical Engineering and INMC, Seoul National University, Seoul , Korea ( nkim@snuackr) Digital Object Identifier /TSA coding gain with a limited bit budget, a robust way to estimate the relevant parameters is required The criterion usually adopted for parameter estimation aims at minimizing the waveform matching error Due to the difficulty in obtaining an optimal solution analytically, we apply the closed loop analysis technique in which all the possible parameter values are tried to reconstruct the original signal and the one that minimizes the matching error is selected as the optimal solution This closed loop analysis technique is referred to as the analysis-by-synthesis approach, and almost all the low-bit-rate speech coders which are in use today are based on it [1] In [6], speech is represented as a sum of harmonic sine waves and the fundamental frequency is searched so as to minimize the mean squared error between the original and synthetic spectra On the other hand, all the fixed-codebook entries are used to form the excitation signal and the one that results in the smallest mean squared difference between the input and synthesized signals is selected in CELP-based coders [2] [4] Furthermore, it has been recently reported that the analysis-by-synthesis vector quantization of the rapidly evolving waveform (REW) parameters enhances the performance of a waveform interpolation coder [9] When measuring the waveform matching error, a perceptual weighting filter is usually applied to make the error more suitable for human auditory characteristic referred to the masking effect [10] Due to the incorporation of the perceptual weighting filter, the quantization noise is shaped such that it becomes minimally audible In general, the performance of a low-bit-rate speech coder degrades seriously under adverse environments Undesired distortions are frequently observed from the reconstructed signal in the presence of background noise, acoustic echo, music sounds or interfering speakers speech This can be considered from two aspects First, the interfering signals are not appropriate to be effectively coded in the coder which is conventionally designed based on a simplified human speech production model Second, the presence of interfering signals makes it difficult to obtain an exact estimate for the parameters and is likely to mislead to a bad solution The distortion can be somewhat mitigated under the analysis-by-synthesis framework due to its waveform matching property However, a number of codebooks used in the coder are trained based on a large amount of speech data and the ranges for parameter search are specified to fit to the pure speech signals Therefore, deviation of the input signal from the assumed speech production model is still a major cause of performance degradation even in the analysis-by-synthesis coders A straightforward way one can consider to reduce the unwanted distortions is the employment of a speech enhancement technique such as the spectral subtraction, Kalman filtering or model-based enhancement technique [11] [14] Speech enhancement algorithms increase the signal-to-noise ratio (SNR) and can be used as a signal pre-processor for the given /04$ IEEE

2 10 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 12, NO 1, JANUARY 2004 speech coder Even though theses enhancement techniques have been found effective in the presence of a stationary background noise, they are not capable of handling such interfering signals as the acoustic echoes, music sounds or co-talkers speech and further they are usually developed irrespective of the speech coder characteristic An alternative approach is the generalized analysis-by-synthesis technique where the original input speech signal is modified such that it can be coded more effectively [1] In a generalized analysis-by-synthesis approach, it is important that the modification should not bring about any audible distortions In [15], input speech signal is modified to have a smooth timedelay trajectory, which saves the bits required for quantizing the pitch periods in each frame and incurs errors of no perceptual importance Another pre-processing algorithm for the CELP coders is proposed in [16] where the input signal is perturbed such that the perturbed signal is subjectively indistinguishable from the original but the prediction gain can be maximized with the quantized linear prediction coefficients In this paper, we present an approach to modify the signal applied to a certain speech coder as an input A criterion which compromises between the two types of distortions, ie, the modification error and the quantization noise is introduced Optimization of the given problem becomes possible through an approximation of the input-output characteristic provided by a coder-decoder (CODEC) pair with a simple transfer matrix Estimation of the transfer matrix is treated as a system identification problem, and we employ the well-known least squares technique, which can be implemented in either the batch or sequential mode This idea was originally proposed in [17], and this paper gives an in depth description as well as a thorough analysis of the approach Moreover, an adaptive way of controlling the two kinds of errors is proposed based on extensive experiments on performance evaluation In contrast to the conventional generalized analysis-by-synthesis techniques in which the input modification is tightly coupled to the inherent CODEC characteristic, our approach can be implemented as a simple front-end for any analysis-by-synthesis type coders The organization of this paper is as follows: Following this section, a sophisticated description of the signal modification approach is given in Section II Least squares algorithm for the transfer matrix estimation and an on-line implementation of the presented approach are given in Sections III and IV, respectively In Section V, a number of experiments are conducted to evaluate the performance, and finally in Section VI, some concluding remarks are drawn Fig 1 Overall structure of the generalized analysis-by-synthesis speech coder the original input signal can be perfectly reconstructed without any quantization errror Under the generalized analysis-by-synthesis paradigm, the input signal is modified before being fed to the coder so that it can be reconstructed in the receiver side with minimal distortion In this approach, it is crucial to make the modified signal nearly the same to the original input speech in terms of human auditory perception For instance, it is well-known that the human auditory system is insensitive to a degree of time-delay difference In this section, we present an approach to modify the input signal For this, we treat the CODEC as if it were a black box, and take advantage of its input-output characteristics Let with denoting matrix transposition, be a vector which represents a frame of speech samples applied as an input to a speech coder Even though a time-domain approach is possible, it is found beneficial to modify the signal in the transform domains In this paper, we use both the discrete Fourier transform (DFT) and the discrete cosine transform (DCT) for the signal representation in the transform domain is transformed into a vector where denotes the number of coefficients With the DFT DFT in which is assumed to be an even number The effective number of DFT coefficients is and each coefficient is a complex number except for and The inverse DFT (IDFT) is given by (1) (2) II SIGNAL MODIFICATION AS A CODEC FRONT-END Fig 1 shows an overall structure of a speech coder built on the basis of the generalized analysis-by-synthesis paradigm Analysis of the input signal and quantization of the relevant parameters are carried out in the speech coder, and the quantized parameters are transformed into a bit stream and then transmitted to the receiver As in most of the analysis-by-synthesis speech coding techniques, these quantized parameters are also passed through the decoder in order to reconstruct the original signal This CODEC structure forms a system for which an ideal transfer function should be the identity mapping implying that with On the other hand, if we apply the DCT DCT (3)

3 KIM AND CHANG: SIGNAL MODIFICATION FOR ROBUST SPEECH CODING 11 where all the coefficients are real numbers and Modification of the input vector, the following criterion: is achieved according to (7) Inverse DCT (IDCT) is given by where denotes the desired modified vector Here, the objective function is given by (4) In the presented approach, prior to applying to the encoder, we modify such that the modified vector can better fit to the speech coder Let be the signal samples obtained by modifying and be the output vector which is produced when is applied to the coder and then re-synthesized in the decoder Also let and be the transform domain representation of and, respectively Without loss of generality, we assume that where is an augmented input vector and represents the transfer function that models the input-output characteristic of the CODEC In (5), represents the input data in the current frame On the other hand, stands for the previous data and consists of the future input samples which are usually referred to as the look-ahead data In low-bit-rate speech coding techniques, the coding parameters such as the pitch, line spectrum frequencies (LSF s) and the excitation signal extracted from a frame are dependent not only on the samples in that frame but also on the past and look-ahead data The transfer function, is generally highly nonlinear For simplicity, we approximate (5) in the transform domain by (5) (6) in which is a distance measure between the two vectors, and, and is a positive constant From (8), it is noted that is expressed in terms of two distortions; one is caused by the modification and the other is the quantization error which comes from the speech coder The positive constant, compromises these two types of distortions, and it should be carefully determined Clearly, the optimal solution depends on how we choose the distance measure, For developing an effective signal modification method, it is important to make more closely related to human auditory perception In the following, we present three distance measures which are usually applied to compute the distance between two spectra, and the solution for signal modification is given for each case Case 1 Linear Spectral Distance: Linear spectral distance is given by in which case, where (8) (9) indicates the Euclidean norm of a vector In this is written as (10) where with being the identity matrix Since is a quadratic function of, differentiating it with respect to and then equating to zero leads us to and and are the transform domain vectors for and, respectively Equation (6) shows that the CODEC is approximated by a linear system model Practically, the input-output relationship of a speech CODEC is too complicated to be expressed in a rather simple form as given by (6) However, if we focus on a short interval of speech, this linear system model can be considered a good approximation to the real CODEC transfer function Moreover, since the CODEC output is mostly affected by the input in the corresponding frame, the effects of the past and future input data can be ignored without introducing a large modeling error (11) in which means the Hermitian operation From (11), it is easy to show that (12) Since is a diagonal matrix, (12) can be described component-wise as follows: (13)

4 12 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 12, NO 1, JANUARY 2004 Case 2 Weighted Linear Spectral Distance: Given a positive-definite matrix the weighted linear spectral distance is defined by III TRANSFER MATRIX ESTIMATION Estimation of the transfer matrix can be treated as a system identification problem Suppose that we apply input vectors, to the speech CODEC, and obtain the corresponding output vectors, Our approach is based on the method of least squares which is described as follows: (14) Consequently (15) and (16) As in the linear spectral distance case, becomes a quadratic function and from (16) it can be shown that (21) in which is the least squares estimate for and indicates the weighting factor assigned to the th input-output pair As shown in (21), is decomposed into separate functions where (17) Since is positive-definite, it is possible to make the inverse matrix on the right hand side of (17) exist with an appropriate choice for Usually, represents a perceptual weighting filter used for quantization noise shaping in the linear prediction based analysis-by-synthesis coding techniques It is noted that if is a diagonal matrix, (17) becomes the same to (13), which indicates that a diagonal weighting matrix does not affect signal modification Case 3 Log Spectral Distance: The definition for the log spectral distance is given by (22) Differentiating with respect to, we get (23) for and equating each differential to zero results in (18) (24) With the definition, is written as The least squares estimation technique can also be implemented in a sequential manner, which we call the recursive least squares (RLS) method [18] Let (19) (25) From (19), it is not difficult to see (20) ie, the optimal solution is the input signal itself and no modification is necessary be the least squares estimate for derived based on input-output pairs, Then, for (26)

5 KIM AND CHANG: SIGNAL MODIFICATION FOR ROBUST SPEECH CODING 13 where In (27) (27) (28) Moreover, if we incorporate the exponential forgetting scheme, (27) and (28) are modified as follows: Fig 2 CODEC performance in noisy environments (29) with representing the given forgetting factor which lies in the range (0, 1) The RLS technique with an appropriate forgetting scheme enables us to track the time-varying transfer function of a given speech CODEC IV ON-LINE IMPLEMENTATION From the previous sections, it is shown that the transfer matrix, should be estimated prior to input modification Since, however, the CODEC input-output characteristic is usually time-varying depending on the given speech frame, identification of as well as input modification should be performed in an on-line fashion A simple solution is to apply the input speech frame twice to the CODEC, once for estimating by means of the RLS technique and next for real speech quantization However, this scheme requires not only a large amount of computation but also some modification to the CODEC operation Since our purpose is to achieve a proper input modification while maintaining the original CODEC operation, we predict the transfer matrix, at a time using the previous data This is based on the assumption that the transfer characteristic of a CODEC evolves slowly Let be the estimate for at time Then, input modification and transfer matrix estimation are carried out simultaneously as follows: For For i) Modify the input speech frame according to (13) with ii) Obtain the CODEC output, by applying the modified input vector iii) Estimate based on the RLS approach by treating as a new input-output pair V EXPERIMENTS AND RESULTS In this section, we perform a number of experiments on the presented signal modification technique As a target speech coder, we employed the G 729 CS-ACELP which is a toll quality 8 kb/s speech coder [4] 96 sentences spoken by four male and four female speakers were used for the evaluation data Each sentence was sampled at 8 khz and the frame size was set to 10 ms A Coder Performance in Noisy Environments For the purpose of analyzing the CODEC transfer characteristics, we first measured the signal-to-quantization noise ratio (SQNR) produced by the speech coder under various background noise conditions If represents the speech samples of the th input frame, and denotes the real CODEC transfer function, the average SQNR is computed as follows: SQNR (30) where is the total number of frames The average SQNR of the target coder for the clean speech was found 1285 db For the purpose of simulating noisy environments, speech samples were corrupted by three kinds of noise sources: white, babble, and pink noises extracted from the NOISEX-92 database [19] These noises were added to the clean speech waveforms at various SNRs The results are shown in Fig 2 from which it is evident that the performance deteriorates rapidly as the SNR lowers in all the noise conditions Among the three types of noises, the white noise affected most on the coder performance As opposed to the white noise case, a mild degradation in performance was observed with the babble noise B System Identification of CODEC Transfer Function In the presented signal modification approach, the real CODEC transfer function is approximated by a locally linear

6 14 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 12, NO 1, JANUARY 2004 Fig 3 Approximation error of CODEC transfer function with clean speech Fig 4 Approximation error of CODEC transfer function with speech corrupted by babble noise at SNR =10dB transfer matrix as given by (6) For that reason, it is of great importance how to estimate so that it can provide a more realistic description of the CODEC transfer characteristic In the present work, is estimated by means of the RLS technique which is found suitable for on-line implementation Several experiments were conducted to verify how closely the estimated could approximate the real transfer function The performance was described in terms of the approximation error indicating the difference between the true CODEC output and the signal predicted based on the estimated By varying the forgetting factor, we applied the RLS algorithm given by (26) to estimate the diagonal transfer matrix Let denote the spectrum of the th input signal frame and be the corresponding CODEC output spectrum Here, these spectra are represented by either the DFT or DCT coefficients Since, in our on-line implementation, is used to approximate the CODEC output, a relative measure for the approximation error, which we call the signal-to-approximation noise ratio (SANR) is given as follows: enon implies that the given CODEC transfer characteristic can be considered to evolve slowly C Experiments on Signal Modification Performance of the signal modification approach presented in Section II was evaluated in both the clean and noisy environments Since the present approach makes a trade off between the modification error and the coder quantization error, the performance is described in terms of two distortion measures One is the signal-to-modification noise ratio (SMNR) defined as follows: SMNR (32) in which is the signal spectrum obtained by modifying the original input signal spectrum, The other distortion measure is the conventional SQNR given by SANR (31) SQNR (33) SANR was computed over both the clean and noisy speech databases The noisy speech samples were generated by adding the babble noise to the clean speech waveforms at 10 db of SNR Figs 3 and 4 show the results for the clean and noisy speech, respectively From the results, we can see that the DFT was more efficient than the DCT in approximating the true CODEC transfer function, but the difference in performance became smaller as the forgetting factor got closer to 1 It is noted that the performance achieved with the use of the DFT coefficients degraded more seriously in the presence of background noise compared to that using the DCT Moreover, it is also worth mentioning that SANR increased as the forgetting factor approached to 1 This phenom- where represents the output when the modified spectrum, is applied to the CODEC as an input Signal modification was done according to (13) derived from the linear spectral distance by varying the constant The transfer matrix, was estimated based on the RLS method with the forgetting factor, Fig 5 gives the results obtained from the clean speech data As expected, SMNR was high when was small and vice versa For the SQNR, even though it increased as grew, the slope of increase was not as steep as in the case of SMNR SQNR s obtained from both

7 KIM AND CHANG: SIGNAL MODIFICATION FOR ROBUST SPEECH CODING 15 Fig 5 SMNR and SQNR of the modification approach with clean speech Fig 7 SMNR and SQNR of the modification approach with the noisy speech corrupted by the white noise at SNR =10dB Fig 6 Overall SQNR of the modification approach with clean speech the DFT and DCT spectral representation were found almost the same for all the values of As for the modification error, DFT produced higher SMNR than DCT if was small while DCT performed better than DFT with a large In order to examine how much the signal modification scheme affects the overall CODEC performance, we also evaluated the overall SQNR (OSQNR) defined by OSQNR (34) OSQNR takes into account both the modification and quantization errors In Fig 6, we show the overall CODEC performance when the clean speech signals were applied OSQNR was found lower with signal modification compared to the original CODEC performance Fig 8 Overall SQNR of the modification approach with the noisy speech corrupted by the white noise at SNR =10dB Next, several experiments on noisy speech data were carried out For these experiments, white and babble noises were added to the clean speech waveforms while keeping the SNR at 10 db The results are shown in Figs 7 10 where it is observed that as grew the SQNR increased more rapidly than that obtained based on the clean speech data D Adaptive Signal Modification In the presented signal modification technique, the amount of modification and quantization errors is controlled by the constant If is large, more emphasis is placed on the quantization error and a larger modification of the input signal is allowed In some parts of the given speech signal, even a slight modification makes an audible distortion which degrades the perceptual quality On the contrary, other parts remain almost perceptually indistinguishable from the original signal or cause no degrada-

8 16 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 12, NO 1, JANUARY 2004 Fig 9 SMNR and SQNR of the modification approach with the noisy speech corrupted by the babble noise at SNR =10dB Fig 11 Smoothed SQNR of the noisy speech corrupted by the babble noise at SNR =10dB (a) Input signal waveform (b) Smoothed SQNR in each frame If represents the smoothed SQNR at time, smoothing was done as follows: SQNR SQNR (35) Fig 10 Overall SQNR of the modification approach with the noisy speech corrupted by the babble noise at SNR =10dB tion in speech quality such as the intelligibility and naturalness Therefore, it is desirable to apply an adaptive approach to modification depending on the given signal One possible way may be to classify each frame into speech and nonspeech periods, and then apply a different value of depending on the classification result Even though a voice activity detection (VAD) algorithm [20] can be used for such a hard decision task, it is useful only in the stationary noise environments Moreover, as in many speech enhancement techniques, a soft decision method will be more helpful to avoid an abrupt discontinuity in spectral components [14] Here, we propose an adaptive approach to determine the constant based on the CODEC characteristic To devise the approach, we performed an experiment where a speech signal corrupted by the babble noise at was passed through the CODEC and a smoothed SQNR was computed in which and are the input signal in the th frame and the corresponding CODEC output, respectively, and is the smoothing coefficient which was set to 098 in the experiment Fig 11 plots the smoothed SQNR curve in conjunction with the original input speech samples From the results, it is evident that the smoothed SQNR is high for the speech periods while it is low during the nonspeech periods Since modification of the input signal during the speech period is considered more likely to produce undesired distortions, should be selected adaptively according to whether the given input signal frame is classified into the speech or nonspeech period Based on the observation given by Fig 11, we propose a novel approach to determine by means of the computed SQNR If denotes the estimate for in the th frame, it is given as follows: with SQNR (36) (37) in which ( ) is the slope parameter, means an offset and is the maximum possible value of, and all these parameters are found experimentally It is noted that the sigmoid type function of (37) makes the constant inversely proportional to the SQNR while limiting the value to the interval (0, )

9 KIM AND CHANG: SIGNAL MODIFICATION FOR ROBUST SPEECH CODING 17 TABLE I RESULTS FOR SUBJECTIVE LISTENING TEST; MOS between the modification and quantization errors, and we have derived the solutions associated to various distortion measures Based on a quasistationarity assumption for the transfer function, the system model parameters are identified by means of the RLS approach Since each part of the speech signal is differently affected by the modification, we have also devised an adaptive method based on the smoothed estimate for the SQNR From a number of experiments, the presented modification approach has been found to improve the perceived speech quality especially when the input signal is corrupted by a class of interfering signals such as the background noises, music sounds and the speech from other speakers E Subjective Listening Tests For the purpose of evaluating the subjective quality of the presented signal modification algorithm, we carried out a set of informal listening tests Eight test sentences spoken by the same number of speakers, ie, one for each speaker, were selected and then used for quality measurement Subjective opinion scores were decided by a group of ten listeners and then averaged to yield the mean opinion score (MOS) results In order to make the input data deviate from the pure speech, we added the white, babble and high frequency channel (HFC) noises from the NOISEX-92 database to the clean speech signals by varying SNR The HFC noise in the NOISEX-92 database was collected by recording the noise sounds in a high frequency radio channel after demodulation In addition, two types of interfering signals in which one was the co-talker s speech and the other was a background music were also applied to degrade the input speech quality The MOS results are shown in Table I where SM DFT and SM DCT represent the signal modification algorithm with DFT and DCT, respectively Signal modification was performed adaptively according to the sigmoid type function given by (37) with, and From Table I, it is evident that the input signal modification algorithm is effective in reducing the distortion or listener fatigue caused by the background noise Performance improvement was found greater for the white and HFC noise environments compared to the other cases It is interesting to see that the performance of the SM DCT was better than that of the SM DFT even though the former was found inferior to the latter in terms of the modeling capability, which can be discovered from our previous experiments, eg, Figs 3 and 4 It is also noted that without any interfering signals, the results obtained with the employment of a signal modification scheme were even slightly better than that of the original CODEC VI CONCLUSIONS We have presented an approach to input signal modification to be used as the front-end for a low-bit-rate speech coder A simplified system modeling of the given CODEC transfer function makes it possible to convert the signal modification issue into a mathematically tractable optimization problem The objective function of this optimization task is described as a compromise REFERENCES [1] W B Kleijn and K K Paliwal, Speech Coding and Synthesis New York: Elsevier, 1995 [2] M Schroeder and B Atal, Code-excited linear prediction (CELP): High quality speech at very low bit rates, in IEEE Int Conf Acoust, Speech, Signal Processing, 1985, pp [3] P Kroon, E F Deprette, and R J Sluyter, Regular-pulse excitation- A novel approach to effective and efficient multipulse coding of speech, IEEE Trans Acoust, Speech, Signal Processing, vol ASSP-34, no 5, pp , May 1986 [4] R Salami et al, Design and description of CS-ACELP: A toll quality 8 kb/s speech coder, IEEE Trans Speech Audio Processing, vol 6, pp , Mar 1998 [5] R J McAulay and T F Quatieri, Speech analysis-synthesis based on a sinusoidal representation, IEEE Trans Acoust, Speech, Signal Processing, vol ASSP-34, pp , Apr 1986 [6] D Griffin and J S Lim, Multiband excitation vocoder, IEEE Trans Acoust, Speech, Signal Processing, vol 36, pp , Aug 1988 [7] W B Kleijn, Encoding speech using prototype waveforms, IEEE Trans Speech Audio Processing, vol 1, pp , July 1993 [8] W B Kleijn and J Haagen, Transformation and decomposition of the speech signal for coding, IEEE Signal Processing Lett, vol 1, pp , Sept 1994 [9] O Gottesman and A Gersho, Enhancing waveform interpolative coding with weighted REW parametric quantization, in Proc IEEE Speech Coding Workshop, Sept 2000, pp [10] B Atal and M Schroeder, Predictive coding of speech signals and subjective error criteria, IEEE Trans Acoust, Speech, Signal Processing, vol ASSP-27, pp , Mar 1979 [11] Y Ephraim and D Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEE Trans Acoust, Speech, Signal Processing, vol 32, pp , Dec 1984 [12] J Gibson, B Koo, and S Gray, Filtering of colored noise for speech enhancement and coding, IEEE Trans Signal Processing, vol 39, pp , Aug 1991 [13] Y Ephraim, Statistical-model-based speech enhancement systems, Proc IEEE, vol 80, pp , 1992 [14] N S Kim and J -H Chang, Spectral enhancement based on global soft decision, IEEE Signal Processing Lett, vol 7, pp , May 2000 [15] W B Kleijn, R P Ramachandran, and P Kroon, Interpolation of the pitch-predictor parameters in analysis-by-synthesis coders, IEEE Trans Speech Audio Processing, vol 2, pp 42 54, Jan 1994 [16] J Jensen, S H Jensen, and E hansen, A perturbation-based pre-processing algorithm for CELP-coders, in Proc IEEE Speech Coding Workshop, June 1999, pp [17] N S Kim and J -H Chang, A preprocessor for low-bit-rate speech coding, IEEE Signal Processing Lett, vol 9, pp , Oct 2002 [18] S Haykin, Adaptive Filter Theory Englewood Cliffs, NJ: Prentice- Hall, 1991 [19] A P Varga, H J M Steeneken, T Tomlinson, and D Jones, The NOISEX-92 Study on the Effect of Additive Noise on Automatic Speech Recognition, DRA Speech Res Unit, 1992 [20] J Sohn, N S Kim, and W Sung, A statistical model-based voice activity detection, IEEE Signal Processing Lett, vol 6, pp 1 2, Jan 1999

10 18 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 12, NO 1, JANUARY 2004 Nam Soo Kim (M 88) received the BS degree in electronics engineering from Seoul National University (SNU), Seoul, Korea, in 1988, and the MS and PhD degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST) in 1990 and 1994, respectively From 1994 to 1998, he was with Samsung Advanced Institute of Technology (SAIT) as a Senior Member of Technical Staff Since 1998, he has been with the School of Electrical Engineering, SNU, where he is currently an Associate Professor His research area includes speech signal processing, speech recognition, speech/audio coding, speech synthesis, adaptive signal processing, machine learning, and mobile communication Joon-Hyuk Chang (M 02) received the BS degree in electronics engineering from Kyung-pook National University, Korea, in 1998, and the MS degree in electrical engineering from Seoul National University, Seoul, Korea, in 2000 He is currently pursuing the PhD degree in electrical engineering at the Seoul National University Since 2000, he has been with Netdus Corp, Seoul, as an Associate Engineer Since January 2003, he has served as CTO at Netdus His research area includes speech signal processing, speech/image coding, speech enhancement, and adaptive filtering

RECENTLY, there has been an increasing interest in noisy

RECENTLY, there has been an increasing interest in noisy IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

Chapter IV THEORY OF CELP CODING

Chapter IV THEORY OF CELP CODING Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Speech Enhancement Using a Mixture-Maximum Model

Speech Enhancement Using a Mixture-Maximum Model IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 6, SEPTEMBER 2002 341 Speech Enhancement Using a Mixture-Maximum Model David Burshtein, Senior Member, IEEE, and Sharon Gannot, Member, IEEE

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,

More information

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Compression Using Voice Excited Linear Predictive Coding Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Audio Signal Compression using DCT and LPC Techniques

Audio Signal Compression using DCT and LPC Techniques Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 1 Electronics and Communication Department, Parul institute of engineering and technology, Vadodara,

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement 1 Zeeshan Hashmi Khateeb, 2 Gopalaiah 1,2 Department of Instrumentation

More information

Flexible and Scalable Transform-Domain Codebook for High Bit Rate CELP Coders

Flexible and Scalable Transform-Domain Codebook for High Bit Rate CELP Coders Flexible and Scalable Transform-Domain Codebook for High Bit Rate CELP Coders Václav Eksler, Bruno Bessette, Milan Jelínek, Tommy Vaillancourt University of Sherbrooke, VoiceAge Corporation Montreal, QC,

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

On-Line Dead-Time Compensation Method Based on Time Delay Control

On-Line Dead-Time Compensation Method Based on Time Delay Control IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 11, NO. 2, MARCH 2003 279 On-Line Dead-Time Compensation Method Based on Time Delay Control Hyun-Soo Kim, Kyeong-Hwa Kim, and Myung-Joong Youn Abstract

More information

6/29 Vol.7, No.2, February 2012

6/29 Vol.7, No.2, February 2012 Synthesis Filter/Decoder Structures in Speech Codecs Jerry D. Gibson, Electrical & Computer Engineering, UC Santa Barbara, CA, USA gibson@ece.ucsb.edu Abstract Using the Shannon backward channel result

More information

Auditory modelling for speech processing in the perceptual domain

Auditory modelling for speech processing in the perceptual domain ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

Hungarian Speech Synthesis Using a Phase Exact HNM Approach Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

Voice Excited Lpc for Speech Compression by V/Uv Classification

Voice Excited Lpc for Speech Compression by V/Uv Classification IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 65-69 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Voice Excited Lpc for Speech

More information

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Project Proposal Avner Halevy Department of Mathematics University of Maryland, College Park ahalevy at math.umd.edu

More information

HUMAN speech is frequently encountered in several

HUMAN speech is frequently encountered in several 1948 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 7, SEPTEMBER 2012 Enhancement of Single-Channel Periodic Signals in the Time-Domain Jesper Rindom Jensen, Student Member,

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 50, NO. 12, DECEMBER

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 50, NO. 12, DECEMBER IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 50, NO. 12, DECEMBER 2002 1865 Transactions Letters Fast Initialization of Nyquist Echo Cancelers Using Circular Convolution Technique Minho Cheong, Student Member,

More information

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile 8 2. LITERATURE SURVEY The available radio spectrum for the wireless radio communication is very limited hence to accommodate maximum number of users the speech is compressed. The speech compression techniques

More information

Time-Frequency Distributions for Automatic Speech Recognition

Time-Frequency Distributions for Automatic Speech Recognition 196 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 3, MARCH 2001 Time-Frequency Distributions for Automatic Speech Recognition Alexandros Potamianos, Member, IEEE, and Petros Maragos, Fellow,

More information

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley University of California Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Professors : N.Morgan / B.Gold EE225D Spring,1999 Medium & High Rate Coding Lecture 26

More information

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech

More information

On the Estimation of Interleaved Pulse Train Phases

On the Estimation of Interleaved Pulse Train Phases 3420 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 48, NO. 12, DECEMBER 2000 On the Estimation of Interleaved Pulse Train Phases Tanya L. Conroy and John B. Moore, Fellow, IEEE Abstract Some signals are

More information

ROBUST echo cancellation requires a method for adjusting

ROBUST echo cancellation requires a method for adjusting 1030 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 On Adjusting the Learning Rate in Frequency Domain Echo Cancellation With Double-Talk Jean-Marc Valin, Member,

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

A Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder

A Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder A Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder Jing Wang, Jingg Kuang, and Shenghui Zhao Research Center of Digital Communication Technology,Department of Electronic

More information

Optimal Adaptive Filtering Technique for Tamil Speech Enhancement

Optimal Adaptive Filtering Technique for Tamil Speech Enhancement Optimal Adaptive Filtering Technique for Tamil Speech Enhancement Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore,

More information

FOURIER analysis is a well-known method for nonparametric

FOURIER analysis is a well-known method for nonparametric 386 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 54, NO. 1, FEBRUARY 2005 Resonator-Based Nonparametric Identification of Linear Systems László Sujbert, Member, IEEE, Gábor Péceli, Fellow,

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

Sound Synthesis Methods

Sound Synthesis Methods Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like

More information

REAL-TIME BROADBAND NOISE REDUCTION

REAL-TIME BROADBAND NOISE REDUCTION REAL-TIME BROADBAND NOISE REDUCTION Robert Hoeldrich and Markus Lorber Institute of Electronic Music Graz Jakoministrasse 3-5, A-8010 Graz, Austria email: robert.hoeldrich@mhsg.ac.at Abstract A real-time

More information

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators

Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators 374 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 52, NO. 2, MARCH 2003 Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators Jenq-Tay Yuan

More information

Transcoding of Narrowband to Wideband Speech

Transcoding of Narrowband to Wideband Speech University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2005 Transcoding of Narrowband to Wideband Speech Christian H. Ritz University

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals ISCA Journal of Engineering Sciences ISCA J. Engineering Sci. Vocoder (LPC) Analysis by Variation of Input Parameters and Signals Abstract Gupta Rajani, Mehta Alok K. and Tiwari Vebhav Truba College of

More information

SNR Scalability, Multiple Descriptions, and Perceptual Distortion Measures

SNR Scalability, Multiple Descriptions, and Perceptual Distortion Measures SNR Scalability, Multiple Descriptions, Perceptual Distortion Measures Jerry D. Gibson Department of Electrical & Computer Engineering University of California, Santa Barbara gibson@mat.ucsb.edu Abstract

More information

HARMONIC INSTABILITY OF DIGITAL SOFT CLIPPING ALGORITHMS

HARMONIC INSTABILITY OF DIGITAL SOFT CLIPPING ALGORITHMS HARMONIC INSTABILITY OF DIGITAL SOFT CLIPPING ALGORITHMS Sean Enderby and Zlatko Baracskai Department of Digital Media Technology Birmingham City University Birmingham, UK ABSTRACT In this paper several

More information

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC Jimmy Lapierre 1, Roch Lefebvre 1, Bruno Bessette 1, Vladimir Malenovsky 1, Redwan Salami 2 1 Université de Sherbrooke, Sherbrooke (Québec),

More information

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE - @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu

More information

Phase estimation in speech enhancement unimportant, important, or impossible?

Phase estimation in speech enhancement unimportant, important, or impossible? IEEE 7-th Convention of Electrical and Electronics Engineers in Israel Phase estimation in speech enhancement unimportant, important, or impossible? Timo Gerkmann, Martin Krawczyk, and Robert Rehr Speech

More information

An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec

An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec Akira Nishimura 1 1 Department of Media and Cultural Studies, Tokyo University of Information Sciences,

More information

Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition

Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition Chanwoo Kim 1 and Richard M. Stern Department of Electrical and Computer Engineering and Language Technologies

More information

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter Sana Alaya, Novlène Zoghlami and Zied Lachiri Signal, Image and Information Technology Laboratory National Engineering School

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds

More information

SPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING

SPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING SPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING K.Ramalakshmi Assistant Professor, Dept of CSE Sri Ramakrishna Institute of Technology, Coimbatore R.N.Devendra Kumar Assistant

More information

Speech Enhancement Based on Audible Noise Suppression

Speech Enhancement Based on Audible Noise Suppression IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 5, NO. 6, NOVEMBER 1997 497 Speech Enhancement Based on Audible Noise Suppression Dionysis E. Tsoukalas, John N. Mourjopoulos, Member, IEEE, and George

More information

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Vol., No. 6, 0 Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Zhixin Chen ILX Lightwave Corporation Bozeman, Montana, USA chen.zhixin.mt@gmail.com Abstract This paper

More information

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,

More information

ADDITIVE synthesis [1] is the original spectrum modeling

ADDITIVE synthesis [1] is the original spectrum modeling IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 851 Perceptual Long-Term Variable-Rate Sinusoidal Modeling of Speech Laurent Girin, Member, IEEE, Mohammad Firouzmand,

More information

Wideband Speech Coding & Its Application

Wideband Speech Coding & Its Application Wideband Speech Coding & Its Application Apeksha B. landge. M.E. [student] Aditya Engineering College Beed Prof. Amir Lodhi. Guide & HOD, Aditya Engineering College Beed ABSTRACT: Increasing the bandwidth

More information

Speech Coding using Linear Prediction

Speech Coding using Linear Prediction Speech Coding using Linear Prediction Jesper Kjær Nielsen Aalborg University and Bang & Olufsen jkn@es.aau.dk September 10, 2015 1 Background Speech is generated when air is pushed from the lungs through

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Enhancement of Speech in Noisy Conditions

Enhancement of Speech in Noisy Conditions Enhancement of Speech in Noisy Conditions Anuprita P Pawar 1, Asst.Prof.Kirtimalini.B.Choudhari 2 PG Student, Dept. of Electronics and Telecommunication, AISSMS C.O.E., Pune University, India 1 Assistant

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

MULTICARRIER communication systems are promising

MULTICARRIER communication systems are promising 1658 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 52, NO. 10, OCTOBER 2004 Transmit Power Allocation for BER Performance Improvement in Multicarrier Systems Chang Soon Park, Student Member, IEEE, and Kwang

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor

More information

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 46 CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 3.1 INTRODUCTION Personal communication of today is impaired by nearly ubiquitous noise. Speech communication becomes difficult under these conditions; speech

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

Speech Coding in the Frequency Domain

Speech Coding in the Frequency Domain Speech Coding in the Frequency Domain Speech Processing Advanced Topics Tom Bäckström Aalto University October 215 Introduction The speech production model can be used to efficiently encode speech signals.

More information

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License Title Non-intrusive intelligibility prediction for Mandarin speech in noise Author(s) Chen, F; Guan, T Citation The 213 IEEE Region 1 Conference (TENCON 213), Xi'an, China, 22-25 October 213. In Conference

More information

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution PAGE 433 Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution Wenliang Lu, D. Sen, and Shuai Wang School of Electrical Engineering & Telecommunications University of New South Wales,

More information

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition International Conference on Advanced Computer Science and Electronics Information (ICACSEI 03) On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition Jongkuk Kim, Hernsoo Hahn Department

More information

TIME encoding of a band-limited function,,

TIME encoding of a band-limited function,, 672 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 8, AUGUST 2006 Time Encoding Machines With Multiplicative Coupling, Feedforward, and Feedback Aurel A. Lazar, Fellow, IEEE

More information

Synthesis Algorithms and Validation

Synthesis Algorithms and Validation Chapter 5 Synthesis Algorithms and Validation An essential step in the study of pathological voices is re-synthesis; clear and immediate evidence of the success and accuracy of modeling efforts is provided

More information

Adaptive Filters Application of Linear Prediction

Adaptive Filters Application of Linear Prediction Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing

More information

for Single-Tone Frequency Tracking H. C. So Department of Computer Engineering & Information Technology, City University of Hong Kong,

for Single-Tone Frequency Tracking H. C. So Department of Computer Engineering & Information Technology, City University of Hong Kong, A Comparative Study of Three Recursive Least Squares Algorithms for Single-Tone Frequency Tracking H. C. So Department of Computer Engineering & Information Technology, City University of Hong Kong, Tat

More information

Interpolation Error in Waveform Table Lookup

Interpolation Error in Waveform Table Lookup Carnegie Mellon University Research Showcase @ CMU Computer Science Department School of Computer Science 1998 Interpolation Error in Waveform Table Lookup Roger B. Dannenberg Carnegie Mellon University

More information

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday. L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are

More information

Transmit Power Allocation for BER Performance Improvement in Multicarrier Systems

Transmit Power Allocation for BER Performance Improvement in Multicarrier Systems Transmit Power Allocation for Performance Improvement in Systems Chang Soon Par O and wang Bo (Ed) Lee School of Electrical Engineering and Computer Science, Seoul National University parcs@mobile.snu.ac.r,

More information

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM Mr. M. Mathivanan Associate Professor/ECE Selvam College of Technology Namakkal, Tamilnadu, India Dr. S.Chenthur

More information

Estimation of Non-stationary Noise Power Spectrum using DWT

Estimation of Non-stationary Noise Power Spectrum using DWT Estimation of Non-stationary Noise Power Spectrum using DWT Haripriya.R.P. Department of Electronics & Communication Engineering Mar Baselios College of Engineering & Technology, Kerala, India Lani Rachel

More information

SPEECH enhancement has many applications in voice

SPEECH enhancement has many applications in voice 1072 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 8, AUGUST 1998 Subband Kalman Filtering for Speech Enhancement Wen-Rong Wu, Member, IEEE, and Po-Cheng

More information

An Equalization Technique for Orthogonal Frequency-Division Multiplexing Systems in Time-Variant Multipath Channels

An Equalization Technique for Orthogonal Frequency-Division Multiplexing Systems in Time-Variant Multipath Channels IEEE TRANSACTIONS ON COMMUNICATIONS, VOL 47, NO 1, JANUARY 1999 27 An Equalization Technique for Orthogonal Frequency-Division Multiplexing Systems in Time-Variant Multipath Channels Won Gi Jeon, Student

More information

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement Modulator Domain Adaptive Gain Equalizer for Speech Enhancement Ravindra d. Dhage, Prof. Pravinkumar R.Badadapure Abstract M.E Scholar, Professor. This paper presents a speech enhancement method for personal

More information

Audio Compression using the MLT and SPIHT

Audio Compression using the MLT and SPIHT Audio Compression using the MLT and SPIHT Mohammed Raad, Alfred Mertins and Ian Burnett School of Electrical, Computer and Telecommunications Engineering University Of Wollongong Northfields Ave Wollongong

More information

Audio Imputation Using the Non-negative Hidden Markov Model

Audio Imputation Using the Non-negative Hidden Markov Model Audio Imputation Using the Non-negative Hidden Markov Model Jinyu Han 1,, Gautham J. Mysore 2, and Bryan Pardo 1 1 EECS Department, Northwestern University 2 Advanced Technology Labs, Adobe Systems Inc.

More information

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation

More information

TRANSFORMS / WAVELETS

TRANSFORMS / WAVELETS RANSFORMS / WAVELES ransform Analysis Signal processing using a transform analysis for calculations is a technique used to simplify or accelerate problem solution. For example, instead of dividing two

More information

MITIGATING INTERFERENCE TO GPS OPERATION USING VARIABLE FORGETTING FACTOR BASED RECURSIVE LEAST SQUARES ESTIMATION

MITIGATING INTERFERENCE TO GPS OPERATION USING VARIABLE FORGETTING FACTOR BASED RECURSIVE LEAST SQUARES ESTIMATION MITIGATING INTERFERENCE TO GPS OPERATION USING VARIABLE FORGETTING FACTOR BASED RECURSIVE LEAST SQUARES ESTIMATION Aseel AlRikabi and Taher AlSharabati Al-Ahliyya Amman University/Electronics and Communications

More information