DECOMPOSITION OF SPEECH INTO VOICED AND UNVOICED COMPONENTS BASED ON A KALMAN FILTERBANK

Size: px
Start display at page:

Download "DECOMPOSITION OF SPEECH INTO VOICED AND UNVOICED COMPONENTS BASED ON A KALMAN FILTERBANK"

Transcription

1 DECOMPOSITIO OF SPEECH ITO VOICED AD UVOICED COMPOETS BASED O A KALMA FILTERBAK Mark Thomson, Simon Boland, Michael Smithers 3, Mike Wu & Julien Epps Motorola Labs, Botany, SW 09 Cross Avaya R & D, orth Ryde, SW 3 3 Dolby Laboratories, San Francisco, USA ABSTRACT: We present a novel method for decomposing speech into signals representing the voiced and unvoiced components of speech. The method involves first demodulating the variations in spectral envelope, energy and pitch, and then applying a bank of Kalman filters to separate the harmonic and non-harmonic components of the signal. The use of Kalman filters relies on a state-space representation of the composite signal, and provides a way to accurately estimate the harmonic component without the large delay required by a linear phase comb filter. However it also requires a priori knowledge of the variance of the unvoiced component and the state transition parameters. We present a novel method to accurately determine these parameters based on a variant of the Expectation-Maximisation algorithm. ITRODUCTIO The distinction between voiced and unvoiced sounds is important in many areas of speech technology. In speech coding, for example, different mechanisms are often used to encode the voiced and unvoiced parts of speech (Kleijn and Haagen, 994). In some methods of speech enhancement, the quasiperiodic nature of voiced speech is used to design of an optimal filter that separates speech from additive noise (Goh et al, 999). In speech recognition, knowledge of the temporal structure of the cycles of voiced speech can be used to process the speech in such a way that the impact of additive noise on feature extraction is reduced (Macho and Cheng, 00). Knowledge of the pitch of voiced speech is also useful in speech recognition for tonal languages, such as Mandarin (Zhang et al, 00). In some applications, it is sufficient to assume that any particular segment of speech is either purely voiced or purely unvoiced, and to classify segments into one of these two categories. This was true, for example, in early low bit rate vocoders (Campbell and Tremain, 986). However, in reality many segments of speech contain both quasiperiodic and noise-like energy, and many processing methods are designed to exploit this. In some cases, what is required is simply a determination of the degree of voicing. In mixed excitation linear prediction (MELP) coding (McCree and Barnwell, 995) and multiband excitation (MBE) coding (Griffin and Lim, 988), for example, a frequency-dependent measure of the strength of voicing is used to control the relative amount of periodic and non-periodic energy in the excitation of a linear prediction filter. In other cases, an attempt is made to explicitly separate the voiced and unvoiced components. In codebook-excited linear predictive (CELP) speech coders, this is achieved through an analysis-bysynthesis procedure (Gerson and Jasiuk, 99). Speech is generated by exciting a short-term linear prediction filter with a combination of signals from both an adaptive codebook, representing voiced energy, and a fixed codebook representing unvoiced energy. Minimisation of the perceptually weighted difference between the synthesised and input speech is used to estimate the two components. Several alternative approaches are possible. One is to use a linear comb filter to isolate the voiced component based on its harmonic structure. This is similar to the practice of using a low pass filter to separate slowly evolving and rapidly evolving components of the pitch cycle in interpolation-based coding (Kleijn and Haagen, 994). One limitation of this approach, however, is that its effectiveness depends on having a filter with a sharp roll-off, which requires a long impulse response. The implication of this is that the decomposition process requires a relatively large delay, which is Melbourne, December to 5, 00. Australian Speech Science & Technology Association Inc. Accepted after full review page 8

2 undesirable in some applications, such as speech coding, and also creates difficulties in dealing with rapid transitions. Achieving good decomposition without a large delay requires the use of more a priori knowledge about signal behaviour. One approach is to impose a deterministic parametric model on the evolution of the harmonic coefficients (Stylianou, 996). However the signal model is then highly non-linear, and parameter estimation becomes very complex. Stochastic models of signal evolution have been suggested by both Gruber and Tödtli (994) and Stachurski (997). However both also involve very complex estimation processes. In this paper we present a new method of decomposition that is also based on a stochastic model, but which is much simpler to implement, and also permits more control over the behaviour of the decomposition. The approach involves using a bank of Kalman filters, each corresponding to one sample in a normalised pitch period. SIGAL MODELIG AD ESTIMATIO In keeping with usual practice, we represent speech as the response of an autoregressive (AR) system, representing the vocal tract filter, to an input signal representing the acoustic energy generated by both vocal fold vibration and turbulent airflow: M z k = ai zk i + g. yk i= () and y k + vk = () where g is a gain factor, x k is a quasiperiodic signal, and v k is an uncorrelated Gaussian random variable with variance σ v. The response of the vocal tract filter to each of the two components, x k and v k, constitute the voiced and unvoiced components of speech respectively. Fundamental to our method of decomposing z k into these components is the way that x k is modelled. The component is assumed to evolve according to = α (3) T + wk T where T is the period of x k, w k is an uncorrelated Gaussian random variable with variance α is a gain value. Based on this model, the overall decomposition process is depicted in Figure. In our implementation, processing is carried out on a frame-by-frame basis with frames of 0ms duration. We begin by demodulating the variation in the energy, spectral envelope, and pitch of the signal. Energy is estimated on a subframe basis (4 subframes/frame). Demodulation of the spectral envelope variation is achieved by applying an inverse filter estimated once per frame by linear prediction. The linear prediction residual is used to estimate the pitch period, and the period is used to time-warp the signal to a fixed period. The demodulated signal is an approximation, ŷ k, of y k. Equations () and (3) together constitute a state space representation of this signal, with w k representing the process noise and v k the observation noise. Based on this, a Kalman filter can be used to estimate the state variable x k, by means of the following recursion. σ w, and k k = α + K( yˆ α ) (4) k k k Melbourne, December to 5, 00. Australian Speech Science & Technology Association Inc. Accepted after full review page 9

3 Parameter Estimation z k Demodulation ŷ k α Kalman Filterbank σ v k Modulation Voiced component Energy Period LPC parameters + _ Modulation Unvoiced component Figure : Decomposition System where K = Σ k k T ( Σ k + σ v ) (5) is the Kalman gain, Σ k k T = α Σ + σ w (6) is the variance of the error in the predicted state estimate, k, and Σ k k = ( K) Σ k (7) is the variance in the error in the filtered state estimate, k k. σ w may be chosen to control the rate at which the estimated quasiperiodic component evolves. However α and σ v must be estimated from the input data. We describe a new method to do this in the next section. Since the state variable is different for each sample in the period, estimation of the entire period essentially constitutes a bank of multiple scalar Kalman filters. The smoothing form of the Kalman filter may also be used to take advantage of future pitch cycles to estimate each current sample. The observation noise is estimated as vˆ k = ( yˆ k k ). The estimated quasiperiodic and noisy components can then be remodulated using the estimated period, LPC filter and energy to produce the voiced and unvoiced components of the speech. For the decomposition to work effectively, it is essential that when quasiperiodic energy is present in the signal, its period be known accurately. This requires not only that the resolution of the period estimate be sufficiently high, but also that the estimation method be able to track variations in period over sufficiently short time intervals. In order to ensure that x k and T are always optimally aligned, it needs to be possible to track variations in period within a pitch cycle. To achieve this we have used a dynamic programming approach, with a path metric composed of an accumulated average magnitude difference function with an additional term to penalise inappropriate variations in period. The model on which our method is based has some similarity to those in Gruber and Tödtli, J. (994). Melbourne, December to 5, 00. Australian Speech Science & Technology Association Inc. Accepted after full review page 0

4 However the use here of a time domain state representation makes it possible to use only scalar Kalman filter estimators, resulting in significantly lower complexity. In addition, the methods in Gruber and Tödtli (994) explicitly assumed that σ v is known in advance, and made no allowance for an explicit state transition gain α. The latter point is particularly important in decomposing speech, because the overall amplitude of consecutive cycles can change more rapidly than their shape. The model developed in Stachurski (997) is almost identical to that described by () and (3), but again there was no allowance for a variable transition gain, and also no provision for explicitly controlling σ w. In addition, because σ v was not known or determined prior to decomposition, it was not possible to use a Kalman filter for signal estimation. Instead a much more complex algorithm was proposed based on singular value decomposition. ESTIMATIO OF DYAMICAL SYSTEM PARAMETERS Good estimates of α and σ v are critically important in order for the decomposition described above to be effective. An iterative method for determining parameters, θ, of a general linear dynamic system from observations of its output was developed by Digalakis et al (994) based on the Expectation Maximization (EM) algorithm. Each iteration involves maximizing the expected joint log likelihood of the observed data sequence and the unknown state sequence conditioned on the observed data and the previous estimate of θ. In our application, we only require estimates of α and σ v. Using the procedure described by Digalakis et al, the values that maximize the expected joint log likelihood are: [ Σ ] + k T α = E{ x } (8) ( yk yk ) σ v = ˆ (9) k = where represents a fixed interval over which α and σ v are assumed constant, and Σ is the covariance of ( T ). The expectation in (8) should be understood to be conditioned on both the observed data up to and initial estimates of α and σ v. The effectiveness of the recursion defined by (8) and (9) depends significantly on the accuracy of the initial estimates of α and σ v. Inaccurate starting values will lead to slow convergence, and may cause the algorithm to converge to a local optimum. Although no method to obtain initial estimates was suggested by Digalakis et al (994), this was not a significant problem there since the application of interest was training of acoustic models for speech recognition. In that situation, estimation occurs off-line. However in the current application, α and σ v vary throughout the speech waveform, and must be estimated in real-time. We present here a method to obtain these values using only the observed data and past values of the estimated state sequence. The method is derived from the recursion equations above, and relies on an assumption that the interval over which α and σ v are constant is no more that one period. Although, in principal these estimates may be used as initial values for subsequent EM iterations, in our experience they are generally sufficiently accurate themselves, without resorting to further recursion. To estimate α, we first note that since v k is uncorrelated with T the expectation in the numerator of (8) can be written as E {( yk vk ) T } = yk. The smoothed state estimate x ˆ k T, which Melbourne, December to 5, 00. Australian Speech Science & Technology Association Inc. Accepted after full review page

5 also appears in the denominator, is not known a priori. However, in the mean over the interval from K, x ˆ k T is well approximated by the filtered estimate, k T k T. In addition, the error variance, Σ can be expected to be small compared with k T yk. Thus α can be approximated by α (0) Provided the summation interval in (0) is no more than one period, all terms in the right hand side are known. Using α computed from (0), σ v can be found as follows. Again assuming that k, then x ˆ k in (9) is equivalent to x ˆ k k. Using (4) to compute this value results in k σ v v = ( y k αyk k T k T ) Σ k + σ v σ () Σ is determined from (6). () can be manipulated to produce a quadratic in v = the signal is not noise-free ( σ 0 ), the value of σ v that satisfies this is [ y αy x ] Σ k v = k k ˆk T k T σ v. Assuming that σ () RESULTS AD DISCUSSIO Figure illustrates the application of our algorithm to a segment of speech consisting of a dominant unvoiced component followed by a dominant voiced component. The smoothing form of the Kalman filter was used with a two period look-ahead. The results show that the algorithm successfully decomposes the speech, with strong attenuation of noisy energy in the voiced component, and no visible harmonic energy in the unvoiced component. The presence of unvoiced signal energy during segments that would generally be classified as voiced is significant. Listening tests indicate that the unvoiced component retains the intelligibility of the original speech, but with a whispered quality. COCLUSIOS We have presented a novel method for decomposing speech into voiced and unvoiced components in the time domain. The algorithm is distinctive in its use of a Kalman filterbank, based on dynamical system parameters estimated on-line using a form of the Expectation-Maximisation algorithm. ACKOWLEDGEMETS This work was performed while the authors were all with Motorola. Simon Boland is now with Cross Avaya Research and Development and Michael Smithers is with Dolby Laboratories. REFERECES Campbell, J. P. Jr & Tremain, T. R. (986), Voiced/unvoiced classification of speech with applications to the US Government LPC0E Algorithm, Proceedings of the International Conference on Acoustic Speech and Signal Processing, pp Melbourne, December to 5, 00. Australian Speech Science & Technology Association Inc. Accepted after full review page

6 Figure : (top to bottom) speech waveform, estimated voiced component, estimated unvoiced component Digalakis, V., Rohlicek, J. R. & Ostendorf, M. (993), ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition, IEEE Transactions on Speech and Audio Processing, Vol. o. 4, pp Gerson, I. A. & Jasiuk, M. A. (99), Techniques for improving the performance of CELP-type speech coders, IEEE Journal on Selected Areas in Communications, Vol. 0, o. 5, pp Goh, Z., Tan, K.-C. & Tan, B. T. G. (999), Kalman filtering speech enhancement method based on a voiced-unvoiced speech modei, IEEE Transactions on Speech and Audio Processing, Vol. 7, o. 5, pp Griffin, D. W. & Lim, J. S (988)., Multiband excitation vocoder, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol 36, o. 8, pp Gruber, P. & Tödtli, J. (994), Estimation of quasiperiodic signal parameters by means of dynamic signal models, IEEE Transactions on Signal Processing, Vol. 4, o. 3, pp Kleijn, W. B. & Haagen, J. (994), Transformation and decomposition of the speech signal for coding, IEEE Signal Processing Letters, Vol., o. 9, pp Macho, D., & Cheng, Y.-M. (00), SR-dependent waveform processing for improving the robustness of ASR front-end, Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Vol., pp McCree, A. V. & Barnwell, T. P. III (995), A mixed excitation LPC vocoder model for low bit rate speech coding, IEEE Transactions on Speech and Audio Processing, Vol. 3, o. 4, pp Stachurski, J. (997), A Pitch Pulse Evolution Model for Linear Predictive Coding of Speech, Ph.D. Thesis, McGill University, Montreal, Canada. Stylianou, Y. (996), Efficient decomposition of speech signals into a deterministic and a stochastic part, Proceedings of the International Symposium on Signal Processing and its Applications, Vol., pp Zhang, Y., Madievski, A., Lawrence, J. & Song, J., A study of tone statistics in Chinese names, Speech Communication, Vol. 36, pp Melbourne, December to 5, 00. Australian Speech Science & Technology Association Inc. Accepted after full review page 3

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

L19: Prosodic modification of speech

L19: Prosodic modification of speech L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances

More information

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

ACCURATE SPEECH DECOMPOSITION INTO PERIODIC AND APERIODIC COMPONENTS BASED ON DISCRETE HARMONIC TRANSFORM

ACCURATE SPEECH DECOMPOSITION INTO PERIODIC AND APERIODIC COMPONENTS BASED ON DISCRETE HARMONIC TRANSFORM 5th European Signal Processing Conference (EUSIPCO 007), Poznan, Poland, September 3-7, 007, copyright by EURASIP ACCURATE SPEECH DECOMPOSITIO ITO PERIODIC AD APERIODIC COMPOETS BASED O DISCRETE HARMOIC

More information

Analysis/synthesis coding

Analysis/synthesis coding TSBK06 speech coding p.1/32 Analysis/synthesis coding Many speech coders are based on a principle called analysis/synthesis coding. Instead of coding a waveform, as is normally done in general audio coders

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

Improved signal analysis and time-synchronous reconstruction in waveform interpolation coding

Improved signal analysis and time-synchronous reconstruction in waveform interpolation coding University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2000 Improved signal analysis and time-synchronous reconstruction in waveform

More information

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Compression Using Voice Excited Linear Predictive Coding Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality

More information

Chapter IV THEORY OF CELP CODING

Chapter IV THEORY OF CELP CODING Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,

More information

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Low Bit Rate Speech Coding

Low Bit Rate Speech Coding Low Bit Rate Speech Coding Jaspreet Singh 1, Mayank Kumar 2 1 Asst. Prof.ECE, RIMT Bareilly, 2 Asst. Prof.ECE, RIMT Bareilly ABSTRACT Despite enormous advances in digital communication, the voice is still

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,

More information

Auditory modelling for speech processing in the perceptual domain

Auditory modelling for speech processing in the perceptual domain ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o

More information

Tracking of Rapidly Time-Varying Sparse Underwater Acoustic Communication Channels

Tracking of Rapidly Time-Varying Sparse Underwater Acoustic Communication Channels Tracking of Rapidly Time-Varying Sparse Underwater Acoustic Communication Channels Weichang Li WHOI Mail Stop 9, Woods Hole, MA 02543 phone: (508) 289-3680 fax: (508) 457-2194 email: wli@whoi.edu James

More information

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied

More information

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation

More information

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

Hungarian Speech Synthesis Using a Phase Exact HNM Approach Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

Comparison of CELP speech coder with a wavelet method

Comparison of CELP speech coder with a wavelet method University of Kentucky UKnowledge University of Kentucky Master's Theses Graduate School 2006 Comparison of CELP speech coder with a wavelet method Sriram Nagaswamy University of Kentucky, sriramn@gmail.com

More information

IN RECENT YEARS, there has been a great deal of interest

IN RECENT YEARS, there has been a great deal of interest IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 12, NO 1, JANUARY 2004 9 Signal Modification for Robust Speech Coding Nam Soo Kim, Member, IEEE, and Joon-Hyuk Chang, Member, IEEE Abstract Usually,

More information

Sound Synthesis Methods

Sound Synthesis Methods Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like

More information

A Spectral Conversion Approach to Single- Channel Speech Enhancement

A Spectral Conversion Approach to Single- Channel Speech Enhancement University of Pennsylvania ScholarlyCommons Departmental Papers (ESE) Department of Electrical & Systems Engineering May 2007 A Spectral Conversion Approach to Single- Channel Speech Enhancement Athanasios

More information

Pitch Period of Speech Signals Preface, Determination and Transformation

Pitch Period of Speech Signals Preface, Determination and Transformation Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com

More information

ARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION

ARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION ARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION Tenkasi Ramabadran and Mark Jasiuk Motorola Labs, Motorola Inc., 1301 East Algonquin Road, Schaumburg, IL 60196,

More information

Voice Excited Lpc for Speech Compression by V/Uv Classification

Voice Excited Lpc for Speech Compression by V/Uv Classification IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 65-69 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Voice Excited Lpc for Speech

More information

Scalable speech coding spanning the 4 Kbps divide

Scalable speech coding spanning the 4 Kbps divide University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2003 Scalable speech coding spanning the 4 Kbps divide J Lukasiak University

More information

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

A Parametric Model for Spectral Sound Synthesis of Musical Sounds A Parametric Model for Spectral Sound Synthesis of Musical Sounds Cornelia Kreutzer University of Limerick ECE Department Limerick, Ireland cornelia.kreutzer@ul.ie Jacqueline Walker University of Limerick

More information

YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION

YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION American Journal of Engineering and Technology Research Vol. 3, No., 03 YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION Yinan Kong Department of Electronic Engineering, Macquarie University

More information

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Vikram Ramesh Lakkavalli, K V Vijay Girish, A G Ramakrishnan Medical Intelligence and Language Engineering (MILE) Laboratory

More information

APPLICATIONS OF DSP OBJECTIVES

APPLICATIONS OF DSP OBJECTIVES APPLICATIONS OF DSP OBJECTIVES This lecture will discuss the following: Introduce analog and digital waveform coding Introduce Pulse Coded Modulation Consider speech-coding principles Introduce the channel

More information

Page 0 of 23. MELP Vocoder

Page 0 of 23. MELP Vocoder Page 0 of 23 MELP Vocoder Outline Introduction MELP Vocoder Features Algorithm Description Parameters & Comparison Page 1 of 23 Introduction Traditional pitched-excited LPC vocoders use either a periodic

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

Speech Coding using Linear Prediction

Speech Coding using Linear Prediction Speech Coding using Linear Prediction Jesper Kjær Nielsen Aalborg University and Bang & Olufsen jkn@es.aau.dk September 10, 2015 1 Background Speech is generated when air is pushed from the lungs through

More information

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis

More information

Spanning the 4 kbps divide using pulse modeled residual

Spanning the 4 kbps divide using pulse modeled residual University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2002 Spanning the 4 kbps divide using pulse modeled residual J Lukasiak

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

Lecture 9: Time & Pitch Scaling

Lecture 9: Time & Pitch Scaling ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 9: Time & Pitch Scaling 1. Time Scale Modification (TSM) 2. Time-Domain Approaches 3. The Phase Vocoder 4. Sinusoidal Approach Dan Ellis Dept. Electrical Engineering,

More information

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of COMPRESSIVE SAMPLING OF SPEECH SIGNALS by Mona Hussein Ramadan BS, Sebha University, 25 Submitted to the Graduate Faculty of Swanson School of Engineering in partial fulfillment of the requirements for

More information

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement 1 Zeeshan Hashmi Khateeb, 2 Gopalaiah 1,2 Department of Instrumentation

More information

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement Modulator Domain Adaptive Gain Equalizer for Speech Enhancement Ravindra d. Dhage, Prof. Pravinkumar R.Badadapure Abstract M.E Scholar, Professor. This paper presents a speech enhancement method for personal

More information

Adaptive Filters Application of Linear Prediction

Adaptive Filters Application of Linear Prediction Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing

More information

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands Audio Engineering Society Convention Paper Presented at the th Convention May 5 Amsterdam, The Netherlands This convention paper has been reproduced from the author's advance manuscript, without editing,

More information

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley University of California Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Professors : N.Morgan / B.Gold EE225D Spring,1999 Medium & High Rate Coding Lecture 26

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21 E85.267: Lecture 8 Source-Filter Processing E85.267: Lecture 8 Source-Filter Processing 21-4-1 1 / 21 Source-filter analysis/synthesis n f Spectral envelope Spectral envelope Analysis Source signal n 1

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Automatic Transcription of Monophonic Audio to MIDI

Automatic Transcription of Monophonic Audio to MIDI Automatic Transcription of Monophonic Audio to MIDI Jiří Vass 1 and Hadas Ofir 2 1 Czech Technical University in Prague, Faculty of Electrical Engineering Department of Measurement vassj@fel.cvut.cz 2

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition International Conference on Advanced Computer Science and Electronics Information (ICACSEI 03) On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition Jongkuk Kim, Hernsoo Hahn Department

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation

More information

The Channel Vocoder (analyzer):

The Channel Vocoder (analyzer): Vocoders 1 The Channel Vocoder (analyzer): The channel vocoder employs a bank of bandpass filters, Each having a bandwidth between 100 Hz and 300 Hz. Typically, 16-20 linear phase FIR filter are used.

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Sinusoidal Modelling in Speech Synthesis, A Survey.

Sinusoidal Modelling in Speech Synthesis, A Survey. Sinusoidal Modelling in Speech Synthesis, A Survey. A.S. Visagie, J.A. du Preez Dept. of Electrical and Electronic Engineering University of Stellenbosch, 7600, Stellenbosch avisagie@dsp.sun.ac.za, dupreez@dsp.sun.ac.za

More information

Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding

Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding Nanda Prasetiyo Koestoer B. Eng (Hon) (1998) School of Microelectronic Engineering Faculty of Engineering and Information Technology Griffith

More information

NCCF ACF. cepstrum coef. error signal > samples

NCCF ACF. cepstrum coef. error signal > samples ESTIMATION OF FUNDAMENTAL FREQUENCY IN SPEECH Petr Motl»cek 1 Abstract This paper presents an application of one method for improving fundamental frequency detection from the speech. The method is based

More information

Robust Algorithms For Speech Reconstruction On Mobile Devices

Robust Algorithms For Speech Reconstruction On Mobile Devices Robust Algorithms For Speech Reconstruction On Mobile Devices XU SHAO A Thesis presented for the degree of Doctor of Philosophy Speech Group School of Computing Sciences University of East Anglia England

More information

A METHOD OF SPEECH PERIODICITY ENHANCEMENT BASED ON TRANSFORM-DOMAIN SIGNAL DECOMPOSITION

A METHOD OF SPEECH PERIODICITY ENHANCEMENT BASED ON TRANSFORM-DOMAIN SIGNAL DECOMPOSITION 8th European Signal Processing Conference (EUSIPCO-2) Aalborg, Denmark, August 23-27, 2 A METHOD OF SPEECH PERIODICITY ENHANCEMENT BASED ON TRANSFORM-DOMAIN SIGNAL DECOMPOSITION Feng Huang, Tan Lee and

More information

Monophony/Polyphony Classification System using Fourier of Fourier Transform

Monophony/Polyphony Classification System using Fourier of Fourier Transform International Journal of Electronics Engineering, 2 (2), 2010, pp. 299 303 Monophony/Polyphony Classification System using Fourier of Fourier Transform Kalyani Akant 1, Rajesh Pande 2, and S.S. Limaye

More information

Drum Transcription Based on Independent Subspace Analysis

Drum Transcription Based on Independent Subspace Analysis Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,

More information

Cepstrum alanysis of speech signals

Cepstrum alanysis of speech signals Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP

More information

Defense Technical Information Center Compilation Part Notice

Defense Technical Information Center Compilation Part Notice UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADP010883 TITLE: The Turkish Narrow Band Voice Coding and Noise Pre-Processing NATO Candidate DISTRIBUTION: Approved for public

More information

HUMAN speech is frequently encountered in several

HUMAN speech is frequently encountered in several 1948 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 7, SEPTEMBER 2012 Enhancement of Single-Channel Periodic Signals in the Time-Domain Jesper Rindom Jensen, Student Member,

More information

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals ISCA Journal of Engineering Sciences ISCA J. Engineering Sci. Vocoder (LPC) Analysis by Variation of Input Parameters and Signals Abstract Gupta Rajani, Mehta Alok K. and Tiwari Vebhav Truba College of

More information

Converting Speaking Voice into Singing Voice

Converting Speaking Voice into Singing Voice Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

Digital Signal Representation of Speech Signal

Digital Signal Representation of Speech Signal Digital Signal Representation of Speech Signal Mrs. Smita Chopde 1, Mrs. Pushpa U S 2 1,2. EXTC Department, Mumbai University Abstract Delta modulation is a waveform coding techniques which the data rate

More information

ADAPTIVE IDENTIFICATION OF TIME-VARYING IMPULSE RESPONSE OF UNDERWATER ACOUSTIC COMMUNICATION CHANNEL IWONA KOCHAŃSKA

ADAPTIVE IDENTIFICATION OF TIME-VARYING IMPULSE RESPONSE OF UNDERWATER ACOUSTIC COMMUNICATION CHANNEL IWONA KOCHAŃSKA ADAPTIVE IDENTIFICATION OF TIME-VARYING IMPULSE RESPONSE OF UNDERWATER ACOUSTIC COMMUNICATION CHANNEL IWONA KOCHAŃSKA Gdańsk University of Technology Faculty of Electronics, Telecommuniations and Informatics

More information

INSTANTANEOUS FREQUENCY ESTIMATION FOR A SINUSOIDAL SIGNAL COMBINING DESA-2 AND NOTCH FILTER. Yosuke SUGIURA, Keisuke USUKURA, Naoyuki AIKAWA

INSTANTANEOUS FREQUENCY ESTIMATION FOR A SINUSOIDAL SIGNAL COMBINING DESA-2 AND NOTCH FILTER. Yosuke SUGIURA, Keisuke USUKURA, Naoyuki AIKAWA INSTANTANEOUS FREQUENCY ESTIMATION FOR A SINUSOIDAL SIGNAL COMBINING AND NOTCH FILTER Yosuke SUGIURA, Keisuke USUKURA, Naoyuki AIKAWA Tokyo University of Science Faculty of Science and Technology ABSTRACT

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

Quantisation mechanisms in multi-protoype waveform coding

Quantisation mechanisms in multi-protoype waveform coding University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 1996 Quantisation mechanisms in multi-protoype waveform coding

More information

Speech Coding in the Frequency Domain

Speech Coding in the Frequency Domain Speech Coding in the Frequency Domain Speech Processing Advanced Topics Tom Bäckström Aalto University October 215 Introduction The speech production model can be used to efficiently encode speech signals.

More information

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012 Signal segmentation and waveform characterization Biosignal processing, 5173S Autumn 01 Short-time analysis of signals Signal statistics may vary in time: nonstationary how to compute signal characterizations?

More information

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis Signal Analysis Music 27a: Signal Analysis Tamara Smyth, trsmyth@ucsd.edu Department of Music, University of California, San Diego (UCSD November 23, 215 Some tools we may want to use to automate analysis

More information

ROBUST echo cancellation requires a method for adjusting

ROBUST echo cancellation requires a method for adjusting 1030 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 On Adjusting the Learning Rate in Frequency Domain Echo Cancellation With Double-Talk Jean-Marc Valin, Member,

More information

Recent Advances in Acoustic Signal Extraction and Dereverberation

Recent Advances in Acoustic Signal Extraction and Dereverberation Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing

More information

ADAPTIVE NOISE LEVEL ESTIMATION

ADAPTIVE NOISE LEVEL ESTIMATION Proc. of the 9 th Int. Conference on Digital Audio Effects (DAFx-6), Montreal, Canada, September 18-2, 26 ADAPTIVE NOISE LEVEL ESTIMATION Chunghsin Yeh Analysis/Synthesis team IRCAM/CNRS-STMS, Paris, France

More information

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 Speech Processing Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 All course materials and handouts are the same for both versions. Differences: credits (20 for UG, 10 for PG);

More information

VQ Source Models: Perceptual & Phase Issues

VQ Source Models: Perceptual & Phase Issues VQ Source Models: Perceptual & Phase Issues Dan Ellis & Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,ronw}@ee.columbia.edu

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN-FRBA 2010 Adaptive Filters Stochastic Processes The term stochastic process is broadly used to describe a random process that generates sequential signals such as

More information

Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks

Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks C. S. Blackburn and S. J. Young Cambridge University Engineering Department (CUED), England email: csb@eng.cam.ac.uk

More information

Voice Conversion of Non-aligned Data using Unit Selection

Voice Conversion of Non-aligned Data using Unit Selection June 19 21, 2006 Barcelona, Spain TC-STAR Workshop on Speech-to-Speech Translation Voice Conversion of Non-aligned Data using Unit Selection Helenca Duxans, Daniel Erro, Javier Pérez, Ferran Diego, Antonio

More information

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE 1602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 8, NOVEMBER 2008 Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE Abstract

More information

Adaptive Filters Wiener Filter

Adaptive Filters Wiener Filter Adaptive Filters Wiener Filter Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory

More information

DESIGN AND IMPLEMENTATION OF AN ALGORITHM FOR MODULATION IDENTIFICATION OF ANALOG AND DIGITAL SIGNALS

DESIGN AND IMPLEMENTATION OF AN ALGORITHM FOR MODULATION IDENTIFICATION OF ANALOG AND DIGITAL SIGNALS DESIGN AND IMPLEMENTATION OF AN ALGORITHM FOR MODULATION IDENTIFICATION OF ANALOG AND DIGITAL SIGNALS John Yong Jia Chen (Department of Electrical Engineering, San José State University, San José, California,

More information

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP Monika S.Yadav Vidarbha Institute of Technology Rashtrasant Tukdoji Maharaj Nagpur University, Nagpur, India monika.yadav@rediffmail.com

More information