DECOMPOSITION OF SPEECH INTO VOICED AND UNVOICED COMPONENTS BASED ON A KALMAN FILTERBANK
|
|
- Candice Lester
- 6 years ago
- Views:
Transcription
1 DECOMPOSITIO OF SPEECH ITO VOICED AD UVOICED COMPOETS BASED O A KALMA FILTERBAK Mark Thomson, Simon Boland, Michael Smithers 3, Mike Wu & Julien Epps Motorola Labs, Botany, SW 09 Cross Avaya R & D, orth Ryde, SW 3 3 Dolby Laboratories, San Francisco, USA ABSTRACT: We present a novel method for decomposing speech into signals representing the voiced and unvoiced components of speech. The method involves first demodulating the variations in spectral envelope, energy and pitch, and then applying a bank of Kalman filters to separate the harmonic and non-harmonic components of the signal. The use of Kalman filters relies on a state-space representation of the composite signal, and provides a way to accurately estimate the harmonic component without the large delay required by a linear phase comb filter. However it also requires a priori knowledge of the variance of the unvoiced component and the state transition parameters. We present a novel method to accurately determine these parameters based on a variant of the Expectation-Maximisation algorithm. ITRODUCTIO The distinction between voiced and unvoiced sounds is important in many areas of speech technology. In speech coding, for example, different mechanisms are often used to encode the voiced and unvoiced parts of speech (Kleijn and Haagen, 994). In some methods of speech enhancement, the quasiperiodic nature of voiced speech is used to design of an optimal filter that separates speech from additive noise (Goh et al, 999). In speech recognition, knowledge of the temporal structure of the cycles of voiced speech can be used to process the speech in such a way that the impact of additive noise on feature extraction is reduced (Macho and Cheng, 00). Knowledge of the pitch of voiced speech is also useful in speech recognition for tonal languages, such as Mandarin (Zhang et al, 00). In some applications, it is sufficient to assume that any particular segment of speech is either purely voiced or purely unvoiced, and to classify segments into one of these two categories. This was true, for example, in early low bit rate vocoders (Campbell and Tremain, 986). However, in reality many segments of speech contain both quasiperiodic and noise-like energy, and many processing methods are designed to exploit this. In some cases, what is required is simply a determination of the degree of voicing. In mixed excitation linear prediction (MELP) coding (McCree and Barnwell, 995) and multiband excitation (MBE) coding (Griffin and Lim, 988), for example, a frequency-dependent measure of the strength of voicing is used to control the relative amount of periodic and non-periodic energy in the excitation of a linear prediction filter. In other cases, an attempt is made to explicitly separate the voiced and unvoiced components. In codebook-excited linear predictive (CELP) speech coders, this is achieved through an analysis-bysynthesis procedure (Gerson and Jasiuk, 99). Speech is generated by exciting a short-term linear prediction filter with a combination of signals from both an adaptive codebook, representing voiced energy, and a fixed codebook representing unvoiced energy. Minimisation of the perceptually weighted difference between the synthesised and input speech is used to estimate the two components. Several alternative approaches are possible. One is to use a linear comb filter to isolate the voiced component based on its harmonic structure. This is similar to the practice of using a low pass filter to separate slowly evolving and rapidly evolving components of the pitch cycle in interpolation-based coding (Kleijn and Haagen, 994). One limitation of this approach, however, is that its effectiveness depends on having a filter with a sharp roll-off, which requires a long impulse response. The implication of this is that the decomposition process requires a relatively large delay, which is Melbourne, December to 5, 00. Australian Speech Science & Technology Association Inc. Accepted after full review page 8
2 undesirable in some applications, such as speech coding, and also creates difficulties in dealing with rapid transitions. Achieving good decomposition without a large delay requires the use of more a priori knowledge about signal behaviour. One approach is to impose a deterministic parametric model on the evolution of the harmonic coefficients (Stylianou, 996). However the signal model is then highly non-linear, and parameter estimation becomes very complex. Stochastic models of signal evolution have been suggested by both Gruber and Tödtli (994) and Stachurski (997). However both also involve very complex estimation processes. In this paper we present a new method of decomposition that is also based on a stochastic model, but which is much simpler to implement, and also permits more control over the behaviour of the decomposition. The approach involves using a bank of Kalman filters, each corresponding to one sample in a normalised pitch period. SIGAL MODELIG AD ESTIMATIO In keeping with usual practice, we represent speech as the response of an autoregressive (AR) system, representing the vocal tract filter, to an input signal representing the acoustic energy generated by both vocal fold vibration and turbulent airflow: M z k = ai zk i + g. yk i= () and y k + vk = () where g is a gain factor, x k is a quasiperiodic signal, and v k is an uncorrelated Gaussian random variable with variance σ v. The response of the vocal tract filter to each of the two components, x k and v k, constitute the voiced and unvoiced components of speech respectively. Fundamental to our method of decomposing z k into these components is the way that x k is modelled. The component is assumed to evolve according to = α (3) T + wk T where T is the period of x k, w k is an uncorrelated Gaussian random variable with variance α is a gain value. Based on this model, the overall decomposition process is depicted in Figure. In our implementation, processing is carried out on a frame-by-frame basis with frames of 0ms duration. We begin by demodulating the variation in the energy, spectral envelope, and pitch of the signal. Energy is estimated on a subframe basis (4 subframes/frame). Demodulation of the spectral envelope variation is achieved by applying an inverse filter estimated once per frame by linear prediction. The linear prediction residual is used to estimate the pitch period, and the period is used to time-warp the signal to a fixed period. The demodulated signal is an approximation, ŷ k, of y k. Equations () and (3) together constitute a state space representation of this signal, with w k representing the process noise and v k the observation noise. Based on this, a Kalman filter can be used to estimate the state variable x k, by means of the following recursion. σ w, and k k = α + K( yˆ α ) (4) k k k Melbourne, December to 5, 00. Australian Speech Science & Technology Association Inc. Accepted after full review page 9
3 Parameter Estimation z k Demodulation ŷ k α Kalman Filterbank σ v k Modulation Voiced component Energy Period LPC parameters + _ Modulation Unvoiced component Figure : Decomposition System where K = Σ k k T ( Σ k + σ v ) (5) is the Kalman gain, Σ k k T = α Σ + σ w (6) is the variance of the error in the predicted state estimate, k, and Σ k k = ( K) Σ k (7) is the variance in the error in the filtered state estimate, k k. σ w may be chosen to control the rate at which the estimated quasiperiodic component evolves. However α and σ v must be estimated from the input data. We describe a new method to do this in the next section. Since the state variable is different for each sample in the period, estimation of the entire period essentially constitutes a bank of multiple scalar Kalman filters. The smoothing form of the Kalman filter may also be used to take advantage of future pitch cycles to estimate each current sample. The observation noise is estimated as vˆ k = ( yˆ k k ). The estimated quasiperiodic and noisy components can then be remodulated using the estimated period, LPC filter and energy to produce the voiced and unvoiced components of the speech. For the decomposition to work effectively, it is essential that when quasiperiodic energy is present in the signal, its period be known accurately. This requires not only that the resolution of the period estimate be sufficiently high, but also that the estimation method be able to track variations in period over sufficiently short time intervals. In order to ensure that x k and T are always optimally aligned, it needs to be possible to track variations in period within a pitch cycle. To achieve this we have used a dynamic programming approach, with a path metric composed of an accumulated average magnitude difference function with an additional term to penalise inappropriate variations in period. The model on which our method is based has some similarity to those in Gruber and Tödtli, J. (994). Melbourne, December to 5, 00. Australian Speech Science & Technology Association Inc. Accepted after full review page 0
4 However the use here of a time domain state representation makes it possible to use only scalar Kalman filter estimators, resulting in significantly lower complexity. In addition, the methods in Gruber and Tödtli (994) explicitly assumed that σ v is known in advance, and made no allowance for an explicit state transition gain α. The latter point is particularly important in decomposing speech, because the overall amplitude of consecutive cycles can change more rapidly than their shape. The model developed in Stachurski (997) is almost identical to that described by () and (3), but again there was no allowance for a variable transition gain, and also no provision for explicitly controlling σ w. In addition, because σ v was not known or determined prior to decomposition, it was not possible to use a Kalman filter for signal estimation. Instead a much more complex algorithm was proposed based on singular value decomposition. ESTIMATIO OF DYAMICAL SYSTEM PARAMETERS Good estimates of α and σ v are critically important in order for the decomposition described above to be effective. An iterative method for determining parameters, θ, of a general linear dynamic system from observations of its output was developed by Digalakis et al (994) based on the Expectation Maximization (EM) algorithm. Each iteration involves maximizing the expected joint log likelihood of the observed data sequence and the unknown state sequence conditioned on the observed data and the previous estimate of θ. In our application, we only require estimates of α and σ v. Using the procedure described by Digalakis et al, the values that maximize the expected joint log likelihood are: [ Σ ] + k T α = E{ x } (8) ( yk yk ) σ v = ˆ (9) k = where represents a fixed interval over which α and σ v are assumed constant, and Σ is the covariance of ( T ). The expectation in (8) should be understood to be conditioned on both the observed data up to and initial estimates of α and σ v. The effectiveness of the recursion defined by (8) and (9) depends significantly on the accuracy of the initial estimates of α and σ v. Inaccurate starting values will lead to slow convergence, and may cause the algorithm to converge to a local optimum. Although no method to obtain initial estimates was suggested by Digalakis et al (994), this was not a significant problem there since the application of interest was training of acoustic models for speech recognition. In that situation, estimation occurs off-line. However in the current application, α and σ v vary throughout the speech waveform, and must be estimated in real-time. We present here a method to obtain these values using only the observed data and past values of the estimated state sequence. The method is derived from the recursion equations above, and relies on an assumption that the interval over which α and σ v are constant is no more that one period. Although, in principal these estimates may be used as initial values for subsequent EM iterations, in our experience they are generally sufficiently accurate themselves, without resorting to further recursion. To estimate α, we first note that since v k is uncorrelated with T the expectation in the numerator of (8) can be written as E {( yk vk ) T } = yk. The smoothed state estimate x ˆ k T, which Melbourne, December to 5, 00. Australian Speech Science & Technology Association Inc. Accepted after full review page
5 also appears in the denominator, is not known a priori. However, in the mean over the interval from K, x ˆ k T is well approximated by the filtered estimate, k T k T. In addition, the error variance, Σ can be expected to be small compared with k T yk. Thus α can be approximated by α (0) Provided the summation interval in (0) is no more than one period, all terms in the right hand side are known. Using α computed from (0), σ v can be found as follows. Again assuming that k, then x ˆ k in (9) is equivalent to x ˆ k k. Using (4) to compute this value results in k σ v v = ( y k αyk k T k T ) Σ k + σ v σ () Σ is determined from (6). () can be manipulated to produce a quadratic in v = the signal is not noise-free ( σ 0 ), the value of σ v that satisfies this is [ y αy x ] Σ k v = k k ˆk T k T σ v. Assuming that σ () RESULTS AD DISCUSSIO Figure illustrates the application of our algorithm to a segment of speech consisting of a dominant unvoiced component followed by a dominant voiced component. The smoothing form of the Kalman filter was used with a two period look-ahead. The results show that the algorithm successfully decomposes the speech, with strong attenuation of noisy energy in the voiced component, and no visible harmonic energy in the unvoiced component. The presence of unvoiced signal energy during segments that would generally be classified as voiced is significant. Listening tests indicate that the unvoiced component retains the intelligibility of the original speech, but with a whispered quality. COCLUSIOS We have presented a novel method for decomposing speech into voiced and unvoiced components in the time domain. The algorithm is distinctive in its use of a Kalman filterbank, based on dynamical system parameters estimated on-line using a form of the Expectation-Maximisation algorithm. ACKOWLEDGEMETS This work was performed while the authors were all with Motorola. Simon Boland is now with Cross Avaya Research and Development and Michael Smithers is with Dolby Laboratories. REFERECES Campbell, J. P. Jr & Tremain, T. R. (986), Voiced/unvoiced classification of speech with applications to the US Government LPC0E Algorithm, Proceedings of the International Conference on Acoustic Speech and Signal Processing, pp Melbourne, December to 5, 00. Australian Speech Science & Technology Association Inc. Accepted after full review page
6 Figure : (top to bottom) speech waveform, estimated voiced component, estimated unvoiced component Digalakis, V., Rohlicek, J. R. & Ostendorf, M. (993), ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition, IEEE Transactions on Speech and Audio Processing, Vol. o. 4, pp Gerson, I. A. & Jasiuk, M. A. (99), Techniques for improving the performance of CELP-type speech coders, IEEE Journal on Selected Areas in Communications, Vol. 0, o. 5, pp Goh, Z., Tan, K.-C. & Tan, B. T. G. (999), Kalman filtering speech enhancement method based on a voiced-unvoiced speech modei, IEEE Transactions on Speech and Audio Processing, Vol. 7, o. 5, pp Griffin, D. W. & Lim, J. S (988)., Multiband excitation vocoder, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol 36, o. 8, pp Gruber, P. & Tödtli, J. (994), Estimation of quasiperiodic signal parameters by means of dynamic signal models, IEEE Transactions on Signal Processing, Vol. 4, o. 3, pp Kleijn, W. B. & Haagen, J. (994), Transformation and decomposition of the speech signal for coding, IEEE Signal Processing Letters, Vol., o. 9, pp Macho, D., & Cheng, Y.-M. (00), SR-dependent waveform processing for improving the robustness of ASR front-end, Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Vol., pp McCree, A. V. & Barnwell, T. P. III (995), A mixed excitation LPC vocoder model for low bit rate speech coding, IEEE Transactions on Speech and Audio Processing, Vol. 3, o. 4, pp Stachurski, J. (997), A Pitch Pulse Evolution Model for Linear Predictive Coding of Speech, Ph.D. Thesis, McGill University, Montreal, Canada. Stylianou, Y. (996), Efficient decomposition of speech signals into a deterministic and a stochastic part, Proceedings of the International Symposium on Signal Processing and its Applications, Vol., pp Zhang, Y., Madievski, A., Lawrence, J. & Song, J., A study of tone statistics in Chinese names, Speech Communication, Vol. 36, pp Melbourne, December to 5, 00. Australian Speech Science & Technology Association Inc. Accepted after full review page 3
Enhanced Waveform Interpolative Coding at 4 kbps
Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/
More informationL19: Prosodic modification of speech
L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture
More informationspeech signal S(n). This involves a transformation of S(n) into another signal or a set of signals
16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract
More informationOverview of Code Excited Linear Predictive Coder
Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances
More information(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods
Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationACCURATE SPEECH DECOMPOSITION INTO PERIODIC AND APERIODIC COMPONENTS BASED ON DISCRETE HARMONIC TRANSFORM
5th European Signal Processing Conference (EUSIPCO 007), Poznan, Poland, September 3-7, 007, copyright by EURASIP ACCURATE SPEECH DECOMPOSITIO ITO PERIODIC AD APERIODIC COMPOETS BASED O DISCRETE HARMOIC
More informationAnalysis/synthesis coding
TSBK06 speech coding p.1/32 Analysis/synthesis coding Many speech coders are based on a principle called analysis/synthesis coding. Instead of coding a waveform, as is normally done in general audio coders
More informationSpeech Enhancement using Wiener filtering
Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing
More informationCommunications Theory and Engineering
Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation
More informationImproved signal analysis and time-synchronous reconstruction in waveform interpolation coding
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2000 Improved signal analysis and time-synchronous reconstruction in waveform
More informationSpeech Compression Using Voice Excited Linear Predictive Coding
Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality
More informationChapter IV THEORY OF CELP CODING
Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,
More informationSpectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition
Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium
More informationApplications of Music Processing
Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite
More informationSpeech Synthesis; Pitch Detection and Vocoders
Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech
More informationChapter 4 SPEECH ENHANCEMENT
44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or
More informationLow Bit Rate Speech Coding
Low Bit Rate Speech Coding Jaspreet Singh 1, Mayank Kumar 2 1 Asst. Prof.ECE, RIMT Bareilly, 2 Asst. Prof.ECE, RIMT Bareilly ABSTRACT Despite enormous advances in digital communication, the voice is still
More informationMel Spectrum Analysis of Speech Recognition using Single Microphone
International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree
More informationKONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM
KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,
More informationAuditory modelling for speech processing in the perceptual domain
ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract
More informationPreeti Rao 2 nd CompMusicWorkshop, Istanbul 2012
Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o
More informationTracking of Rapidly Time-Varying Sparse Underwater Acoustic Communication Channels
Tracking of Rapidly Time-Varying Sparse Underwater Acoustic Communication Channels Weichang Li WHOI Mail Stop 9, Woods Hole, MA 02543 phone: (508) 289-3680 fax: (508) 457-2194 email: wli@whoi.edu James
More informationStructure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping
Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics
More informationInternational Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha
More informationPerformance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic
More informationAspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta
Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied
More informationSinging Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection
Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation
More informationHungarian Speech Synthesis Using a Phase Exact HNM Approach
Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University
More informationComparison of CELP speech coder with a wavelet method
University of Kentucky UKnowledge University of Kentucky Master's Theses Graduate School 2006 Comparison of CELP speech coder with a wavelet method Sriram Nagaswamy University of Kentucky, sriramn@gmail.com
More informationIN RECENT YEARS, there has been a great deal of interest
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 12, NO 1, JANUARY 2004 9 Signal Modification for Robust Speech Coding Nam Soo Kim, Member, IEEE, and Joon-Hyuk Chang, Member, IEEE Abstract Usually,
More informationSound Synthesis Methods
Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like
More informationA Spectral Conversion Approach to Single- Channel Speech Enhancement
University of Pennsylvania ScholarlyCommons Departmental Papers (ESE) Department of Electrical & Systems Engineering May 2007 A Spectral Conversion Approach to Single- Channel Speech Enhancement Athanasios
More informationPitch Period of Speech Signals Preface, Determination and Transformation
Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com
More informationARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION
ARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION Tenkasi Ramabadran and Mark Jasiuk Motorola Labs, Motorola Inc., 1301 East Algonquin Road, Schaumburg, IL 60196,
More informationVoice Excited Lpc for Speech Compression by V/Uv Classification
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 65-69 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Voice Excited Lpc for Speech
More informationScalable speech coding spanning the 4 Kbps divide
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2003 Scalable speech coding spanning the 4 Kbps divide J Lukasiak University
More informationA Parametric Model for Spectral Sound Synthesis of Musical Sounds
A Parametric Model for Spectral Sound Synthesis of Musical Sounds Cornelia Kreutzer University of Limerick ECE Department Limerick, Ireland cornelia.kreutzer@ul.ie Jacqueline Walker University of Limerick
More informationYOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION
American Journal of Engineering and Technology Research Vol. 3, No., 03 YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION Yinan Kong Department of Electronic Engineering, Macquarie University
More informationSub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech
Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Vikram Ramesh Lakkavalli, K V Vijay Girish, A G Ramakrishnan Medical Intelligence and Language Engineering (MILE) Laboratory
More informationAPPLICATIONS OF DSP OBJECTIVES
APPLICATIONS OF DSP OBJECTIVES This lecture will discuss the following: Introduce analog and digital waveform coding Introduce Pulse Coded Modulation Consider speech-coding principles Introduce the channel
More informationPage 0 of 23. MELP Vocoder
Page 0 of 23 MELP Vocoder Outline Introduction MELP Vocoder Features Algorithm Description Parameters & Comparison Page 1 of 23 Introduction Traditional pitched-excited LPC vocoders use either a periodic
More informationAdvanced audio analysis. Martin Gasser
Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high
More informationSpeech Coding using Linear Prediction
Speech Coding using Linear Prediction Jesper Kjær Nielsen Aalborg University and Bang & Olufsen jkn@es.aau.dk September 10, 2015 1 Background Speech is generated when air is pushed from the lungs through
More informationUniversity of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005
University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis
More informationSpanning the 4 kbps divide using pulse modeled residual
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2002 Spanning the 4 kbps divide using pulse modeled residual J Lukasiak
More informationSignal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2
Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter
More informationLecture 9: Time & Pitch Scaling
ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 9: Time & Pitch Scaling 1. Time Scale Modification (TSM) 2. Time-Domain Approaches 3. The Phase Vocoder 4. Sinusoidal Approach Dan Ellis Dept. Electrical Engineering,
More informationCOMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of
COMPRESSIVE SAMPLING OF SPEECH SIGNALS by Mona Hussein Ramadan BS, Sebha University, 25 Submitted to the Graduate Faculty of Swanson School of Engineering in partial fulfillment of the requirements for
More informationFrequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement
Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement 1 Zeeshan Hashmi Khateeb, 2 Gopalaiah 1,2 Department of Instrumentation
More informationModulator Domain Adaptive Gain Equalizer for Speech Enhancement
Modulator Domain Adaptive Gain Equalizer for Speech Enhancement Ravindra d. Dhage, Prof. Pravinkumar R.Badadapure Abstract M.E Scholar, Professor. This paper presents a speech enhancement method for personal
More informationAdaptive Filters Application of Linear Prediction
Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing
More informationAudio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands
Audio Engineering Society Convention Paper Presented at the th Convention May 5 Amsterdam, The Netherlands This convention paper has been reproduced from the author's advance manuscript, without editing,
More informationEE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley
University of California Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Professors : N.Morgan / B.Gold EE225D Spring,1999 Medium & High Rate Coding Lecture 26
More informationSPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes
SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,
More informationE : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21
E85.267: Lecture 8 Source-Filter Processing E85.267: Lecture 8 Source-Filter Processing 21-4-1 1 / 21 Source-filter analysis/synthesis n f Spectral envelope Spectral envelope Analysis Source signal n 1
More informationEC 6501 DIGITAL COMMUNICATION UNIT - II PART A
EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing
More informationAutomatic Transcription of Monophonic Audio to MIDI
Automatic Transcription of Monophonic Audio to MIDI Jiří Vass 1 and Hadas Ofir 2 1 Czech Technical University in Prague, Faculty of Electrical Engineering Department of Measurement vassj@fel.cvut.cz 2
More informationEnhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis
Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins
More informationSOUND SOURCE RECOGNITION AND MODELING
SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental
More informationOn a Classification of Voiced/Unvoiced by using SNR for Speech Recognition
International Conference on Advanced Computer Science and Electronics Information (ICACSEI 03) On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition Jongkuk Kim, Hernsoo Hahn Department
More informationSynchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech
INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,
More informationEffects of Reverberation on Pitch, Onset/Offset, and Binaural Cues
Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation
More informationThe Channel Vocoder (analyzer):
Vocoders 1 The Channel Vocoder (analyzer): The channel vocoder employs a bank of bandpass filters, Each having a bandwidth between 100 Hz and 300 Hz. Typically, 16-20 linear phase FIR filter are used.
More informationDifferent Approaches of Spectral Subtraction Method for Speech Enhancement
ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches
More informationSinusoidal Modelling in Speech Synthesis, A Survey.
Sinusoidal Modelling in Speech Synthesis, A Survey. A.S. Visagie, J.A. du Preez Dept. of Electrical and Electronic Engineering University of Stellenbosch, 7600, Stellenbosch avisagie@dsp.sun.ac.za, dupreez@dsp.sun.ac.za
More informationRobust Linear Prediction Analysis for Low Bit-Rate Speech Coding
Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding Nanda Prasetiyo Koestoer B. Eng (Hon) (1998) School of Microelectronic Engineering Faculty of Engineering and Information Technology Griffith
More informationNCCF ACF. cepstrum coef. error signal > samples
ESTIMATION OF FUNDAMENTAL FREQUENCY IN SPEECH Petr Motl»cek 1 Abstract This paper presents an application of one method for improving fundamental frequency detection from the speech. The method is based
More informationRobust Algorithms For Speech Reconstruction On Mobile Devices
Robust Algorithms For Speech Reconstruction On Mobile Devices XU SHAO A Thesis presented for the degree of Doctor of Philosophy Speech Group School of Computing Sciences University of East Anglia England
More informationA METHOD OF SPEECH PERIODICITY ENHANCEMENT BASED ON TRANSFORM-DOMAIN SIGNAL DECOMPOSITION
8th European Signal Processing Conference (EUSIPCO-2) Aalborg, Denmark, August 23-27, 2 A METHOD OF SPEECH PERIODICITY ENHANCEMENT BASED ON TRANSFORM-DOMAIN SIGNAL DECOMPOSITION Feng Huang, Tan Lee and
More informationMonophony/Polyphony Classification System using Fourier of Fourier Transform
International Journal of Electronics Engineering, 2 (2), 2010, pp. 299 303 Monophony/Polyphony Classification System using Fourier of Fourier Transform Kalyani Akant 1, Rajesh Pande 2, and S.S. Limaye
More informationDrum Transcription Based on Independent Subspace Analysis
Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,
More informationCepstrum alanysis of speech signals
Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP
More informationDefense Technical Information Center Compilation Part Notice
UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADP010883 TITLE: The Turkish Narrow Band Voice Coding and Noise Pre-Processing NATO Candidate DISTRIBUTION: Approved for public
More informationHUMAN speech is frequently encountered in several
1948 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 7, SEPTEMBER 2012 Enhancement of Single-Channel Periodic Signals in the Time-Domain Jesper Rindom Jensen, Student Member,
More informationVocoder (LPC) Analysis by Variation of Input Parameters and Signals
ISCA Journal of Engineering Sciences ISCA J. Engineering Sci. Vocoder (LPC) Analysis by Variation of Input Parameters and Signals Abstract Gupta Rajani, Mehta Alok K. and Tiwari Vebhav Truba College of
More informationConverting Speaking Voice into Singing Voice
Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech
More informationMikko Myllymäki and Tuomas Virtanen
NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,
More informationDigital Signal Representation of Speech Signal
Digital Signal Representation of Speech Signal Mrs. Smita Chopde 1, Mrs. Pushpa U S 2 1,2. EXTC Department, Mumbai University Abstract Delta modulation is a waveform coding techniques which the data rate
More informationADAPTIVE IDENTIFICATION OF TIME-VARYING IMPULSE RESPONSE OF UNDERWATER ACOUSTIC COMMUNICATION CHANNEL IWONA KOCHAŃSKA
ADAPTIVE IDENTIFICATION OF TIME-VARYING IMPULSE RESPONSE OF UNDERWATER ACOUSTIC COMMUNICATION CHANNEL IWONA KOCHAŃSKA Gdańsk University of Technology Faculty of Electronics, Telecommuniations and Informatics
More informationINSTANTANEOUS FREQUENCY ESTIMATION FOR A SINUSOIDAL SIGNAL COMBINING DESA-2 AND NOTCH FILTER. Yosuke SUGIURA, Keisuke USUKURA, Naoyuki AIKAWA
INSTANTANEOUS FREQUENCY ESTIMATION FOR A SINUSOIDAL SIGNAL COMBINING AND NOTCH FILTER Yosuke SUGIURA, Keisuke USUKURA, Naoyuki AIKAWA Tokyo University of Science Faculty of Science and Technology ABSTRACT
More informationVoice Activity Detection
Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class
More informationQuantisation mechanisms in multi-protoype waveform coding
University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 1996 Quantisation mechanisms in multi-protoype waveform coding
More informationSpeech Coding in the Frequency Domain
Speech Coding in the Frequency Domain Speech Processing Advanced Topics Tom Bäckström Aalto University October 215 Introduction The speech production model can be used to efficiently encode speech signals.
More informationSignal segmentation and waveform characterization. Biosignal processing, S Autumn 2012
Signal segmentation and waveform characterization Biosignal processing, 5173S Autumn 01 Short-time analysis of signals Signal statistics may vary in time: nonstationary how to compute signal characterizations?
More informationSignal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis
Signal Analysis Music 27a: Signal Analysis Tamara Smyth, trsmyth@ucsd.edu Department of Music, University of California, San Diego (UCSD November 23, 215 Some tools we may want to use to automate analysis
More informationROBUST echo cancellation requires a method for adjusting
1030 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 On Adjusting the Learning Rate in Frequency Domain Echo Cancellation With Double-Talk Jean-Marc Valin, Member,
More informationRecent Advances in Acoustic Signal Extraction and Dereverberation
Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing
More informationADAPTIVE NOISE LEVEL ESTIMATION
Proc. of the 9 th Int. Conference on Digital Audio Effects (DAFx-6), Montreal, Canada, September 18-2, 26 ADAPTIVE NOISE LEVEL ESTIMATION Chunghsin Yeh Analysis/Synthesis team IRCAM/CNRS-STMS, Paris, France
More informationSpeech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065
Speech Processing Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 All course materials and handouts are the same for both versions. Differences: credits (20 for UG, 10 for PG);
More informationVQ Source Models: Perceptual & Phase Issues
VQ Source Models: Perceptual & Phase Issues Dan Ellis & Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,ronw}@ee.columbia.edu
More informationREAL TIME DIGITAL SIGNAL PROCESSING
REAL TIME DIGITAL SIGNAL PROCESSING UTN-FRBA 2010 Adaptive Filters Stochastic Processes The term stochastic process is broadly used to describe a random process that generates sequential signals such as
More informationLearning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks
Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks C. S. Blackburn and S. J. Young Cambridge University Engineering Department (CUED), England email: csb@eng.cam.ac.uk
More informationVoice Conversion of Non-aligned Data using Unit Selection
June 19 21, 2006 Barcelona, Spain TC-STAR Workshop on Speech-to-Speech Translation Voice Conversion of Non-aligned Data using Unit Selection Helenca Duxans, Daniel Erro, Javier Pérez, Ferran Diego, Antonio
More informationEpoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE
1602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 8, NOVEMBER 2008 Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE Abstract
More informationAdaptive Filters Wiener Filter
Adaptive Filters Wiener Filter Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory
More informationDESIGN AND IMPLEMENTATION OF AN ALGORITHM FOR MODULATION IDENTIFICATION OF ANALOG AND DIGITAL SIGNALS
DESIGN AND IMPLEMENTATION OF AN ALGORITHM FOR MODULATION IDENTIFICATION OF ANALOG AND DIGITAL SIGNALS John Yong Jia Chen (Department of Electrical Engineering, San José State University, San José, California,
More informationSpeech Coding Technique And Analysis Of Speech Codec Using CS-ACELP
Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP Monika S.Yadav Vidarbha Institute of Technology Rashtrasant Tukdoji Maharaj Nagpur University, Nagpur, India monika.yadav@rediffmail.com
More information