Glottal inverse filtering based on quadratic programming

Size: px
Start display at page:

Download "Glottal inverse filtering based on quadratic programming"

Transcription

1 INTERSPEECH 25 Glottal inverse filtering based on quadratic programming Manu Airaksinen, Tom Bäckström 2, Paavo Alku Department of Signal Processing and Acoustics, Aalto University, Finland 2 International Audio Laboratories Erlangen, Friedrich-Alexander University (FAU), Germany manu.airaksinen@aalto.fi, tom.backstrom@audiolabs-erlangen.de Abstract This study presents a novel quadratic programming based approach to glottal inverse filtering. The proposed method aims to jointly model the effect of the vocal tract and lip radiation with a single filter whose coefficients are optimized using the quadratic programming framework. This allows the proposed method to directly estimate the glottal flow of speech, which mitigates the problem of non-flat closed phases in inverse filtering estimates. The proposed method was objectively evaluated using a synthetic Liljencrants-Fant model based test set of sustained vowels containing a wide variety of phonation types and fundamental periods. The results indicate that the proposed method is robust to changes in f and state-of-the-art quality results were obtained for high pitch voices, when f is in the range 33 to 45Hz. Index Terms: glottal inverse filtering, GIF, quadratic programming, voice source. Introduction The glottal volume velocity waveform, or the glottal flow, is the main source of excitation for voiced human speech production. Obtaining knowledge about the excitation is important in fundamental research of speech, but also in medicine (e.g. occupational voice or speech pathology), phonetics (e.g. prosody), and neuroscience (e.g. brain responses evoked by speech). In addition, estimation of the glottal excitation has recently gained momentum in speech technology, especially in speech synthesis []. Glottal inverse filtering (GIF) is a computational method for estimating the glottal flow from a recorded microphone signal. This approach assumes the so-called source-filter model of speech production, which is most commonly presented as a cascade of three processes: () a time-domain input that represents the glottal flow, (2) a digital filter representing the vocal tract transfer function, and (3) a differentiator that represents the lip radiation effect. GIF is performed by blindly applying antiresonances to the recorded acoustic pressure signal so that the effects of the vocal tract and lip radiation are cancelled, ideally leaving the glottal flow intact. The practice is effective and non-invasive which is key for automated solutions. Given these properties, methods utilizing various forms of linear prediction (LP) (e.g. conventional linear prediction [2] or discrete all-pole modeling [3]) for modeling of the vocal tract have become a popular basis to achieve relatively simple and computationally efficient GIF algorithms. In modeling of the lip radiation effect, GIF algorithms typically utilize a fixed first-order differentiator [4]. In this study, the use of quadratic programming [5] in GIF is introduced to obtain a more thorough mathematical optimization model that is inspired by the principles of the Closed Phase Covariance (CP) GIF method [6]. The CP method is based on computing the vocal tract model from samples that are located in the closed phase of the glottal cycle. This principle was developed further in the recent Quasi Closed Phase (QCP) method [7] by using temporally weighted linear prediction (WLP) as a vocal tract modelling technique. Instead of using a few samples located in the (true) closed pahse, the QCP method takes advantage of all samples of the analysis frame and computes a WLP-based vocal tract model in which the contribution of samples located in the closed phase becomes emphasized in comparison to those that occur in the open phase. Another key point of the current study is the unification of the lip radiation effect and the vocal tract transfer function within the optimization model. This is used to obtain a better estimation of a horizontal, near-zero closed phase for the glottal flow, which might be problematic in most state-of-the-art GIF methods [8, 9]. The principles of GIF within the source-filter model are discussed in Section 2, and its application to quadratic programming is presented in Section 3. Particular attention is paid to the selection of the optimization criterion of quadratic programming in Section 3., as well as to the practical implementation of the proposed method in Section Principles of GIF The source-filter model of speech production is defined in the z-domain as: S(z) = G(z)V (z)l(z), () where S(z) is the speech signal, G(z) is the glottal excitation (depicted in the time domain in Fig. ), V (z) is the vocal tract transfer function, and L(z) is the lip radiation effect that converts the air volume velocity waveform at lips into an acoustic pressure waveform outside the lips. In most GIF methods L(z) is modeled as a fixed first-order differentiator [4]: L(z) = αz, α. (2) When V (z) and L(z) are known, the glottal flow is obtained as: G(z) = S(z) V (z)l(z). (3) Using the fixed lip radiation model of Eq. 2, however, might lead to a low-frequency distrotion that creates an ascending or descending component in the closed phase of the estimated flow pulses (seen e.g. in the female IAIF examples of Fig. 3). This phenomenon was analyzed in [9] and it was suggested to be due to following factors. First, the assumption of an ideal flow-topressure conversion might not be precise, e.g., in cases where the recording microphone is not sufficiently in the far field []. Copyright 25 ISCA 2342 September 6-, 25, Dresden, Germany

2 Second, estimation methods of the vocal tract, such as LP or discrete all-pole modeling, focus mainly on obtaining good modeling performance in formants paying less attention to low frequencies. Small errors in the very-low frequencies of the vocal tract model, however, can greatly affect the shape of the integrated waveform, especially with α-values of Eq. 2 that are very close to. In [9] a method was developed to automatically determine an α-value that aims to compensate the error produced by the above mentioned effects, but in the present study the approach taken is to merge V (z) and L(z) into a unified linear model A(z) = V (z)ˆl(z), (4) where ˆL(z) is an all-pole approximation of L(z) as proposed in [4], and A(z) is a linear FIR model. This leads to the following GIF model: G(z) = S(z)A(z) (5) An important thing to note about the linear model A(z) is that because it includes the FIR approximation of the integrator, its length must be within the range of a single fundamental period of the corresponding speech signal. This assures that L(z) the approximated integrator does not leak information within a single period. The increased length of the filter enables estimating the zeroes of the vocal tract transfer function, which are, for example, produced by the piriform fossa [] and the nasal cavity. 3. Speech production model for quadratic programming The speech production model presented in Eq. 5 can be represented in matrix notation as Ŝâ = g, (6) where Ŝ is a Toeplitz convolution matrix where colums represent the input signal s at consecutive delays, g is the glottal flow, and â is the linear speech production model. To be specific, if â is assumed to correspond to the conventional LP model, the first coefficient of â must be unity, whereby Ŝâ = s + Sa = g, (7) where vector a contains all coefficients of â except the first one, s is the first column of Ŝ, and S has the remaining columns of Ŝ. Now it can be observed that given a, the calculation of g is possible, and vice versa. Since the focus of our interest is the glottal flow g, the effect of a can be canceled by analytic methods. With that objective and by defining the null-space of S as S whereby S S =, Eq. 7 can be rewritten as: S Sa = = S (g s ) (8) The latter equation, = S (g s ), does not contain a, whereby this fulfills the objective of presenting g without a, and this equation can thus be used equivalently with Eq. 7. As a second part of the quadratic speech production model, the glottal flow g will be constrained to be non-negative (g ). This is justified by the fact that speech is produced by exhaling, which produces a positive flow. The flow is zero during glottal closed phase and otherwise positive. However, s is a pressure waveform and our objective is to obtain the flow waveform g. As explained in Section 2, pressure is the derivative of flow, Amplitude Amplitude.5 Glottal flow waveform with w CP glot w CP Glottal flow derivative waveform with w AME d/dt glot w AME Figure : Two cycles of a glottal flow waveform (top) and the corresponding glottal flow derivative waveform (bottom) superposed with the weighting functions W CP and W AME. which means that the zero level of g is ambiguous. Therefore an adaptive zero level δ must be applied, so that g δ. The model is thus so far { S (g S ) = (9) g + δu, where u is a vector of ones. Note that Eq. 9 represents constraints in the sense that they define the feasible space of g. That is, all those glottal flows g which fulfill Eq. 9 could have originated from the defined speech production model. 3.. Optimization criterion Quadratic programming is the problem of optimizing a quadratic function subject to linear constraints on these variables, i.e.: min( x 2 xt Hx + f T x) () { Ax = b s.t. () Ex d It can be seen that the constraints of the speech production model of Eq. 9 are linear and can be applied to quadratic programming, but the optimization criterion that points to the best fitting glottal flow waveform within these constraints is still missing. The optimization criterion is a quadratic function that contains a combined contribution of the norm- and norm-2. The optimization criterion can be formulated in numerous ways. GIF methods utilizing LP-based methods most commonly minimize the residual energy (only the norm-2) of the pre-emphasized speech signal, which yields a reasonable estimate of the vocal tract transfer function because the preemphasis approximately inverts the spectral tilt of the glottal flow [4]. Pre-emphasis is commonly performed as the first time derivative of the input signal (which is a pressure signal), which brings it to the domain of the second derivative of the glottal flow g (which is a volume velocity signal). Furthermore, our recent studies on WLP show that the Attenuated Main Excitation (AME) weighting greatly increases the estimation accuracy both in formant estimation [2] and GIF [7]. The AME weighting function is a temporal waveform that downgrades the contribution of speech samples that are located in the vicinity of the main excitation of the vocal tract near the glottal closure instants (GCIs). This suggests that a good optimization criterion could 2343

3 Error (%) NAQ All 8<f<2 2<f<33 33<f<45 Error (%) QOQ Error (db) HH2 Error (%).5.5 HRF Figure 2: Average error measures of the glottal source parameters for the tested GIF methods. include the AME weighted norm-2 of the second derivative of g: min g,δ (γ (WAME CCg) γ 2δ 2 ) (2) where W AME = Diag(w AME), and w AME is the AME weighting function shown in Fig., C is a convolution matrix that approximates the time derivative, and γ and γ 2 are optimization coefficients. As discussed in Section 2, the modeling of the lip radiation effect has its most prominent effects in the closed phase of the glottal flow estimate. Furthermore, the results obtained in [9] indicate that the norm- is robust for finding a suitable lip radiation model for conventional GIF methods. Thus, the minimization of the norm- of the exact closed phase of g would seem like a valuable addition to the optimization criterion. The physical interpretation of this is that when the vocal folds are closed, the air flow coming from the lungs is ideally zero. In the optimization model this can be expressed as: min g,δ (γ WAMECCg γ 2δ 2 + γ 3 W CP g ) (3) where W CP = Diag(w CP), and w CP is a weighting function with the same size as g ( for the samples of g located in the closed phase, and zero for the samples in the open phase), and γ 3 is an optimization coefficient. It is important to note that the accurate determination of the closed phase is key for accurate estimation of the glottal flow. For example, the SEDREAMS algorithm [3] provides good estimates of the glottal closure and opening instants (GCIs and GOIs, respectively) Practical application of the model The quadratic speech production model is applied in practice using the constraints and the optimization criterion to the standard formula of quadratic programming (Eqs. and ) by selecting: [ g x = δ] [ ] γ3w f = CP H = E = [ I u ] d = [ ] K T K γ 2 K = γ W AMECC A = [ S ] b = S s The output of this model is the vector x that contains the estimated glottal flow g, and the zero-level term δ as its last coefficient so that g final = g + δu; The weighting functions W CP and W AME can be constructed as the AME function if the glottal closure (and opening for W CP) instants are known. In the present study, W CP was constructed as determined by the SEDREAMS algorithm, and W AME was constructed by using the AME parameters [7] PQ =.5, DQ =.85, N ramp = 7. The coefficient values used for γ 3 were γ = 4 (2 for real speech), γ 2 = 5, and γ 3 = 5. The filter order m was selected as the nearest even integer to.85 N f, where N f is the length of the fundamental period in samples. It follows from this high-order filter requirement that the length of the analysis window needed for the proposed method is somewhat longer than for traditional LP applications. In the present study, 5 ms windows of 8kHz samples were used. 4. Experiments The objective evaluation of GIF methods is problematic because it is not possible to measure the real glottal flow signal from natural speech. To overcome this problem, the use of synthetic vowels, e.g. sustained vowels created according to the sourcefilter model, is a common method of obtaining test data for inverse filtering experiments. In the present study, the proposed quadratic programming based GIF method (denoted by QPR) was objectively compared to existing state-of-the-art GIF methods. The evaluation was done using a database of sustained synthetic vowels. The vowels were inverse filtered using the selected GIF methods, and the obtained glottal flow estimates were automatically parameterized into selected glottal flow parameterizations. The average errors on these parameters were used as the objective measures. The used parameters were the Normalized Amplitude Quotient (NAQ) [4], Quasi Open Quotient (QOQ) [5], Harmonic Richness Factor [6] (HRF), and H-H2 [7]. NAQ measures the relative length of the glottal closing phase, QOQ measures the approximate length of the glottal open quotient, and HRF and HH2 are measures that reflect on the spectral decay of the waveform. 4.. Test data The test set was created by using the source-filter model of speech production, where the source was modeled according to the Liljencrants-Fant (LF) model [8], and the vocal tract was modeled as an 8th-order all-pole filter with four formants. Lip radiation was modeled by an ideal differentiator. The LF parameters used to form the excitations were interpolated between breathy and creaky phonations taken from [9] to form a total of 625 phonations. f was varied between 8 Hz to 45 Hz in -Hz increments, and the vocal tract was constructed as in [2], modeling the three vowels [a], [e], and [i]. In total, the test set contained 7 25 test sounds. Because of the long analysis filter of the proposed method (see Section 2), and the relatively low number of parameters in the vocal tract (8th order IIR filter) and in the excitation 2344

4 Breathy Modal Pressed Female Male IAIF QPR DEGG Figure 3: Representative real speech examples of glottal flows estimated with the proposed method (QPR) and the IAIF method for a male and a female speaker with various phonations. Differentiated electroglottography signals shown as reference. (4 parameter LF-model), ideal synthetic vowels (i.e. sustained phonation, no additive noise) were found to be easily overlearned by the method, meaning that the provided glottal flow estimates were easily transformed into waveforms resembling a Dirac delta function. This problem was mostly mitigated by adding a small Gaussian noise component to the synthetic vowels, yielding a SNR of 6dB, but it was still observed that the overall waveforms produced by the proposed method on synthetic data were not of as high quality as they were with real speech (e.g. in Fig. 3) Evaluation methods The proposed QPR method was compared to four existing GIF methods: Quasi Closed Phase Analysis (QCP) [7], Closed Phase Covariance Analysis (CP) [6], Iterative Adaptive Inverse Filtering (IAIF) [2], and Complex Cepstral Decomposition [22]. All GIF methods used the sampling rate of 8kHz, and the order of the vocal tract model was set to p = in QCP, CP, and IAIF. The analysis frame duration was set to 3ms, with a varying length of additional buffer samples on both sides of the frame determined by the respective methods. The QPR method used a ms frame of of extra samples on both sides of the analysis window, whereas the other methods used a frame of 2.5ms. The CP method was implemented utilizing the covariance criterion in LP analysis, by using two pitch-period analysis for frames with F 2Hz. The QCP method utilized the fixed AME parameters of P Q =.5, DQ =.7, and N ramp = 7. IAIF utilized the secondary prediction order of m = 4. The CCD method was obtained from the GLOAT toolbox [23]. GCI (and GOI) information for the QCP, CP, and CCD methods were obtained straight without errors from the reference signals. 5. Results The results for the objective test are shown in Fig. 2. The results are divided into overall and f specific categories. The f specific categories were divided into low (8 2Hz), mid (2 33Hz), and high (33 45Hz) categories. It can be seen that for the tested synthetic vowels, the proposed method is inferior for the low f to most of the compared methods. However, as the f range incrases, the performance of the proposed method remains relatively constant, whereas other methods (excluding CCD) show significant deterioration in their performance. For the high f range the proposed method produces results that improve the state-of-the-art performance for the NAQ, HH2 and HRF parameters. Representative real speech examples are presented in Fig. 3, where sustained vowels of a male and a female speaker with varying phonation types have been inverse filtered with QPR, and IAIF. The differentiated electroglottography (DEGG) signal is also shown as a reference for the glottal closing and opening instants. It is conventionally interpreted that the negative peaks of the DEGG signal correspond to GCIs, and the positive peaks correspond to GOIs [3]. It can be seen that for the proposed QPR method, the closed phases of the estimated glottal flows correspond remarkably well with the DEGG reference. For IAIF the closed phase is commonly distorted as discussed in Section Discussion This study introduced a new method of glottal inverse filtering based on quadratic programming. The novelty of the proposed method is particularly in modeling the vocal tract and lip radiation effect within a single filter whose coefficients are optimized using the quadratic programming framework. The proposed method directly estimates the glottal flow, as opposed to the more conventional approach of first estimating the glottal flow derivative waveform, after which the lip radiation is separately compensated. This ensures that the estimated glottal flow waveforms can be optimized to show more ideal behavior during the glottal closed phase, where the glottal airflow is assumed to be zero. The objective results obtained with a synthetic LF-model based test set suggest that the proposed method is robust in performance regarding the f of speech, and it can produce stateof-the-art quality results for very high f s. It is important to note that the synthetic vowels of the test set have been created with the speech production model that is the basis of the QCP, IAIF, and CP methods. In contrast, the proposed method and the CCD method have significantly different modeling approaches which might bias the results in favor of the more similar methods. The modeling of the lip radiation effect of conventional methods is also ideal for the synthetic vowels, which produces an additional bias against the proposed method in the performed evaluation. 7. Acknowledgements The research leading to these results has received funding from the Academy of Finland (project no , 28467). 2345

5 8. References [] T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, and P. Alku, HMM-based speech synthesis utilizing glottal inverse filtering, Audio, Speech, and Language Processing, IEEE Transactions on, vol. 9, no., pp , 2. [2] J. D. Markel and A. H. Gray Jr., Linear Prediction of Speech. Springer-Verlag, Berlin, 976. [3] A. El-Jaroudi and J. Makhoul, Discrete all-pole modeling, Signal Processing, IEEE Transactions on, vol. 39, no. 2, pp , 99. [4] L. Rabiner and R. Schafer, Digital Processing of Speech Signals, ser. Prentice-Hall signal processing series. Prentice-Hall, 978. [5] J. Nocedal and S. Wright, Numerical Optimization, ser. Springer Series in Operations Research and Financial Engineering. Springer New York, 26. [6] D. Wong, J. Markel, and A. Gray Jr., Least squares glottal inverse filtering from the acoustic speech waveform, Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 27, no. 4, pp , 979. [7] M. Airaksinen, T. Raitio, B. Story, and P. Alku, Quasi closed phase glottal inverse filtering analysis with weighted linear prediction, Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 22, no. 3, pp , 24. [8] T. Koc, Post-processing method for removing low-frequency bias in glottal inverse filtering, Electronic Letters, vol. 5, no., pp. 2, 25. [9] M. Airaksinen, T. Bäckström, and P. Alku, Automatic estimation of the lip radiation effect in glottal inverse filtering, in Proc. Interspeech, 24. [] J. J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals, 2nd ed. Wiley-IEEE Press, 999. [] J. Dang and K. Honda, Acoustic characteristics of the piriform fossa in models and humans, The Journal of the Acoustical Society of America, vol., no., pp , 997. [2] P. Alku, J. Pohjalainen, M. Vainio, A.-M. Laukkanen, and B. H. Story, Formant frequency estimation of high-pitched vowels using weighted linear prediction, The Journal of the Acoustical Society of America, vol. 34, no. 2, pp , 23. [3] T. Drugman, M. Thomas, J. Gudnason, P. Naylor, and T. Dutoit, Detection of glottal closure instants from speech signals: A quantitative review, Audio, Speech, and Language Processing, IEEE Transactions on, vol. 2, no. 3, pp , 22. [4] P. Alku, T. Bäckström, and E. Vilkman, Normalized amplitude quotient for parametrization of the glottal flow, The Journal of the Acoustical Society of America, vol. 2, no. 2, pp. 7 7, 22. [5] T. Hacki, Klassizierung von glottisdysfunktionen mit hilfe der elektroglottographie, Folia phoniatrica, vol. 4, no., pp , 989. [6] D. G. Childers and C. K. Lee, Vocal quality factors: Analysis, synthesis, and perception, The Journal of the Acoustical Society of America, vol. 9, no. 5, pp , 99. [7] G. Fant, The LF-model revisited. Transformations and frequency domain analysis, STL-QPSR, vol. 36, no. 2-3, pp. 9 56, 995. [8] G. Fant, J. Liljencrants, and Q. Lin, A four-parameter model of glottal flow, STL-QPSR, vol. 26, no. 4, pp. 3, 985. [9] C. Gobl, The voice source in speech communication - production and perception experiments involving inverse filtering and synthesis, Ph.D. dissertation, KTH, Speech Transmission and Music Acoustics, 23. [2] B. Gold and L. Rabiner, Analysis of digital and analog formant synthesizers, Audio and Electroacoustics, IEEE Transactions on, vol. 6, no., pp. 8 94, 968. [2] P. Alku, Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering, Speech Communication, vol., no. 23, pp. 9 8, 992. [22] T. Drugman, B. Bozkurt, and T. Dutoit, A comparative study of glottal source estimation techniques, Computer Speech & Language, vol. 26, no., pp. 2 34, 22. [23] T. Drugman, GLOttal Analysis Toolbox (GLOAT), 22, downloaded November 22. [Online]. Available: drugman/toolbox/ 2346

Automatic estimation of the lip radiation effect in glottal inverse filtering

Automatic estimation of the lip radiation effect in glottal inverse filtering INTERSPEECH 24 Automatic estimation of the lip radiation effect in glottal inverse filtering Manu Airaksinen, Tom Bäckström 2, Paavo Alku Department of Signal Processing and Acoustics, Aalto University,

More information

Parameterization of the glottal source with the phase plane plot

Parameterization of the glottal source with the phase plane plot INTERSPEECH 2014 Parameterization of the glottal source with the phase plane plot Manu Airaksinen, Paavo Alku Department of Signal Processing and Acoustics, Aalto University, Finland manu.airaksinen@aalto.fi,

More information

Aalto Aparat A Freely Available Tool for Glottal Inverse Filtering and Voice Source Parameterization

Aalto Aparat A Freely Available Tool for Glottal Inverse Filtering and Voice Source Parameterization [LOGO] Aalto Aparat A Freely Available Tool for Glottal Inverse Filtering and Voice Source Parameterization Paavo Alku, Hilla Pohjalainen, Manu Airaksinen Aalto University, Department of Signal Processing

More information

EVALUATION OF SPEECH INVERSE FILTERING TECHNIQUES USING A PHYSIOLOGICALLY-BASED SYNTHESIZER*

EVALUATION OF SPEECH INVERSE FILTERING TECHNIQUES USING A PHYSIOLOGICALLY-BASED SYNTHESIZER* EVALUATION OF SPEECH INVERSE FILTERING TECHNIQUES USING A PHYSIOLOGICALLY-BASED SYNTHESIZER* Jón Guðnason, Daryush D. Mehta 2, 3, Thomas F. Quatieri 3 Center for Analysis and Design of Intelligent Agents,

More information

Glottal source model selection for stationary singing-voice by low-band envelope matching

Glottal source model selection for stationary singing-voice by low-band envelope matching Glottal source model selection for stationary singing-voice by low-band envelope matching Fernando Villavicencio Yamaha Corporation, Corporate Research & Development Center, 3 Matsunokijima, Iwata, Shizuoka,

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

COMPARING ACOUSTIC GLOTTAL FEATURE EXTRACTION METHODS WITH SIMULTANEOUSLY RECORDED HIGH- SPEED VIDEO FEATURES FOR CLINICALLY OBTAINED DATA

COMPARING ACOUSTIC GLOTTAL FEATURE EXTRACTION METHODS WITH SIMULTANEOUSLY RECORDED HIGH- SPEED VIDEO FEATURES FOR CLINICALLY OBTAINED DATA University of Kentucky UKnowledge Theses and Dissertations--Electrical and Computer Engineering Electrical and Computer Engineering 2012 COMPARING ACOUSTIC GLOTTAL FEATURE EXTRACTION METHODS WITH SIMULTANEOUSLY

More information

Linguistic Phonetics. Spectral Analysis

Linguistic Phonetics. Spectral Analysis 24.963 Linguistic Phonetics Spectral Analysis 4 4 Frequency (Hz) 1 Reading for next week: Liljencrants & Lindblom 1972. Assignment: Lip-rounding assignment, due 1/15. 2 Spectral analysis techniques There

More information

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

On the glottal flow derivative waveform and its properties

On the glottal flow derivative waveform and its properties COMPUTER SCIENCE DEPARTMENT UNIVERSITY OF CRETE On the glottal flow derivative waveform and its properties A time/frequency study George P. Kafentzis Bachelor s Dissertation 29/2/2008 Supervisor: Yannis

More information

Speech Coding using Linear Prediction

Speech Coding using Linear Prediction Speech Coding using Linear Prediction Jesper Kjær Nielsen Aalborg University and Bang & Olufsen jkn@es.aau.dk September 10, 2015 1 Background Speech is generated when air is pushed from the lungs through

More information

The GlottHMM Entry for Blizzard Challenge 2011: Utilizing Source Unit Selection in HMM-Based Speech Synthesis for Improved Excitation Generation

The GlottHMM Entry for Blizzard Challenge 2011: Utilizing Source Unit Selection in HMM-Based Speech Synthesis for Improved Excitation Generation The GlottHMM ntry for Blizzard Challenge 2011: Utilizing Source Unit Selection in HMM-Based Speech Synthesis for Improved xcitation Generation Antti Suni 1, Tuomo Raitio 2, Martti Vainio 1, Paavo Alku

More information

HIGH-PITCHED EXCITATION GENERATION FOR GLOTTAL VOCODING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING A DEEP NEURAL NETWORK

HIGH-PITCHED EXCITATION GENERATION FOR GLOTTAL VOCODING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING A DEEP NEURAL NETWORK HIGH-PITCHED EXCITATION GENERATION FOR GLOTTAL VOCODING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING A DEEP NEURAL NETWORK Lauri Juvela, Bajibabu Bollepalli, Manu Airaksinen, Paavo Alku Aalto University,

More information

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Introducing COVAREP: A collaborative voice analysis repository for speech technologies Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1 Introduction

More information

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE 1602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 8, NOVEMBER 2008 Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE Abstract

More information

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006 1. Resonators and Filters INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006 Different vibrating objects are tuned to specific frequencies; these frequencies at which a particular

More information

A New Iterative Algorithm for ARMA Modelling of Vowels and glottal Flow Estimation based on Blind System Identification

A New Iterative Algorithm for ARMA Modelling of Vowels and glottal Flow Estimation based on Blind System Identification A New Iterative Algorithm for ARMA Modelling of Vowels and glottal Flow Estimation based on Blind System Identification Milad LANKARANY Department of Electrical and Computer Engineering, Shahid Beheshti

More information

Advanced Methods for Glottal Wave Extraction

Advanced Methods for Glottal Wave Extraction Advanced Methods for Glottal Wave Extraction Jacqueline Walker and Peter Murphy Department of Electronic and Computer Engineering, University of Limerick, Limerick, Ireland, jacqueline.walker@ul.ie, peter.murphy@ul.ie

More information

Detecting Speech Polarity with High-Order Statistics

Detecting Speech Polarity with High-Order Statistics Detecting Speech Polarity with High-Order Statistics Thomas Drugman, Thierry Dutoit TCTS Lab, University of Mons, Belgium Abstract. Inverting the speech polarity, which is dependent upon the recording

More information

Experimental evaluation of inverse filtering using physical systems with known glottal flow and tract characteristics

Experimental evaluation of inverse filtering using physical systems with known glottal flow and tract characteristics Experimental evaluation of inverse filtering using physical systems with known glottal flow and tract characteristics Derek Tze Wei Chu and Kaiwen Li School of Physics, University of New South Wales, Sydney,

More information

Edinburgh Research Explorer

Edinburgh Research Explorer Edinburgh Research Explorer Voice source modelling using deep neural networks for statistical parametric speech synthesis Citation for published version: Raitio, T, Lu, H, Kane, J, Suni, A, Vainio, M,

More information

Vowel Enhancement in Early Stage Spanish Esophageal Speech Using Natural Glottal Flow Pulse and Vocal Tract Frequency Warping

Vowel Enhancement in Early Stage Spanish Esophageal Speech Using Natural Glottal Flow Pulse and Vocal Tract Frequency Warping Vowel Enhancement in Early Stage Spanish Esophageal Speech Using Natural Glottal Flow Pulse and Vocal Tract Frequency Warping Rizwan Ishaq 1, Dhananjaya Gowda 2, Paavo Alku 2, Begoña García Zapirain 1

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

Adaptive Filters Linear Prediction

Adaptive Filters Linear Prediction Adaptive Filters Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory Slide 1 Contents

More information

GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES

GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES Clemson University TigerPrints All Dissertations Dissertations 5-2012 GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES Yiqiao Chen Clemson University, rls_lms@yahoo.com

More information

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Proceedings of the 2 nd International Conference on Current Trends in Engineering and Management ICCTEM -214 ISSN

More information

A perceptually and physiologically motivated voice source model

A perceptually and physiologically motivated voice source model INTERSPEECH 23 A perceptually and physiologically motivated voice source model Gang Chen, Marc Garellek 2,3, Jody Kreiman 3, Bruce R. Gerratt 3, Abeer Alwan Department of Electrical Engineering, University

More information

Quarterly Progress and Status Report. Acoustic properties of the Rothenberg mask

Quarterly Progress and Status Report. Acoustic properties of the Rothenberg mask Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Acoustic properties of the Rothenberg mask Hertegård, S. and Gauffin, J. journal: STL-QPSR volume: 33 number: 2-3 year: 1992 pages:

More information

SOURCE-filter modeling of speech is based on exciting. Glottal Spectral Separation for Speech Synthesis

SOURCE-filter modeling of speech is based on exciting. Glottal Spectral Separation for Speech Synthesis IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 1 Glottal Spectral Separation for Speech Synthesis João P. Cabral, Korin Richmond, Member, IEEE, Junichi Yamagishi, Member, IEEE, and Steve Renals,

More information

A Review of Glottal Waveform Analysis

A Review of Glottal Waveform Analysis A Review of Glottal Waveform Analysis Jacqueline Walker and Peter Murphy Department of Electronic and Computer Engineering, University of Limerick, Limerick, Ireland jacqueline.walker@ul.ie,peter.murphy@ul.ie

More information

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS NORDIC ACOUSTICAL MEETING 12-14 JUNE 1996 HELSINKI WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS Helsinki University of Technology Laboratory of Acoustics and Audio

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Synthesis Algorithms and Validation

Synthesis Algorithms and Validation Chapter 5 Synthesis Algorithms and Validation An essential step in the study of pathological voices is re-synthesis; clear and immediate evidence of the success and accuracy of modeling efforts is provided

More information

SPEECH AND SPECTRAL ANALYSIS

SPEECH AND SPECTRAL ANALYSIS SPEECH AND SPECTRAL ANALYSIS 1 Sound waves: production in general: acoustic interference vibration (carried by some propagation medium) variations in air pressure speech: actions of the articulatory organs

More information

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances

More information

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,

More information

COMP 546, Winter 2017 lecture 20 - sound 2

COMP 546, Winter 2017 lecture 20 - sound 2 Today we will examine two types of sounds that are of great interest: music and speech. We will see how a frequency domain analysis is fundamental to both. Musical sounds Let s begin by briefly considering

More information

651 Analysis of LSF frame selection in voice conversion

651 Analysis of LSF frame selection in voice conversion 651 Analysis of LSF frame selection in voice conversion Elina Helander 1, Jani Nurminen 2, Moncef Gabbouj 1 1 Institute of Signal Processing, Tampere University of Technology, Finland 2 Noia Technology

More information

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Vikram Ramesh Lakkavalli, K V Vijay Girish, A G Ramakrishnan Medical Intelligence and Language Engineering (MILE) Laboratory

More information

Using text and acoustic features in predicting glottal excitation waveforms for parametric speech synthesis with recurrent neural networks

Using text and acoustic features in predicting glottal excitation waveforms for parametric speech synthesis with recurrent neural networks INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Using text and acoustic in predicting glottal excitation waveforms for parametric speech synthesis with recurrent neural networks Lauri Juvela

More information

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey Workshop on Spoken Language Processing - 2003, TIFR, Mumbai, India, January 9-11, 2003 149 IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES P. K. Lehana and P. C. Pandey Department of Electrical

More information

ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1

ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1 ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN SPEECH SIGNALS Zied Mnasri 1, Hamid Amiri 1 1 Electrical engineering dept, National School of Engineering in Tunis, University Tunis El

More information

Cumulative Impulse Strength for Epoch Extraction

Cumulative Impulse Strength for Epoch Extraction Cumulative Impulse Strength for Epoch Extraction Journal: IEEE Signal Processing Letters Manuscript ID SPL--.R Manuscript Type: Letter Date Submitted by the Author: n/a Complete List of Authors: Prathosh,

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Compression Using Voice Excited Linear Predictive Coding Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality

More information

Vocal effort modification for singing synthesis

Vocal effort modification for singing synthesis INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Vocal effort modification for singing synthesis Olivier Perrotin, Christophe d Alessandro LIMSI, CNRS, Université Paris-Saclay, France olivier.perrotin@limsi.fr

More information

Subtractive Synthesis & Formant Synthesis

Subtractive Synthesis & Formant Synthesis Subtractive Synthesis & Formant Synthesis Prof Eduardo R Miranda Varèse-Gastprofessor eduardo.miranda@btinternet.com Electronic Music Studio TU Berlin Institute of Communications Research http://www.kgw.tu-berlin.de/

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

L19: Prosodic modification of speech

L19: Prosodic modification of speech L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture

More information

2007 Elsevier Science. Reprinted with permission from Elsevier.

2007 Elsevier Science. Reprinted with permission from Elsevier. Lehto L, Airas M, Björkner E, Sundberg J, Alku P, Comparison of two inverse filtering methods in parameterization of the glottal closing phase characteristics in different phonation types, Journal of Voice,

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz Between physics and perception signal models for high level audio processing Axel Röbel Analysis / synthesis team, IRCAM DAFx 2010 iem Graz Overview Introduction High level control of signal transformation

More information

TIME encoding of a band-limited function,,

TIME encoding of a band-limited function,, 672 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 8, AUGUST 2006 Time Encoding Machines With Multiplicative Coupling, Feedforward, and Feedback Aurel A. Lazar, Fellow, IEEE

More information

Fundamental frequency estimation of speech signals using MUSIC algorithm

Fundamental frequency estimation of speech signals using MUSIC algorithm Acoust. Sci. & Tech. 22, 4 (2) TECHNICAL REPORT Fundamental frequency estimation of speech signals using MUSIC algorithm Takahiro Murakami and Yoshihisa Ishida School of Science and Technology, Meiji University,,

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

Research Article Linear Prediction Using Refined Autocorrelation Function

Research Article Linear Prediction Using Refined Autocorrelation Function Hindawi Publishing Corporation EURASIP Journal on Audio, Speech, and Music Processing Volume 27, Article ID 45962, 9 pages doi:.55/27/45962 Research Article Linear Prediction Using Refined Autocorrelation

More information

ScienceDirect. Accuracy of Jitter and Shimmer Measurements

ScienceDirect. Accuracy of Jitter and Shimmer Measurements Available online at www.sciencedirect.com ScienceDirect Procedia Technology 16 (2014 ) 1190 1199 CENTERIS 2014 - Conference on ENTERprise Information Systems / ProjMAN 2014 - International Conference on

More information

The source-filter model of speech production"

The source-filter model of speech production 24.915/24.963! Linguistic Phonetics! The source-filter model of speech production" Glottal airflow Output from lips 400 200 0.1 0.2 0.3 Time (in secs) 30 20 10 0 0 1000 2000 3000 Frequency (Hz) Source

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

CHAPTER 3. ACOUSTIC MEASURES OF GLOTTAL CHARACTERISTICS 39 and from periodic glottal sources (Shadle, 1985; Stevens, 1993). The ratio of the amplitude of the harmonics at 3 khz to the noise amplitude in

More information

VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL

VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL Narsimh Kamath Vishweshwara Rao Preeti Rao NIT Karnataka EE Dept, IIT-Bombay EE Dept, IIT-Bombay narsimh@gmail.com vishu@ee.iitb.ac.in

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

Hungarian Speech Synthesis Using a Phase Exact HNM Approach Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

Adaptive Filters Application of Linear Prediction

Adaptive Filters Application of Linear Prediction Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing

More information

Transforming High-Effort Voices Into Breathy Voices Using Adaptive Pre-Emphasis Linear Prediction

Transforming High-Effort Voices Into Breathy Voices Using Adaptive Pre-Emphasis Linear Prediction Transforming High-Effort Voices Into Breathy Voices Using Adaptive Pre-Emphasis Linear Prediction by Karl Ingram Nordstrom B.Eng., University of Victoria, 1995 M.A.Sc., University of Victoria, 2000 A Dissertation

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,

More information

USING A WHITE NOISE SOURCE TO CHARACTERIZE A GLOTTAL SOURCE WAVEFORM FOR IMPLEMENTATION IN A SPEECH SYNTHESIS SYSTEM

USING A WHITE NOISE SOURCE TO CHARACTERIZE A GLOTTAL SOURCE WAVEFORM FOR IMPLEMENTATION IN A SPEECH SYNTHESIS SYSTEM USING A WHITE NOISE SOURCE TO CHARACTERIZE A GLOTTAL SOURCE WAVEFORM FOR IMPLEMENTATION IN A SPEECH SYNTHESIS SYSTEM by Brandon R. Graham A report submitted in partial fulfillment of the requirements for

More information

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals ISCA Journal of Engineering Sciences ISCA J. Engineering Sci. Vocoder (LPC) Analysis by Variation of Input Parameters and Signals Abstract Gupta Rajani, Mehta Alok K. and Tiwari Vebhav Truba College of

More information

The NII speech synthesis entry for Blizzard Challenge 2016

The NII speech synthesis entry for Blizzard Challenge 2016 The NII speech synthesis entry for Blizzard Challenge 2016 Lauri Juvela 1, Xin Wang 2,3, Shinji Takaki 2, SangJin Kim 4, Manu Airaksinen 1, Junichi Yamagishi 2,3,5 1 Aalto University, Department of Signal

More information

THE BEATING EQUALIZER AND ITS APPLICATION TO THE SYNTHESIS AND MODIFICATION OF PIANO TONES

THE BEATING EQUALIZER AND ITS APPLICATION TO THE SYNTHESIS AND MODIFICATION OF PIANO TONES J. Rauhala, The beating equalizer and its application to the synthesis and modification of piano tones, in Proceedings of the 1th International Conference on Digital Audio Effects, Bordeaux, France, 27,

More information

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis Signal Analysis Music 27a: Signal Analysis Tamara Smyth, trsmyth@ucsd.edu Department of Music, University of California, San Diego (UCSD November 23, 215 Some tools we may want to use to automate analysis

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/76252

More information

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor

More information

A Comparative Study of Formant Frequencies Estimation Techniques

A Comparative Study of Formant Frequencies Estimation Techniques A Comparative Study of Formant Frequencies Estimation Techniques DORRA GARGOURI, Med ALI KAMMOUN and AHMED BEN HAMIDA Unité de traitement de l information et électronique médicale, ENIS University of Sfax

More information

Voiced/nonvoiced detection based on robustness of voiced epochs

Voiced/nonvoiced detection based on robustness of voiced epochs Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies

More information

ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION DARYUSH MEHTA

ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION DARYUSH MEHTA ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION by DARYUSH MEHTA B.S., Electrical Engineering (23) University of Florida SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING

More information

AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH

AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH A. Stráník, R. Čmejla Department of Circuit Theory, Faculty of Electrical Engineering, CTU in Prague Abstract Acoustic

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

A Parametric Model for Spectral Sound Synthesis of Musical Sounds A Parametric Model for Spectral Sound Synthesis of Musical Sounds Cornelia Kreutzer University of Limerick ECE Department Limerick, Ireland cornelia.kreutzer@ul.ie Jacqueline Walker University of Limerick

More information

Speech Enhancement Based On Noise Reduction

Speech Enhancement Based On Noise Reduction Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion

More information

Epoch Extraction From Emotional Speech

Epoch Extraction From Emotional Speech Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

Voice Excited Lpc for Speech Compression by V/Uv Classification

Voice Excited Lpc for Speech Compression by V/Uv Classification IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 65-69 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Voice Excited Lpc for Speech

More information

Pattern Recognition Part 2: Noise Suppression

Pattern Recognition Part 2: Noise Suppression Pattern Recognition Part 2: Noise Suppression Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Engineering Digital Signal Processing

More information

A Physiologically Produced Impulsive UWB signal: Speech

A Physiologically Produced Impulsive UWB signal: Speech A Physiologically Produced Impulsive UWB signal: Speech Maria-Gabriella Di Benedetto University of Rome La Sapienza Faculty of Engineering Rome, Italy gaby@acts.ing.uniroma1.it http://acts.ing.uniroma1.it

More information

INTRODUCTION TO COMPUTER MUSIC PHYSICAL MODELS. Professor of Computer Science, Art, and Music. Copyright by Roger B.

INTRODUCTION TO COMPUTER MUSIC PHYSICAL MODELS. Professor of Computer Science, Art, and Music. Copyright by Roger B. INTRODUCTION TO COMPUTER MUSIC PHYSICAL MODELS Roger B. Dannenberg Professor of Computer Science, Art, and Music Copyright 2002-2013 by Roger B. Dannenberg 1 Introduction Many kinds of synthesis: Mathematical

More information

HUMAN speech is frequently encountered in several

HUMAN speech is frequently encountered in several 1948 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 7, SEPTEMBER 2012 Enhancement of Single-Channel Periodic Signals in the Time-Domain Jesper Rindom Jensen, Student Member,

More information

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR BeBeC-2016-S9 BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR Clemens Nau Daimler AG Béla-Barényi-Straße 1, 71063 Sindelfingen, Germany ABSTRACT Physically the conventional beamforming method

More information

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Engineering

More information

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt Pattern Recognition Part 6: Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory

More information

Publication III. c 2008 Taylor & Francis/Informa Healthcare. Reprinted with permission.

Publication III. c 2008 Taylor & Francis/Informa Healthcare. Reprinted with permission. 113 Publication III Matti Airas, TKK Aparat: An Environment for Voice Inverse Filtering and Parameterization. Logopedics Phoniatrics Vocology, 33(1), pp. 49 64, 2008. c 2008 Taylor & FrancisInforma Healthcare.

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information