ACCURATE SPEECH DECOMPOSITION INTO PERIODIC AND APERIODIC COMPONENTS BASED ON DISCRETE HARMONIC TRANSFORM

Size: px
Start display at page:

Download "ACCURATE SPEECH DECOMPOSITION INTO PERIODIC AND APERIODIC COMPONENTS BASED ON DISCRETE HARMONIC TRANSFORM"

Transcription

1 5th European Signal Processing Conference (EUSIPCO 007), Poznan, Poland, September 3-7, 007, copyright by EURASIP ACCURATE SPEECH DECOMPOSITIO ITO PERIODIC AD APERIODIC COMPOETS BASED O DISCRETE HARMOIC TRASFORM Piotr Zubrycki and Alexander Petrovsky Department of Real-Time Systems, Bialystok Technical University Wiejska 45A street, 5-35 Bialystok, Poland phone: (48 85) , fax: (48 85) palex@it.org.by ABSTRACT This paper presents a new method for the speech signal decomposition into periodic and aperiodic components. Proposed method is based on the Discrete Harmonic Transform (DHT). This transformation is able to analyze the signal spectrum in the harmonic domain. Another feature of the DHT is its ability to synchronize the transformation kernel with the time-varying pitch frequency. The system works without a priori knowledge about the pitch track. Unlike the most applications proposed method estimates the fundamental frequency changes within a frame before estimating fundamental the frequency itself. Periodic component is modelled as a sum of harmonically related sinusoids and for accurate estimation of the amplitudes and initial phases DHT is used. Aperiodic component is defined as a difference between the original speech and the estimated periodic component.. ITRODUCTIO Speech signal is generally assumed as a composition of two major components: periodic (harmonic) and aperiodic (noise). The problem of speech decomposition into its two basic components is the major challenge in many speech processing systems. In general this task lays in accurate estimation of the periodic and aperiodic components thus they can be analyzed separately which plays important role in many speech applications such as synthesis or coding. Periodic component is generated by the vibrations of vocal folds while aperiodic component is generated by the modulation of the air flow. Modulated air flow is responsible for generation fricative or plosive sounds but it also present in the voiced sounds as well. The basic speech production model assumes that the speech is either voiced or unvoiced. Unvoiced part of speech in this basic model is generated by passing a white gaussian noise signal through a linear filter, which represents the vocal track characteristics. Voiced parts of speech are modelled as a time-varying impulse train modulated by the vocal track filter. In this model it is assumed, that no noise signal is present in the voiced parts of speech. In fact, real voiced speech consists of some noise. The speech signal can be viewed as a mixed-source signal with both periodic and aperiodic excitation. In the sinusoidal and noise speech models this mixed-source speech signal is generally modelled as []: K s( = = A k k ( cosϕ k ( r(, () where A k is the instantaneous amplitude of k-th harmonic, K is the number of harmonics present in speech signal, r( is the noise component, φ k is the instantaneous phase of k-th harmonic defined as: n πf k ( i) ϕ k ( = ϕ (0) i = 0 k, () Fs where f k is the instantaneous frequency of the k-th harmonic, F s is the sampling frequency and φ k (0) is the initial phase of the k-th harmonic. Sinusoidal speech modelling treats the speech signal as a sum of periodic and aperiodic components where periodic signal defined as sum of sinusoids with a time-varying amplitudes and frequencies. If f k obey: f k = kf 0, where f 0 is the fundamental frequency, sinusoids in the model are harmonically related and thus the model is called Harmonicoise. There are several variations of the sinusoidal speech modelling [,]. Sinusoidal speech model presented by McAulay and Quatieri [3] and further developed by George and Smith [4] assumes the voiced speech as a sum of harmonically related sinusoids with the amplitudes and phases obtained directly from the Short-Time Fourier Transform (STFT) spectrum. Unvoiced speech is modelled as a sum of randomly distributed sinusoids with the random initial phase. Stylianou presented more accurate approach to the voiced speech modelling based on the harmonicnoise model [5]. In this approach the maximum voicing frequency is determined on the basis of the speech spectra analysis. The speech band is divided into the lower-voiced and the higher-unvoiced bands by the maximum voicing frequency. In the Multiband Excitation Vocoder (MBE) presented by Griffin and Lim [6] the speech spectrum is divided into a set of bands with a respect to the pitch frequency. Each band is analysed and the binary voiced/unvoiced decision is taken. Voiced bands are modelled as a sinusoids and unvoiced as a band-limited noise. Periodic and aperiodic speech decomposition in the methods discussed above involves a binary voiced/unvoiced 007 EURASIP 336

2 5th European Signal Processing Conference (EUSIPCO 007), Poznan, Poland, September 3-7, 007, copyright by EURASIP decision which is not valid from the speech production point of view. Yegnanarayana et. all [7] proposed a speech decomposition method which considers the voiced and the noise components to be present in the whole speech band. Idea of the work is to use an iterative algorithm based on the Discrete Fourier Transform (DFT)/Inverse Discrete Fourier Transform (IDFT) pairs for the noise component estimation. Another method of decomposition which uses the Pitch Scaled Harmonic Filter (PSHF) is presented by Jackson and Shadle [8]. The speech signal is windowed and the window length is chosen with respect to the knowledge of the pitch frequency thus the segment taken to the analysis contains integer multiple of the pitch cycles. Pitch-scaled frame length enables the pitch harmonics to be aligned with the frequency bins of the STFT and thus minimises the leakage, but complicates the windowing process. PSHF algorithm performs a decomposition in the frequency domain by selecting only these STFT bins which are aligned with the pitch harmonics. The most often assumption that is made to the speech signal is its local stationarity i.e. it is assumed that the parameters of the pitch harmonics are slowly-varying and locally these variations can be omitted. While dealing with the real speech signal these variations can decrease the quality of the speech components separation, especially if STFT is used as a spectral analysis tool. The accuracy of the speech decomposition can be improved if the speech signal nonstationarity is taken into account. In this paper we propose a new periodic-aperiodic decomposition method which assumes periodic and aperiodic components to be present in the whole speech band, which is similar to the approach presented in [9]. The motivation of our approach to speech separation problem was to develop a system which is able to accurately separate the speech components with taking into account the nonstationary speech nature and without a priori knowledge of the pitch frequency track. In our system we use the speech model defined by (). Basic concept of our method lays in the analysis of speech spectrum in the harmonic domain rather than the frequency domain in order to provide accurate estimation of the model parameters. For our purposes we have adopted the Harmonic Transform (HT) idea proposed by Zhang et. All [0]. The HT is a spectral analysis tool able to analyse the harmonic signal with the time-varying frequency and produce the pulse-train spectrum in the harmonic domain. First step of designed system is an estimation of the optimal speech fundamental frequency change on the frame-by-frame basis with usage of the HT. Once the optimal change of the pitch track is found the fundamental frequency is estimated using the analysis of the harmonic domain spectrum. On this basis the periodic component is estimated by selecting HT local maxima corresponding to the pitch harmonics. Aperiodic component is defined as a difference between the input speech and the estimated periodic component. The paper is organized as follows. In section we discus the Harmonic Transform and define the speech model used in our system. In section 3 the optimal pitch track estimation method is presented. Finally in section 4 we present a decomposition scheme. Some experimental results are given in section 5.. DISCRETE HARMOIC TRASFORM The most speech analysis applications based on sinusoidal speech modelling use the STFT spectrum for estimation of the harmonics parameters with the assumption of the speech local stationarity, i.e. the fundamental frequency is constant within the analysis frame. This is often coarse assumption in the case of the real speech signals. In fact fundamental frequency varies in time and thus only several first harmonic are distinguishable in the DFT spectrum (fig.). Figure Harmonic Transform: harmonic signal with 6 harmonics and the fundamental frequency changing from 00Hz to 0Hz (top), DFT (middle) and DHT (bottom) of this signal. This fact decreases STFT performance in the harmonic parameters estimation process. The basic concept of the harmonic domain spectral analysis is to provide the analysis along instantaneous harmonics frequencies rather than fixed frequencies like in the STFT. There are two main strategies which are possible. One strategy is to provide the timewarping of the input signal in order to convert time-varying frequency into the constant one and then use the STFT. The second one is to use the spectral analysis tool which transforms input signal directly into the harmonic domain. Zhang et. all [0] proposed the Harmonic Transform (HT) which is the transformation with built-in time-warping function. The HT of signal s(t) is defined as: jωφ t S = s t t e u ( ) ( ω ) ( ) φ ( dt, (3) φu ( t) u ) where φ u (t) is the unit phase function which is the phase of the fundamental divided by its instantaneous frequency [0] and φ u (t) is first order derivative of φ u (t). Inverse Harmonic Transform is defined as: jωφ s t = Sφ ω e u ( t) ( ) t dω π u ( ) ( ). (4) 007 EURASIP 337

3 5th European Signal Processing Conference (EUSIPCO 007), Poznan, Poland, September 3-7, 007, copyright by EURASIP In the real speech fundamental frequency is slowly time varying i.e. it cannot change rapidly in a short time period. On this basis in our approach we assume a linear frequency change within given speech segment. Instantaneous phase φ(t) of a sinusoid with linear change of the frequency is defined by known formula (for simplicity initial phase is omitted): εt ϕ ( t) = π f0t, (5) where f 0 is the initial frequency and ε=(δf 0 /T) is the fundamental frequency change divided by length of the segment (i.e. time in which this the frequency change occurs). Considering the discrete-time signals and the segment length of samples (T=/F s ) this formula can be written as: f0n Δf0n ϕ ( = π. (6) Fs F s Initial fundamental frequency within a given segment can be written as: f0 = f c afc, a = Δf0 fc, (7) where f c is the central fundamental frequency within a given segment of the length. Substituting f 0 and Δf 0 in (6) with (7) we get: πf ( c a an ϕ = α a (, α a ( = n. (8) F s ow, let us consider the Discrete Harmonic Transform for signals with linear changing fundamental frequency. Frequencies of the spectral lines of the Discrete Fourier Transform are defined as: Fs fc =. (9) In the HT central frequencies of the spectral lines are aligned with the frequencies of DFT spectral lines. Using (9) in (8) we get: π ϕ ( = α a (. (0) Finally we can define the Short Time Discrete Harmonic Transform (STHT) for signals with linear frequency change: n= 0 j πk α ( = s( α ( e, () where α ( is defined as: a an α ( n ) =. () Inverse STHT is defined as: = j πk α ( s( e. (3) k = 0 Example of the STFT spectrum and the STHT spectrum of a test signal is shown on fig. The input harmonic signal consists of 6 harmonics, the fundamental frequency changes linearly from 00Hz to 0Hz within a segment of length 56 samples (Fs=8000Hz). ote, that only few first harmonics in the STFT spectrum can be distinguished while in the STHT spectrum all of the harmonic are visible. The second example is an example of comparison of the spectrograms of the speech signal processed by the STFT and the STHT is shown in fig.. Figure example spectrograms of the speech signal using the STFT (top) and the STHT (bottom). 3. PITCH TRACK ESTIMATIO The pair of transforms given by () and (3) allow to analyze the harmonic signals in the harmonic domain in case when the fundamental frequency track is known. In case of speech both the central fundamental frequency and its change are unknown. Block diagram of the pitch detection algorithm is shown in fig. 3. Proposed algorithm starts from searching the fundamental frequency change by examining the STHT spectrum for a different unit phase functions () i.e. unit phase functions with a different a parameter. Optimal a parameter value is defined as the value which minimises the Spectral Flatness Measure: STHT ( a, k = 0 arg min SFM ( a) =, (4) a STHT ( a, k = 0 where STHT(a, is the harmonic spectrum of a given speech segment for a given a and. denotes absolute value. The minimal spectral flatness value indicates the highest concentration, which in case of our algorithm means an optimal fit of the signal and the STHT kernel. This also means, that the optimal speech fundamental frequency change is found for a given speech segment. Once this is done, the pitch frequency is estimated. First step of this algorithm is the determination of the pitch frequency harmonics candidates f i by peak picking of the STHT spectra based on the algorithm proposed in []. Pitch harmonics candidates with the central frequency located between EURASIP 338

4 5th European Signal Processing Conference (EUSIPCO 007), Poznan, Poland, September 3-7, 007, copyright by EURASIP and 450Hz are considered as the pitch candidates. For each pitch candidate the algorithm tries to find its harmonics. In the case of inability to find three of the first four harmonics the candidate is discarded. In order to prevent pitch doubling or halving following factor is computed for each harmonic: nhmax n = h max n a r =, n where a n is an amplitude of the n-th harmonic of pitch, n hmax is the number of all possible harmonics for a given pitch candidate. This formula can be viewed as a mean energy of the harmonic signal for the particular pitch per a single harmonic multiplied by the energy carried by the signal. This formula prevents from pitch halving, while the mean energy per a harmonic is smaller for halved pitch candidates from one side and from the other side energy of the harmonic signal is higher for lower pitch candidates which prevents from pitch doubling. As a pitch for a given frame the pitch candidate is selected with the greatest r factor. Finally, the pitch value is refined using following formula: nh max fn n = n fr =, nh max where f n is the frequency of nth harmonic candidate. Figure 3 Pitch detection algorithm Described procedure estimates the central pitch frequency for one frame. Further prevention of the pitch halving or doubling is provided by usage of the tracking buffer which stores the fundamental frequency estimates from a several consecutive frames. The final pitch estimation is done for the frame in the middle of the tracking buffer, thus the resulting pitch estimation is done witch a several frames delay. In our system we used the buffer length of 5. As a tracking algorithm we use the median filtering which we found simple and robust against grose pitch errors. 4. PERIODIC-APERIODIC DECOMPOSITIO Speech decomposition in our system is performed in a time domain. First, the periodic component is estimated and the aperiodic component is defined as a difference between the input speech signal and the estimated periodic component. Figure 4 Example of the speech decomposition: original speech (top), estimated periodic (middle) and aperiodic (bottom) components On the basis of the speech model discussed in section periodic component is defined as: K h( = = A ( ) k k cos kϕ ( ϕ k (0), (5) where A k is the amplitude of the k-th harmonic, φ( is the instantaneous phase of the k-th harmonic defined in (8) with the central frequency f c defined by the pitch frequency, φ k (0) is the initial phase of the k-th harmonic. Unfortunately pitch harmonics are not aligned with the spectral lines and thus they cannot be directly estimated from the STHT spectrum. One possible solution for this problem is an interpolation of the adjacent STHT coefficients. In our system we propose more accurate solution to find the harmonics amplitudes and phases. In order to provide the spectral analysis exactly at the frequencies aligned witch the pitch harmonic we use the same formula (8) as we used in (5). By doing it we get the special case of HT which we have used in our previous work []. The DHT variant aligned with the pitch is defined as: = n= 0 πkf j r α ( F s s( α ( e, where f r is the refined pitch frequency and k=..k, K is the number of the harmonics of the pitch. Amplitudes and phases of the harmonics can be computed directly from h) coefficients: A k = Re Im Im ϕ k (0) = arctan, Re where Re and Im stands for the real and the imaginary parts of respectively. The periodic component is generated using formula (5) and the aperiodic component is defined as: r( = s( h(. Example of the speech decomposition is given in fig EURASIP 339

5 5th European Signal Processing Conference (EUSIPCO 007), Poznan, Poland, September 3-7, 007, copyright by EURASIP 5. EXPERIMETAL RESULTS In order to verify the proposed decomposition algorithm we performed set of experiments on a synthetic speech-like signals. The testing procedure was as follows: two sets of synthetic speech were prepared, one for male (central frequency 0Hz) and one for female (central frequency 00Hz). In order to verify the Short Time Harmonic Transform performance different fundamental frequency changes were used in both sets. The fundamental frequency change parameters were chosen randomly within a given boundaries which were chosen in order not to exceed 30% of the central fundamental frequency within a test frame. We have tested our algorithm for several Harmonic to oise Ratios (HR) by adding a noise signal with different energy to the input signal. Results of the experiment is shown in table. Central Pitch Frequency HR [db] Measured HR [db] Estimated periodic component SR [db] 0 59,6 59, , 33, ,6 0, ,7 6, 0 0,05, ,3 68, ,3 38, ,6, ,54 6,3 00 0,06, Table Results of experiments In the table the HR column is the original HR ratio of the input signal. After periodic and aperiodic component estimation HR parameter was measured. Mean value of this measure is shown in the column Measured HR. Finally, the quality of estimated periodic component was tested by measuring its SR, which is defined as the estimated periodic component energy to the error signal energy ratio. Error signal is defined as a difference between the original and estimated periodic components. 6. COCLUSIOS In this paper we proposed new speech decomposition scheme based on Harmonic Transform. Four our purposes we have developed two variants of the Short Time Discrete Harmonic Transform in the case of linear frequency change within analysis frame. First variant allows for the spectral analysis in the harmonic domain and has the ability to synchronize its kernel with the input signal. Second variant of the transformation allows for accurate estimation of the pitch harmonics amplitudes and frequencies because the spectral lines in this variant are aligned with the pitch frequency. There are two main advantages of using the STHT compared to the conventional spectral analysis using the STFT. First is the ability to estimate the fundamental frequency change without a knowledge of the fundamental frequency itself. Second one is preventing spectrum smearing especially for higher order harmonics which is important if the spectral domain fundamental frequency estimation algorithm is used. This feature allows the algorithm to be more robust in the cases of highly intonated speech segments and transient speech segments as well. Experiments prove robustness of the proposed approach. 7. ACKOWLEDGEMETS This work was supported by Bialystok Technical University under the grant W/WI//05. REFERECES [] A.M. Kondoz, Digital speech: coding for low bit rate communication systems, ew York: John Wiley & Sons, 996. [] A.S. Spanias, Speech coding: a tutorial review, Proc. IEEE, vol. 8, no. 0, pp , 994. [3] R.J McAulay, T.F. Quatieri, Sinusoidal Coding in Speech Coding and Synthesis (W. Klein and K. Palival, eds.), Amsterdam: Elsevier Science Publishers, 995. [4] E.B. George, M.J.T. Smith, Speech Analysis/Synthesis and Modification Using an Analysis-by-Synthesis/Overlap- Add Sinusoidal Model, IEEE Trans. on Speech and Audio Processing, vol. 5, no. 5, pp , 997. [5] Y. Stylianou, Applying the Harmonic Plus oise Mode in Concatenative Speech Synthesis, IEEE Trans. on Speech and Audio Processing, vol. 9, no., 00. [6] D.W. Griffin, J.S. Lim, Multiband Excitation Vocoder, IEEE Trans. on Acoust., Speech and Signal Processing, vol. ASSP-36, pp. 3-35, 988. [7] B. Yegnanarayana, C. d Alessandro, V. Darsions, An Iterative Algorithm for Decomposiiton of Speech Signals into Voiced and oise Components, IEEE Trans. on Speech and Audio Coding, vol. 6, no., pp. -, 998. [8] P.J.B. Jackson, C.H. Shadle, Pitch-Scaled Estimation of Simultaneous Voiced and Turbulence-oise Components in Speech, IEEE Trans. on Speech and Audio Processing, vol. 9, no. 7, pp , Oct. 00 [9] X. Serra, Musical Sound Modeling with Sinusoids plus oise in Musical Signal Processing (C. Roads, S. Pope, A. Picialli, and G. De Poli eds.), Swets & Zeitlinger Publishers, 997, pp. 9- [0] F. Zhang, G. Bi, Y.Q. Chen, Harmonic Transform, IEEE Trans. on Vis. Image Signal Processing, vol. 5, o. 4, pp , Aug [] V.Sercov, A.Petrovsky, The method of pitch frequency detection on the base of tuning to its harmonics, in Proc. of the 9 th European Signal processing conference, EUSIPCO 98, vol.ii, Sep. 8-, 998, Rhodes, Greece. - pp [] V. Sercov, A. Petrovsky, An Improved Speech Model with Allowance for Time-Varying Pitch Harmonic Amplitudes and Frequencies in Low Bit-Rate MBE Coders, in Proc. of the 6ht European Сonf. on Speech Communication and Technology EUROSPEECH 99, Budapest, Hungary, 999, pp EURASIP 340

L19: Prosodic modification of speech

L19: Prosodic modification of speech L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture

More information

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,

More information

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

DECOMPOSITION OF SPEECH INTO VOICED AND UNVOICED COMPONENTS BASED ON A KALMAN FILTERBANK

DECOMPOSITION OF SPEECH INTO VOICED AND UNVOICED COMPONENTS BASED ON A KALMAN FILTERBANK DECOMPOSITIO OF SPEECH ITO VOICED AD UVOICED COMPOETS BASED O A KALMA FILTERBAK Mark Thomson, Simon Boland, Michael Smithers 3, Mike Wu & Julien Epps Motorola Labs, Botany, SW 09 Cross Avaya R & D, orth

More information

Timbral Distortion in Inverse FFT Synthesis

Timbral Distortion in Inverse FFT Synthesis Timbral Distortion in Inverse FFT Synthesis Mark Zadel Introduction Inverse FFT synthesis (FFT ) is a computationally efficient technique for performing additive synthesis []. Instead of summing partials

More information

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

Hungarian Speech Synthesis Using a Phase Exact HNM Approach Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components

Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components Geoffroy Peeters, avier Rodet To cite this version: Geoffroy Peeters, avier Rodet. Signal Characterization in terms of Sinusoidal

More information

FREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION. Jean Laroche

FREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION. Jean Laroche Proc. of the 6 th Int. Conference on Digital Audio Effects (DAFx-3), London, UK, September 8-11, 23 FREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION Jean Laroche Creative Advanced Technology

More information

Sound Synthesis Methods

Sound Synthesis Methods Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like

More information

Complex Sounds. Reading: Yost Ch. 4

Complex Sounds. Reading: Yost Ch. 4 Complex Sounds Reading: Yost Ch. 4 Natural Sounds Most sounds in our everyday lives are not simple sinusoidal sounds, but are complex sounds, consisting of a sum of many sinusoids. The amplitude and frequency

More information

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering VIBRATO DETECTING ALGORITHM IN REAL TIME Minhao Zhang, Xinzhao Liu University of Rochester Department of Electrical and Computer Engineering ABSTRACT Vibrato is a fundamental expressive attribute in music,

More information

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting Julius O. Smith III (jos@ccrma.stanford.edu) Center for Computer Research in Music and Acoustics (CCRMA)

More information

Digital Signal Processing

Digital Signal Processing COMP ENG 4TL4: Digital Signal Processing Notes for Lecture #27 Tuesday, November 11, 23 6. SPECTRAL ANALYSIS AND ESTIMATION 6.1 Introduction to Spectral Analysis and Estimation The discrete-time Fourier

More information

Lecture 9: Time & Pitch Scaling

Lecture 9: Time & Pitch Scaling ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 9: Time & Pitch Scaling 1. Time Scale Modification (TSM) 2. Time-Domain Approaches 3. The Phase Vocoder 4. Sinusoidal Approach Dan Ellis Dept. Electrical Engineering,

More information

Identification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound

Identification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound Identification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound Paul Masri, Prof. Andrew Bateman Digital Music Research Group, University of Bristol 1.4

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

The Partly Preserved Natural Phases in the Concatenative Speech Synthesis Based on the Harmonic/Noise Approach

The Partly Preserved Natural Phases in the Concatenative Speech Synthesis Based on the Harmonic/Noise Approach The Partly Preserved Natural Phases in the Concatenative Speech Synthesis Based on the Harmonic/Noise Approach ZBYNĚ K TYCHTL Department of Cybernetics University of West Bohemia Univerzitní 8, 306 14

More information

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,

More information

Converting Speaking Voice into Singing Voice

Converting Speaking Voice into Singing Voice Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech

More information

A Full-Band Adaptive Harmonic Representation of Speech

A Full-Band Adaptive Harmonic Representation of Speech A Full-Band Adaptive Harmonic Representation of Speech Gilles Degottex and Yannis Stylianou {degottex,yannis}@csd.uoc.gr University of Crete - FORTH - Swiss National Science Foundation G. Degottex & Y.

More information

Epoch Extraction From Emotional Speech

Epoch Extraction From Emotional Speech Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract

More information

Sinusoidal Modelling in Speech Synthesis, A Survey.

Sinusoidal Modelling in Speech Synthesis, A Survey. Sinusoidal Modelling in Speech Synthesis, A Survey. A.S. Visagie, J.A. du Preez Dept. of Electrical and Electronic Engineering University of Stellenbosch, 7600, Stellenbosch avisagie@dsp.sun.ac.za, dupreez@dsp.sun.ac.za

More information

SPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION

SPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION M.Tech. Credit Seminar Report, Electronic Systems Group, EE Dept, IIT Bombay, submitted November 04 SPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION G. Gidda Reddy (Roll no. 04307046)

More information

Pitch Period of Speech Signals Preface, Determination and Transformation

Pitch Period of Speech Signals Preface, Determination and Transformation Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com

More information

COMBINING ADVANCED SINUSOIDAL AND WAVEFORM MATCHING MODELS FOR PARAMETRIC AUDIO/SPEECH CODING

COMBINING ADVANCED SINUSOIDAL AND WAVEFORM MATCHING MODELS FOR PARAMETRIC AUDIO/SPEECH CODING 17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 COMBINING ADVANCED SINUSOIDAL AND WAVEFORM MATCHING MODELS FOR PARAMETRIC AUDIO/SPEECH CODING Alexey Petrovsky

More information

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL José R. Beltrán and Fernando Beltrán Department of Electronic Engineering and Communications University of

More information

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied

More information

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey Workshop on Spoken Language Processing - 2003, TIFR, Mumbai, India, January 9-11, 2003 149 IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES P. K. Lehana and P. C. Pandey Department of Electrical

More information

AhoTransf: A tool for Multiband Excitation based speech analysis and modification

AhoTransf: A tool for Multiband Excitation based speech analysis and modification AhoTransf: A tool for Multiband Excitation based speech analysis and modification Ibon Saratxaga, Inmaculada Hernáez, Eva avas, Iñai Sainz, Ier Luengo, Jon Sánchez, Igor Odriozola, Daniel Erro Aholab -

More information

FFT analysis in practice

FFT analysis in practice FFT analysis in practice Perception & Multimedia Computing Lecture 13 Rebecca Fiebrink Lecturer, Department of Computing Goldsmiths, University of London 1 Last Week Review of complex numbers: rectangular

More information

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

A Parametric Model for Spectral Sound Synthesis of Musical Sounds A Parametric Model for Spectral Sound Synthesis of Musical Sounds Cornelia Kreutzer University of Limerick ECE Department Limerick, Ireland cornelia.kreutzer@ul.ie Jacqueline Walker University of Limerick

More information

HIGH ACCURACY FRAME-BY-FRAME NON-STATIONARY SINUSOIDAL MODELLING

HIGH ACCURACY FRAME-BY-FRAME NON-STATIONARY SINUSOIDAL MODELLING HIGH ACCURACY FRAME-BY-FRAME NON-STATIONARY SINUSOIDAL MODELLING Jeremy J. Wells, Damian T. Murphy Audio Lab, Intelligent Systems Group, Department of Electronics University of York, YO10 5DD, UK {jjw100

More information

IMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR

IMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR IMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR Tomasz Żernici, Mare Domańsi, Poznań University of Technology, Chair of Multimedia Telecommunications and Microelectronics, Polana 3, 6-965, Poznań,

More information

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday. L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are

More information

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands Audio Engineering Society Convention Paper Presented at the th Convention May 5 Amsterdam, The Netherlands This convention paper has been reproduced from the author's advance manuscript, without editing,

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

Phase estimation in speech enhancement unimportant, important, or impossible?

Phase estimation in speech enhancement unimportant, important, or impossible? IEEE 7-th Convention of Electrical and Electronics Engineers in Israel Phase estimation in speech enhancement unimportant, important, or impossible? Timo Gerkmann, Martin Krawczyk, and Robert Rehr Speech

More information

YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION

YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION American Journal of Engineering and Technology Research Vol. 3, No., 03 YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION Yinan Kong Department of Electronic Engineering, Macquarie University

More information

Pitch-Scaled Estimation of Simultaneous Voiced and Turbulence-Noise Components in Speech

Pitch-Scaled Estimation of Simultaneous Voiced and Turbulence-Noise Components in Speech IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 7, OCTOBER 2001 713 Pitch-Scaled Estimation of Simultaneous Voiced and Turbulence-Noise Components in Speech Philip J. B. Jackson, Member,

More information

Impact Noise Suppression Using Spectral Phase Estimation

Impact Noise Suppression Using Spectral Phase Estimation Proceedings of APSIPA Annual Summit and Conference 2015 16-19 December 2015 Impact oise Suppression Using Spectral Phase Estimation Kohei FUJIKURA, Arata KAWAMURA, and Youji IIGUI Graduate School of Engineering

More information

ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1

ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1 ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN SPEECH SIGNALS Zied Mnasri 1, Hamid Amiri 1 1 Electrical engineering dept, National School of Engineering in Tunis, University Tunis El

More information

Application of velvet noise and its variants for synthetic speech and singing (Revised and extended version with appendices)

Application of velvet noise and its variants for synthetic speech and singing (Revised and extended version with appendices) Application of velvet noise and its variants for synthetic speech and singing (Revised and extended version with appendices) (Compiled: 1:3 A.M., February, 18) Hideki Kawahara 1,a) Abstract: The Velvet

More information

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar Biomedical Signals Signals and Images in Medicine Dr Nabeel Anwar Noise Removal: Time Domain Techniques 1. Synchronized Averaging (covered in lecture 1) 2. Moving Average Filters (today s topic) 3. Derivative

More information

HIGH ACCURACY AND OCTAVE ERROR IMMUNE PITCH DETECTION ALGORITHMS

HIGH ACCURACY AND OCTAVE ERROR IMMUNE PITCH DETECTION ALGORITHMS ARCHIVES OF ACOUSTICS 29, 1, 1 21 (2004) HIGH ACCURACY AND OCTAVE ERROR IMMUNE PITCH DETECTION ALGORITHMS M. DZIUBIŃSKI and B. KOSTEK Multimedia Systems Department Gdańsk University of Technology Narutowicza

More information

ADDITIVE synthesis [1] is the original spectrum modeling

ADDITIVE synthesis [1] is the original spectrum modeling IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 851 Perceptual Long-Term Variable-Rate Sinusoidal Modeling of Speech Laurent Girin, Member, IEEE, Mohammad Firouzmand,

More information

FOURIER analysis is a well-known method for nonparametric

FOURIER analysis is a well-known method for nonparametric 386 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 54, NO. 1, FEBRUARY 2005 Resonator-Based Nonparametric Identification of Linear Systems László Sujbert, Member, IEEE, Gábor Péceli, Fellow,

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

Final Exam Practice Questions for Music 421, with Solutions

Final Exam Practice Questions for Music 421, with Solutions Final Exam Practice Questions for Music 4, with Solutions Elementary Fourier Relationships. For the window w = [/,,/ ], what is (a) the dc magnitude of the window transform? + (b) the magnitude at half

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information

Real-time fundamental frequency estimation by least-square fitting. IEEE Transactions on Speech and Audio Processing, 1997, v. 5 n. 2, p.

Real-time fundamental frequency estimation by least-square fitting. IEEE Transactions on Speech and Audio Processing, 1997, v. 5 n. 2, p. Title Real-time fundamental frequency estimation by least-square fitting Author(s) Choi, AKO Citation IEEE Transactions on Speech and Audio Processing, 1997, v. 5 n. 2, p. 201-205 Issued Date 1997 URL

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

T a large number of applications, and as a result has

T a large number of applications, and as a result has IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. 36, NO. 8, AUGUST 1988 1223 Multiband Excitation Vocoder DANIEL W. GRIFFIN AND JAE S. LIM, FELLOW, IEEE AbstractIn this paper, we present

More information

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Vikram Ramesh Lakkavalli, K V Vijay Girish, A G Ramakrishnan Medical Intelligence and Language Engineering (MILE) Laboratory

More information

A NEW APPROACH TO TRANSIENT PROCESSING IN THE PHASE VOCODER. Axel Röbel. IRCAM, Analysis-Synthesis Team, France

A NEW APPROACH TO TRANSIENT PROCESSING IN THE PHASE VOCODER. Axel Röbel. IRCAM, Analysis-Synthesis Team, France A NEW APPROACH TO TRANSIENT PROCESSING IN THE PHASE VOCODER Axel Röbel IRCAM, Analysis-Synthesis Team, France Axel.Roebel@ircam.fr ABSTRACT In this paper we propose a new method to reduce phase vocoder

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

SPEECH AND SPECTRAL ANALYSIS

SPEECH AND SPECTRAL ANALYSIS SPEECH AND SPECTRAL ANALYSIS 1 Sound waves: production in general: acoustic interference vibration (carried by some propagation medium) variations in air pressure speech: actions of the articulatory organs

More information

Carrier Frequency Offset Estimation in WCDMA Systems Using a Modified FFT-Based Algorithm

Carrier Frequency Offset Estimation in WCDMA Systems Using a Modified FFT-Based Algorithm Carrier Frequency Offset Estimation in WCDMA Systems Using a Modified FFT-Based Algorithm Seare H. Rezenom and Anthony D. Broadhurst, Member, IEEE Abstract-- Wideband Code Division Multiple Access (WCDMA)

More information

AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH

AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH A. Stráník, R. Čmejla Department of Circuit Theory, Faculty of Electrical Engineering, CTU in Prague Abstract Acoustic

More information

HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH. George P. Kafentzis and Yannis Stylianou

HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH. George P. Kafentzis and Yannis Stylianou HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH George P. Kafentzis and Yannis Stylianou Multimedia Informatics Lab Department of Computer Science University of Crete, Greece ABSTRACT In this paper,

More information

COMP 546, Winter 2017 lecture 20 - sound 2

COMP 546, Winter 2017 lecture 20 - sound 2 Today we will examine two types of sounds that are of great interest: music and speech. We will see how a frequency domain analysis is fundamental to both. Musical sounds Let s begin by briefly considering

More information

Measurement of RMS values of non-coherently sampled signals. Martin Novotny 1, Milos Sedlacek 2

Measurement of RMS values of non-coherently sampled signals. Martin Novotny 1, Milos Sedlacek 2 Measurement of values of non-coherently sampled signals Martin ovotny, Milos Sedlacek, Czech Technical University in Prague, Faculty of Electrical Engineering, Dept. of Measurement Technicka, CZ-667 Prague,

More information

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006 1. Resonators and Filters INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006 Different vibrating objects are tuned to specific frequencies; these frequencies at which a particular

More information

Formant Synthesis of Haegeum: A Sound Analysis/Synthesis System using Cpestral Envelope

Formant Synthesis of Haegeum: A Sound Analysis/Synthesis System using Cpestral Envelope Formant Synthesis of Haegeum: A Sound Analysis/Synthesis System using Cpestral Envelope Myeongsu Kang School of Computer Engineering and Information Technology Ulsan, South Korea ilmareboy@ulsan.ac.kr

More information

VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL

VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL Narsimh Kamath Vishweshwara Rao Preeti Rao NIT Karnataka EE Dept, IIT-Bombay EE Dept, IIT-Bombay narsimh@gmail.com vishu@ee.iitb.ac.in

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

ADAPTIVE NOISE LEVEL ESTIMATION

ADAPTIVE NOISE LEVEL ESTIMATION Proc. of the 9 th Int. Conference on Digital Audio Effects (DAFx-6), Montreal, Canada, September 18-2, 26 ADAPTIVE NOISE LEVEL ESTIMATION Chunghsin Yeh Analysis/Synthesis team IRCAM/CNRS-STMS, Paris, France

More information

ON BEDROSIAN CONDITION IN APPLICATION TO CHIRP SOUNDS

ON BEDROSIAN CONDITION IN APPLICATION TO CHIRP SOUNDS 15th European Signal Processing Conference (EUSIPCO 7), Poznan, Poland, September 3-7, 7, copyright by EURASIP ON BEDROSIAN CONDIION IN APPLICAION O CHIRP SOUNDS E. HERMANOWICZ 1 ) ) and M. ROJEWSKI Faculty

More information

Discrete Fourier Transform (DFT)

Discrete Fourier Transform (DFT) Amplitude Amplitude Discrete Fourier Transform (DFT) DFT transforms the time domain signal samples to the frequency domain components. DFT Signal Spectrum Time Frequency DFT is often used to do frequency

More information

Introduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem

Introduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem Introduction to Wavelet Transform Chapter 7 Instructor: Hossein Pourghassem Introduction Most of the signals in practice, are TIME-DOMAIN signals in their raw format. It means that measured signal is a

More information

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o

More information

Voiced/nonvoiced detection based on robustness of voiced epochs

Voiced/nonvoiced detection based on robustness of voiced epochs Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies

More information

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester SPEECH TO SINGING SYNTHESIS SYSTEM Mingqing Yun, Yoon mo Yang, Yufei Zhang Department of Electrical and Computer Engineering University of Rochester ABSTRACT This paper describes a speech-to-singing synthesis

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Compression Using Voice Excited Linear Predictive Coding Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

AM-FM demodulation using zero crossings and local peaks

AM-FM demodulation using zero crossings and local peaks AM-FM demodulation using zero crossings and local peaks K.V.S. Narayana and T.V. Sreenivas Department of Electrical Communication Engineering Indian Institute of Science, Bangalore, India 52 Phone: +9

More information

Wavelet Speech Enhancement based on the Teager Energy Operator

Wavelet Speech Enhancement based on the Teager Energy Operator Wavelet Speech Enhancement based on the Teager Energy Operator Mohammed Bahoura and Jean Rouat ERMETIS, DSA, Université du Québec à Chicoutimi, Chicoutimi, Québec, G7H 2B1, Canada. Abstract We propose

More information

Frequency Domain Representation of Signals

Frequency Domain Representation of Signals Frequency Domain Representation of Signals The Discrete Fourier Transform (DFT) of a sampled time domain waveform x n x 0, x 1,..., x 1 is a set of Fourier Coefficients whose samples are 1 n0 X k X0, X

More information

Linear Frequency Modulation (FM) Chirp Signal. Chirp Signal cont. CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis

Linear Frequency Modulation (FM) Chirp Signal. Chirp Signal cont. CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis Linear Frequency Modulation (FM) CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 26, 29 Till now we

More information

SAMPLING THEORY. Representing continuous signals with discrete numbers

SAMPLING THEORY. Representing continuous signals with discrete numbers SAMPLING THEORY Representing continuous signals with discrete numbers Roger B. Dannenberg Professor of Computer Science, Art, and Music Carnegie Mellon University ICM Week 3 Copyright 2002-2013 by Roger

More information

METHODS FOR SEPARATION OF AMPLITUDE AND FREQUENCY MODULATION IN FOURIER TRANSFORMED SIGNALS

METHODS FOR SEPARATION OF AMPLITUDE AND FREQUENCY MODULATION IN FOURIER TRANSFORMED SIGNALS METHODS FOR SEPARATION OF AMPLITUDE AND FREQUENCY MODULATION IN FOURIER TRANSFORMED SIGNALS Jeremy J. Wells Audio Lab, Department of Electronics, University of York, YO10 5DD York, UK jjw100@ohm.york.ac.uk

More information

Spectrum. Additive Synthesis. Additive Synthesis Caveat. Music 270a: Modulation

Spectrum. Additive Synthesis. Additive Synthesis Caveat. Music 270a: Modulation Spectrum Music 7a: Modulation Tamara Smyth, trsmyth@ucsd.edu Department of Music, University of California, San Diego (UCSD) October 3, 7 When sinusoids of different frequencies are added together, the

More information

Estimation of Sinusoidally Modulated Signal Parameters Based on the Inverse Radon Transform

Estimation of Sinusoidally Modulated Signal Parameters Based on the Inverse Radon Transform Estimation of Sinusoidally Modulated Signal Parameters Based on the Inverse Radon Transform Miloš Daković, Ljubiša Stanković Faculty of Electrical Engineering, University of Montenegro, Podgorica, Montenegro

More information

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE - @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu

More information

Instantaneous Higher Order Phase Derivatives

Instantaneous Higher Order Phase Derivatives Digital Signal Processing 12, 416 428 (2002) doi:10.1006/dspr.2002.0456 Instantaneous Higher Order Phase Derivatives Douglas J. Nelson National Security Agency, Fort George G. Meade, Maryland 20755 E-mail:

More information

The Channel Vocoder (analyzer):

The Channel Vocoder (analyzer): Vocoders 1 The Channel Vocoder (analyzer): The channel vocoder employs a bank of bandpass filters, Each having a bandwidth between 100 Hz and 300 Hz. Typically, 16-20 linear phase FIR filter are used.

More information

Chapter 7. Frequency-Domain Representations 语音信号的频域表征

Chapter 7. Frequency-Domain Representations 语音信号的频域表征 Chapter 7 Frequency-Domain Representations 语音信号的频域表征 1 General Discrete-Time Model of Speech Production Voiced Speech: A V P(z)G(z)V(z)R(z) Unvoiced Speech: A N N(z)V(z)R(z) 2 DTFT and DFT of Speech The

More information

TIME FREQUENCY ANALYSIS OF TRANSIENT NVH PHENOMENA IN VEHICLES

TIME FREQUENCY ANALYSIS OF TRANSIENT NVH PHENOMENA IN VEHICLES TIME FREQUENCY ANALYSIS OF TRANSIENT NVH PHENOMENA IN VEHICLES K Becker 1, S J Walsh 2, J Niermann 3 1 Institute of Automotive Engineering, University of Applied Sciences Cologne, Germany 2 Dept. of Aeronautical

More information

Topic 2. Signal Processing Review. (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music)

Topic 2. Signal Processing Review. (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music) Topic 2 Signal Processing Review (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music) Recording Sound Mechanical Vibration Pressure Waves Motion->Voltage Transducer

More information

Correspondence. Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm

Correspondence. Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 3, MAY 1999 333 Correspondence Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm Sassan Ahmadi and Andreas

More information

Synthesis Algorithms and Validation

Synthesis Algorithms and Validation Chapter 5 Synthesis Algorithms and Validation An essential step in the study of pathological voices is re-synthesis; clear and immediate evidence of the success and accuracy of modeling efforts is provided

More information

THE BEATING EQUALIZER AND ITS APPLICATION TO THE SYNTHESIS AND MODIFICATION OF PIANO TONES

THE BEATING EQUALIZER AND ITS APPLICATION TO THE SYNTHESIS AND MODIFICATION OF PIANO TONES J. Rauhala, The beating equalizer and its application to the synthesis and modification of piano tones, in Proceedings of the 1th International Conference on Digital Audio Effects, Bordeaux, France, 27,

More information

I-Hao Hsiao, Chun-Tang Chao*, and Chi-Jo Wang (2016). A HHT-Based Music Synthesizer. Intelligent Technologies and Engineering Systems, Lecture Notes

I-Hao Hsiao, Chun-Tang Chao*, and Chi-Jo Wang (2016). A HHT-Based Music Synthesizer. Intelligent Technologies and Engineering Systems, Lecture Notes I-Hao Hsiao, Chun-Tang Chao*, and Chi-Jo Wang (2016). A HHT-Based Music Synthesizer. Intelligent Technologies and Engineering Systems, Lecture Notes in Electrical Engineering (LNEE), Vol.345, pp.523-528.

More information

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics

More information