ACCURATE SPEECH DECOMPOSITION INTO PERIODIC AND APERIODIC COMPONENTS BASED ON DISCRETE HARMONIC TRANSFORM
|
|
- Arline Daniels
- 6 years ago
- Views:
Transcription
1 5th European Signal Processing Conference (EUSIPCO 007), Poznan, Poland, September 3-7, 007, copyright by EURASIP ACCURATE SPEECH DECOMPOSITIO ITO PERIODIC AD APERIODIC COMPOETS BASED O DISCRETE HARMOIC TRASFORM Piotr Zubrycki and Alexander Petrovsky Department of Real-Time Systems, Bialystok Technical University Wiejska 45A street, 5-35 Bialystok, Poland phone: (48 85) , fax: (48 85) palex@it.org.by ABSTRACT This paper presents a new method for the speech signal decomposition into periodic and aperiodic components. Proposed method is based on the Discrete Harmonic Transform (DHT). This transformation is able to analyze the signal spectrum in the harmonic domain. Another feature of the DHT is its ability to synchronize the transformation kernel with the time-varying pitch frequency. The system works without a priori knowledge about the pitch track. Unlike the most applications proposed method estimates the fundamental frequency changes within a frame before estimating fundamental the frequency itself. Periodic component is modelled as a sum of harmonically related sinusoids and for accurate estimation of the amplitudes and initial phases DHT is used. Aperiodic component is defined as a difference between the original speech and the estimated periodic component.. ITRODUCTIO Speech signal is generally assumed as a composition of two major components: periodic (harmonic) and aperiodic (noise). The problem of speech decomposition into its two basic components is the major challenge in many speech processing systems. In general this task lays in accurate estimation of the periodic and aperiodic components thus they can be analyzed separately which plays important role in many speech applications such as synthesis or coding. Periodic component is generated by the vibrations of vocal folds while aperiodic component is generated by the modulation of the air flow. Modulated air flow is responsible for generation fricative or plosive sounds but it also present in the voiced sounds as well. The basic speech production model assumes that the speech is either voiced or unvoiced. Unvoiced part of speech in this basic model is generated by passing a white gaussian noise signal through a linear filter, which represents the vocal track characteristics. Voiced parts of speech are modelled as a time-varying impulse train modulated by the vocal track filter. In this model it is assumed, that no noise signal is present in the voiced parts of speech. In fact, real voiced speech consists of some noise. The speech signal can be viewed as a mixed-source signal with both periodic and aperiodic excitation. In the sinusoidal and noise speech models this mixed-source speech signal is generally modelled as []: K s( = = A k k ( cosϕ k ( r(, () where A k is the instantaneous amplitude of k-th harmonic, K is the number of harmonics present in speech signal, r( is the noise component, φ k is the instantaneous phase of k-th harmonic defined as: n πf k ( i) ϕ k ( = ϕ (0) i = 0 k, () Fs where f k is the instantaneous frequency of the k-th harmonic, F s is the sampling frequency and φ k (0) is the initial phase of the k-th harmonic. Sinusoidal speech modelling treats the speech signal as a sum of periodic and aperiodic components where periodic signal defined as sum of sinusoids with a time-varying amplitudes and frequencies. If f k obey: f k = kf 0, where f 0 is the fundamental frequency, sinusoids in the model are harmonically related and thus the model is called Harmonicoise. There are several variations of the sinusoidal speech modelling [,]. Sinusoidal speech model presented by McAulay and Quatieri [3] and further developed by George and Smith [4] assumes the voiced speech as a sum of harmonically related sinusoids with the amplitudes and phases obtained directly from the Short-Time Fourier Transform (STFT) spectrum. Unvoiced speech is modelled as a sum of randomly distributed sinusoids with the random initial phase. Stylianou presented more accurate approach to the voiced speech modelling based on the harmonicnoise model [5]. In this approach the maximum voicing frequency is determined on the basis of the speech spectra analysis. The speech band is divided into the lower-voiced and the higher-unvoiced bands by the maximum voicing frequency. In the Multiband Excitation Vocoder (MBE) presented by Griffin and Lim [6] the speech spectrum is divided into a set of bands with a respect to the pitch frequency. Each band is analysed and the binary voiced/unvoiced decision is taken. Voiced bands are modelled as a sinusoids and unvoiced as a band-limited noise. Periodic and aperiodic speech decomposition in the methods discussed above involves a binary voiced/unvoiced 007 EURASIP 336
2 5th European Signal Processing Conference (EUSIPCO 007), Poznan, Poland, September 3-7, 007, copyright by EURASIP decision which is not valid from the speech production point of view. Yegnanarayana et. all [7] proposed a speech decomposition method which considers the voiced and the noise components to be present in the whole speech band. Idea of the work is to use an iterative algorithm based on the Discrete Fourier Transform (DFT)/Inverse Discrete Fourier Transform (IDFT) pairs for the noise component estimation. Another method of decomposition which uses the Pitch Scaled Harmonic Filter (PSHF) is presented by Jackson and Shadle [8]. The speech signal is windowed and the window length is chosen with respect to the knowledge of the pitch frequency thus the segment taken to the analysis contains integer multiple of the pitch cycles. Pitch-scaled frame length enables the pitch harmonics to be aligned with the frequency bins of the STFT and thus minimises the leakage, but complicates the windowing process. PSHF algorithm performs a decomposition in the frequency domain by selecting only these STFT bins which are aligned with the pitch harmonics. The most often assumption that is made to the speech signal is its local stationarity i.e. it is assumed that the parameters of the pitch harmonics are slowly-varying and locally these variations can be omitted. While dealing with the real speech signal these variations can decrease the quality of the speech components separation, especially if STFT is used as a spectral analysis tool. The accuracy of the speech decomposition can be improved if the speech signal nonstationarity is taken into account. In this paper we propose a new periodic-aperiodic decomposition method which assumes periodic and aperiodic components to be present in the whole speech band, which is similar to the approach presented in [9]. The motivation of our approach to speech separation problem was to develop a system which is able to accurately separate the speech components with taking into account the nonstationary speech nature and without a priori knowledge of the pitch frequency track. In our system we use the speech model defined by (). Basic concept of our method lays in the analysis of speech spectrum in the harmonic domain rather than the frequency domain in order to provide accurate estimation of the model parameters. For our purposes we have adopted the Harmonic Transform (HT) idea proposed by Zhang et. All [0]. The HT is a spectral analysis tool able to analyse the harmonic signal with the time-varying frequency and produce the pulse-train spectrum in the harmonic domain. First step of designed system is an estimation of the optimal speech fundamental frequency change on the frame-by-frame basis with usage of the HT. Once the optimal change of the pitch track is found the fundamental frequency is estimated using the analysis of the harmonic domain spectrum. On this basis the periodic component is estimated by selecting HT local maxima corresponding to the pitch harmonics. Aperiodic component is defined as a difference between the input speech and the estimated periodic component. The paper is organized as follows. In section we discus the Harmonic Transform and define the speech model used in our system. In section 3 the optimal pitch track estimation method is presented. Finally in section 4 we present a decomposition scheme. Some experimental results are given in section 5.. DISCRETE HARMOIC TRASFORM The most speech analysis applications based on sinusoidal speech modelling use the STFT spectrum for estimation of the harmonics parameters with the assumption of the speech local stationarity, i.e. the fundamental frequency is constant within the analysis frame. This is often coarse assumption in the case of the real speech signals. In fact fundamental frequency varies in time and thus only several first harmonic are distinguishable in the DFT spectrum (fig.). Figure Harmonic Transform: harmonic signal with 6 harmonics and the fundamental frequency changing from 00Hz to 0Hz (top), DFT (middle) and DHT (bottom) of this signal. This fact decreases STFT performance in the harmonic parameters estimation process. The basic concept of the harmonic domain spectral analysis is to provide the analysis along instantaneous harmonics frequencies rather than fixed frequencies like in the STFT. There are two main strategies which are possible. One strategy is to provide the timewarping of the input signal in order to convert time-varying frequency into the constant one and then use the STFT. The second one is to use the spectral analysis tool which transforms input signal directly into the harmonic domain. Zhang et. all [0] proposed the Harmonic Transform (HT) which is the transformation with built-in time-warping function. The HT of signal s(t) is defined as: jωφ t S = s t t e u ( ) ( ω ) ( ) φ ( dt, (3) φu ( t) u ) where φ u (t) is the unit phase function which is the phase of the fundamental divided by its instantaneous frequency [0] and φ u (t) is first order derivative of φ u (t). Inverse Harmonic Transform is defined as: jωφ s t = Sφ ω e u ( t) ( ) t dω π u ( ) ( ). (4) 007 EURASIP 337
3 5th European Signal Processing Conference (EUSIPCO 007), Poznan, Poland, September 3-7, 007, copyright by EURASIP In the real speech fundamental frequency is slowly time varying i.e. it cannot change rapidly in a short time period. On this basis in our approach we assume a linear frequency change within given speech segment. Instantaneous phase φ(t) of a sinusoid with linear change of the frequency is defined by known formula (for simplicity initial phase is omitted): εt ϕ ( t) = π f0t, (5) where f 0 is the initial frequency and ε=(δf 0 /T) is the fundamental frequency change divided by length of the segment (i.e. time in which this the frequency change occurs). Considering the discrete-time signals and the segment length of samples (T=/F s ) this formula can be written as: f0n Δf0n ϕ ( = π. (6) Fs F s Initial fundamental frequency within a given segment can be written as: f0 = f c afc, a = Δf0 fc, (7) where f c is the central fundamental frequency within a given segment of the length. Substituting f 0 and Δf 0 in (6) with (7) we get: πf ( c a an ϕ = α a (, α a ( = n. (8) F s ow, let us consider the Discrete Harmonic Transform for signals with linear changing fundamental frequency. Frequencies of the spectral lines of the Discrete Fourier Transform are defined as: Fs fc =. (9) In the HT central frequencies of the spectral lines are aligned with the frequencies of DFT spectral lines. Using (9) in (8) we get: π ϕ ( = α a (. (0) Finally we can define the Short Time Discrete Harmonic Transform (STHT) for signals with linear frequency change: n= 0 j πk α ( = s( α ( e, () where α ( is defined as: a an α ( n ) =. () Inverse STHT is defined as: = j πk α ( s( e. (3) k = 0 Example of the STFT spectrum and the STHT spectrum of a test signal is shown on fig. The input harmonic signal consists of 6 harmonics, the fundamental frequency changes linearly from 00Hz to 0Hz within a segment of length 56 samples (Fs=8000Hz). ote, that only few first harmonics in the STFT spectrum can be distinguished while in the STHT spectrum all of the harmonic are visible. The second example is an example of comparison of the spectrograms of the speech signal processed by the STFT and the STHT is shown in fig.. Figure example spectrograms of the speech signal using the STFT (top) and the STHT (bottom). 3. PITCH TRACK ESTIMATIO The pair of transforms given by () and (3) allow to analyze the harmonic signals in the harmonic domain in case when the fundamental frequency track is known. In case of speech both the central fundamental frequency and its change are unknown. Block diagram of the pitch detection algorithm is shown in fig. 3. Proposed algorithm starts from searching the fundamental frequency change by examining the STHT spectrum for a different unit phase functions () i.e. unit phase functions with a different a parameter. Optimal a parameter value is defined as the value which minimises the Spectral Flatness Measure: STHT ( a, k = 0 arg min SFM ( a) =, (4) a STHT ( a, k = 0 where STHT(a, is the harmonic spectrum of a given speech segment for a given a and. denotes absolute value. The minimal spectral flatness value indicates the highest concentration, which in case of our algorithm means an optimal fit of the signal and the STHT kernel. This also means, that the optimal speech fundamental frequency change is found for a given speech segment. Once this is done, the pitch frequency is estimated. First step of this algorithm is the determination of the pitch frequency harmonics candidates f i by peak picking of the STHT spectra based on the algorithm proposed in []. Pitch harmonics candidates with the central frequency located between EURASIP 338
4 5th European Signal Processing Conference (EUSIPCO 007), Poznan, Poland, September 3-7, 007, copyright by EURASIP and 450Hz are considered as the pitch candidates. For each pitch candidate the algorithm tries to find its harmonics. In the case of inability to find three of the first four harmonics the candidate is discarded. In order to prevent pitch doubling or halving following factor is computed for each harmonic: nhmax n = h max n a r =, n where a n is an amplitude of the n-th harmonic of pitch, n hmax is the number of all possible harmonics for a given pitch candidate. This formula can be viewed as a mean energy of the harmonic signal for the particular pitch per a single harmonic multiplied by the energy carried by the signal. This formula prevents from pitch halving, while the mean energy per a harmonic is smaller for halved pitch candidates from one side and from the other side energy of the harmonic signal is higher for lower pitch candidates which prevents from pitch doubling. As a pitch for a given frame the pitch candidate is selected with the greatest r factor. Finally, the pitch value is refined using following formula: nh max fn n = n fr =, nh max where f n is the frequency of nth harmonic candidate. Figure 3 Pitch detection algorithm Described procedure estimates the central pitch frequency for one frame. Further prevention of the pitch halving or doubling is provided by usage of the tracking buffer which stores the fundamental frequency estimates from a several consecutive frames. The final pitch estimation is done for the frame in the middle of the tracking buffer, thus the resulting pitch estimation is done witch a several frames delay. In our system we used the buffer length of 5. As a tracking algorithm we use the median filtering which we found simple and robust against grose pitch errors. 4. PERIODIC-APERIODIC DECOMPOSITIO Speech decomposition in our system is performed in a time domain. First, the periodic component is estimated and the aperiodic component is defined as a difference between the input speech signal and the estimated periodic component. Figure 4 Example of the speech decomposition: original speech (top), estimated periodic (middle) and aperiodic (bottom) components On the basis of the speech model discussed in section periodic component is defined as: K h( = = A ( ) k k cos kϕ ( ϕ k (0), (5) where A k is the amplitude of the k-th harmonic, φ( is the instantaneous phase of the k-th harmonic defined in (8) with the central frequency f c defined by the pitch frequency, φ k (0) is the initial phase of the k-th harmonic. Unfortunately pitch harmonics are not aligned with the spectral lines and thus they cannot be directly estimated from the STHT spectrum. One possible solution for this problem is an interpolation of the adjacent STHT coefficients. In our system we propose more accurate solution to find the harmonics amplitudes and phases. In order to provide the spectral analysis exactly at the frequencies aligned witch the pitch harmonic we use the same formula (8) as we used in (5). By doing it we get the special case of HT which we have used in our previous work []. The DHT variant aligned with the pitch is defined as: = n= 0 πkf j r α ( F s s( α ( e, where f r is the refined pitch frequency and k=..k, K is the number of the harmonics of the pitch. Amplitudes and phases of the harmonics can be computed directly from h) coefficients: A k = Re Im Im ϕ k (0) = arctan, Re where Re and Im stands for the real and the imaginary parts of respectively. The periodic component is generated using formula (5) and the aperiodic component is defined as: r( = s( h(. Example of the speech decomposition is given in fig EURASIP 339
5 5th European Signal Processing Conference (EUSIPCO 007), Poznan, Poland, September 3-7, 007, copyright by EURASIP 5. EXPERIMETAL RESULTS In order to verify the proposed decomposition algorithm we performed set of experiments on a synthetic speech-like signals. The testing procedure was as follows: two sets of synthetic speech were prepared, one for male (central frequency 0Hz) and one for female (central frequency 00Hz). In order to verify the Short Time Harmonic Transform performance different fundamental frequency changes were used in both sets. The fundamental frequency change parameters were chosen randomly within a given boundaries which were chosen in order not to exceed 30% of the central fundamental frequency within a test frame. We have tested our algorithm for several Harmonic to oise Ratios (HR) by adding a noise signal with different energy to the input signal. Results of the experiment is shown in table. Central Pitch Frequency HR [db] Measured HR [db] Estimated periodic component SR [db] 0 59,6 59, , 33, ,6 0, ,7 6, 0 0,05, ,3 68, ,3 38, ,6, ,54 6,3 00 0,06, Table Results of experiments In the table the HR column is the original HR ratio of the input signal. After periodic and aperiodic component estimation HR parameter was measured. Mean value of this measure is shown in the column Measured HR. Finally, the quality of estimated periodic component was tested by measuring its SR, which is defined as the estimated periodic component energy to the error signal energy ratio. Error signal is defined as a difference between the original and estimated periodic components. 6. COCLUSIOS In this paper we proposed new speech decomposition scheme based on Harmonic Transform. Four our purposes we have developed two variants of the Short Time Discrete Harmonic Transform in the case of linear frequency change within analysis frame. First variant allows for the spectral analysis in the harmonic domain and has the ability to synchronize its kernel with the input signal. Second variant of the transformation allows for accurate estimation of the pitch harmonics amplitudes and frequencies because the spectral lines in this variant are aligned with the pitch frequency. There are two main advantages of using the STHT compared to the conventional spectral analysis using the STFT. First is the ability to estimate the fundamental frequency change without a knowledge of the fundamental frequency itself. Second one is preventing spectrum smearing especially for higher order harmonics which is important if the spectral domain fundamental frequency estimation algorithm is used. This feature allows the algorithm to be more robust in the cases of highly intonated speech segments and transient speech segments as well. Experiments prove robustness of the proposed approach. 7. ACKOWLEDGEMETS This work was supported by Bialystok Technical University under the grant W/WI//05. REFERECES [] A.M. Kondoz, Digital speech: coding for low bit rate communication systems, ew York: John Wiley & Sons, 996. [] A.S. Spanias, Speech coding: a tutorial review, Proc. IEEE, vol. 8, no. 0, pp , 994. [3] R.J McAulay, T.F. Quatieri, Sinusoidal Coding in Speech Coding and Synthesis (W. Klein and K. Palival, eds.), Amsterdam: Elsevier Science Publishers, 995. [4] E.B. George, M.J.T. Smith, Speech Analysis/Synthesis and Modification Using an Analysis-by-Synthesis/Overlap- Add Sinusoidal Model, IEEE Trans. on Speech and Audio Processing, vol. 5, no. 5, pp , 997. [5] Y. Stylianou, Applying the Harmonic Plus oise Mode in Concatenative Speech Synthesis, IEEE Trans. on Speech and Audio Processing, vol. 9, no., 00. [6] D.W. Griffin, J.S. Lim, Multiband Excitation Vocoder, IEEE Trans. on Acoust., Speech and Signal Processing, vol. ASSP-36, pp. 3-35, 988. [7] B. Yegnanarayana, C. d Alessandro, V. Darsions, An Iterative Algorithm for Decomposiiton of Speech Signals into Voiced and oise Components, IEEE Trans. on Speech and Audio Coding, vol. 6, no., pp. -, 998. [8] P.J.B. Jackson, C.H. Shadle, Pitch-Scaled Estimation of Simultaneous Voiced and Turbulence-oise Components in Speech, IEEE Trans. on Speech and Audio Processing, vol. 9, no. 7, pp , Oct. 00 [9] X. Serra, Musical Sound Modeling with Sinusoids plus oise in Musical Signal Processing (C. Roads, S. Pope, A. Picialli, and G. De Poli eds.), Swets & Zeitlinger Publishers, 997, pp. 9- [0] F. Zhang, G. Bi, Y.Q. Chen, Harmonic Transform, IEEE Trans. on Vis. Image Signal Processing, vol. 5, o. 4, pp , Aug [] V.Sercov, A.Petrovsky, The method of pitch frequency detection on the base of tuning to its harmonics, in Proc. of the 9 th European Signal processing conference, EUSIPCO 98, vol.ii, Sep. 8-, 998, Rhodes, Greece. - pp [] V. Sercov, A. Petrovsky, An Improved Speech Model with Allowance for Time-Varying Pitch Harmonic Amplitudes and Frequencies in Low Bit-Rate MBE Coders, in Proc. of the 6ht European Сonf. on Speech Communication and Technology EUROSPEECH 99, Budapest, Hungary, 999, pp EURASIP 340
L19: Prosodic modification of speech
L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture
More informationNon-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment
Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,
More informationSINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum
SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor
More informationEnhanced Waveform Interpolative Coding at 4 kbps
Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationDECOMPOSITION OF SPEECH INTO VOICED AND UNVOICED COMPONENTS BASED ON A KALMAN FILTERBANK
DECOMPOSITIO OF SPEECH ITO VOICED AD UVOICED COMPOETS BASED O A KALMA FILTERBAK Mark Thomson, Simon Boland, Michael Smithers 3, Mike Wu & Julien Epps Motorola Labs, Botany, SW 09 Cross Avaya R & D, orth
More informationTimbral Distortion in Inverse FFT Synthesis
Timbral Distortion in Inverse FFT Synthesis Mark Zadel Introduction Inverse FFT synthesis (FFT ) is a computationally efficient technique for performing additive synthesis []. Instead of summing partials
More informationHungarian Speech Synthesis Using a Phase Exact HNM Approach
Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University
More informationSignal Characterization in terms of Sinusoidal and Non-Sinusoidal Components
Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components Geoffroy Peeters, avier Rodet To cite this version: Geoffroy Peeters, avier Rodet. Signal Characterization in terms of Sinusoidal
More informationFREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION. Jean Laroche
Proc. of the 6 th Int. Conference on Digital Audio Effects (DAFx-3), London, UK, September 8-11, 23 FREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION Jean Laroche Creative Advanced Technology
More informationSound Synthesis Methods
Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like
More informationComplex Sounds. Reading: Yost Ch. 4
Complex Sounds Reading: Yost Ch. 4 Natural Sounds Most sounds in our everyday lives are not simple sinusoidal sounds, but are complex sounds, consisting of a sum of many sinusoids. The amplitude and frequency
More informationVIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering
VIBRATO DETECTING ALGORITHM IN REAL TIME Minhao Zhang, Xinzhao Liu University of Rochester Department of Electrical and Computer Engineering ABSTRACT Vibrato is a fundamental expressive attribute in music,
More informationMUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting
MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting Julius O. Smith III (jos@ccrma.stanford.edu) Center for Computer Research in Music and Acoustics (CCRMA)
More informationDigital Signal Processing
COMP ENG 4TL4: Digital Signal Processing Notes for Lecture #27 Tuesday, November 11, 23 6. SPECTRAL ANALYSIS AND ESTIMATION 6.1 Introduction to Spectral Analysis and Estimation The discrete-time Fourier
More informationLecture 9: Time & Pitch Scaling
ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 9: Time & Pitch Scaling 1. Time Scale Modification (TSM) 2. Time-Domain Approaches 3. The Phase Vocoder 4. Sinusoidal Approach Dan Ellis Dept. Electrical Engineering,
More informationIdentification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound
Identification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound Paul Masri, Prof. Andrew Bateman Digital Music Research Group, University of Bristol 1.4
More informationInternational Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha
More informationThe Partly Preserved Natural Phases in the Concatenative Speech Synthesis Based on the Harmonic/Noise Approach
The Partly Preserved Natural Phases in the Concatenative Speech Synthesis Based on the Harmonic/Noise Approach ZBYNĚ K TYCHTL Department of Cybernetics University of West Bohemia Univerzitní 8, 306 14
More informationDetermination of instants of significant excitation in speech using Hilbert envelope and group delay function
Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,
More informationConverting Speaking Voice into Singing Voice
Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech
More informationA Full-Band Adaptive Harmonic Representation of Speech
A Full-Band Adaptive Harmonic Representation of Speech Gilles Degottex and Yannis Stylianou {degottex,yannis}@csd.uoc.gr University of Crete - FORTH - Swiss National Science Foundation G. Degottex & Y.
More informationEpoch Extraction From Emotional Speech
Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract
More informationSinusoidal Modelling in Speech Synthesis, A Survey.
Sinusoidal Modelling in Speech Synthesis, A Survey. A.S. Visagie, J.A. du Preez Dept. of Electrical and Electronic Engineering University of Stellenbosch, 7600, Stellenbosch avisagie@dsp.sun.ac.za, dupreez@dsp.sun.ac.za
More informationSPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION
M.Tech. Credit Seminar Report, Electronic Systems Group, EE Dept, IIT Bombay, submitted November 04 SPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION G. Gidda Reddy (Roll no. 04307046)
More informationPitch Period of Speech Signals Preface, Determination and Transformation
Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com
More informationCOMBINING ADVANCED SINUSOIDAL AND WAVEFORM MATCHING MODELS FOR PARAMETRIC AUDIO/SPEECH CODING
17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 COMBINING ADVANCED SINUSOIDAL AND WAVEFORM MATCHING MODELS FOR PARAMETRIC AUDIO/SPEECH CODING Alexey Petrovsky
More informationADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL
ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL José R. Beltrán and Fernando Beltrán Department of Electronic Engineering and Communications University of
More informationAspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta
Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied
More informationIMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey
Workshop on Spoken Language Processing - 2003, TIFR, Mumbai, India, January 9-11, 2003 149 IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES P. K. Lehana and P. C. Pandey Department of Electrical
More informationAhoTransf: A tool for Multiband Excitation based speech analysis and modification
AhoTransf: A tool for Multiband Excitation based speech analysis and modification Ibon Saratxaga, Inmaculada Hernáez, Eva avas, Iñai Sainz, Ier Luengo, Jon Sánchez, Igor Odriozola, Daniel Erro Aholab -
More informationFFT analysis in practice
FFT analysis in practice Perception & Multimedia Computing Lecture 13 Rebecca Fiebrink Lecturer, Department of Computing Goldsmiths, University of London 1 Last Week Review of complex numbers: rectangular
More informationA Parametric Model for Spectral Sound Synthesis of Musical Sounds
A Parametric Model for Spectral Sound Synthesis of Musical Sounds Cornelia Kreutzer University of Limerick ECE Department Limerick, Ireland cornelia.kreutzer@ul.ie Jacqueline Walker University of Limerick
More informationHIGH ACCURACY FRAME-BY-FRAME NON-STATIONARY SINUSOIDAL MODELLING
HIGH ACCURACY FRAME-BY-FRAME NON-STATIONARY SINUSOIDAL MODELLING Jeremy J. Wells, Damian T. Murphy Audio Lab, Intelligent Systems Group, Department of Electronics University of York, YO10 5DD, UK {jjw100
More informationIMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR
IMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR Tomasz Żernici, Mare Domańsi, Poznań University of Technology, Chair of Multimedia Telecommunications and Microelectronics, Polana 3, 6-965, Poznań,
More informationReading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.
L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are
More informationAudio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands
Audio Engineering Society Convention Paper Presented at the th Convention May 5 Amsterdam, The Netherlands This convention paper has been reproduced from the author's advance manuscript, without editing,
More informationSingle Channel Speaker Segregation using Sinusoidal Residual Modeling
NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology
More informationPhase estimation in speech enhancement unimportant, important, or impossible?
IEEE 7-th Convention of Electrical and Electronics Engineers in Israel Phase estimation in speech enhancement unimportant, important, or impossible? Timo Gerkmann, Martin Krawczyk, and Robert Rehr Speech
More informationYOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION
American Journal of Engineering and Technology Research Vol. 3, No., 03 YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION Yinan Kong Department of Electronic Engineering, Macquarie University
More informationPitch-Scaled Estimation of Simultaneous Voiced and Turbulence-Noise Components in Speech
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 7, OCTOBER 2001 713 Pitch-Scaled Estimation of Simultaneous Voiced and Turbulence-Noise Components in Speech Philip J. B. Jackson, Member,
More informationImpact Noise Suppression Using Spectral Phase Estimation
Proceedings of APSIPA Annual Summit and Conference 2015 16-19 December 2015 Impact oise Suppression Using Spectral Phase Estimation Kohei FUJIKURA, Arata KAWAMURA, and Youji IIGUI Graduate School of Engineering
More informationON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1
ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN SPEECH SIGNALS Zied Mnasri 1, Hamid Amiri 1 1 Electrical engineering dept, National School of Engineering in Tunis, University Tunis El
More informationApplication of velvet noise and its variants for synthetic speech and singing (Revised and extended version with appendices)
Application of velvet noise and its variants for synthetic speech and singing (Revised and extended version with appendices) (Compiled: 1:3 A.M., February, 18) Hideki Kawahara 1,a) Abstract: The Velvet
More informationBiomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar
Biomedical Signals Signals and Images in Medicine Dr Nabeel Anwar Noise Removal: Time Domain Techniques 1. Synchronized Averaging (covered in lecture 1) 2. Moving Average Filters (today s topic) 3. Derivative
More informationHIGH ACCURACY AND OCTAVE ERROR IMMUNE PITCH DETECTION ALGORITHMS
ARCHIVES OF ACOUSTICS 29, 1, 1 21 (2004) HIGH ACCURACY AND OCTAVE ERROR IMMUNE PITCH DETECTION ALGORITHMS M. DZIUBIŃSKI and B. KOSTEK Multimedia Systems Department Gdańsk University of Technology Narutowicza
More informationADDITIVE synthesis [1] is the original spectrum modeling
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 851 Perceptual Long-Term Variable-Rate Sinusoidal Modeling of Speech Laurent Girin, Member, IEEE, Mohammad Firouzmand,
More informationFOURIER analysis is a well-known method for nonparametric
386 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 54, NO. 1, FEBRUARY 2005 Resonator-Based Nonparametric Identification of Linear Systems László Sujbert, Member, IEEE, Gábor Péceli, Fellow,
More informationAdvanced audio analysis. Martin Gasser
Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high
More informationQuantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation
Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University
More informationFinal Exam Practice Questions for Music 421, with Solutions
Final Exam Practice Questions for Music 4, with Solutions Elementary Fourier Relationships. For the window w = [/,,/ ], what is (a) the dc magnitude of the window transform? + (b) the magnitude at half
More informationProject 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing
Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You
More informationReal-time fundamental frequency estimation by least-square fitting. IEEE Transactions on Speech and Audio Processing, 1997, v. 5 n. 2, p.
Title Real-time fundamental frequency estimation by least-square fitting Author(s) Choi, AKO Citation IEEE Transactions on Speech and Audio Processing, 1997, v. 5 n. 2, p. 201-205 Issued Date 1997 URL
More informationIntroduction of Audio and Music
1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,
More informationSynchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech
INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,
More informationT a large number of applications, and as a result has
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. 36, NO. 8, AUGUST 1988 1223 Multiband Excitation Vocoder DANIEL W. GRIFFIN AND JAE S. LIM, FELLOW, IEEE AbstractIn this paper, we present
More informationSub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech
Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Vikram Ramesh Lakkavalli, K V Vijay Girish, A G Ramakrishnan Medical Intelligence and Language Engineering (MILE) Laboratory
More informationA NEW APPROACH TO TRANSIENT PROCESSING IN THE PHASE VOCODER. Axel Röbel. IRCAM, Analysis-Synthesis Team, France
A NEW APPROACH TO TRANSIENT PROCESSING IN THE PHASE VOCODER Axel Röbel IRCAM, Analysis-Synthesis Team, France Axel.Roebel@ircam.fr ABSTRACT In this paper we propose a new method to reduce phase vocoder
More informationChapter 4 SPEECH ENHANCEMENT
44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or
More informationSPEECH AND SPECTRAL ANALYSIS
SPEECH AND SPECTRAL ANALYSIS 1 Sound waves: production in general: acoustic interference vibration (carried by some propagation medium) variations in air pressure speech: actions of the articulatory organs
More informationCarrier Frequency Offset Estimation in WCDMA Systems Using a Modified FFT-Based Algorithm
Carrier Frequency Offset Estimation in WCDMA Systems Using a Modified FFT-Based Algorithm Seare H. Rezenom and Anthony D. Broadhurst, Member, IEEE Abstract-- Wideband Code Division Multiple Access (WCDMA)
More informationAN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH
AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH A. Stráník, R. Čmejla Department of Circuit Theory, Faculty of Electrical Engineering, CTU in Prague Abstract Acoustic
More informationHIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH. George P. Kafentzis and Yannis Stylianou
HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH George P. Kafentzis and Yannis Stylianou Multimedia Informatics Lab Department of Computer Science University of Crete, Greece ABSTRACT In this paper,
More informationCOMP 546, Winter 2017 lecture 20 - sound 2
Today we will examine two types of sounds that are of great interest: music and speech. We will see how a frequency domain analysis is fundamental to both. Musical sounds Let s begin by briefly considering
More informationMeasurement of RMS values of non-coherently sampled signals. Martin Novotny 1, Milos Sedlacek 2
Measurement of values of non-coherently sampled signals Martin ovotny, Milos Sedlacek, Czech Technical University in Prague, Faculty of Electrical Engineering, Dept. of Measurement Technicka, CZ-667 Prague,
More informationINTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006
1. Resonators and Filters INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006 Different vibrating objects are tuned to specific frequencies; these frequencies at which a particular
More informationFormant Synthesis of Haegeum: A Sound Analysis/Synthesis System using Cpestral Envelope
Formant Synthesis of Haegeum: A Sound Analysis/Synthesis System using Cpestral Envelope Myeongsu Kang School of Computer Engineering and Information Technology Ulsan, South Korea ilmareboy@ulsan.ac.kr
More informationVOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL
VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL Narsimh Kamath Vishweshwara Rao Preeti Rao NIT Karnataka EE Dept, IIT-Bombay EE Dept, IIT-Bombay narsimh@gmail.com vishu@ee.iitb.ac.in
More informationNOISE ESTIMATION IN A SINGLE CHANNEL
SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina
More informationADAPTIVE NOISE LEVEL ESTIMATION
Proc. of the 9 th Int. Conference on Digital Audio Effects (DAFx-6), Montreal, Canada, September 18-2, 26 ADAPTIVE NOISE LEVEL ESTIMATION Chunghsin Yeh Analysis/Synthesis team IRCAM/CNRS-STMS, Paris, France
More informationON BEDROSIAN CONDITION IN APPLICATION TO CHIRP SOUNDS
15th European Signal Processing Conference (EUSIPCO 7), Poznan, Poland, September 3-7, 7, copyright by EURASIP ON BEDROSIAN CONDIION IN APPLICAION O CHIRP SOUNDS E. HERMANOWICZ 1 ) ) and M. ROJEWSKI Faculty
More informationDiscrete Fourier Transform (DFT)
Amplitude Amplitude Discrete Fourier Transform (DFT) DFT transforms the time domain signal samples to the frequency domain components. DFT Signal Spectrum Time Frequency DFT is often used to do frequency
More informationIntroduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem
Introduction to Wavelet Transform Chapter 7 Instructor: Hossein Pourghassem Introduction Most of the signals in practice, are TIME-DOMAIN signals in their raw format. It means that measured signal is a
More informationPreeti Rao 2 nd CompMusicWorkshop, Istanbul 2012
Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o
More informationVoiced/nonvoiced detection based on robustness of voiced epochs
Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies
More informationSPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester
SPEECH TO SINGING SYNTHESIS SYSTEM Mingqing Yun, Yoon mo Yang, Yufei Zhang Department of Electrical and Computer Engineering University of Rochester ABSTRACT This paper describes a speech-to-singing synthesis
More informationDifferent Approaches of Spectral Subtraction Method for Speech Enhancement
ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches
More informationSpeech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure
More informationSpeech Compression Using Voice Excited Linear Predictive Coding
Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality
More informationReduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter
Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC
More informationspeech signal S(n). This involves a transformation of S(n) into another signal or a set of signals
16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract
More informationAM-FM demodulation using zero crossings and local peaks
AM-FM demodulation using zero crossings and local peaks K.V.S. Narayana and T.V. Sreenivas Department of Electrical Communication Engineering Indian Institute of Science, Bangalore, India 52 Phone: +9
More informationWavelet Speech Enhancement based on the Teager Energy Operator
Wavelet Speech Enhancement based on the Teager Energy Operator Mohammed Bahoura and Jean Rouat ERMETIS, DSA, Université du Québec à Chicoutimi, Chicoutimi, Québec, G7H 2B1, Canada. Abstract We propose
More informationFrequency Domain Representation of Signals
Frequency Domain Representation of Signals The Discrete Fourier Transform (DFT) of a sampled time domain waveform x n x 0, x 1,..., x 1 is a set of Fourier Coefficients whose samples are 1 n0 X k X0, X
More informationLinear Frequency Modulation (FM) Chirp Signal. Chirp Signal cont. CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis
Linear Frequency Modulation (FM) CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 26, 29 Till now we
More informationSAMPLING THEORY. Representing continuous signals with discrete numbers
SAMPLING THEORY Representing continuous signals with discrete numbers Roger B. Dannenberg Professor of Computer Science, Art, and Music Carnegie Mellon University ICM Week 3 Copyright 2002-2013 by Roger
More informationMETHODS FOR SEPARATION OF AMPLITUDE AND FREQUENCY MODULATION IN FOURIER TRANSFORMED SIGNALS
METHODS FOR SEPARATION OF AMPLITUDE AND FREQUENCY MODULATION IN FOURIER TRANSFORMED SIGNALS Jeremy J. Wells Audio Lab, Department of Electronics, University of York, YO10 5DD York, UK jjw100@ohm.york.ac.uk
More informationSpectrum. Additive Synthesis. Additive Synthesis Caveat. Music 270a: Modulation
Spectrum Music 7a: Modulation Tamara Smyth, trsmyth@ucsd.edu Department of Music, University of California, San Diego (UCSD) October 3, 7 When sinusoids of different frequencies are added together, the
More informationEstimation of Sinusoidally Modulated Signal Parameters Based on the Inverse Radon Transform
Estimation of Sinusoidally Modulated Signal Parameters Based on the Inverse Radon Transform Miloš Daković, Ljubiša Stanković Faculty of Electrical Engineering, University of Montenegro, Podgorica, Montenegro
More informationROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE
- @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu
More informationInstantaneous Higher Order Phase Derivatives
Digital Signal Processing 12, 416 428 (2002) doi:10.1006/dspr.2002.0456 Instantaneous Higher Order Phase Derivatives Douglas J. Nelson National Security Agency, Fort George G. Meade, Maryland 20755 E-mail:
More informationThe Channel Vocoder (analyzer):
Vocoders 1 The Channel Vocoder (analyzer): The channel vocoder employs a bank of bandpass filters, Each having a bandwidth between 100 Hz and 300 Hz. Typically, 16-20 linear phase FIR filter are used.
More informationChapter 7. Frequency-Domain Representations 语音信号的频域表征
Chapter 7 Frequency-Domain Representations 语音信号的频域表征 1 General Discrete-Time Model of Speech Production Voiced Speech: A V P(z)G(z)V(z)R(z) Unvoiced Speech: A N N(z)V(z)R(z) 2 DTFT and DFT of Speech The
More informationTIME FREQUENCY ANALYSIS OF TRANSIENT NVH PHENOMENA IN VEHICLES
TIME FREQUENCY ANALYSIS OF TRANSIENT NVH PHENOMENA IN VEHICLES K Becker 1, S J Walsh 2, J Niermann 3 1 Institute of Automotive Engineering, University of Applied Sciences Cologne, Germany 2 Dept. of Aeronautical
More informationTopic 2. Signal Processing Review. (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music)
Topic 2 Signal Processing Review (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music) Recording Sound Mechanical Vibration Pressure Waves Motion->Voltage Transducer
More informationCorrespondence. Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 3, MAY 1999 333 Correspondence Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm Sassan Ahmadi and Andreas
More informationSynthesis Algorithms and Validation
Chapter 5 Synthesis Algorithms and Validation An essential step in the study of pathological voices is re-synthesis; clear and immediate evidence of the success and accuracy of modeling efforts is provided
More informationTHE BEATING EQUALIZER AND ITS APPLICATION TO THE SYNTHESIS AND MODIFICATION OF PIANO TONES
J. Rauhala, The beating equalizer and its application to the synthesis and modification of piano tones, in Proceedings of the 1th International Conference on Digital Audio Effects, Bordeaux, France, 27,
More informationI-Hao Hsiao, Chun-Tang Chao*, and Chi-Jo Wang (2016). A HHT-Based Music Synthesizer. Intelligent Technologies and Engineering Systems, Lecture Notes
I-Hao Hsiao, Chun-Tang Chao*, and Chi-Jo Wang (2016). A HHT-Based Music Synthesizer. Intelligent Technologies and Engineering Systems, Lecture Notes in Electrical Engineering (LNEE), Vol.345, pp.523-528.
More informationStructure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping
Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics
More information