L19: Prosodic modification of speech

Similar documents
Speech Synthesis; Pitch Detection and Vocoders

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

Sound Synthesis Methods

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Linguistic Phonetics. Spectral Analysis

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

Synthesis Algorithms and Validation

Converting Speaking Voice into Singing Voice

Speech Synthesis using Mel-Cepstral Coefficient Feature

Sinusoidal Modelling in Speech Synthesis, A Survey.

SPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION

Digital Speech Processing and Coding

Digital Signal Processing

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

Pitch Period of Speech Signals Preface, Determination and Transformation

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Enhanced Waveform Interpolative Coding at 4 kbps

The Partly Preserved Natural Phases in the Concatenative Speech Synthesis Based on the Harmonic/Noise Approach

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Advanced audio analysis. Martin Gasser

Voice Conversion of Non-aligned Data using Unit Selection

Applying the Harmonic Plus Noise Model in Concatenative Speech Synthesis

Overview of Code Excited Linear Predictive Coder

SPEECH AND SPECTRAL ANALYSIS

EE482: Digital Signal Processing Applications

Lecture 6: Speech modeling and synthesis

Lecture 5: Speech modeling. The speech signal

Determination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech

Prosody Modification using Allpass Residual of Speech Signals

SOURCE-filter modeling of speech is based on exciting. Glottal Spectral Separation for Speech Synthesis

AhoTransf: A tool for Multiband Excitation based speech analysis and modification

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EE 225D LECTURE ON SPEECH SYNTHESIS. University of California Berkeley

ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION DARYUSH MEHTA

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting

Mel Spectrum Analysis of Speech Recognition using Single Microphone

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components

APPLICATIONS OF DSP OBJECTIVES

High-Pitch Formant Estimation by Exploiting Temporal Change of Pitch

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

ACCURATE SPEECH DECOMPOSITION INTO PERIODIC AND APERIODIC COMPONENTS BASED ON DISCRETE HARMONIC TRANSFORM

Lecture 9: Time & Pitch Scaling

Audio Signal Compression using DCT and LPC Techniques

USING A WHITE NOISE SOURCE TO CHARACTERIZE A GLOTTAL SOURCE WAVEFORM FOR IMPLEMENTATION IN A SPEECH SYNTHESIS SYSTEM

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech

NOTES FOR THE SYLLABLE-SIGNAL SYNTHESIS METHOD: TIPW

Page 0 of 23. MELP Vocoder

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Chapter 4 SPEECH ENHANCEMENT

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

COMP 546, Winter 2017 lecture 20 - sound 2

Speech Processing. Simon King University of Edinburgh. additional lecture slides for

Complex Sounds. Reading: Yost Ch. 4

651 Analysis of LSF frame selection in voice conversion

Speech Compression Using Voice Excited Linear Predictive Coding

FFT analysis in practice

WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels. Spectrogram. See Rogers chapter 7 8

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL

Lecture 5: Sinusoidal Modeling

DECOMPOSITION OF SPEECH INTO VOICED AND UNVOICED COMPONENTS BASED ON A KALMAN FILTERBANK

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

The Channel Vocoder (analyzer):

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

Epoch Extraction From Emotional Speech

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Linear Frequency Modulation (FM) Chirp Signal. Chirp Signal cont. CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis

FREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION. Jean Laroche

Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals

Synthesis Techniques. Juan P Bello

The Discrete Fourier Transform. Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido

Communications Theory and Engineering

TRANSFORMS / WAVELETS

Improved signal analysis and time-synchronous reconstruction in waveform interpolation coding

Auto Regressive Moving Average Model Base Speech Synthesis for Phoneme Transitions

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Chapter IV THEORY OF CELP CODING

Improving Sound Quality by Bandwidth Extension

Glottal source model selection for stationary singing-voice by low-band envelope matching

Lecture 5: Speech modeling

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

HMM-based Speech Synthesis Using an Acoustic Glottal Source Model

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL

A Comparative Performance of Various Speech Analysis-Synthesis Techniques

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE

Speech Signal Analysis

Spectrum. Additive Synthesis. Additive Synthesis Caveat. Music 270a: Modulation

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

Transcription:

L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture is based on [Taylor, 2009, ch. 14; Holmes, 2001, ch. 5; Moulines and Charpentier, 1990] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 1

Motivation Introduction As we saw in the previous lecture, concatenative synthesis with fixed inventory requires prosodic modification of the diphones to match specifications from the front end Simple modifications of the speech waveform do not produce the desired results We are familiar with speeding up or slowing down recordings, which changes not only the duration but also the pitch Likewise, over- or under-sampling alters duration, but also modifies the spectral envelope: formants become compressed/dilated The techniques proposed in this lecture perform prosodic modification of speech with minimum distortions Time-scale modification modifies the duration of the utterance without affecting pitch Pitch-scale modification seeks to modify the pitch of the utterance without affecting its duration Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 2

Pitch synchronous overlap add (PSOLA) Introduction PSOLA refers to a family of signal processing techniques that are used to perform time-scale and pitch-scale modification of speech These modifications are performed without performing any explicit source/filter separation The basis of all PSOLA techniques is Isolate pitch periods in the original signal Perform the required modification Resynthesize the final waveform through an overlap-add operation Time-domain TD-PSOLA is the most popular PSOLA technique and also the most popular of all time/pitch-scaling techniques Other variants of PSOLA include Linear-prediction LP-PSOLA Fourier-domain FD-PSOLA Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 3

Requirements Time-domain PSOLA TD-PSOLA works pith-synchronously, which means there is one analysis window per pitch period A prerequisite for this, therefore, is that we need to be able to identify the epochs in the speech signal For PSOLA, it is vital that epochs are determined with great accuracy Epochs may be the instants of glottal closure or any other instant as long as it lies in the same relative position for every frame The signal is the separated with a Hanning window, generally extending two pitch periods (one before, one after) These windowed frames can then be recombined by placing their centers at the original epoch positions and adding the overlapping regions Though the result is not exactly the same, the resulting speech waveform is perceptually indistinguishable from the original one For unvoiced segments, a default window length of 10ms is commonly used Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 4

Analysis and reconstruction (1) Original speech waveform with epochs (2) A Hanning window is placed at each epoch (3) Separate frames are created by the Hanning window, each centered at the point of maximum positive excursion (4) Overlap-add of the separate frames results in a perceptually identical waveform to the original Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU [Taylor, 2009] 5

Merging two segments [Holmes, 2001] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 6

Time-scale modification Lengthening is achieved by duplicating frames For a given set of frames, certain frames are duplicated, inserted back into the sequence, and then overlap-added The result is a longer speech waveform In general, listeners won t detect the operation, and will only perceive a longer segment of natural speech Shortening is achieved by removing frames For a given set of frames, certain frames are removed, and the remaining ones are overlap-added The result is a shorter speech waveform As before, listeners will only perceive a shorter segment of natural speech As a rule of thumb, time-scaling by up to a factor of two (twice longer or shorter) can be performed without much noticeable degradation Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 7

Time-scaling (lengthening) [Taylor, 2009] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 8

Pitch-scale modification Performed by recombining frames on epochs which are set at different distances apart from the original ones Assume a speech segment with a pitch of 100Hz (10 ms between epochs) As before, we perform pitch-synchronous analysis with a Hanning window If we place the windowed frames 9ms apart and overlap-add, we will obtain a signal with a pitch of 1/0.009 = 111Hz Conversely, if we place the frames 11ms apart, we will obtaine a signal with a pitch of 1/0.011 = 91Hz The process of pitch lowering explains why we need an analysis window that is two pitch periods long This ensures that up to a factor of 0.5, when we move the frames we always have some speech to add at the frame edges As with time-scaling, pitch-scaling by up to a factor of two can be performed without much noticeable degradation Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 9

Pitch-scaling [Holmes, 2001] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 10

Pitch-scaling (lowering) [Taylor, 2009] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 11

Epoch manipulation A critical step in TD-PSOLA is proper manipulation of epochs A sequence of analysis epochs T a = t 1 a, t 2 a t M a is found by means of an epoch detection algorithm From this sequence, the local pitch period can be found as p a m = t a a m+1 t m 1 2 Given the sequence of analysis epochs and pitch periods, we extract a sequence of analysis frames by windowing x a m n = w m n x n Next, a set of synthesis epochs T s = t 1 s, t 2 s t M s is created from the target F0 and timing values provided by the front end A mapping function M i is then created that specifies which analysis frames should be used with each synthesis epoch Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 12

Mapping function M i for time-scaling (slowing down) Dashed lines represent time-scale warping function between analysis and synthesis time axes corresponding to the desired time-scaling Dashed lines represent the resulting pitchmark mapping, in this case duplicating two analysis ST signals out of six. Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 13

[Stylianou, 2008, in Benesty et al., (Eds) ] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 14

Interactions Duration modification can be performed without reference to pitch Assume 5 frames of F0=100Hz speech spanning 40 ms A sequence with the same pitch but longer (shorter) duration can be achieved by adding (removing) synthesis epochs The mapping function M i specifies which analysis frame should be used for each synthesis frame Pitch modification is more complex as it interacts with duration Consider the same example of 100Hz and spanning 5 frames, or a total of 5 1 10 = 40ms between t 1 a and t 5 a Imagine we wish to change its pitch to 150 Hz This can be done by creating a set of synthesis epochs 6.6ms apart In doing so, the overall duration becomes 5 1 6.6 = 26ms To preserve the original duration, we would then have to duplicate two frames, yielding an overall duration of 7 1 6.6 = 40ms Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 15

Simultaneous time- and pitch-scaling [Taylor, 2009] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 16

Performance Synthesis quality with TD-PSOLA is extremely high, provided that The speech has been accurately epoch-marked (critical), and Modifications do not exceed a factor of two In terms of speed, it would be difficult to conceive an algorithm that would be faster than TD-PSOLA However, TD-PSOLA can only be used for time- and pitch-scaling, it does not allow any other form of modification (e.g., spectral) In addition, TD-PSOLA does not perform compression, and the entire waveform must be kept in memory This issue is addressed by a variant known as linear-prediction PSOLA Other issues When slowing down unvoiced portions in the range of 2, the regular repetition of unvoiced segments leads to a perceived tonal noise This can be addressed by reversing the time axis of consecutive frames Similar effects can also occur for voiced fricatives; in this case, though, time reversal does not solve the problem and FD-PSOLA is needed Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 17

Approach Linear prediction PSOLA Decompose the speech signal through an LP filter Process the residual in a manner similar to TD-PSOLA Convolve the time/pitch-scaled residual with the LP filter Advantages over TD-PSOLA Data compression Filter parameters can be compressed (e.g., reflection coefficients) The residual can also be compressed as a pulse train, though at the expense of lower synthesis quality Joint modification of pitch and of spectral envelope Independent time frames for spectral envelope estimation and for prosodic modification Fewer distortions, since LP-PSOLA operates on a spectrally flat residual rather than on the speech signal itself Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 18

Fourier-domain PSOLA FD-PSOLA also operates in three stages Analysis A complex ST spectrum is computed at the analysis pitch marks A ST spectral envelope is estimated, via LP analysis, homomorphic analysis or peak-picking algorithms (SEEVOC) A flattened version of the ST-spectrum is derived by dividing the ST complex spectrum by the spectral envelope Frequency modification Flattened spectrum is modified so the spacing between pitch harmonics is equal to the desired pitch This can be done using either (i) spectral compression-expansion, or (ii) harmonic elimination-repetition (see Moulines & Charpentier, 1990) Synthesis Multiply flattened spectrum and spectral envelope Obtain synthesis ST signal by inverse DFT Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 19

Pitch-scaling with FD-PSOLA [Felps and Gutierrez-Osuna, 2009] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 20

Performance FD-PSOLA solves a major limitation of TD-PSOLA: its inability to perform spectral modification These modifications may be used for several purposes Smoothing spectral envelopes across diphones in concatenative synthesis Changing voice characteristics (e.g., vocal tract length) Morphing across voices However, FD-PSOLA is very computationally intensive and has high memory requirements for storage Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 21

Introduction Sinusoidal models As we saw in earlier lectures, the Fourier series can be used to generate any periodic signal from a sum of sinusoids L x t = A l cos ω 0 l + φ l l=1 A family of techniques known as sinusoidal models use this as their basic building block to perform speech modification This is achieved by finding the sinusoidal components A l, ω 0, φ l, and then altering them to meet the prosodic targets In theory, we could perform Fourier analysis to find model parameters For several reasons, however, it is advantageous to use a different procedure that is more geared towards synthesis If the goal is to perform pitch-scaling, it is also advantageous to do the analysis in a pitch-synchronous fashion The accuracy of pitch marks, however, does not have to be as high as for PSOLA Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 22

Finding sinusoidal parameters Components A l, ω 0, φ l are found so as to minimize the error E n E = w n 2 s n s n 2 n L = w n 2 s n A l cos ω 0 l + φ l l=1 which requires a complex linear regression; see Quatieri (2002) Why use this analysis equation rather than Fourier analysis? First, the window function w n concentrates accuracy in the center Second, this analysis can be performed on relatively short frames Given these parameters, a ST-waveform can be reconstructed using L the synthesis equation x t = A l cos ω 0 l + φ l l=1 An entire waveform can then be reconstructed by overlapping ST segments just as with TD-PSOLA Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 23

Modification Modification is performed by separating harmonics and spectral envelope, but without explicit source/filter modeling This can be done in a number of ways, such as by peak-picking in the spectrum to determine the spectral envelope Once the envelope has been found, the harmonics can be moved in the frequency domain and new amplitudes found from the envelope Finally, the synthesis equation can be used to generate waveforms Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 24

Sinewave modeling results http://www.ee.columbia.edu/~dpwe/e6820/lectures/l05-speechmodels.pdf Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 25

Motivation Harmonic + noise models Sinusoidal modeling works quite well for perfectly periodic signals, but performance degrades in practice since speech is rarely periodic In addition, very little periodic source information is generally found at high frequencies, where the signal is significantly noisier This non-periodicity comes from several sources, including breath passing through the glottis and turbulences in the vocal tract [Taylor, 2009] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 26

Overview To address this issue, a stochastic component can be included L s t = s t p + s t r = A l cos ω 0 l + φ l + s t r l=1 where the noise component s t r is assumed to be Gaussian noise A number of models based on this principle have been proposed Multiband excitation (MBE) (Griffin and Lim, 1988) Harmonic + noise models (HNM) (Stylianou, 1998) Here we focus on HNM, as it was developed specifically for TTS Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 27

Harmonic + noise model (HNM) HNM follows the same principle of harmonic/stochastic models The main difference is it also considers the temporal patterns of noise As an example, the noise component in stops evolves rapidly, so a model with uniform noise across the frame will miss important details The noise part in HNM is modeled as s t r = e t h t, τ b t where b t is white Gaussian noise h t, τ is a spectral filter applied to the noise (generally all-pole), and e t is a function that gives filtered noise the correct temporal pattern [Dutoit, 2008, in Benesty et al., (Eds) ] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 28

Analysis steps First, classify frames as V/UV Estimate the pitch in order to perform pitch-synchronous (PS) analysis With HNM, however, there is no need for accurate epoch detection; the location of pitch periods suffices since phases are adjusted later on Using the estimated pitch, fit a harmonic model to each PS frame From the residual error, classify the frame as V/UV Approach: UV frames will have higher residual error than V frames For V frames, determine the highest harmonic frequency Approach: move through the frequency range and determine how well a synthetic model fits the real waveform Estimate model parameters Refine pitch estimate using only the part of the signal below the cutoff Find amplitudes and phases by minimizing the error E Find components h t and e t of the noise term Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 29

And finally, adjust phases Since the pitch synchronous analysis was done without reference to a fixed epoch, frames will not necessarily align To adjust the phase, a time domain technique is used to shift the relative positions of waveforms within their frames Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 30

Synthesis steps As in PSOLA, determine synthesis frames and mapping M i To perform time-scaling, proceed as with PSOLA To perform pitch-scaling Adjust the harmonics on each frame Generate noise component by passing WGN b t through the filter h t For V frames, high-pass-filter the noise above the cutoff to remove its low-frequency components Modulate the noise in the time domain to ensure synchrony with the harmonic component This step is essential so a single sound (rather than two) is perceived Finally, synthesize ST frame by a conventional overlap-add method Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 31

Overview STRAIGHT STRAIGHT is a high-quality vocoder that decomposes the speech signal into three terms A smooth spectrogram, free from periodicities in time and frequency An F0 contour, and A time-frequency periodicity map, which captures the spectral shape of the noise and also its temporal envelope [Hawahara, 2007 ] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 32

During analysis F0 is accurately estimated using a fixed-point algorithm This F0 estimate is used to smooth out periodicity in the ST spectrum using an F0-adaptive filter and a surface reconstruction method The result is a smooth spectrogram that captures vocal-tract and glottal filters, but is free from F0 influences During synthesis Pulses or noise with a flat spectrum are generated in accordance with voicing information and F0 Sounds are resynthesized from the smoothed spectrum and the pulse/noise component using an inverse FFT with an OLA technique Notes STRAIGHT does not extract phase information, instead uses a minimumphase assumption for the spectral envelope and applies all-pass filters in order to reduce buzz timbre Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 33

Conventional vs. STRAIGHT spectrogram [Hawahara, 2002 ] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 34

Performance Prosodic modification with STRAIGHT is very simple Time-scale modification reduces to duplicating/removing ST slices from the STRAIGHT spectrogram and aperiodicity Pitch-scale modification reduces to modifying the F0 contour Following these modifications, the STRAIGHT synthesis method can be invoked to synthesize the waveform The three terms in STRAIGHT can be manipulated independently, which provides maximum flexibility STRAIGHT allows extreme prosodic modifications (up to 600%) while maintaining the naturalness of the synthesized speech On the downside, STRAIGHT is computationally intensive Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 35