EE 225D LECTURE ON SPEECH SYNTHESIS. University of California Berkeley

Size: px

Start display at page:

Download "EE 225D LECTURE ON SPEECH SYNTHESIS. University of California Berkeley"

Stephanie Carr
5 years ago
Views:

1 University of California Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Professors : N.Morgan / B.Gold EE225D Speech Synthesis Spring,1999 Lecture 23 N.MORGAN / B.GOLD LECTURE

2 vocal tract parameters text graphemes to phonemes phoneme string phonemes to parameters excitation speech synthesizer synthetic speech major effort but we re not getting into this could be diphones (later) prosodics Synthesis by Rule We ll be listening to * Early synthesizer * Synthesis by rule researcher enters phoneme * Complete text-to-speech system Evolution Desche??has early synthesis was done (page4?) -Different configuration Connected with Vocoders N.MORGAN / B.GOLD LECTURE

3 { Synthesis by Rule Tape Number 1. Voder 2. Pattern Playback 3. PAT 4. OVE 5. PAT2 6. OVE II 7. OVE II (holmes) 8. Holmes II Synthesis 9. Klatt (Male Fem.) 10. Dectalk 11. Davo 12. Flanagan 13. Speach & Spell 14. Multi pulse LPC 15. Pattern Playback 16. Kelly Gerstman Figure No Fig. No Fig & get Fig from previous 11.3 N.MORGAN / B.GOLD LECTURE

4 Synthesizers can be - Channel vocoder, LPC or homomorphic - Serial formants [each formant is a two-pole network] - Parallel formants - Articulatay models - Oddball arrangement pattern playback N.MORGAN / B.GOLD LECTURE

5 Evolution * Researcher piches an utterance, creates a spectrogram. * Researcher has a synthesizer model at his/her disposal. * Researcher enter secuence of parmeter values into model. * Synthesizer Speaks and researcher adjusts sounds so utterance searches better, before this. We had the Voder where the instrument was played in real time by a skilled performer. N.MORGAN / B.GOLD LECTURE

6 Speech Synthesis diphone vocal tract information text graphemes to phonemes phoneme string phonemes to parameters excitation speech synthesizer speech (major effort) Early Synthesizers Synthesis by Rule Complete text-to speech. Synthesis by Rule prosodics? N.MORGAN / B.GOLD LECTURE

7 Channel Signal 1 BANDPASS FILTER 1 Excitation Signal (Buzz or Hiss) BANDPASS FILTER 2 Σ Synthesized Speech Channel Signal n BANDPASS FILTER n Figure 29.7: Channell vocoder synthesizer. N.MORGAN / B.GOLD LECTURE

8 Figure 29.8 : Light Collector, mirror, Tone wheel, Spectrogram etc. N.MORGAN / B.GOLD LECTURE

9 N.MORGAN / B.GOLD LECTURE

10 Channel Signal 1 Channel Signal 2 Formant 1 VARIABLE BANDPASS FILTER 1 Formant 2 Excitation VARIABLE BANDPASS FILTER 2 Synthesized Speech Channel Signal k Formant k VARIABLE BANDPASS FILTER k Figure 29.8: Parallel formant synthesizer. N.MORGAN / B.GOLD LECTURE

11 F O A O Pulse Generator Source Filter 1 - F H F 3 F 4 F 2 F 5 Vowel Network F 1 Source Filter 2 A H A N N O N 3 N 2 N 4 N 1 Nasal Network Σ Noise Generator Source Filter 3 A C K O K 1 K 2 Fricative and Plosive Network Figure 29.2: OVE II Speech Synthesizer of Gunnar Fant. Form [20] N.MORGAN / B.GOLD LECTURE

12 F M F 1 F 2 F 3 G lottal Wave Gen. 12dβ /octave dβ F M Individual Spectrum Snaping Filters for Formants F 0 dβ F 1 Pulse Duration A M dβ F 2-6dβ /octave A 1 A 2 dβ F 3 output A 3 A NF Voicing Noise Gen. Noise Smaping Filter dβ F 4 dβ Gain Controllers Smoothing Filters High Frequency BPF Noise Filter ( Hz) N.MORGAN / B.GOLD LECTURE

13 Max.Av. Basic Voicing Waveform TO at 2 bt 3 ab, = favoo (, TO) Low-Pass Resonator, FBW, = ftl ( ) O OO TO TO Breathiness Noise Source, AMP = f( OO) Time, T Figure 29.4: The Klatt Synthesizer. From [35]. (cont.) N.MORGAN / B.GOLD LECTURE

14 TO Voicing Source AV OO TL B1 B2 B3 FZ Vocal Tract transfer Function for Laryngeal Sources (Formant Resonators in Cascade) Aspiration Source AH F1 F2 F3 Radiation Characteristic Output Speech Frication Source AF Vocal Tract Transfer Function for Frication Sources (Formant Resonators in Parallel) A2 A3 A4 A5 A6 AB Figure 29.4: The Klatt Synthesizer. From [35]. N.MORGAN / B.GOLD LECTURE

15 To Nasal Circuit Noise Amplitude Noise Source Movable Point of Noise Insertion Buzz Amplitude Buzz Frequency Glottal Pulse Source One Section One Section One Section One Section Output Radiation Load Source of Section Area Control Voltages Figure 29.9 :DAVO (Dynamic analog of the vocal tract.) From [] N.MORGAN / B.GOLD LECTURE

16 Figure : Schematic of the vocal cord-vocal tract system. N.MORGAN / B.GOLD LECTURE

17 Figure : Circuit of an individual T-Section. N.MORGAN / B.GOLD LECTURE

18 xn ( ) yn ( ) 1 z a 1 a 2 a 3 z 1 z 1 z a p 1 a p (a) Direct-Form Digital Filter with Variable a Coefficients Input A p 1 A 1 A C Output (b) Acoustic Tube with Variable Area Functions Figure 29.5 : Two configurations for all pole synthesizers based on LPC analysis.(cont.) N.MORGAN / B.GOLD LECTURE

19 Input Stage P-1 Stage 1 Stage K p 1 K 1 K 0 Output z 1 z 1 - K p 1 K 1 K 0... z 1 z 1 Back (Glettis) (c) All-Pole lattice Network with Variable k Parameters Front (Lips) Figure 29.5 : Two configurations for all pole synthesizers based on LPC analysis. a) shows a direct form inplementation of the difference equation giving a synthsyzer output as a weighted sum of its past values plus the excitation input. b) shows a model of the acoustic tube with variable cross-sectional area that could give rise to such a characteristic. c) shows an intepretation of this model that suggests a lattice form for the filter. N.MORGAN / B.GOLD LECTURE

20 u k z M u k 1 1 r u k 2 z L r r u k z M 1 r u k 1 z L u k 2 Figure 11.2 : Two section digital wave guide. N.MORGAN / B.GOLD LECTURE

21 Exitation z 1 z 1 z 1... z 1 h 0 h 1 h 2 h m Σ Synthesized Speech Figure 29.6 : All-Zero synthesizer based on depstral analysis. N.MORGAN / B.GOLD LECTURE

22 Pulse Source Serial Structure for Vowels and Nasals A 2 A 1 Noise Source A 3 A 4 Parallel Structure for Most Consonants Figure : Structure of Klatt Synthesizer. N.MORGAN / B.GOLD LECTURE

EE 225D LECTURE ON SYNTHETIC AUDIO. University of California Berkeley

EE 225D LECTURE ON SYNTHETIC AUDIO. University of California Berkeley University of California Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Professors : N.Morgan / B.Gold EE225D Synthetic Audio Spring,1999 Lecture 2 N.MORGAN