SPEECH ANALYSIS* Prof. M. Halle G. W. Hughes A. R. Adolph

Similar documents
Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals.

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Acoustic Phonetics. How speech sounds are physically represented. Chapters 12 and 13

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

Linguistics 401 LECTURE #2. BASIC ACOUSTIC CONCEPTS (A review)

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

Pitch Period of Speech Signals Preface, Determination and Transformation

WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels. Spectrogram. See Rogers chapter 7 8

Quarterly Progress and Status Report. Formant amplitude measurements

Linguistic Phonetics. Spectral Analysis

SPEECH AND SPECTRAL ANALYSIS

Complex Sounds. Reading: Yost Ch. 4

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Epoch Extraction From Emotional Speech

COMP 546, Winter 2017 lecture 20 - sound 2

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Digital Speech Processing and Coding

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Sound Synthesis Methods

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

ScienceDirect. Accuracy of Jitter and Shimmer Measurements

Linear Frequency Modulation (FM) Chirp Signal. Chirp Signal cont. CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis

Audio Signal Compression using DCT and LPC Techniques

Quarterly Progress and Status Report. On certain irregularities of voiced-speech waveforms

Speech Synthesis; Pitch Detection and Vocoders

A Simple Hardware Pitch Extractor 1 *

Synthesis Algorithms and Validation

SECTION 7: FREQUENCY DOMAIN ANALYSIS. MAE 3401 Modeling and Simulation

Recording and post-processing speech signals from magnetic resonance imaging experiments

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

Theory of Telecommunications Networks

TRANSFORMS / WAVELETS

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Foundations of Language Science and Technology. Acoustic Phonetics 1: Resonances and formants

CHAPTER 6 INTRODUCTION TO SYSTEM IDENTIFICATION

Filters And Waveform Shaping

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

APPLICATIONS OF DSP OBJECTIVES

CHAPTER 6 Frequency Response, Bode. Plots, and Resonance

Speech Perception Speech Analysis Project. Record 3 tokens of each of the 15 vowels of American English in bvd or hvd context.

Experiment Guide: RC/RLC Filters and LabVIEW

Experiment 2: Transients and Oscillations in RLC Circuits

The source-filter model of speech production"

The Channel Vocoder (analyzer):

Speech Enhancement using Wiener filtering

Overview of Code Excited Linear Predictive Coder

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

Subtractive Synthesis & Formant Synthesis

CMPT 468: Frequency Modulation (FM) Synthesis

Electrical & Computer Engineering Technology

A() I I X=t,~ X=XI, X=O

Chapter 33. Alternating Current Circuits

FREQUENCY MODULATION. K. P. Luke R. J. McLaughlin R. E. Mortensen G. J. Rubissow

Section 11: Power Quality Considerations Bill Brown, P.E., Square D Engineering Services

ELC224 Final Review (12/10/2009) Name:

Speech Synthesis using Mel-Cepstral Coefficient Feature

EECE 301 Signals & Systems Prof. Mark Fowler

FREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION. Jean Laroche

Chapter 2. The Fundamentals of Electronics: A Review

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

On the glottal flow derivative waveform and its properties

EE 264 DSP Project Report

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 14 Timbre / Tone quality II

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Quarterly Progress and Status Report. A note on the vocal tract wall impedance

INDIANA UNIVERSITY, DEPT. OF PHYSICS P105, Basic Physics of Sound, Spring 2010

Drum Transcription Based on Independent Subspace Analysis

ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION DARYUSH MEHTA

Acoustic Phonetics. Chapter 8

ALTERNATING CURRENT (AC)

Correspondence. Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm


L19: Prosodic modification of speech

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Teaching the descriptive physics of string instruments at the undergraduate level

NAME STUDENT # ELEC 484 Audio Signal Processing. Midterm Exam July Listening test

Spectrum. Additive Synthesis. Additive Synthesis Caveat. Music 270a: Modulation

Adaptive Filters Application of Linear Prediction

Music 270a: Modulation

EE482: Digital Signal Processing Applications

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

Chapter 8. Chapter 9. Chapter 6. Chapter 10. Chapter 11. Chapter 7

Plaits. Macro-oscillator

Precision power measurements for megawatt heating controls

BJT & FET Frequency Response

EE105 Fall 2015 Microelectronic Devices and Circuits. Amplifier Gain

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

ECE438 - Laboratory 7a: Digital Filter Design (Week 1) By Prof. Charles Bouman and Prof. Mireille Boutin Fall 2015

Experimental evaluation of inverse filtering using physical systems with known glottal flow and tract characteristics

Recap the waveform. Complex waves (dạnh sóng phức tạp) and spectra. Recap the waveform

CHAPTER 14. Introduction to Frequency Selective Circuits

Automatic Glottal Closed-Phase Location and Analysis by Kalman Filtering

Transcription:

XII. SPEECH ANALYSIS* Prof. M. Halle G. W. Hughes A. R. Adolph A. STUDIES OF PITCH PERIODICITY In the past a number of devices have been built to extract pitch-period information from speech. These efforts have stemmed mainly from research on communication systems, the immediate goal being bandwidth reduction. For this purpose, some measure of the invariant or nonredundant properties of speech is desired. The fundamental pitch-periodicity or glottal excitation frequency is one such invariant. Since pitch information was only one aspect of one dimension (speech) of a complex multidimensional code (language), the devices that were evolved en route to bandwidth reduction were based on rather idealized and simplified representations of the speech waveform. The devices did not work, or made errors, over a significant portion of commonly encountered speech sounds. The limited dynamic range of physically realizable systems, improperly utilized, caused errors during transient portions of the speech signals. In tests with a small number of sample inputs, errors of 5 per cent to 20 per cent were reported. Four distinct approaches to the problem of pitch extraction were made by Dolansky (1), Gruenz and Schott (2), Lerner (3), and Miller and Weibel (4). Their work does not necessarily represent the first or only examples of these approaches but it illustrates the principles involved. Other devices in use are similar to one or more of these four. All of the aforementioned devices work reasonably well for an input consisting of a steady-state, decaying waveform, such as might result when one speaker said the long vowel /e/, as in "gay" in Fig. XII-la, or /a/, as in "father." However, when a perfectly understandable vowel like /I/ in Fig. XII-2a, or /o/ in Fig. XII-3a occurs, they make errors. Dolansky's system, which depends on obtaining the envelope of the input speech by asymmetrical detection, would be in error on /I/ in Fig. XII-Za. Figure XII-Zb shows the speech half-wave rectified. The effect of the second peaks would be enhanced by integration. A similar effect would occur on /o/ in Fig. XII-3a and b. Gruenz and Schott use low-pass filtering to extract the fundamental. A problem arises in deciding on a cutoff frequency for the filter. Referring to Fig. XII-2d and e, it is clear that a cutoff high enough to allow both male and female pitches to be extracted (i.e., 500 cps) will be unsuitable. This point is further illustrated in Fig. XII-3d and e, and in Fig. XII-4d and e. This work was supported in part by the National Science Foundation. 127

- CC 1II -- L (C) Fig. XII- 1. Steady-state /e/ sound, as in the word "gay." Male; T = 9 msec. Full-wave rectified, 500 cps low-pass filtered. P Half-wave rectified, 500 cps low-pass filtered. Bandpass filtered 600-2500 cps. Bandpass filtered 250-700 cps. (a) (C) Fig. XII-2. Steady-state /I/ sound, as in the word "sill." Male; T = 7. 5 msec. Half-wave rectified. Full-wave rectified. 500 cps low-pass filtered Za. 200 cps low-pass filtered 2a. 128

_ a~ ~ --~---~--- g-- --- -r (C) Fig. XII-3. a. Steady-state /o/ sound, as in "doze." Female; T = 6 msec. b. Half-wave rectified. c. Full-wave rectified. d. 500 cps low-pass filtered. e. 200 cps low-pass filtered. (a) (c) Fig. XII-4. a. Steady-state /u/, as m "spool." b. Full-wave rectified. c. Half-wave rectified. d. 500 cps low-pass filtered. e. 200 cps low-pass filtered. Female; T P = 6 msec. 129

~ (XII. SPEECH ANALYSIS) (a) (c) Fig. XII-5. a. Steady-state /o/, as in "doze." Female; T = 6 msec. P b. Full-wave rectified, 500 cps low-pass filtered. c. Half-wave rectified, 500 cps low-pass filtered. d. Bandpass filtered 600-2500 cps. e. Bandpass filtered 250-700 cps. Lerner represents the speech signal as a rotating vector and depends on a discontinuity in amplitude to indicate the period. Such a system will make an error on vowel examples that exhibit a high first-formant amplitude or correspondingly relatively small decay of their envelopes. Miller was more cognizant of vowel properties in his approach than were the others. By using a long delay line with a great number of taps which were scanned and whose outputs were subtracted from the undelayed speech, he added a "shape" correlation dimension to the amplitude dimension, and reduced the degree of uncertainty in the pitch determination. Errors caused by transient amplitude variations and formant frequency shifts can occur; however, these would be smaller in number for a true multiplying-integrating type of correlator. Vowels such as the /u/ in Fig. XII-4a illustrate a difficult case for a correlation method of pitch extraction. One promising method has been examined by the author: full-wave rectification followed by low-pass filtering. The effect of full-wave rectification is to shift the frequencies of energy concentration (formants) upward without changing the pitch frequency. This can best be visualized by thinking of the waveform in terms of its zerocrossing frequency and its fundamental pitch. The zero-crossing frequency is a function of the formant positions. When the waveform is full-wave rectified the effective zerocrossing frequency is doubled, while the fundamental pitch remains unchanged. This is readily seen in Fig. XII-2b and c. Thus a much higher low-pass filter cutoff 130

(XII. SPEECH ANALYSIS) frequency can be used, one that is high enough to be effective for both male and female voices, as illustrated in Figs. XII-lb and XII-5b. Another result of the current work is that the second formant energy alone is of little value for pitch determination, See Figs. XII-ld and XII-5d. A. R. Adolph References 1. L. O. Dolansky, An instantaneous pitch-period indicator, J. Acoust. Soc. Am 27, 67-72 (1955). 2. O. O. Gruenz and L. O. Schott, Extraction and portrayal of pitch of speech sounds, J. Acoust. Soc. Am. 21, 487-495 (1949). 3. R. Lerner, Pitch synchronous chopping of speech, Quarterly Progress Report, Research Laboratory of Electronics, M. I. T., Oct. 15, 1956, p. 99. 4. R. L. Miller and E. S. Weibel, Measurement of the fundamental period of speech using a delay line, J. Acoust. Soc. Am. 28, 761 (1956). B. VARIATIONS OF FORMANT INTENSITY WITH PITCH The energy contained in a vowel formant may vary with fundamental pitch. It will be near maximum when the ratio of formant frequency to pitch frequency is an integer, x, and near minimum when this ratio equals an odd multiple of 1/2 greater than one. Calculations were made with a parallel RLC circuit as a simple model of vocal tract resonance. The network was assumed to be driven by periodic unit impulses of current of period Z1T/w. Taking log 1 0 of the sum of IZ(jw) I Z at each of the component frequencies of the voltage across the elements will give a measure of the relative power, in decibels, passed by the resonant circuit for various values of the above ratio. Since I z(j) 12 Z( ) L/C - 2 (1) (1)2 [l Q C' the relative total power is L 1 1 1 (2) + + Qo x o x-1o 3 x where x = w/w, wo =LC/2, andq w RC. In Fig. XII-6 the quantity in brackets in Eq. 2 is plotted against x for three values 131

(XII. SPEECH ANALYSIS) 25 24-23- 22-21 - S20 o 19 S 18 17 16 14 Fig. XII-6. Output of parallel RLC circuit versus ratio (x) of the resonant frequency to the frequency of impulses. of Q. Some idea of how formant intensity in speech varies with pitch is given from the shapes of these curves, although the ordinate numbers themselves mean little. This effect is important for the very low formant frequency position only, since a Qo of 3 to 5 is appropriate for most vocal tract resonances. Appropriate weighting curves could be applied if the excitation is not a pure impulse. G. W. Hughes 132