INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

Similar documents
SPEECH AND SPECTRAL ANALYSIS

The source-filter model of speech production"

Source-filter Analysis of Consonants: Nasals and Laterals

Review: Frequency Response Graph. Introduction to Speech and Science. Review: Vowels. Response Graph. Review: Acoustic tube models

WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels. Spectrogram. See Rogers chapter 7 8

Linguistic Phonetics. The acoustics of vowels

COMP 546, Winter 2017 lecture 20 - sound 2

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

Source-filter analysis of fricatives

Foundations of Language Science and Technology. Acoustic Phonetics 1: Resonances and formants

Acoustic Phonetics. How speech sounds are physically represented. Chapters 12 and 13

Digital Signal Processing

Resonance and resonators

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Chapter 3. Description of the Cascade/Parallel Formant Synthesizer. 3.1 Overview

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Subtractive Synthesis & Formant Synthesis

Complex Sounds. Reading: Yost Ch. 4

Acoustic Phonetics. Chapter 8

Linguistic Phonetics. Spectral Analysis

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

CS 188: Artificial Intelligence Spring Speech in an Hour

About waves. Sounds of English. Different types of waves. Ever done the wave?? Why do we care? Tuning forks and pendulums

DIVERSE RESONANCE TUNING STRATEGIES FOR WOMEN SINGERS

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Source-Filter Theory 1

Linguistics 401 LECTURE #2. BASIC ACOUSTIC CONCEPTS (A review)

Recap the waveform. Complex waves (dạnh sóng phức tạp) and spectra. Recap the waveform

From Ladefoged EAP, p. 11

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

An Experimentally Measured Source Filter Model: Glottal Flow, Vocal Tract Gain and Output Sound from a Physical Model

Quarterly Progress and Status Report. Acoustic properties of the Rothenberg mask


Statistical NLP Spring Unsupervised Tagging?

Quarterly Progress and Status Report. A note on the vocal tract wall impedance

A() I I X=t,~ X=XI, X=O

Psychology of Language

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Acoustics, signals & systems for audiology. Week 4. Signals through Systems

ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION DARYUSH MEHTA

EE 225D LECTURE ON SPEECH SYNTHESIS. University of California Berkeley

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals.

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend

INDIANA UNIVERSITY, DEPT. OF PHYSICS P105, Basic Physics of Sound, Spring 2010

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Chapter 3 The Physics of Sound

Transforming High-Effort Voices Into Breathy Voices Using Adaptive Pre-Emphasis Linear Prediction

Speech Synthesis; Pitch Detection and Vocoders

An Implementation of the Klatt Speech Synthesiser*

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

Airflow visualization in a model of human glottis near the self-oscillating vocal folds model

Experimental evaluation of inverse filtering using physical systems with known glottal flow and tract characteristics

USING A WHITE NOISE SOURCE TO CHARACTERIZE A GLOTTAL SOURCE WAVEFORM FOR IMPLEMENTATION IN A SPEECH SYNTHESIS SYSTEM

The effect of whisper and creak vocal mechanisms on vocal tract resonances

Communications Theory and Engineering

Speech Perception Speech Analysis Project. Record 3 tokens of each of the 15 vowels of American English in bvd or hvd context.

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 14 Timbre / Tone quality II

Digital Signal Representation of Speech Signal

A Physiologically Produced Impulsive UWB signal: Speech

Respiration, Phonation, and Resonation: How dependent are they on each other? (Kay-Pentax Lecture in Upper Airway Science) Ingo R.

Low frequency response of the vocal tract: acoustic and mechanical resonances and their losses

Location of sound source and transfer functions

Signals, systems, acoustics and the ear. Week 3. Frequency characterisations of systems & signals

Digitized signals. Notes on the perils of low sample resolution and inappropriate sampling rates.

A Look at Un-Electronic Musical Instruments

GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES

Acoustics, signals & systems for audiology. Week 3. Frequency characterisations of systems & signals

Subglottal coupling and its influence on vowel formants

EE482: Digital Signal Processing Applications

EE 225D LECTURE ON SYNTHETIC AUDIO. University of California Berkeley

L19: Prosodic modification of speech

On the glottal flow derivative waveform and its properties

Acoustics and Fourier Transform Physics Advanced Physics Lab - Summer 2018 Don Heiman, Northeastern University, 1/12/2018

(a) What is the tension in the rope? (b) With what frequency must the rope vibrate to create a traveling wave with a wavelength of 2m?

Parameterization of the glottal source with the phase plane plot

Converting Speaking Voice into Singing Voice

Waves and Sound Practice Test 43 points total Free- response part: [27 points]

HMM-based Speech Synthesis Using an Acoustic Glottal Source Model

Mask-Based Nasometry A New Method for the Measurement of Nasalance

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Speech Signal Analysis

Pitch Period of Speech Signals Preface, Determination and Transformation

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE

The Correlogram: a visual display of periodicity

Glottal source model selection for stationary singing-voice by low-band envelope matching

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Quarterly Progress and Status Report. Notes on the Rothenberg mask

Speech Synthesis using Mel-Cepstral Coefficient Feature

COMPARING ACOUSTIC GLOTTAL FEATURE EXTRACTION METHODS WITH SIMULTANEOUSLY RECORDED HIGH- SPEED VIDEO FEATURES FOR CLINICALLY OBTAINED DATA

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

Principles of Musical Acoustics

A perceptually and physiologically motivated voice source model

A novel instrument to measure acoustic resonances of the vocal tract during phonation

University of Southampton ABSTRACT Doctor of Philosophy Characterisation of plosive, fricative and aspiration components in speech production by Phili

Perceptual evaluation of voice source models a)

Epoch Extraction From Emotional Speech

MUSC 316 Sound & Digital Audio Basics Worksheet

Transcription:

1. Resonators and Filters INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006 Different vibrating objects are tuned to specific frequencies; these frequencies at which a particular object prefers to vibrate are called natural frequencies. If you couple a vibrating object (a driving force) with another object (a driven system), it will cause forced vibration in the latter. Resonance happens when the frequency of the driving force is close to the natural frequency of the driven system. It has two-fold effect: 1) amplitude of vibration are increased at the natural (resonant) frequencies of the driven system, i.e. the latter works as a resonator; 2) amplitudes at other frequencies are absorbed, decreased, i.e. the driven system works as a filter. Filtering of a complex sound is the process of selective separation; some frequencies are allowed to pass through the filter while other frequencies are blocked from passing. There are different kinds of filters. A low pass filter permits only frequency components below a specified frequency (cut-off frequency) to pass unattenuated and reduces or blocks completely frequency components above the cut-off frequency. A high pass filter permits only frequency components above a cut-off frequency to pass unattenuated. A band pass filter is a combination of LP and HP filters: it permits frequency components between two cut-off frequencies to pass unattenuated. Fig. 1. Three types of filters: low pass, high pass, and band pass Band pass filters are characterized by centre frequency and bandwidth. The latter is defined as a range of frequencies passed by the filter, which are not more than 3 db down from the maximum amplitude (i.e. the amplitude of the centre frequency). The bandwidth of a filter may be relatively narrow or broad (see Fig. 2 and 3). 1

Fig. 2. Band pass filter and its properties Fig. 3. Filters with narrow and broad bandwidths 2. Spectrograms Series of band pass filters are used to create a visual representation of sounds called spectrograms. Unlike 2-D power spectra, spectrograms also represent the dimension of time, which makes it possible to capture constant variation of the acoustic signal. Spectrograms display time on the horizontal axis, frequency on the vertical axis, while amplitude is represented by means of colour density: Depending on the use of broadband or narrowband filters, we can get either wide-band (typically 300 Hz bandwidth) or narrow-band (45 Hz bandwidth) spectrograms. There is a trade-off between temporal and frequency resolutions. In a wide-band spectrogram (Fig. 4) individual harmonics are smashed, but it has a high temporal resolution it is possible to see individual voicing pulses as vertical striations. They are most commonly used because in majority of cases we are not interested in the changes of harmonics. Fig. 4. Wide-band spectrogram of the word heard ; vertical striations show individual voicing pulses Narrow-band spectrograms (Fig. 5) pick individual harmonic that appear as black horizontal lines, and the drops of energy between successive harmonics that appear as white horizontal lines, but temporal resolution is poor. 2

Fig. 5. Narrow-band spectrogram of heard, horizontal lines are individual harmonics 3. Source-Filter Theory The acoustic theory of speech production, known as source-filter theory, postulates that sound production consists of two basic components: (1) generation of sound source at the glottis or at some point along the length of vocal tract, and (2) filtering of that source by the vocal tract. There are three types of sound sources involved in speech production: a) quasi-periodic laryngeal voicing source, produced by the vocal fold vibration (present in vowels, nasals and approximants), b) continuous aperiodic turbulent source, produced at some point along the vocal tract (majority of voiceless fricatives), c) transient aperiodic noise source, produced at some point along the vocal tract (the release burst of voiceless stop consonants). These sources may be combined, e.g. production of voiced fricatives involves both periodic laryngeal source and aperiodic noise generated in the vocal tract. Production of voiced stops combines all three sources. Alone or in combination, these sources serve as input for the vocal tract filter which modulates this input, as different frequency components of the source are passed through the filter. 4. Source-Filter Theory for Vowels Vowels are the product of glottal quasi-periodic source and filtering effects of the supraglottal tract. Same quality vowels have the same gross spectral shapes, irrespective of the fundamental frequency of the source (a variable that changes significantly depending on the age, sex, and emotional state of the speaker). 3

4.1. Voice glottal source The air flowing out of the lungs supplies the system with the energy needed to produce sounds. When vocal folds vibrate, the rate of air flow through the glottis rises and falls and it generates a complex periodic wave. Fourier analysis on the glottal source waveform gives us the power spectrum showing its component frequencies (see Fig. 6). Amplitude of nearly all harmonics after the second one decrease rapidly as frequency increases it is an important characteristic of the glottal source spectrum. Fig. 6.Spectrum of the glottal source waveform Glottal source waveform and spectrum vary depending on the type of phonation modal, creaky or breathy. The differences in the waveform are due to the differences in the amount of time that the vocal folds are open during each glottal cycle. The relationship between amplitudes of the first two harmonics and general slope of the spectrum are two main spectral indicators of the type of phonation (Fig. 7-8). Fig. 7. Glottal source waveform. From Johnson (2003) Fig. 8. Power spectra of glottal waveforms. f 0 of the vocal fold vibration is dependent on several factors such as mass, length and tension of the folds which are interrelated in a fairly complicated way. Typical average values for f 0 are as follows: adult males voice: adult female voice: child s voice: 125 Hz 220 Hz 300 Hz During normal speech production, the frequency of voicing varies over an octave or more (e.g., for an adult male voice the range will be from 80 Hz to 160 Hz). 4

4.2. Vocal tract filter Vocal tract filter selectively passes energy in the harmonics of the source. The size and shape of the vocal tract determine for each harmonic of the source the relative amount of energy that is passed. Characteristic resonances of the vocal tract are called formants (F 1, F 2, F 3 etc). The vocal tract transfer function for a particular vowel is defined by the centre frequency and bandwidth of these formants. In the study of speech sounds we are mostly interested in the first three or four formants. We can model the acoustic properties of the vocal tract as a tube open at one end (mouth) and closed at the other (glottis). Assuming that this tube is uniform in its cross-section as in production of mid-centre schwa we can calculate resonant frequencies of that tube (the frequencies that will produce standing waves) by using the following formula: F n = (2n 1)c/4L where n is the number of the formant and L is the length of the tube This formula derives formants for tubes with uniform cross-sectional area only, but we need to analyse acoustic properties of vowels other than schwa and their production involves constrictions in vocal tract. One way of modelling the acoustic properties of vowels is to represent the vocal tract as a concatenation of tubes (Fant, 1960). Alternative approach is known as perturbation theory which models vowel acoustics in terms of relationship between air pressure and velocity (Chiba and Kajiyama, 1941). 4.3. Formant frequencies of the vowels First formant frequency (F 1 ) is traditionally referred to as the frequency of the pharyngeal/back cavity, but in fact it is influenced by the shape of the entire vocal tract. F 1 is inversely related with tongue height: low vowels have high F 1 and high vowels have low F 1. The size and length of the oral/front cavity are the main factors determining second formant frequency (F 2 ). F 2 is associated with the front-back dimension: front vowels have high F 2 while back vowels have low F 2 ; the formant frequencies decrease through the cardinal vowels. However, the relationship is not straightforward because of the possible effects of lip rounding, which always lowers all frequencies; thus [ɑ] is backer than [u] but has higher F 2. Third formant frequency (F 3 ) varies less then the first two formant frequencies. Lip rounding and retroflexion of the tongue have the biggest effect on this formant, both causing considerable lowering. 5

Fig. 9 Spectrograms of 8 British English vowels. From Ladefoged, Peter (2001) Articulatory properties of vowels can be related to the first two formant frequencies by the means of formant charts that plots F 1 against F 2 (Fig. 10). Because of the inverse relation between articulatory parameters and formant frequencies the graph is represented in such a way that zero frequency would be at the top right corner. Fig. 10. A formant chart plotting first and second formants for 8 English vowels. From Ladefoged, P. (2001) Reading: Chiba, T. and Kajiyama, M. (1941) The Vowel: Its Nature and Structure. Tokyo: Kaiseikan. Fant, G. (1960) Acoustic Theory of Speech Production. The Hague: Mouton. Fry, D. B. (1979) The Physics of Speech. Cambridge: CUP (chapters 4-7 and 9). Johnson, K. (2003) Acoustic and Auditory Phonetics. 2nd edition. Oxford: Blackwell (chapters 5-6). Kent, R. D., Dembowski, J. and Lass, N. J. (1996) The Acoustic Characteristics of American English. In Norman J. Lass (ed), Principles of Experimental Phonetics, pp. 185-225. 6