Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Similar documents
From Ladefoged EAP, p. 11

Complex Sounds. Reading: Yost Ch. 4

Definition of Sound. Sound. Vibration. Period - Frequency. Waveform. Parameters. SPA Lundeen

Linguistics 401 LECTURE #2. BASIC ACOUSTIC CONCEPTS (A review)

SPEECH AND SPECTRAL ANALYSIS

An introduction to physics of Sound

Signals A Preliminary Discussion EE442 Analog & Digital Communication Systems Lecture 2

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

Signals, systems, acoustics and the ear. Week 3. Frequency characterisations of systems & signals

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

Acoustics, signals & systems for audiology. Week 3. Frequency characterisations of systems & signals

COMP 546, Winter 2017 lecture 20 - sound 2

About waves. Sounds of English. Different types of waves. Ever done the wave?? Why do we care? Tuning forks and pendulums

Chapter 2. Meeting 2, Measures and Visualizations of Sounds and Signals

Signal Characteristics

Modulation. Digital Data Transmission. COMP476 Networked Computer Systems. Analog and Digital Signals. Analog and Digital Examples.

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Fundamentals of Music Technology

The quality of the transmission signal The characteristics of the transmission medium. Some type of transmission medium is required for transmission:

Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time.

Lecture Fundamentals of Data and signals

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 14 Timbre / Tone quality II

Introduction to Telecommunications and Computer Engineering Unit 3: Communications Systems & Signals

The source-filter model of speech production"

CHAPTER 6 Frequency Response, Bode. Plots, and Resonance

Terminology (1) Chapter 3. Terminology (3) Terminology (2) Transmitter Receiver Medium. Data Transmission. Direct link. Point-to-point.

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

Laboratory Assignment 4. Fourier Sound Synthesis

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Michael F. Toner, et. al.. "Distortion Measurement." Copyright 2000 CRC Press LLC. <

ME scope Application Note 01 The FFT, Leakage, and Windowing

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

SAMPLING THEORY. Representing continuous signals with discrete numbers

CS 188: Artificial Intelligence Spring Speech in an Hour

Fundamentals of Digital Audio *

Spectrum Analysis: The FFT Display

Acoustic Phonetics. How speech sounds are physically represented. Chapters 12 and 13

Resonance and resonators

Chapter 12. Preview. Objectives The Production of Sound Waves Frequency of Sound Waves The Doppler Effect. Section 1 Sound Waves

Computer Networks. Practice Set I. Dr. Hussein Al-Bahadili

Acoustic Resonance Lab

describe sound as the transmission of energy via longitudinal pressure waves;

Chapter 5 Window Functions. periodic with a period of N (number of samples). This is observed in table (3.1).

Preview. Sound Section 1. Section 1 Sound Waves. Section 2 Sound Intensity and Resonance. Section 3 Harmonics

Psycho-acoustics (Sound characteristics, Masking, and Loudness)

Data Communications & Computer Networks

Lab 9 Fourier Synthesis and Analysis

Data Communication. Chapter 3 Data Transmission

Digital and Analog Communication (EE-217-F)

Sound. Production of Sound

Kent Bertilsson Muhammad Amir Yousaf

Acoustics, signals & systems for audiology. Week 4. Signals through Systems

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

A mechanical wave is a disturbance which propagates through a medium with little or no net displacement of the particles of the medium.

Advanced Audiovisual Processing Expected Background

What is Sound? Simple Harmonic Motion -- a Pendulum

College of information Technology Department of Information Networks Telecommunication & Networking I Chapter DATA AND SIGNALS 1 من 42

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 13 Timbre / Tone quality I

Announcements. Today. Speech and Language. State Path Trellis. HMMs: MLE Queries. Introduction to Artificial Intelligence. V22.

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Lecture 2: SIGNALS. 1 st semester By: Elham Sunbu

Psychology of Language

Trigonometric functions and sound

BASIC SYNTHESIS/AUDIO TERMS

3.2 Measuring Frequency Response Of Low-Pass Filter :

E40M Sound and Music. M. Horowitz, J. Plummer, R. Howe 1

E40M Sound and Music. M. Horowitz, J. Plummer, R. Howe 1

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

ALTERNATING CURRENT (AC)

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

8A. ANALYSIS OF COMPLEX SOUNDS. Amplitude, loudness, and decibels

Week 1. Signals & Systems for Speech & Hearing. Sound is a SIGNAL 3. You may find this course demanding! How to get through it:

Chapter 3 Data Transmission COSC 3213 Summer 2003

Hearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin

Digitized signals. Notes on the perils of low sample resolution and inappropriate sampling rates.

Fourier Analysis. Chapter Introduction Distortion Harmonic Distortion

PHYSICS LAB. Sound. Date: GRADE: PHYSICS DEPARTMENT JAMES MADISON UNIVERSITY

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Signals, Sound, and Sensation

Introduction. Chapter Time-Varying Signals

Signals. Periodic vs. Aperiodic. Signals

Chapter 3 The Physics of Sound

Some key functions implemented in the transmitter are modulation, filtering, encoding, and signal transmitting (to be elaborated)

Data Transmission. ITS323: Introduction to Data Communications. Sirindhorn International Institute of Technology Thammasat University ITS323

CHAPTER 6 INTRODUCTION TO SYSTEM IDENTIFICATION

Terminology (1) Chapter 3. Terminology (3) Terminology (2) Transmitter Receiver Medium. Data Transmission. Simplex. Direct link.

SGN Audio and Speech Processing

PART II Practical problems in the spectral analysis of speech signals

Subtractive Synthesis & Formant Synthesis

PROBLEM SET 6. Note: This version is preliminary in that it does not yet have instructions for uploading the MATLAB problems.

Lab week 4: Harmonic Synthesis

MUS 302 ENGINEERING SECTION

Laboratory Experiment #1 Introduction to Spectral Analysis

Sound, acoustics Slides based on: Rossing, The science of sound, 1990.

Physics B Waves and Sound Name: AP Review. Show your work:

Copyright 2009 Pearson Education, Inc.

Music 171: Sinusoids. Tamara Smyth, Department of Music, University of California, San Diego (UCSD) January 10, 2019

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

Digital Processing of Continuous-Time Signals

Acoustics and Fourier Transform Physics Advanced Physics Lab - Summer 2018 Don Heiman, Northeastern University, 1/12/2018

Transcription:

Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics Acoustic Phonetics Classification and Features Segmental structure Coarticulation Suprasegmentals Physical acoustics pertains to what sound is and how it is described. Most simply, sound is the propagation of pressure variations through a medium such as air, water, train rails, etc. We cover the representation of sound in the timedomain as the time course of pressure variations, and in the frequency domain as the amplitude and phase of one or more sinusoids. We draw a distinction between sound sources (things that generate sound by vibrating), and way sound is altered by the acoustic properties of objects it interacts with such as resonating tubes. Physical acoustics provides the background to understand how speech sounds are generated and controlled by the vocal tract and articulators. The discussion of speech acoustics is primarily about speech production. Finally, we will examine the relationship between the linguistic characteristics of speech and the structure of the acoustic speech signal. This will include discussion of segments or phonemes (corresponding roughly to vowels and consonants), sub-phonemic acoustic features of speech such as voicing, manner, place that can be used to characterize phonemes, as well as properties that extend over several phonemes or even an entire utterance. 1

Physical Acoustics: Time-domain Simple waveforms Amplitude, Frequency, and Phase Physical versus perceptual characteristics Complex waveforms Periodic and aperiodic waveforms Simple waveforms are sinusoidal signals that are completely described by their amplitude, frequency, and phase. Amplitude describes the degree of fluctuation in pressure, frequency is the number of pressure fluctuations per unit time, and phase is the relative alignment of the pressure fluctuations with respect to a specific instant. Perceptually, amplitude corresponds to loudness of a sound, and frequency to its pitch. Under most conditions, phase is not perceptually significant. The relationship between changes in loudness/pitch and changes in amplitude/frequency is roughly logarithmic. That is, at low amplitudes/frequencies, small changes result in large perceived changes in loudness or pitch, while at high amplitudes/frequencies much larger amplitude/frequency changes are needed to produce the same perceived change in loudness or pitch. Complex waveforms contain multiple sinusoidal (simple) waveforms. Sinusoids are periodic. Each cycle of a sinusoid is repeated exactly in the next cycle. Complex waveforms may also be periodic if they contain cyclic patterns that repeat exactly. Aperiodic waveforms are complex waveforms that do not contain cyclic patterns that repeat exactly. Many biological and other realworld signals--including the speech signal--may contain sequences of very similar patterns that are nearly but not exactly the same. Such sounds are similar to periodic sounds and are called quasiperiodic. 2

y i Simple Waveform For all integer i: = a sin ( θ + 2π i f ) f s a = Amplitude θ = Phase f = Frequency f s = Sampling Rate 1) By definition, simple waveforms are sinusoids. Any sinusoid can be completely specified by three parameters: Amplitude (a) - The extent of pressure variation. Frequency (f) - The rate of pressure variation in terms of the number of complete cycles of the sinusoid per second (Hertz abbreviated Hz). Phase (θ)- An offset of the function with respect to a specific time. 2) Although we are technically dealing with analog signals, this is obviously a discrete approximation to a sine function with an additional parameter, the sampling rate (f s ). For frequency specified in Hz, f s is the number of equally spaced instants every second at which the function is evaluated. 3

Simple Waveforms 175 Hz 225 Hz 275 Hz 325 Hz 375 Hz 425 Hz 475 Hz 1) Here are several sinusoids illustrating variations both frequency, amplitude, and phase. These are orthogonal (each can be varied independently of the others), but not in these diagrams where the top right frame is both lower in amplitude and higher in frequency than the top left. 2) The three graphs are: Top left - 200 Hz tone. Top right - 600 Hz tone at lower amplitude. Bottom left - two 200 Hz tones of the same amplitude differing in phase 3) It is worthwhile noting that the physical features frequency and amplitude correspond to the perceptual features of pitch and loudness respectively, but the relationships between physical and perceptual features are not linear. The relationship is roughly logarithmic: small physical changes at low frequencies/amplitudes are perceptually much larger than equal physical changes at high frequencies/amplitudes. This is illustrated in the sounds linked to the buttons on the bottom right - All steps are 50 Hz, but the pitch difference between 175 and 225 Hz is greater than the pitch difference between 425 and 475 Hz. [I don t know how to make this link work in a PDF file - HTB] 4

= Complex waveforms For all integer i: For all integer k (0 < k < K): y i K k= 0 a k sin ( θ + 2π if f ) a k = Amplitude of k th component θ κ = Phase of k th component f k = Frequency of k th component f s = Sampling Rate k k s 1) Complex waveforms can be described as the summation of a series of two or more simple waveforms. 2) In the discrete but unbounded case, K =. Later, we will explore the consequences of using finite i and K. 5

Complex Waveforms 1) This figure shows three complex waveforms, and one simple waveform. It shows how non-sinusoidal functions of time can be approximated by summing simple waveforms of the correct frequencies, amplitudes, and phases. 2) Graphs in the figure are: Top left - 200 Hz Square wave Top right - 200 Hz Sine wave is first component of the square wave (NOT a complex wave, but the first in the series of simple waves that sum to form a square wave. This is called the fundamental frequency and is typically written as F0. Bottom left - 200 Hz Sine wave + 600 Hz Sine wave. The 600 Hz sine wave (at three times F0) is 1/3 the amplitude of F0 and has the same phase. Components at integer multiples of F0 are called harmonics. Bottom right - Sum of sine waves at 200, 600, 1000, 1400 Hz. These frequencies correspond to F0 and its 3rd, 5th, and 7th harmonics. The amplitudes of these are 1/3, 1/5, and 1/7 the amplitude of the F0 component. All have the same phase. 3) Perceptually, all of these signals would have the same pitch, corresponding to that of a 200 Hz sine wave, but timbre differs, becoming brighter as more components are combined. 6

Aperiodic Waveforms Impulse - only non-zero for one instant. Random - Amplitude variations are without temporal structure. 1) All the previous waveforms we ve examined are called periodic because they consist of a basic pattern (however complex) that repeats over time. The length of the pattern (called its period) is inversely related to the fundamental frequency of the periodic waveform and all the sinusoidal components needed to describe the structure of the waveform will fall at integer multiples of F0. The period (P) of a waveform is thus specified by the relation P = 1/F0. 2) These examples of aperiodic waveforms differ from periodic waveforms in that they do not have a repeating temporal pattern. 3) The impulse is (theoretically) non-zero at only one instant and zero at all other instants. 4) A waveform with random amplitude also lacks systematic repetition of amplitude values over time (if the random number generator is any good). 7

Physical Acoustics: Frequencydomain Line spectra Represent periodic signals One or more sinusoids that are harmonically related Harmonics can appear only at frequencies that are integer multiples of the fundamental (usually lowest) frequency. Continuous spectra Represent aperiodic signals Energy present (potentially) at any frequency, not just harmonically related frequencies. 1) The frequency domain and time domain are alternative representations of exactly the same signals. Conversion between the two is information preserving. 8

Line Spectra 1) These figures make it clear why we refer to the spectra of periodic signals as line spectra. There can be information in the spectrum only at the fundamental frequency (F0) and its integer multiples. 2) We have left something out of these diagrams: there is no phase information displayed. It could be if we added a Y-Z plane as a 3rd dimension to the figures. We could then show phase as a rotation angle of the line around the X axis in the Y-Z plane. However, to a first approximation, we are unable to perceive phase relations in complex signals and it is thus normal practice to display the magnitude only of components (lines), disregarding their phase. 3) The figure at the top left is from a simple waveform (i.e., one sinusoid). It has a frequency of 0.20 khz (that kilo Hertz) or in other words, 200 Hz. It s amplitude is 100 on a scale of unspecified units. 4) The top right figure is the spectrum of a 200 Hz square wave. Note that it has lines at frequencies and amplitudes that correspond to the components we previously added to approximate the square wave. 5) The bottom left figure is the same spectrum shown in the top right, but with logarithmic units of amplitude instead of linear units. Specifically, amplitude is shown in decibels (db). Conversion from linear amplitude units to db follows the relation: db = 20.0 * log10(amp/ref) where amp is the linear amplitude and ref is a reference amplitude for sound in air, the reference is 0.002 dyne. For digitized speech, the ref is commonly a unit amplitude step. 9

Continuous Spectra Spectrum of impulse is all frequencies at equal amplitude. Spectrum of random signal is all frequencies at random amplitudes. 1) Unlike line spectra, continuous spectra may have energy at any representable frequency. Of course, for discrete waveforms & spectra, continuous spectra are not really continuous: there are a discrete number of frequencies. Generally, you should read all frequencies to mean all the frequencies that we can actually talk about here to cover both the discrete and truly continuous cases. 2) The top figure is the spectrum of an impulse. While the impulse has energy at only one instant in time, it has equal energy at all frequencies. As we will see, this make the impulse an especially useful notion for characterizing the response of objects to being hit by sound because it effectively probes the response equally at all frequencies. More about that later. 3) The bottom figure is the spectrum of a random process. Since this is really a discrete spectrum, it was based on a discrete sequence of samples from the random process, and within that specific sequence, we can see that amplitude variations at some frequencies were more likely than at others. Of course, the expectation is for the a new sequence from the same process to have an entirely different set of spectral peaks and valleys. 10

Quasiperiodicity Log magnitude of a square wave Log magnitude of a jittery square wave 1) The top figure is the log magnitude spectrum of the square wave we were just looking at. 2) The bottom figure is also the spectrum of a square wave, but with one significant difference. In this case, the square wave was windowed over a large number of cycles to begin at zero amplitude, grow to a maximum amplitude and then decrease back to zero amplitude again. This is now an amplitude modulated square wave and because every cycle is not exactly like every other cycle, it is no longer periodic, but it is nearly so.this is called quasiperiodic. 3) Quasiperiodic signals give rise to spectra that are similar to line spectra where the lines have measurable width. Such spectra are technically continuous spectra, not line spectra. 4) Because quasiperiodic signals are the rule rather than the exception in the real world, we refer to spectra produced by such signals as harmonic spectra. 5) The individual wide lines in harmonic spectra are called harmonics of F0 just as lines are in the pure line spectrum case. 11

Physical Acoustics: Sound shaping Sources Generators of sound energy Periodic or aperiodic possibly complex temporal/spectral structure Filters Low pass High pass Band pass Resonance Pendulums Tubes Frequency domain properties 1) All sound must originate with an energy source driving an object and making it vibrate. The structure of the sound emanating from the vibrating object (its waveform or spectrum) will be a function of two things: the structure (movement pattern) of the driving source, and the way that the object being driven responds to being vibrated. 2) Sound itself may be the energy source that is striking an object and making it vibrate. In such instances, the final output sound of the object will be due to both the structure of the original driving sound and the manner in which the object responds to being driven. 3) If we know the response characteristics of the object being driven and the structure of the driving force (whatever it is), we can determine what the output of the system comprising the driving force and driven object will be. This is the case for a particular class of driven objects, filters, and resonators, that we ll discuss next. 4) Filters respond to a source function by attenuating the energy of the source at some frequencies. Typical classes of filters will be discussed in the next slide. 5) Unlike filters, resonators have a specific frequency at which they prefer to vibrate called their resonant frequency. It takes very little energy to produce a response at the resonant frequency and the response will continue for some time after the driving source is removed. 12

Filters Low pass High pass Band pass Amplitude (db) 0 0 0-3 -3-3 Frequency Bandwidth NOTES 1) These are three fundamental types of filters. Shown in the frequency domain. Other more complicated filters could be constructed by combining effects of simple filters. The axes for the three diagrams are arbitrary frequency (X axis) and arbitrary amplitude (Y axis) scales. 2) Filters are most simply characterized by their bandwidth and slope. Bandwidth is the range of frequencies that pass through the filter with little or no attenuation. The bandwidth of a filter is delimited by the filter cutoff frequency(ies), the point at which attenuation becomes greater than 3 db. Filter slope is how rapidly attenuation increases after the cutoff frequency (commonly specified in db per octave). Band pass filters are additionally characterized by their center frequency. 3) Low pass filters are smoothing filters. They remove rapid fluctuations in a time series while retaining larger scale features. 4) High pass filters are differentiators. They remove slow changes in a time series while retaining rapid changes. 5) Band pass filters are tuned to a specific frequency or range of frequencies and reject variations outside that range. 6) The impulse response of a filter shows how it responds over time to being excited by an impulse. The frequency response of a filter is the fourier transform of its impulse response, and vice versa. 13

Pendulum Time p 0-1 0 +1 Amplitude NOTES 1) Pendulums have a natural frequency that depends upon the length of the pendulum. 2) As long as a pendulum is in free motion, it will swing at its natural frequency. 3) A hard push makes the pendulum swing further (greater amplitude), but will not change the period of the swinging. That is, no matter how great the amplitude of swinging, the time required to complete one cycle of swinging is a constant. 14

Resonating Tube Open end Closed end 17 cm R1 = s/4l = 34000/68 = 500 R2 = 3s/4L = 1500 R3 = 5s/4L = 2500 etc.. NOTES 1) A uniform tube, open at one end and closed at the other resonates at frequencies given by a quarter wavelength law: The length of the tube is 1/4th the wavelength of the tube s first resonant frequency. 2) The speed of sound is roughly 34 cm/msec or 34000 cm/sec. So a 17 cm tube would have its first resonance at about 500 Hz. 3) Subsequent resonances fall at odd multiples of the first resonance. 4) For tubes that are not of uniform cross sectional area, resonances differ from those of a uniform tube. For instance, if the front half of the tube (toward the open end) is wider than the back half of the tube, R1 will be higher in frequency, R2 lower in frequency, R3 higher in frequency, R4 lower, and so forth. The greater the difference in area, the greater the effect on resonant frequencies. 15

Resonators 40 30 20 500 1500 3500 10 Amplitude (db) 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000-10 -20-30 -40-50 -60 Frequency (Hz) NOTES 1) This figure shows the frequency response of three resonators each with a bandwidth of 100 Hz but with resonant frequencies of 500, 1500, and 3500 Hz. The magnitude of their response (in db) is plotted relative to energy input to the resonator. 2) Resonators share some properties of band pass filters in that they can be characterized by their center frequency--more accurately their resonant or natural frequency--and their bandwidth. 3) Unlike filters, resonators have greater than 0 db response in the neighborhood of their natural frequency. This means that the energy out of the resonator is actually greater than the driving energy into the resonator in this region. 4) For resonators, the narrower the bandwidth, the greater the response is at the resonant frequency. 5) For resonators of equal bandwidth, the greater the resonant frequency, the greater the response level at the resonant frequency and all frequencies above the resonant frequency. 6) This last fact has implications for speech where we see that the higher the frequency of the first resonance of the vocal track, the greater the overall amplitude of the speech signal. 16