SPEECH AND SPECTRAL ANALYSIS

Similar documents
INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

The source-filter model of speech production"

Source-filter Analysis of Consonants: Nasals and Laterals

From Ladefoged EAP, p. 11

Review: Frequency Response Graph. Introduction to Speech and Science. Review: Vowels. Response Graph. Review: Acoustic tube models

Acoustic Phonetics. How speech sounds are physically represented. Chapters 12 and 13

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Source-filter analysis of fricatives

Acoustic Phonetics. Chapter 8

COMP 546, Winter 2017 lecture 20 - sound 2

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Resonance and resonators

CS 188: Artificial Intelligence Spring Speech in an Hour

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Linguistic Phonetics. Spectral Analysis

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

About waves. Sounds of English. Different types of waves. Ever done the wave?? Why do we care? Tuning forks and pendulums

An introduction to physics of Sound

Linguistics 401 LECTURE #2. BASIC ACOUSTIC CONCEPTS (A review)

Communications Theory and Engineering

WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels. Spectrogram. See Rogers chapter 7 8

Digital Signal Processing

Subtractive Synthesis & Formant Synthesis

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

EE 225D LECTURE ON SPEECH SYNTHESIS. University of California Berkeley

Converting Speaking Voice into Singing Voice

Source-Filter Theory 1

Speech Synthesis; Pitch Detection and Vocoders

Foundations of Language Science and Technology. Acoustic Phonetics 1: Resonances and formants

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Complex Sounds. Reading: Yost Ch. 4

ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION DARYUSH MEHTA

Psychology of Language

Statistical NLP Spring Unsupervised Tagging?

Principles of Musical Acoustics

Chapter 3. Description of the Cascade/Parallel Formant Synthesizer. 3.1 Overview

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

EE482: Digital Signal Processing Applications

Speech Perception Speech Analysis Project. Record 3 tokens of each of the 15 vowels of American English in bvd or hvd context.

Digitized signals. Notes on the perils of low sample resolution and inappropriate sampling rates.

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE

Recap the waveform. Complex waves (dạnh sóng phức tạp) and spectra. Recap the waveform

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)


Copyright 2009 Pearson Education, Inc.

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

INDIANA UNIVERSITY, DEPT. OF PHYSICS P105, Basic Physics of Sound, Spring 2010

Definition of Sound. Sound. Vibration. Period - Frequency. Waveform. Parameters. SPA Lundeen

Respiration, Phonation, and Resonation: How dependent are they on each other? (Kay-Pentax Lecture in Upper Airway Science) Ingo R.

Preview. Sound Section 1. Section 1 Sound Waves. Section 2 Sound Intensity and Resonance. Section 3 Harmonics

Chapter 3 The Physics of Sound

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Linguistic Phonetics. The acoustics of vowels

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals.

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 13 Timbre / Tone quality I

8.3 Basic Parameters for Audio

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

An Implementation of the Klatt Speech Synthesiser*

An Experimentally Measured Source Filter Model: Glottal Flow, Vocal Tract Gain and Output Sound from a Physical Model

Quarterly Progress and Status Report. A note on the vocal tract wall impedance

Experimental evaluation of inverse filtering using physical systems with known glottal flow and tract characteristics

A Physiologically Produced Impulsive UWB signal: Speech

GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES

Chapter 2. Meeting 2, Measures and Visualizations of Sounds and Signals

Location of sound source and transfer functions

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

DIVERSE RESONANCE TUNING STRATEGIES FOR WOMEN SINGERS

L19: Prosodic modification of speech

Digital Speech Processing and Coding

ALTERNATING CURRENT (AC)

USING A WHITE NOISE SOURCE TO CHARACTERIZE A GLOTTAL SOURCE WAVEFORM FOR IMPLEMENTATION IN A SPEECH SYNTHESIS SYSTEM

Speech Signal Analysis

Acoustics and Fourier Transform Physics Advanced Physics Lab - Summer 2018 Don Heiman, Northeastern University, 1/12/2018

Speech Synthesis using Mel-Cepstral Coefficient Feature

JOURNAL OF OBJECT TECHNOLOGY

Quarterly Progress and Status Report. Acoustic properties of the Rothenberg mask

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

SOUND SOURCE RECOGNITION AND MODELING

Basic Characteristics of Speech Signal Analysis

A() I I X=t,~ X=XI, X=O

A mechanical wave is a disturbance which propagates through a medium with little or no net displacement of the particles of the medium.

An Investigation of Response Bias in Tone Glide Direction Identification. A Senior Honors Thesis

Announcements. Today. Speech and Language. State Path Trellis. HMMs: MLE Queries. Introduction to Artificial Intelligence. V22.

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

Chapter 1: Introduction to audio signal processing

Transforming High-Effort Voices Into Breathy Voices Using Adaptive Pre-Emphasis Linear Prediction

Interference & Superposition. Creating Complex Wave Forms

EE 225D LECTURE ON SYNTHETIC AUDIO. University of California Berkeley

Linear Predictive Coding *

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

Digital Signal Representation of Speech Signal

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

Unit 6: Waves and Sound

Chapter 12. Preview. Objectives The Production of Sound Waves Frequency of Sound Waves The Doppler Effect. Section 1 Sound Waves

Lecture Presentation Chapter 16 Superposition and Standing Waves

Proceedings of Meetings on Acoustics

Sound Interference and Resonance: Standing Waves in Air Columns

On the glottal flow derivative waveform and its properties

Transcription:

SPEECH AND SPECTRAL ANALYSIS 1

Sound waves: production in general: acoustic interference vibration (carried by some propagation medium) variations in air pressure speech: actions of the articulatory organs -> vibrations propagation medium -> airstream Representation of fluctuations in air pressure caused by a vibrating tuning fork (from P. Ladefoged, Elements of acoustic phonetics). 2

Sound waves: perception A schematic diagramm of the mechanism of the ear (from P. Ladefoged, Elements of acoustic phonetics). 3

Distinctive features of sound waves Frequency measured in cycles per second (Hz): A sound wave whose frequency is 100 Hz has 100 cycles in a second. cycle: the distance between two peaks (C) or rests (B) in the movement of the wave (i.e. it describes how close together the two points are) period: Period is the time required to complete one cycle of vibration, e.g. if 20 cycles are completed in 1 second, the period is 1/20th of a second (s), or 0.05 s. Amplitude the maximum distance between the peak (C) and the trough (A) peak-to-peak a. Fundamental frequency (of a voiced speech sound): 1/fundamental period (i.e. the time required to complete one cycle of the pattern as a whole) the frequency of vocal folds vibration depending on the size of the vocal apparatus human voice produces sounds within the ranges: 80-220 male, 120-300 female, 200-500 children A wave of a 20 Hz frequency from Davenport & Hannahs, Introducing phonetics and phonology). 4

Simple and complex waves Two simple waves (pure tones, harmonics) of frequency 100 and 500 cps. The complex wave resulting from superposition of two simple waves of 100 and 500 cps (from P. Ladefoged, Elements of acoustic phonetics). 5

Distinctive features of sounds (1) Two sounds of the same duration (lenght) can differ with respect to: Pitch: subjective impression of the height of the sound related to fundamental frequency of the vibration which is an acoustic (objective) measure indicating the height of the sound two sounds of a different f. frequency (f 0 ) can be perceived as having the same pitch Loudness related to the amplitude of the sound: the higher the amplitudę, the louder the sound is perceived affected by the efficiency and distance of the propagating medium: the larger the distance, the less audible the sound becomes some materials, e.g. wood, are more efficient in carrying sounds than air 6

Distinctive features of sounds (2) quality (or colouring) results from differences in the shape of the propagation medium (hence differences in the perception of the same phoneme produced by different speakers, as well as differences in the vowel quality resulting from different shape of the vocal tract) and the material enclosing that medium (in case of musical instruments e.g. flute made of metal vs. wooden violin). Depending on the features (shape, size and material) of the propagation medium some harmonics of the sound will be emphasized and others will be damped. 7

Source-filter theory (1) speech production: a two stage process 1) the generation of a sound source 2) shaping/filtering of the sound source by the resonant properties of the vocal tract the input (source of sound): glottis or the supralaryngeal vocal tract the output: the lips or the nose (or both) The vocal tract filters the sound source. The vocal tract s acoustic response depends on its length & shape. 8

Source-filter theory (2) the effect of the vocal tract shape on the characteristics of the output sound: it determies whether there is a supralaryngeal sound source it determies the resonance frequencies (formant frequencies) of the vocal tract Examples of different types of source and vocal tract shape. 9

Source-filter theory (3) A resonator acts as a filter on the original source of sound: it rearranges the input energy so that frequencies that are at or near the resonance frequencies are amplified, at the expense of those frequencies that are not near the resonance frequencies (they become reduced). We can calculate the resonances given the length of the vocal tract (assume 17.5 cm for now) and the speed of sound (assume 35.000 cm/s): F1 = c/4l, where: c = the speed of sound and L = the length of the tube For example, for a 17.5 cm tube, F1 = c/4l = 35000/70 = 500 Hz. 10

Periodic and aperiodic waves complex waves can be: periodic: regularly repeating pattern each complete cycle, or period, is like the last one aperiodic: irregular no regularly repeating pattern, thus no clear cycles, or periods the type of the complex waveform is determined by the sound source (excitation source): periodic: when the vocal folds vibrate regularly aperiodic: every other sound source, laryngeal and supralaryngeal 11

Periodic sound source in speech 1. Regular vibration of the vocal folds produces many different frequencies in a single glottal cycle, which results in a complex periodic waveform -> a periodic (= regularly repeating) sound source. 2. All periodic speech sounds are phonated, i.e. phonetically voiced. The source of periodic sound is always in the larynx at the glottis. 3. The period is the duration of one cycle of the pattern of a periodic wave (one glottal cycle). 4. The fundamental frequency (f0) is the reciprocal of the period: 1/period. 5. The percept of pitch is closely related to f0. A higher pitch has a higher f0, and hence faster glottal pulses. (Periodic sounds have pitch; aperiodic sounds do not.) 12

Aperiodic sound sources in speech 1. Aperiodic sound source results in turbulence noise or implosion noise (random noise = many frequencies, but forming irregular patterns). The vocal folds do not vibrate: such sounds are phonetically voiceless. 2. The aperiodic source may be laryngeal (located at the glottis) or supralaryngeal (located higher in the vocal tract): when the glottis is narrowed enough to produce aperiodic noise (but too wide to let the vocal folds vibrate), the result is whisper, [h] (= a voiceless vowel) or breathy voice for other aperiodic speech sounds, the source of sound is at a constriction in the oral cavity that is narrow enough to cause air to rush through it. These supralaryngeal constrictions result in voiceless stops, fricatives and affricates, e.g. [f s t ʧ]. 13

Mixed voiced and aperiodic sound source Periodic and aperiodic sources can be generated simultaneously to produce mixed voiced and aperiodic speech typical of sounds such as voiced fricatives. 14

Acoustic representations of sounds: spectrogram, waveform, spectrum (1) waveform variations in the air pressure associated with speech sounds changes in amplitude through time pulses corresponding to the vibrations of the vocal folds Waveform of a Polish utterance: Ostatnie przygody Korowiowa i Behemota (male speaker). 15

Acoustic representations of sounds (2): waveforms What kind of information can we derive from a waveform? amplitude, F0, the manner of articulation (to some extent): vowels, approximants and nasals pulses (voicing), high amplitude and energy (vowels, approximants and in the end nasals) voiced obstruents (plosives, fricatives and affricates) pulses and low energy and amplitude (fricative segments, plosives) voiceless obstruents empty spaces in case of stops, aperiodic variation in the amplitude in case of fricatives and fricative component of an affricate 16

Acoustic representations of sounds (3): spectrograms spectrogram variation in the frequency domain over the time vertical lines -> pulsations of the vocal folds frequency domain: certain frequencies are emphasized (dark marks) -> formants The frequency of the formant depends on the size and shape of the vocal tract, so in a spectrographic analysis it provides information on the place and manner of articulation. Spectrogram of a Polish utterance: Ostatnie przygody Korowiowa i Behemota (male speaker). 17

Acoustic representations of sounds (4): spectrograms In the analysis of speech the first four formants are taken into account and they are marked as F1, F2, F3 and F4 (from the lowest to the highest on the frequency scale). F1 and F2 are the most important indicators of vowel quality, whereas the higher formants reflect speaker s characteristics (voice quality). In the flow of articulation changes in formant frequencies which occur when the setting of the vocal tract is changed from one sound to another are called transitions. Spectrograms: optimal for analysis of duration, F0 and phonetic features (e.g. aspiration), and identification of different speech sounds (-> formant frequencies, transitions and vocal folds pulsations) 18

Acoustic representations of sounds (5): spectra spectrum (pl. spectra) is static: it shows the amplitude of each frequency present in the sound, usually during a single short section of the signall e.g. 25 or 50 ms you can obtain a spectrogram by arranging together a series of spectra types of spectral analysis: Fourier analysis (fft [fast Fourier transform] or dft [discrete Fourier transform]) Linear Predictive Coding (lpc) harmonics each component frequency in a periodic wave: H1, H2 (=2 x H1), H3 (=3 x H1), etc. the frequency of the lowest harmonic (the first harmonic) is equivalent to the fundamental frequency of the voice-> f0 = H1 harmonics formants Dft (jagged line) and lpc (smooth line) spectra of [uː] in It s too much. 19