Speech Perception Speech Analysis Project. Record 3 tokens of each of the 15 vowels of American English in bvd or hvd context.

Similar documents
Linguistic Phonetics. Spectral Analysis

Acoustic Phonetics. How speech sounds are physically represented. Chapters 12 and 13

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

SPEECH AND SPECTRAL ANALYSIS

WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels. Spectrogram. See Rogers chapter 7 8

Acoustic Phonetics. Chapter 8

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

The source-filter model of speech production"

Speech Recognition. Mitch Marcus CIS 421/521 Artificial Intelligence

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

Source-filter analysis of fricatives

Converting Speaking Voice into Singing Voice

Review: Frequency Response Graph. Introduction to Speech and Science. Review: Vowels. Response Graph. Review: Acoustic tube models

EE 225D LECTURE ON SPEECH SYNTHESIS. University of California Berkeley

Source-filter Analysis of Consonants: Nasals and Laterals

Homework 4. Installing Praat Download Praat from Paul Boersma's website at Follow the instructions there.

Communications Theory and Engineering

Digital Signal Processing

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

Digital Speech Processing and Coding

COMP 546, Winter 2017 lecture 20 - sound 2

CS 188: Artificial Intelligence Spring Speech in an Hour

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

About waves. Sounds of English. Different types of waves. Ever done the wave?? Why do we care? Tuning forks and pendulums

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Speech Enhancement using Wiener filtering

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

8A. ANALYSIS OF COMPLEX SOUNDS. Amplitude, loudness, and decibels

Digitized signals. Notes on the perils of low sample resolution and inappropriate sampling rates.

Speech Synthesis using Mel-Cepstral Coefficient Feature

Complex Sounds. Reading: Yost Ch. 4

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

EE482: Digital Signal Processing Applications

Statistical NLP Spring Unsupervised Tagging?

Introduction to cochlear implants Philipos C. Loizou Figure Captions


Speech Synthesis; Pitch Detection and Vocoders

HCS 7367 Speech Perception

SPEECH ANALYSIS* Prof. M. Halle G. W. Hughes A. R. Adolph

Foundations of Language Science and Technology. Acoustic Phonetics 1: Resonances and formants

Pitch Period of Speech Signals Preface, Determination and Transformation

Subglottal coupling and its influence on vowel formants

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals.

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

Linguistics 401 LECTURE #2. BASIC ACOUSTIC CONCEPTS (A review)

A Physiologically Produced Impulsive UWB signal: Speech

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

June INRAD Microphones and Transmission of the Human Voice

Nature of Noise source. soundsc (noise, 10000);

NOTES FOR THE SYLLABLE-SIGNAL SYNTHESIS METHOD: TIPW

Source-Filter Theory 1

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

On the glottal flow derivative waveform and its properties

ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION DARYUSH MEHTA

Quarterly Progress and Status Report. Notes on the Rothenberg mask

The Channel Vocoder (analyzer):

Quarterly Progress and Status Report. Acoustic properties of the Rothenberg mask

Acoustic Tremor Measurement: Comparing Two Systems

Fundamental Frequency Detection

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Signal Analysis. Young Won Lim 2/10/18

Subtractive Synthesis & Formant Synthesis

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

Speech/Non-speech detection Rule-based method using log energy and zero crossing rate

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Basic Characteristics of Speech Signal Analysis

Quarterly Progress and Status Report. Formant amplitude measurements

Glottal source model selection for stationary singing-voice by low-band envelope matching

A() I I X=t,~ X=XI, X=O

3A: PROPERTIES OF WAVES

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis

Bioacoustics Lab- Spring 2011 BRING LAPTOP & HEADPHONES

Resonance and resonators

Speech Compression Using Voice Excited Linear Predictive Coding

Location of sound source and transfer functions

Psychology of Language

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

NOISE ESTIMATION IN A SINGLE CHANNEL

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

Linear Predictive Coding *

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

Recording and post-processing speech signals from magnetic resonance imaging experiments

An Experimentally Measured Source Filter Model: Glottal Flow, Vocal Tract Gain and Output Sound from a Physical Model

A Manual of TransShiftMex

ENEE408G Multimedia Signal Processing

Quarterly Progress and Status Report. On certain irregularities of voiced-speech waveforms

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

A Comparative Study of Formant Frequencies Estimation Techniques

COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner. University of Rochester

Chapter 7. Frequency-Domain Representations 语音信号的频域表征

Improving Sound Quality by Bandwidth Extension

Transcription:

Speech Perception Map your vowel space. Record tokens of the 15 vowels of English. Using LPC and measurements on the waveform and spectrum, determine F0, F1, F2, F3, and F4 at 3 points in each token plus the voiced duration. Use Praat. 1. Recording. Record 3 tokens of each of the 15 vowels of American English in bvd or hvd context. These are: /bid/ /b d/ /bud/ /boid/ /bid/ /b d/ /bud/ /baid/ /bed/ /bod/ /baud/ /bed/ /bçd/ /bqd/ /bad/ 1

The sampling rate should be in the range of about 10 khz to 44.1 khz. They should be recorded in a sentence context: Yesterday Alice wrote to me. Try to say each sentence at the same speaking rate and with the same emphasis. See the word list at the end for assistance in pronouncing the vowels. 2. Analysis Generally, it is easier to spot the formants in a wide band spectrogram. For a male speaker, this is a window length of about.004 to.007 sec while for a female it is.002 to.004 sec. However, don t use the shortest if formants (e.g. F2 and F3 in /i/) appear to merge. a. Locations. The analysis in each token is to be done at three points in the voiced portion. Near the beginning, middle and near the end. Ideally, we want to make our measurements in 2

the vowel. Due to coarticulation, there is an influence of the initial /b/ and final /d/. Also, since both /b/ and /d/ are voiced, they can not be separated from the vowel. (If you are using hvd, the initial /h/ is voiceless and easier to separate from the vowel.) Determine the beginning and end of the voiced portion (after release and any aspiration of the /b/ to the closure of the /d/). Record this as the voiced duration. The first measurement point is 30 msec after the beginning (release for /b/ or end of aspiration for /h/). The second is half way from beginning to end of the voiced portion. The third is 40 msec before the end (/d/ closure) of the voiced portion. 3

b. F0. The fundamental can be determined either of two ways. Using the waveform, measure the distance (time in sec) from one vocal pulse peak to the corresponding peak in the next pulse. F0 is 1/time. Using a narrow band spectrum (.040 sec for males,.025 sec for females), measure the distance (frequency) between two adjacent peaks in the spectrum. These adjacent peaks are harmonics. The distance between them is F0. Note that the software that you are using (e.g. Praat) has a built in means for determining F0 that uses autocorrelation. Use one of the two methods above to cross check it. 4

c. F1, F2, F3, & F4. The formants will be measured using LPC. Praat will set the parameters of this for you. The tracked formants will show up as red dots in the spectrum. Note that this automated formant tracking does make mistakes. If a value is out of the possible range for one of the formants, check the spectrogram to find the correct value. There are some formant tracking settings in Praat. For males, look for 5 formants below 5000 Hz. For females, look for 4 formants below 4500 5000 Hz. This depends upon vocal tract length: Longer means more formants at lower frequencies. In some cases, you will not be able to find a formant. If, after using the tricks, you can not find a reasonable value, note it as nm or - (not measurable). 5

d. Tricks. Occasionally, you will have trouble finding one of the formants at a point in time. Try moving your analysis window left or right by a few msec or to a neighboring vocal pulse. Change the length (time) of the analysis window. Inspect the spectrum for a peak that the LPC is missing. In spite of your best efforts, this may fail. You may not be able to find a particular formant in a particular token at a particular point in time. The formant (vocal tract resonance) may fall between two harmonics. When two formants are close together, you may not be able to find both. 6

3. Reporting For each vowel: For each token, report the F0, F1, F2, F3, and F4 at each of the 3 locations. Then, determine the average (mean) F0, F1, F2, F3, and F4 for each of the 3 locations across the 3 tokens for each vowel. Also report the voiced duration for each token and the average voiced duration across the three tokens for each vowel. 7

Word list: /bid/ - bead /bid/ - bid /bed/ - bade (rhymes with maid) /bed/ - bed /bqd/ - bad /b d/ - bud /b d/ - bird /bud/ - boo d /bud/ - bood (rhymes with wood) /bod/ - bode (rhymes with road) /bçd/ - baud /bad/ - bod (rhymes with rod) /boid/ - boid (rhymes with void) /baid/ - bide (rhymes with hide) /baud/ - bowed 8