Quarterly Progress and Status Report. Mimicking and perception of synthetic vowels, part II

Similar documents
X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

Quarterly Progress and Status Report. Formant amplitude measurements

Quarterly Progress and Status Report. Synthesis of selected VCV-syllables in singing

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quarterly Progress and Status Report. The 51-channel spectrum analyzer - a status report

SPEECH AND SPECTRAL ANALYSIS

VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL

Grouping of vowel harmonics by frequency modulation: Absence of effects on phonemic categorization

Glottal source model selection for stationary singing-voice by low-band envelope matching

Synthesis Algorithms and Validation

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Quarterly Progress and Status Report. Speech synthesizer control by smoothed step functions

A perceptually and physiologically motivated voice source model

Combining Subjective and Objective Assessment of Loudspeaker Distortion Marian Liebig Wolfgang Klippel

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals.

Quarterly Progress and Status Report. A note on the vocal tract wall impedance

Speech Synthesis; Pitch Detection and Vocoders

EE 225D LECTURE ON SPEECH SYNTHESIS. University of California Berkeley

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

Quarterly Progress and Status Report. On certain irregularities of voiced-speech waveforms

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

EE482: Digital Signal Processing Applications

Steady state phonation is never perfectly steady. Phonation is characterized

Parameterization of the glottal source with the phase plane plot

Perceptual evaluation of voice source models a)

Quarterly Progress and Status Report. Acoustic properties of the Rothenberg mask

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Quarterly Progress and Status Report. Form factors for power spectra of vowel nuclei. II

ANALYSIS AND EVALUATION OF IRREGULARITY IN PITCH VIBRATO FOR STRING-INSTRUMENT TONES

Sound Synthesis Methods

Perceived Pitch of Synthesized Voice with Alternate Cycles

Acoustic Phonetics. How speech sounds are physically represented. Chapters 12 and 13

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Machine recognition of speech trained on data from New Jersey Labs

Reconceptualizing Presence: Differentiating Between Mode of Presence and Sense of Presence

Artistic Literacy for All!

Quarterly Progress and Status Report. Notes on the Rothenberg mask

HARMONIC INSTABILITY OF DIGITAL SOFT CLIPPING ALGORITHMS

Sensation and Perception. Sensation. Sensory Receptors. Sensation. General Properties of Sensory Systems

THE HUMANISATION OF STOCHASTIC PROCESSES FOR THE MODELLING OF F0 DRIFT IN SINGING

An introduction to physics of Sound

Perception. The process of organizing and interpreting information, enabling us to recognize meaningful objects and events.

Binaural Hearing. Reading: Yost Ch. 12

A Pilot Study: Introduction of Time-domain Segment to Intensity-based Perception Model of High-frequency Vibration

Converting Speaking Voice into Singing Voice

HCS 7367 Speech Perception

What is Sound? Part II

8A. ANALYSIS OF COMPLEX SOUNDS. Amplitude, loudness, and decibels

Foundations of Language Science and Technology. Acoustic Phonetics 1: Resonances and formants

An unnatural test of a natural model of pitch perception: The tritone paradox and spectral dominance

EE 225D LECTURE ON SYNTHETIC AUDIO. University of California Berkeley

Linguistic Phonetics. Spectral Analysis

Quarterly Progress and Status Report. Computing formant frequencies for VT configurations with abruptly changing area functions

A DEVICE FOR AUTOMATIC SPEECH RECOGNITION*

The Effect of Frequency Shifting on Audio-Tactile Conversion for Enriching Musical Experience

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Image to Sound Conversion

A Physiologically Produced Impulsive UWB signal: Speech

Advanced Methods for Glottal Wave Extraction

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

THE BEATING EQUALIZER AND ITS APPLICATION TO THE SYNTHESIS AND MODIFICATION OF PIANO TONES

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Transforming High-Effort Voices Into Breathy Voices Using Adaptive Pre-Emphasis Linear Prediction

Quarterly Progress and Status Report. An ionophone for acoustical measurements

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Introduction to cochlear implants Philipos C. Loizou Figure Captions

Questioning Strategies Questions and Answers

AMERICAN UNIVERSITY EAST CAMPUS DEVELOPMENT WASHINGTON, D.C. Environmental Noise Study. Project Number

Signals A Preliminary Discussion EE442 Analog & Digital Communication Systems Lecture 2

ThE JOURN.L OF TIIE ACOUSTICAL SOCIETY OF AMERICA XrOLIJME 35, NUMBER 4 APRIL Experiments Relating to the Perception of Formants

Vocal effort modification for singing synthesis

Speech Perception Speech Analysis Project. Record 3 tokens of each of the 15 vowels of American English in bvd or hvd context.

Ludwig Phase II Synthesizer Tech Overview

I. INTRODUCTION J. Acoust. Soc. Am. 110 (3), Pt. 1, Sep /2001/110(3)/1628/13/$ Acoustical Society of America

SOUND SOURCE RECOGNITION AND MODELING

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Communications Theory and Engineering

CSC475 Music Information Retrieval

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

Introductory Physics, High School Learning Standards for a Full First-Year Course

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Quarterly Progress and Status Report. Observations on the transient components of the piano tone

Chapter IV THEORY OF CELP CODING

The GlottHMM Entry for Blizzard Challenge 2011: Utilizing Source Unit Selection in HMM-Based Speech Synthesis for Improved Excitation Generation

Enhanced Waveform Interpolative Coding at 4 kbps

Dual Digital Shift Register

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

NOTES FOR THE SYLLABLE-SIGNAL SYNTHESIS METHOD: TIPW

STK 573 Metode Grafik untuk Analisis dan Penyajian Data

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Complex Sounds. Reading: Yost Ch. 4


ENSEMBLE String Synthesizer

Simple Figures and Perceptions in Depth (2): Stereo Capture

Enhancing 3D Audio Using Blind Bandwidth Extension

Transcription:

Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Mimicking and perception of synthetic vowels, part II Chistovich, L. and Fant, G. and de Serpa-Leitao, A. journal: STL-QPSR volume: 7 number: 3 year: 1966 pages: 001-003 http://www.speech.kth.se/qpsr

I. SPEECH PERCEPTION A. MIMICKING AND PERCEPTION OF SYNTHETIC VOWELS, art I1 L. Chistovich, G. Fant, and A, de Serpa-Leitao The following report pertains to a continuation of the work reported in the Speech Transmission Laboratory, QPSR No. 2/1966. Two sets of experiments have been made. The aim of the first experiment was to check the categorical nature of mimicking. In the second experiment an attempt was made to gain some insight in the decision rules used by subjects in vowel identification. The stimulus vowels were produced with the new miniaturized version of the manually controlled vowel synthesizer, OVE Ib, constructed by Johan Liljencrants (see Fig. I-A-I). A noise generator was used as an excitation source instead of the standard pulse generator for voiced sounds, The choice of a noise source was motivated by the desire to avoid interaction between responses to the formant pattern and to a harmonic pattern. Experiment I The function generator for deriving the F1 F2 signals was equipped with a mechanical linkage for selecting a prescribed path of variation, a "trajectory': in the F1 - F2 plane. The subject was instructed to move the control in small steps along a trajectory and to mimick the vowels produced by the synthesizer. The subject's response vowels were recorded on magnetic tape and afterwards presented to a group of two listeners. These evaluated each of the mimicked vowels with respect to identity with the previous vowel. By this method the number of different vowels mimicked by the subject in response to vowels sampled along a given trajectory was determined. Each of thc nine subjects fulfilled the mimicking experiment along fourteen selected trajectories. In 120 out of the 126 trajectory tracings the number of responses labelled different was less than the number of mimicked vowels. These results suggest that the separate members of a certain class of vowels evoked one and the same reaction within the mimicking subject.

Fig. I-A-I. The new portable OVE Ib with electronics unit (including power supply, formant circuits, voice source, output amplifier) and function generator for control of F1, F2, Fn and voice on/off.

The results of spectral analysis of F1 and F of the response 2 vowels support this conclusion, as seen in Fig. I-A-2, A, B, and C, where the trajectories are shown together with the measured F1 F2 response data. The listener group categorization of the response data is indicated by the parentheses in the figure heads. It is apparent that the responses are not distributed evenly along the stimulus tra- jectories. A number of steps along the trajectories seems to be ac- companied only by small and random changes in response parameters followed by occasional large jumps to new areas of rather limited variation. Ex~e riment 2 Another set of experiments was concerned with the boundaries between two adjacent vowel allophones in the F1 F2 function generator field. A number of trajectories passing through adjacent allophone areas was selected and the subject was instructed to generate scquen- ces of sounds along these pathways and to find points corresponding to a perceived shift from one vowel to the other within a pair. The manual control of F and F was arranged so that the subject could 1 2 not observe the particular position of the mechanical F1 F2 linkage. Only after a decision was made the subject could turn his attention to the setting and was asked to make a mark at the particular F1 F2 point. After 10-20 different pathways through the vowel pair had been investigated the subject was asked to draw a line through the boundary points. This boundary line was then calibrated by spectrographic measurements of vowels generated with the control unit moved through the line. The corrected data were redrawn together with the subject' s other boundaries on a F F diagram. In all 102 boundaries from four 1 2 subjects were determined in this way. Data on subject JM (Hungarian born Swedish citizen) are shown in Fig. I-A-3. It is seen that most of the boundaries are ordered in constant F or constant F and that 1 2 one and the same line often serves to differentiate two or three different vowel pairs.

(1)(2)(3)(4)(5)(6)(7)(8,9,10,11) A.S.L. Fig. I-A-2. a. F1 F2 extent of stimulus trajectories (broken lines) and spectrographic measurements of F1 and F2 of the subject's mimicking response (solid points). The parentheses at the top of each diagram enclose mimicking responses judged to belong to the same category (phonetic identity being the criteria). Noise source excitation.

Fig. I-A-2. b. See legend, Fig. I-A-2.a.

F2 kc/s f-2 kc/s A 1.6-1.6 - - ( 1,2,3,4,5 ) (6,7,8,9 ) 1.4-1.4 - L.Ch. - - (1 2,3,4,5/6 1 B.L. 1.2-1.2 - - 1.0-1.0 - - B-2 &8 0.8 - B -2 0.8 ' %, 1 - - l6 5 ' 4 0.6-0.6 - - - 0.4 - - 0.4-0.2-0.2 - - II 0 I I I I I I I I I 1 *F1 0 I I I I I I I I I 1, F1 0 0.2 0.4 0.6 0.8 1.0 kc/s 0 0.2 0.4 0.6 0.8 1.0 kc/s Fig. I-A-2. c. See legend, Fig. 14-2.a

Fig. I-A-3. Perceptual boundaries in the F1 F2 plane of synthetic vowels, subject J. M. The two parallel boundaries F1 = 300 c/s pertain to the same subject on two different occasions. This difference can be an instrumental arte - fact. Observe the tendency of boundaries ordered in constant F1 or F2 or constant F1 t F2.

Of the whole material of 102 boundaries 80 could be approximated by lines of constant F1 or FZ. This suggests that extremely simple rules employing critical boundary values of formant frequencies oper- ate in vowel perception. Such a principle conforms with the general idea of one and the same distinctive feature operating in several vowel pairs. Our limited data suggest that some of these critical boundaries are not much different in different languages. The pilot character of this study must be stressed. The material is limited and the results should be considered as preliminary only. The technique of data extraction could be speeded up if the mechanical control unit had a greater stability so that the spectrographic calibra- tion would be unnecessary. The stability requirement will be fulfilled in the new version of the OVE Ib function generator. au.r OVE I1 type computer controlled synthesizer which is under construction will allow an even more flexible and reliable tool for generation and re- cording of stimuli data including not only F and F but also other 1 2 synthesis parameters that need to be varied in an experiment.