SPEECH AND SPECTRAL ANALYSIS 1
Sound waves: production in general: acoustic interference vibration (carried by some propagation medium) variations in air pressure speech: actions of the articulatory organs -> vibrations propagation medium -> airstream Representation of fluctuations in air pressure caused by a vibrating tuning fork (from P. Ladefoged, Elements of acoustic phonetics). 2
Sound waves: perception A schematic diagramm of the mechanism of the ear (from P. Ladefoged, Elements of acoustic phonetics). 3
Distinctive features of sound waves Frequency measured in cycles per second (Hz): A sound wave whose frequency is 100 Hz has 100 cycles in a second. cycle: the distance between two peaks (C) or rests (B) in the movement of the wave (i.e. it describes how close together the two points are) period: Period is the time required to complete one cycle of vibration, e.g. if 20 cycles are completed in 1 second, the period is 1/20th of a second (s), or 0.05 s. Amplitude the maximum distance between the peak (C) and the trough (A) peak-to-peak a. Fundamental frequency (of a voiced speech sound): 1/fundamental period (i.e. the time required to complete one cycle of the pattern as a whole) the frequency of vocal folds vibration depending on the size of the vocal apparatus human voice produces sounds within the ranges: 80-220 male, 120-300 female, 200-500 children A wave of a 20 Hz frequency from Davenport & Hannahs, Introducing phonetics and phonology). 4
Simple and complex waves Two simple waves (pure tones, harmonics) of frequency 100 and 500 cps. The complex wave resulting from superposition of two simple waves of 100 and 500 cps (from P. Ladefoged, Elements of acoustic phonetics). 5
Distinctive features of sounds (1) Two sounds of the same duration (lenght) can differ with respect to: Pitch: subjective impression of the height of the sound related to fundamental frequency of the vibration which is an acoustic (objective) measure indicating the height of the sound two sounds of a different f. frequency (f 0 ) can be perceived as having the same pitch Loudness related to the amplitude of the sound: the higher the amplitudę, the louder the sound is perceived affected by the efficiency and distance of the propagating medium: the larger the distance, the less audible the sound becomes some materials, e.g. wood, are more efficient in carrying sounds than air 6
Distinctive features of sounds (2) quality (or colouring) results from differences in the shape of the propagation medium (hence differences in the perception of the same phoneme produced by different speakers, as well as differences in the vowel quality resulting from different shape of the vocal tract) and the material enclosing that medium (in case of musical instruments e.g. flute made of metal vs. wooden violin). Depending on the features (shape, size and material) of the propagation medium some harmonics of the sound will be emphasized and others will be damped. 7
Source-filter theory (1) speech production: a two stage process 1) the generation of a sound source 2) shaping/filtering of the sound source by the resonant properties of the vocal tract the input (source of sound): glottis or the supralaryngeal vocal tract the output: the lips or the nose (or both) The vocal tract filters the sound source. The vocal tract s acoustic response depends on its length & shape. 8
Source-filter theory (2) the effect of the vocal tract shape on the characteristics of the output sound: it determies whether there is a supralaryngeal sound source it determies the resonance frequencies (formant frequencies) of the vocal tract Examples of different types of source and vocal tract shape. 9
Source-filter theory (3) A resonator acts as a filter on the original source of sound: it rearranges the input energy so that frequencies that are at or near the resonance frequencies are amplified, at the expense of those frequencies that are not near the resonance frequencies (they become reduced). We can calculate the resonances given the length of the vocal tract (assume 17.5 cm for now) and the speed of sound (assume 35.000 cm/s): F1 = c/4l, where: c = the speed of sound and L = the length of the tube For example, for a 17.5 cm tube, F1 = c/4l = 35000/70 = 500 Hz. 10
Periodic and aperiodic waves complex waves can be: periodic: regularly repeating pattern each complete cycle, or period, is like the last one aperiodic: irregular no regularly repeating pattern, thus no clear cycles, or periods the type of the complex waveform is determined by the sound source (excitation source): periodic: when the vocal folds vibrate regularly aperiodic: every other sound source, laryngeal and supralaryngeal 11
Periodic sound source in speech 1. Regular vibration of the vocal folds produces many different frequencies in a single glottal cycle, which results in a complex periodic waveform -> a periodic (= regularly repeating) sound source. 2. All periodic speech sounds are phonated, i.e. phonetically voiced. The source of periodic sound is always in the larynx at the glottis. 3. The period is the duration of one cycle of the pattern of a periodic wave (one glottal cycle). 4. The fundamental frequency (f0) is the reciprocal of the period: 1/period. 5. The percept of pitch is closely related to f0. A higher pitch has a higher f0, and hence faster glottal pulses. (Periodic sounds have pitch; aperiodic sounds do not.) 12
Aperiodic sound sources in speech 1. Aperiodic sound source results in turbulence noise or implosion noise (random noise = many frequencies, but forming irregular patterns). The vocal folds do not vibrate: such sounds are phonetically voiceless. 2. The aperiodic source may be laryngeal (located at the glottis) or supralaryngeal (located higher in the vocal tract): when the glottis is narrowed enough to produce aperiodic noise (but too wide to let the vocal folds vibrate), the result is whisper, [h] (= a voiceless vowel) or breathy voice for other aperiodic speech sounds, the source of sound is at a constriction in the oral cavity that is narrow enough to cause air to rush through it. These supralaryngeal constrictions result in voiceless stops, fricatives and affricates, e.g. [f s t ʧ]. 13
Mixed voiced and aperiodic sound source Periodic and aperiodic sources can be generated simultaneously to produce mixed voiced and aperiodic speech typical of sounds such as voiced fricatives. 14
Acoustic representations of sounds: spectrogram, waveform, spectrum (1) waveform variations in the air pressure associated with speech sounds changes in amplitude through time pulses corresponding to the vibrations of the vocal folds Waveform of a Polish utterance: Ostatnie przygody Korowiowa i Behemota (male speaker). 15
Acoustic representations of sounds (2): waveforms What kind of information can we derive from a waveform? amplitude, F0, the manner of articulation (to some extent): vowels, approximants and nasals pulses (voicing), high amplitude and energy (vowels, approximants and in the end nasals) voiced obstruents (plosives, fricatives and affricates) pulses and low energy and amplitude (fricative segments, plosives) voiceless obstruents empty spaces in case of stops, aperiodic variation in the amplitude in case of fricatives and fricative component of an affricate 16
Acoustic representations of sounds (3): spectrograms spectrogram variation in the frequency domain over the time vertical lines -> pulsations of the vocal folds frequency domain: certain frequencies are emphasized (dark marks) -> formants The frequency of the formant depends on the size and shape of the vocal tract, so in a spectrographic analysis it provides information on the place and manner of articulation. Spectrogram of a Polish utterance: Ostatnie przygody Korowiowa i Behemota (male speaker). 17
Acoustic representations of sounds (4): spectrograms In the analysis of speech the first four formants are taken into account and they are marked as F1, F2, F3 and F4 (from the lowest to the highest on the frequency scale). F1 and F2 are the most important indicators of vowel quality, whereas the higher formants reflect speaker s characteristics (voice quality). In the flow of articulation changes in formant frequencies which occur when the setting of the vocal tract is changed from one sound to another are called transitions. Spectrograms: optimal for analysis of duration, F0 and phonetic features (e.g. aspiration), and identification of different speech sounds (-> formant frequencies, transitions and vocal folds pulsations) 18
Acoustic representations of sounds (5): spectra spectrum (pl. spectra) is static: it shows the amplitude of each frequency present in the sound, usually during a single short section of the signall e.g. 25 or 50 ms you can obtain a spectrogram by arranging together a series of spectra types of spectral analysis: Fourier analysis (fft [fast Fourier transform] or dft [discrete Fourier transform]) Linear Predictive Coding (lpc) harmonics each component frequency in a periodic wave: H1, H2 (=2 x H1), H3 (=3 x H1), etc. the frequency of the lowest harmonic (the first harmonic) is equivalent to the fundamental frequency of the voice-> f0 = H1 harmonics formants Dft (jagged line) and lpc (smooth line) spectra of [uː] in It s too much. 19