PSYCH 150 / LIN 155 UCI COGNITIVE SCIENCES syn lab Psychology of Language Prof. Jon Sprouse 01.10.13: The Mental Representation of Speech Sounds 1
A logical organization For clarity s sake, we ll organize our investigations around the direction of comprehension: sensory 1. 2. 3. sounds words sentences conceptual We ll also organize our investigation around the logical order of questions we saw last time: 1. 2. 3. What are the properties of the mental representations that must be constructed? What processes does the brain deploy to construct those mental representations? How do the physical parts of the brain implement those processes? 2
What is the mental representation of speech sounds? 3
Physical signals and Percepts When a person speaks, they emit a physical signal. You perceive this physical signal -- the mental object that you perceive is often called a percept. ah ah What are the physical properties of the signal? What is the mental representation of the percept? 4
The simplest hypothesis The simplest possible hypothesis is that the mental representation of speech sounds is the physical properties of the signal signal ah physical properties ah percept This suggests a very straightforward workflow: 1. Identify all of the speech sounds that make up a language (or all languages) 2. Identify the physical properties of the speech sounds 3. Use those physical properties as the definition of the mental representations 5
Identifying the speech sounds of a language (or all languages) 6
Identifying the set of speech sounds of a language (or all languages) Speech is a continuous stream of sound. Although there are breaks, they probably don t correspond with your intuition about what counts as speech sounds: What we need then is a principled way to identify the set of speech sounds in a language. 7
The phoneme approach The phoneme approach says that the units of speech are called phonemes, and are defined as follows: phoneme: The smallest segment of speech that leads to meaningful contrasts between words. In other words, two speech sounds are phonemes if they are the smallest change that you can make that results in a different word: p a t b a t = p and b are distinct phonemes s a t s a d = t and d are distinct phonemes 8
The phonemes of English If you follow this procedure, you will end up with the following set of phonemes: Note that these symbols are from the International Phonetic Alphabet The IPA was created because many of the symbols of the Roman alphabet are ambiguous. In the IPA, each symbol represents a single phoneme. 9
Identifying the physical properties of phonemes 10
The vocal tract: source and filter Much like a musical instrument, the vocal tract consists of a source of the sound and a filter that alters the properties of the source sound. source: reed filter: body filter: pharynx and oral cavity source: vocal folds 11
Basic properties of the source 12
The basics of sound: longitudinal waves Sound is a distortion in pressure that travels through a gas. Gas molecules fill whatever size space they are given. space 1 twice the volume, half the density You can push certain air molecules closer together for a time (increase the air pressure, called compression): compression 13
The basics of sound: longitudinal waves When you push air molecules closer together (compression), you also create a space behind the compression that is less dense called rarefaction. compression Compression and rarefaction are two opposing forces - compression causes rarefaction behind it. And because gas molecules want to equalize their density in a given space, compression of one set of molecules will cause a wave of compression to occur throughout the space as the compressed molecules try to get away from each other. 14
The basics of sound: longitudinal waves The best way to visualize the way that a compression wave travels through space is with a slinky: If you push one end of a stretched slinky, you can see the first set of coils compress, and watch the compression wave spread across the slinky as the coils try to equalize their density. www.youtube.com/watch?v=f66syh8b9d8 15
The basics of sound: visualizing waves To visualize the temporal properties of sound, we focus on a single point in space (i.e., there is no spatial information at all). We then draw the compressions and rarefactions that occur over time at that single point in space. So the x-axis represents time, and the y-axis represents the changes in pressure (measured in the force applied to the air) over time. crest = higher air pressure, coincides with compression normal pressure time trough = lower air pressure coincides with rarefaction 16
The basics of sound: amplitude Once we have a representation of a wave, we can describe its physical properties. Amplitude is a measure of the force applied to an area of air during compression and rarefcation. It is a way of measuring the energy that a wave has. Amplitude is represented by the height of the wave between the normal pressure line and a peak (or trough) 17
The basics of sound: frequency Frequency is a measure of the number of cycles that a wave completes in a given unit of time. A complete cycle consists of compression, rarefaction, and return to normal. 3 cycles per second 12 cycles per second Frequency is measured in Hertz (Hz): cycles per second. Humans can hear from 10Hz - 20,000Hz... but we lose upper frequencies as we age: http://www.noiseaddicts.com/2009/03/can-you-hear-this-hearing-test/ 18
The basics of sound: fundamental frequency The fundamental frequency (F0) is the frequency at which sound sources (oscillators) vibrate: Guitar strings: 196 Hz Human vocal folds: 246.9 Hz 329.6 Hz 146.8 110 82.4 For males it is around 130hz (C) For females it is around 220hz (A) 19
The basics of sound: harmonics In addition to the fundamental frequency (F0), vibrating sound sources (oscillators) also produce a series of additional frequencies called harmonics. Harmonics are always integer multiples of the F0: F0 100 Hz F0 200 Hz F0 400 Hz 2nd 200 Hz 2nd 400 Hz 2nd 800 Hz 3rd 300 Hz 3rd 600 Hz 3rd 1200 Hz 4th 400 Hz 4th 800 Hz 4th 1600 Hz 5th 500 Hz 5th 1000 Hz 5th 2000 Hz We call a tone that consists of a F0 and harmonics and complex tone. The physics behind the existence of harmonics is complex, so let s wait a bit before talking about it. 20
The basics of sound: wavelength Wavelength is a measure of the distance between identical locations in the cycle of a wave The wavelength of a sound is related to the frequency and velocity of the wave: wavelength = velocity / frequency 21
Are the properties of the source critical to the representation of phonemes? 22
Experiment 1: Amplitude and Phonemes Here is a simple experiment to determine if amplitude of the source is critical to the difference between speech sounds. Step 1: say ah Step 2: say ah with high amplitude Step 3: say ah with low amplitude Remember, amplitude is a measure of the size of the distortion -- it is the force applied to the air to cause the disturbance Question: Did varying the amplitude result in a different phoneme? (e.g., ee ) Alternative experiment: say ah and ee with the same amplitude... Conclusion: Varying the amplitude does not result in changes in the phonemes, only changes in volume, so amplitude is not critical to the representation of speech sounds. 23
Experiment 2: F0 and Phonemes Here is a simple experiment to determine if F0 of the source is critical to the difference between speech sounds. Step 1: say ah Step 2: say ah with high frequency Step 3: say ah with low frequency Remember, F0 is a measure of the number of cycles the wave completes in a given time. Question: Did varying the frequency result in a different phoneme? (e.g., ee ) Alternative experiment: say ah and ee with the same frequency... Conclusion: Varying the F0 does not result in changes in the phonemes, only changes in pitch, so frequency is not critical to the representation of speech sounds. 24
Experiment 3: Harmonics and Phonemes Here is a simple experiment to determine if harmonics of the source are critical to the difference between speech sounds. Step 1: say ah Step 2: say ah with high harmonics Step 3: say ah with low harmonics Remember, harmonics are integer multiples of F0. Problem: you can t vary harmonics independently. You can only vary the harmonics by varying F0 (which we already tried). Conclusion: Varying harmonics is the same as varying F0, which only changes the pitch, not the phonemes 25
Experiment 4: Wavelength and Phonemes Here is a simple experiment to determine if wavelength of the source is critical to the difference between speech sounds. Step 1: say ah Step 2: say ah with short wavelength Step 3: say ah with long wavelength Remember, wavelength is a measure of the distance between identical parts of the wave cycle Problem: you can t vary wavelength independently. You can vary frequency (which we already tried), or you can vary the velocity of the wave, but the latter is dependent on the medium that the wave travels through... Conclusion: Because we can t control the velocity of sound waves with our body (it is dependent on the gas they travel through), we can only control wavelength through frequency. And frequency is not critical to the representation of speech sounds. 26
So what is critical? The filter! Much like a musical instrument, the vocal tract consists of a source of the sound and a filter that alters the properties of the source sound. source: reed filter: body filter: pharynx and oral cavity source: vocal folds 27
So what is critical? The filter! ah filter: plastic tubes ee oral cavity pharynx duck call eh source: duck call oh http://www.exploratorium.edu/ exhibits/vocal_vowels/ vocal_vowels.html 28
To understand how the filter works, we need to learn more about Harmonics... 29
Reflection and Interference When a wave is reflected back on itself, the crests and troughs interact in a process known as interference. Constructive interference occurs when the crests align with other crests (and troughs with other troughs), which is also called being in phase. The two reflected waves are in phase, which means that crests and troughs align with other crests and troughs. This doubles the amplitude of the wave. 30
Reflection and Interference When a wave is reflected back on itself, the crests and troughs interact in a process known as interference. Destructive interference occurs when the crests align with troughs (and troughs with crests), which is also called being out of phase. The two reflected waves are out of phase, which means that the crests and troughs are aligning with each other. If two waves are perfectly out of phase, the amplitude will reduce to 0. 31
Reflection and Interference A wave traveling through a string of finite length will be reflected back after it hits the end. Crucially, the reflected wave will have the same speed and amplitude, but be completely out of phase with the original wave. 32
Reflection and Interference An oscillator is an object, like a string, that is continually vibrating. This means that waves are continually traveling down the string. If there is a wave traveling one direction on a string, and a reflected wave traveling the other direction on a string, some form of interference will occur. 33
Reflection and Interference The interference pattern of oscillators creates standing waves: even though the original wave and reflected wave are traveling, the resulting interference creates a wave that doesn t appear to be moving: The animation shown on this webpage is a much better illustration than this static picture: www.phys.unsw.edu.au/jw/strings.html 34
Reflection and Interference Standing waves are interesting because there are certain points along the string that define the boundaries of the standing waves: These are the points where the traveling waves are completely out of phase, resulting in zero amplitude. Crucially, these points will always be an integer fraction of the full length of the string. 35
Harmonics are the result of standing waves Harmonics are always integer multiples of the F0: F0: 200 Hz H2: 400 Hz H3: 600 Hz H4: 800 Hz H5: 1000 Hz 36
Vocal folds produce a broad spectrum of frequencies simultaneously Because of harmonics, the vocal folds produce a broad spectrum of frequencies simultaneously: source: vocal folds We can represent this with a graph like this: frequencies are on the x-axis and the amplitude of the frequencies is on the y-axis. amplitude 1000 2000 3000 frequency Harmonics are represented in a line spectrum graph by vertical lines (a spectrum of frequencies). Note that the amplitude of harmonics decreases as their frequency increases. The solid horizontal line is called the envelope of the spectrum. 37
The Fundamental is Special Recall that vibrating objects have a fundamental frequency and an associated set of harmonics that are integer multiples of the fundamental frequency: F0 H2 H3 H4 H5 H6 200 Hz 400 Hz 600 Hz 800 Hz 1000 Hz 1200 Hz We call a tone that contains multiple frequencies a complex tone, and a tone that contains a single frequency a simple tone. In the case of complex tones that have a harmonic structure, we perceive the pitch of the tone as being equal to the pitch of the fundamental frequency. We perceive the harmonics as overtones, which lead to a richer sound. Now, let s ask ourselves why we treat the fundamental differently than the harmonics (i.e., we perceive the pitch of the complex tone as equal to the fundamental, and not equal to the harmonics)? 38
Why is the fundamental special? Hypothesis 1: It is simply because the fundamental has the highest amplitude (i.e., the loudest) F0 H2 200 Hz 400 Hz 100 db 50 db This hypothesis makes an interesting prediction: H3 H4 H5 H6 600 Hz 800 Hz 1000 Hz 1200 Hz 25 db 12 db 6 db 3 db If the crucial property is amplitude, then taking away the fundamental should change the pitch of the tone: the pitch should now be the frequency of H2! Similarly, if we take away both the F0 and H2, the pitch should be based on H3. Here is a schematic of this test: each successive complex tone has the lowest frequency removed. 39
Why is the fundamental special? Surprisingly, removing the lowest tone in these complexes does not change the pitch that we perceive. How can this be? The answer seems to be that the brain restores the missing fundamental from a complex tone if that tone appears to have harmonic structure. If this is just an illusion, it isn t very helpful. But if the auditory cortex can actually reconstruct the fundamental from the harmonics, then it tells us something about the abilities of the auditory cortex: 1. The auditory cortex may be able to perform calculations on the incoming signal in order to create new information that is not transparently available in the signal. So we can go beyond the simplest hypothesis for the representation of speech sounds! 2. The auditory cortex may be able to do some sort of mathematical factoring (or perhaps division) to figure out the common denominator in the tone complexes. 40
Telephone companies are cheap! Telephones only transmit a narrow band of frequencies 300Hz-4000Hz: This is partly because small speakers can t reproduce low frequencies well. But telephone companies seized on this limitation as a way to save money on data transfer (both landlines and cell networks). Recall that the F0 of the human voice is below 300Hz: the male average is 130Hz, the female average is 220Hz. This means that telephones do not transmit the F0 of our voices! The reason that we can discriminate the gender of the people we are talking to is because our brains can restore the fundamental from the harmonics! 41