Auditory Localization CMPT 468: Sound Localization Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University November 15, 2013 Auditory locatlization is the human perception of the placement of a sound source. A listener receives cues for sound location from the placement of the actual sound sources in an acoustic space such as a room or concert hall. Sound from a loudspeaker generally sounds like it is coming from a loudspeaker. If imagery is neglected in an audio production, the difference between a violinist in a room and a loudspeaker reproduction will not be captured. 1 CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 2 Reproducing Location Imagery Direction Sound source location can often be reproduced successfully using multichannel recordings (at minimum, stereo). Therefore, in addition to adding reverb, adding locality to your synthesis can add dimension to the sound. In defining the location of a sound source, the listener is typically trying to obtain direction and distance of the sound source. The direction is usually expressed in terms of angles: 1. azimuth angle φ which is measured in the horizontal plane passing through the center of the listener s head. Left Front 90 0 Right Front 270 Left Rear 180 Right Rear Figure 1: The azimuth determines the position of the apparent source in the four quadrants surrounding the listener. CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 3 CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 4
An azimuth angle of 0 is situated directly in front of the listener, whereas an angle of 180 is situated directly behind. 2. angle of elevation θ which is measured in a vertical plane bisecting the listener. The elevation angle θ ranges from 90 (directly below the listener) to +90 (directly above the listener). Primary Localization Cues The direction is usualy determined by differences, as received by the two ears, in 1. Time: interaural time difference (ITD) 2. Intensity: interaural intensity difference (IID) Differences in the spectrum can also offer clues since the filtering by the pinnae is directionally dependent. CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 5 CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 6 Interaural time difference (ITD) Interaural intensity difference (IID) The ITD is the delay that a listener perceives between the time that a sound reaches one ear and the time that it reaches the other. The ITD cues give information regarding the angular direction of a source source. If the source is directly in front or behind the listener, the sound will reach both ears at the same time and the ITD will be zero. As the angle changes so that the ITD exceeds 20µs, a difference in direction can be perceived up to a maximum ITD of 0.6ms. A typical listener can resolve the location of a sounds in front to about 2 and from behind to about 10. The interaural intensity difference (IID). When the sound source is not centered, the listener s head partially shadows the ear opposite to the source, diminishing the intensity of the sound in that ear (particularly at higher frequencies). The pinnae filters the sound in a way that is directionally dependent. This is particularly useful in determining if a sound comes from above, below, in front, or behind. CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 7 CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 8
ITD and IID Cues The degree to which these cues are effective is dependent on the frequency content of the sound. Both ITD and IID are ineffective at low frequencies (below 270 Hz), and thus the direction of such sounds is more difficult to determine. ITD Cues Interaural time difference is less precise behind a listener because the change in the ITD per degree in location change is much smaller. ITD cues are most effective between 270 and 500 Hz There is little contribution of ITD cues on sounds above 1400 Hz IID Cues CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 9 IID cues are very small for frequencies below 500 Hz IID cues contribute more for higher frequencies and are found to dominate at (and above) about 1400 Hz IID cues are less sensitive to sound sources behind the head. CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 10 Other localization cues IID and ITD are insufficient localization cues. When there is sufficient high-frequency content (above 4kHz or so), the pinnae filter the incoming sound with a frequency response that depends on the direction from which the sound reaches the ear. Pinnae filtering is most pronounced above 4 khz. Reverberation can also provide a localization cue but works best on impulsive sounds. Longer tones are more difficult because listeners estimate distance almost entirely during the attack portion of the sound (where ITDs are most effective). Reflections off the torso, shoulder also serve a cues. Mistakes in localizing the sound occur mostly at low frequencies. CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 11 CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 12
Head Related Transfer Functions The principle cues for judging distance To increase fidelity of localization, researchers have measured head-related transfer functions (HRTFs) over a wide range of incidence angles. HRTFS express the frequency response imparted to a sound by the pinnae for a particular angle. HRTFs are also dependent on distance, though little variation is observed for source locations more than 2m away (making this the common distance for HTRTF measurements). 1. Intensity of the sound this cue depends on listeners familiarity with the sound. When hearing a sound for the first time, intensity tends to be a more influential cue than when an already established sound is moving around the listening environment. The amplitude diminishes inversely with the distance (1/D). CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 13 CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 14 2. The ratio of reverberated to direct sound (R/D) The sound reaching the listener is comprised of direct and reflected sound; their relative intensities is often the dominant cue. When the source is close to the listener, the R/D ratio is low. As the distance increases the amount of reflected energy also increases. At very large distances, an audio horizon is reached, beyond which the source distance cannot be discerned. 3. Amount of high-frequency energy in the sound Attenuation of a sound wave propagating through the atmosphere is greater at high frequencies. Therefore at very long distances there is a sometimes perceivable absence of high-frequency components in the sound. CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 15 CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 16
Simulation of Directional Cues Positioning Loudspeakers Directional cues are provided either by using several loudspeakers around the performance space or by creating the illusion of direction and distance. Simulated ITD cues can be accurate only under very contrived conditions. Either a listener must be wearing headphones or is sitting in a known position (within 10 cm) with his/her head fixed. Precedence effect: A listener receiving the same sound from multiple sources localizes it as the closest source, not at a point between them, unless the separation in time is less than about a millisecond. If a listener is positioned equidistant from two loudspeakers, L and R, and receives equal signal in each ear, the illusion is created that the source is centered at position image 1 (I1). L I2 I1 Figure 2: Listener positioned equidistant from two loudspeakers. R Moving the loudspeaker R to the right is CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 17 CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 18 equivalent to slightly delaying the signal applied to it. Because the sound now reaches the left ear before the right one, it s as though the source location shifted to position I2. Cross Talk The sound that reaches an ear from the opposite loudspeaker is called cross-talk. It effectively limits the placement of auditory images to the area between the speakers. It is possible to compensate for cross-talk (and place the sound outside the speaker area) using filters but again, the listener must be positioned with great accuracy. CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 19 CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 20
Stereo Method for Simulating Localization INPUT X The direction of the sound may also be placed at I2 by increasing the intensity in the left speaker, that is, using only IID cues. When the amplitude of the left signal is increased, the image is displaced toward L at an angle that is determined by the ratio of the intensities of the signals delivered by the two loudspeakers. 1 LEFT RIGHT X=1 X=0 Figure 3: Stereo method for simulating localization cues. CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 21 CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 22 The location is specified by the parameter x, which has a value between 0 and 1. When x is 0, all the sound emanates from the right speaker. In placing the source between two loudspeakers, the power in the signal is allocated according to the value of x. That is, the power in the left speaker is given by x and the power in the right speaker is given by 1-x. The square root of x is taken because x represents power and is being applied to the amplitude. Quadraphonic Listening Space If the sound is located in the front quadrant (in between speakers LF and RF), the amplitudes of the signals at the loudspeakers will be apportioned by S LF = 1 2 ( 1+ tanθ tanθ max ), S RF = 1 2 ( 1 tanθ ) tanθ max CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 23 CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 24
Simulation of Distance Cues 0 LF RF θ max θ 90 270 LR RR 180 Figure 4: Geometry of a quadraphonic listening space. CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 25 To simulate distance cues, the intensity of the sound can be attenuated with distance as 1/D 2. As the distance increases, so does the ratio of reverberated to direct sound. That is, the direct sound drops off faster than the reverberated with an increase in distance. Reverberant characteristics vary depending on the environment, therefore there is no absolute ratio of direct to reverberated sound that corresponds to a specific distance. It is helpful therefore to specify a critical distance where the R/D = 1; In a model by John Chowning, the energy of the direct sound is attenuated by 1/D, whereas the reverberant energy is attenuated a 1/ D. CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 26 Global vs. Local Reverberation Interaural Coherence It is useful to apply both global and local reverberation to the sound. Global reverberation returns equally from all directions around the listener Local reverberation comes from the same direction as the direct signal and derives from reflectors relatively nearby the source. When the sound is located close to the listener, most of the reverberation is global. When the sound is located at a greater distance from the listener, most of the reverbertation is local. Each acoustic space has its own variety of reflectors causing the reverberant sound arriving at the listener to be different from each direction. Interaural coherence is a measurement that indicates similarity between the reverberation received by each of the two ears. A low interaural coherence generally results in a more pleasing sound and a greater feeling of immersion. Low interaural coherence can be implemented by giving each channel its own reverberator with slightly different parameters. CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 27 CMPT 468: Computer Music Theory and Sound Synthesis: Lecture 13 28