Binaural Hearing- Human Ability of Sound Source Localization

Similar documents
Sound Source Localization using HRTF database

Acoustics Research Institute

Computational Perception. Sound localization 2

Auditory Localization

A Virtual Audio Environment for Testing Dummy- Head HRTFs modeling Real Life Situations

Intensity Discrimination and Binaural Interaction

Computational Perception /785

A triangulation method for determining the perceptual center of the head for auditory stimuli

The analysis of multi-channel sound reproduction algorithms using HRTF data

Convention Paper Presented at the 139th Convention 2015 October 29 November 1 New York, USA

University of Huddersfield Repository

Convention Paper 9870 Presented at the 143 rd Convention 2017 October 18 21, New York, NY, USA

Enhancing 3D Audio Using Blind Bandwidth Extension

Sound source localization and its use in multimedia applications

PERSONALIZED HEAD RELATED TRANSFER FUNCTION MEASUREMENT AND VERIFICATION THROUGH SOUND LOCALIZATION RESOLUTION

Spatial Audio & The Vestibular System!

HRIR Customization in the Median Plane via Principal Components Analysis

Aalborg Universitet. Audibility of time switching in dynamic binaural synthesis Hoffmann, Pablo Francisco F.; Møller, Henrik

A binaural auditory model and applications to spatial sound evaluation

NEAR-FIELD VIRTUAL AUDIO DISPLAYS

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. 3D and Virtual Sound. Paris Smaragdis. paris.cs.illinois.

Robotic Spatial Sound Localization and Its 3-D Sound Human Interface

Spatial Audio Reproduction: Towards Individualized Binaural Sound

Convention Paper Presented at the 125th Convention 2008 October 2 5 San Francisco, CA, USA

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA. Why Ambisonics Does Work

TDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones and Source Counting

Psychoacoustic Cues in Room Size Perception

Listening with Headphones

Study on method of estimating direct arrival using monaural modulation sp. Author(s)Ando, Masaru; Morikawa, Daisuke; Uno

Analysis of Frontal Localization in Double Layered Loudspeaker Array System

Envelopment and Small Room Acoustics

Multiple Sound Sources Localization Using Energetic Analysis Method

Introduction. 1.1 Surround sound

Binaural Hearing. Reading: Yost Ch. 12

THE DEVELOPMENT OF A DESIGN TOOL FOR 5-SPEAKER SURROUND SOUND DECODERS

THE TEMPORAL and spectral structure of a sound signal

Ivan Tashev Microsoft Research

Audio Engineering Society. Convention Paper. Presented at the 131st Convention 2011 October New York, NY, USA

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

3D Sound Simulation over Headphones

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

The psychoacoustics of reverberation

Direction-Dependent Physical Modeling of Musical Instruments

Binaural hearing. Prof. Dan Tollin on the Hearing Throne, Oldenburg Hearing Garden

Audio Engineering Society. Convention Paper. Presented at the 124th Convention 2008 May Amsterdam, The Netherlands

Convention e-brief 400

PERSONAL 3D AUDIO SYSTEM WITH LOUDSPEAKERS

Reproduction of Surround Sound in Headphones

THE MATLAB IMPLEMENTATION OF BINAURAL PROCESSING MODEL SIMULATING LATERAL POSITION OF TONES WITH INTERAURAL TIME DIFFERENCES

MANY emerging applications require the ability to render

BINAURAL RECORDING SYSTEM AND SOUND MAP OF MALAGA

On distance dependence of pinna spectral patterns in head-related transfer functions

ORIENTATION IN SIMPLE VIRTUAL AUDITORY SPACE CREATED WITH MEASURED HRTF

Binaural Audio Project

Sound localization Sound localization in audio-based games for visually impaired children

Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model

Proceedings of Meetings on Acoustics

Virtual Acoustic Space as Assistive Technology

DISTANCE CODING AND PERFORMANCE OF THE MARK 5 AND ST350 SOUNDFIELD MICROPHONES AND THEIR SUITABILITY FOR AMBISONIC REPRODUCTION

Sound Radiation Characteristic of a Shakuhachi with different Playing Techniques

HRTF adaptation and pattern learning

ROOM AND CONCERT HALL ACOUSTICS MEASUREMENTS USING ARRAYS OF CAMERAS AND MICROPHONES

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

ROOM SHAPE AND SIZE ESTIMATION USING DIRECTIONAL IMPULSE RESPONSE MEASUREMENTS

University of Huddersfield Repository

Speech Compression. Application Scenarios

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015

Modeling Head-Related Transfer Functions Based on Pinna Anthropometry

SOUND 1 -- ACOUSTICS 1

III. Publication III. c 2005 Toni Hirvonen.

THE PERCEPTION OF ALL-PASS COMPONENTS IN TRANSFER FUNCTIONS

Sound Processing Technologies for Realistic Sensations in Teleworking

I R UNDERGRADUATE REPORT. Stereausis: A Binaural Processing Model. by Samuel Jiawei Ng Advisor: P.S. Krishnaprasad UG

3D Sound System with Horizontally Arranged Loudspeakers

3D sound image control by individualized parametric head-related transfer functions

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4

Spatial audio is a field that

396 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011

6-channel recording/reproduction system for 3-dimensional auralization of sound fields

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

Proceedings of Meetings on Acoustics

Improving room acoustics at low frequencies with multiple loudspeakers and time based room correction

Jason Schickler Boston University Hearing Research Center, Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York

NAME STUDENT # ELEC 484 Audio Signal Processing. Midterm Exam July Listening test

APPLICATIONS OF A DIGITAL AUDIO-SIGNAL PROCESSOR IN T.V. SETS

SOPA version 3. SOPA project. July 22, Principle Introduction Direction of propagation Speed of propagation...

Virtual Sound Source Positioning and Mixing in 5.1 Implementation on the Real-Time System Genesis

Proceedings of Meetings on Acoustics

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 VIRTUAL AUDIO REPRODUCED IN A HEADREST

4.5 Fractional Delay Operations with Allpass Filters

Measuring impulse responses containing complete spatial information ABSTRACT

Auditory Distance Perception. Yan-Chen Lu & Martin Cooke

FIR/Convolution. Visulalizing the convolution sum. Convolution

3D audio overview : from 2.0 to N.M (?)

Creating three dimensions in virtual auditory displays *

c 2014 Michael Friedman

Week 1. Signals & Systems for Speech & Hearing. Sound is a SIGNAL 3. You may find this course demanding! How to get through it:

IMPROVED COCKTAIL-PARTY PROCESSING

Transcription:

MEE09:07 Binaural Hearing- Human Ability of Sound Source Localization Parvaneh Parhizkari Master of Science in Electrical Engineering Blekinge Institute of Technology December 2008 Blekinge Institute of Technology School of Engineering Department of Signal Processing Supervisors: Dr. Nedelko Grbic Erik Loxbo Examiner: Dr. Nedelko Grbic Blekinge Tekniska Högskola SE 371 79 Karlskrona Tel.vx 0455-38 50 00 Fax 0455-38 50 57

Abstract The purpose of this project is to desig a systematical method in order to measure human directionality ability in horizontal plane with a single sound source. A completely virtual auditory model has been created in Matlab. The project consists of modeling binaural cues, designing digital filters, designing a test workbench, measuring listener's directionality and analyzing the data. The head related transfer function (HRTF) is computed by calculating the two most important binaural cues, interaural level difference (ILD) and interaural time difference (ITD). The platform is made in Matlab and all results have been shown by plots produced from Matlab code. The directionality test has been done with real human subjects and the results have been analyzed and presented. I

II

Table of Contents Page Abbreviation 1 Introduction 3 Background 5 1. Binaural Perception 7 1.1 Binaural cues 7 1.1.1 Interaural Time Differences 7 1.1.2 Interaural Level Differences 9 1.2 Head Related Transfer Function 10 1.3 Minimum Audible angle 13 1.4 Cone of Confusion 14 2. The Spherical head model 15 2.1 Modeling ITD 15 2.2 Modeling ILD 20 2.2.1 ILD Approximation in Spherical Head Model 21 2.3 The HRTF in SHM 23 3. The Virtual Auditory Model 25 3.1 Calculating ITD 26 3.1.1 Time Delay Filtering 27 3.1.2 The FD-MF all pass filter 30 3.2 Calculating ILD 31 3.3 The Generated HRTF 32 III

4. The Directionality Test Work Bench and Test Equipments 4.1 The GUI Interface 35 4.2 Test Requirements 37 4.2.1 The ASIO Sound Card 38 4.2.2 The Matlab Audio Processing Framework 38 4.2.3 The Calibration 39 4.2.4 The Test Environment 40 5. The Directionality Test and The Error Calculation 43 5.1 The Measurement Method 43 5.2 The Test Procedure 43 5.2.1 The Test Signals 44 5.2.2 The Subjects 45 5.3 The Experiment 46 5.3.1 Average Directionality Error 46 5.3.2 The Audiogram 49 5.4 Data Analysis 48 5.5 Improvement 50 -Conclusion 51 - Future Work 51 -Appendix A 53 -References 55 IV

Abbreviations ASIO: Audio Stream Input/ Output FD: Fractional Delay GUI: Graphic User Interface HRIR: Head Related Impulse Response HRTF: Head Related Transfer Function IID: Interaural Intensity Differences ILD: Interaural Level Differences IPD: Interaural Phase Differences ITD: Interaural Time Differences MAA: Minimum Audible Angle MF: Maximally Flat SHM: Spherical Head Model 1

2

Introduction Binaural hearing is human and other animal's ability to judge direction of a sound source. As long as man has lived on Earth he/she has been able to localize the sound source(s) by using two ears. Wide research has been done on binaural hearing in many advanced laboratories during last century. Many of them have worked with dummy heads and some of them have worked with humans. This thesis has focused on some of the recent researches and uses one of the existing models to determine a method for measuring human s directionality. The thesis scope is the horizontal plane and the binaural cues (ITD and ILD) have been simulated in azimuth. The "spherical head model" is one of the oldest and the easiest but the most powerful model that has been considered for creating the virtual auditory model. This thesis does not discuss about physiology of hearing and hearing organ. The investigated area is just between a sound source and entrance of pinna. 3

The assumptions are using a single sound source, working on horizontal plane in the front semicircle. We also suppose that 0 is at right ear, 180 is at left ear and 90 is in front of the head. The details of the work are discussed in following sections. In the background section there are some turnovers on recent researches. Binaural perception, binaural cues, head related transfer functions (HRTF) have been discussed in chapter 1. In chapter 2 the Spherical head model is been explained. The virtual auditory model and digital filter design and some calculations, have been put in section 3. The test workbench and the test equipment are presented in chapter 4 and chapter 5 consists of the binaural measurement and analysis of the results. 4

Background Lord Rayleigh (John William Strutt) found the localization process during 1877-1878. He noted that if a sound source is in ipsilateral ear, then the head makes a shadow cast in contralateral ear. Therefore, the signal in the contralateral ear is been more attenuated than the ipsilateral one. He also noted that different parameters affect on localization at low and high frequencies. His theory was named "Duplex theory" and it is valid to now, of course with some extensions. Many models of binaural processing were created over the last century. "Spherical head model" (Lord Rayleigh, 1907 and Woodworth/Schlosberg, 1954), direct Cross-correlation of the stimuli model (Sayers and Cherry, 1957), The binaural crosscorrelation model (Jeffress, 1956), direct comparison of the amount of the left-sided and right-sided internal response to stimuli model (Bergeijk, 1962), interaural comparison auditory nerve activity model( Colburn, 1973, 1977) and many other models were created [12]. Many other researchers studied other aspects of the binaural hearing such as multi channel sound sources, moving sound sources, noise reduction and so on. Spherical Head Model (SHM) that will be presented in this project is the first binaural model and it was born in the first of the last century. Rayleigh's SHM (1907) was so simple. Woodworth 5

and Schlosberg (1954) calculated binaural cues in polar coordinate system [5]. Joel David Miller (2001) modeled the spherical head in Cartesian coordinate system [10]. 6

1. Binaural Perception 1.1 Binaural cues There are two important binaural physical cues in the horizontal plane. These two cues are: 1. Interaural time differences (delays), ITD and 2. Interaural level (intensity) differences, ILD or IID. 1.1.1 Interaural Time Differences The difference in arrival times from a sound source in ipsilateral and contralateral ear is called ITD. ITD happens because sound waves arrive to one ear earlier than another one. ITD is the dominant cue at frequencies lower than 1500 Hz. The wavelengths of frequencies lower than about 1.5 KHz are comparable with the human head size. The minimum ITD is zero and the maximum perceptible ITD is about 600-800 µs. Figure 1.1 shows a simple single source spherical head model with head radius a and azimuth θ. In Rayleigh's spherical head model with a sound source at infinity, ITD has a simple explanation. He obtained the following formula for ITD: 7

Median Plane Horizontal plane Ipsilateral Ear θ θ θ a a lateralcontara Ear Figure 1.1- Rayleigh's spherical head model in horizontal plane a ITD ( sin) / 2 / 2 c (1) Here c is speed of sound (approximately 343 m/s) and θ is the angle between the line which has connected the sound source to the head center and the median plane in radian. With this formula the ITD is zero when the sound source is in front of the head and is 2.57a/ c, when the sound source is located at one of two ears at the sides. ITD is more sensitive in near-field (less than 1 meter source distance) than far-field. It is seen in the formula that ITD is frequency independent, but in some other binaural models it is dependent on frequency. The position of a sound source at distance dis shown in Figure 1.2. from the center of the head in a SHM has been 8

dis θ Ipsilateral Ear a Contaralateral Ear Vertical plane Figure 1.2- A sound source at distance dis from the center of the head in spherical head model in horizontal plane 1.1.2 Interaural Level Differences The difference in sound pressure levels or intensities in ipsilateral and contralateral ear is called ILD or IID respectively. ILD is a dominant cue at frequencies higher than about 1500 Hz but generally affects the contralateral signals of all frequencies. ILD happens because the head makes a shadow cast in contralateral ear. The ILD dependency to frequency is illustrated in Figure 1.3. ILD is nonlinear with frequency and is strongly dependent on frequency over audible spectrum because sound waves are scattered when the head diameter is larger than the wavelengths and diffraction increases rapidly with increasing frequency. 9

250 Hz Head Shadow 6 KHz Figure 1.3- The head-shadow effect at high frequencies and ILD dependency to frequency and position The smallest detectable ILD is about 0.5 db, regardless of frequency. The far-field ILD doesn't exceed 5-6 db whereas the near-field ILD, for example, at 500 Hz exceeds 15 db [2]. 1.2 Head Related Transfer Function The transformation of a sound signal from a sound source to a listener's ears is called Head Related Transfer Function (HRTF) or Anatomical Transfer Function (ATF). HRTF is a function that characterizes and captures the binaural cues for sound localization. HRTF is an individual function for every person and every sound source location. It depends on frequency and azimuth 10

in 2 dimensional space. Using non-individual HRTF has a high measuring error and it is not as accurate as the individual type. In another phrase HRTF describes the filtering of a sound source before being received by the ears. Far-field HRTF is attenuated inversely by the range whereas in near-field, the HRTF follows the ILD changes. X() H L () H R () X L () X R () Figure 1.4-The HRTF for left and right ear As shown in Figure 1.4 the signals that are received by two ears are: X L() H L(). X( ) XR() H R(). X() 11

HL() and H R() are the frequency responses of the transformations for left and right ears respectively. HRTF is a frequency domain expression of the head related impulse response (HRIR). By knowing the HRTF is always possible to create binaural signals from monaural sound sources. HRTF is usually measured in far-field. HRTF in free field is a very complicate function whereas it is a simpler function in a virtual auditory model. May1994) Figure 1.5- Head related impulse response of KEMAR dummy head (The MIT Media Lab, Some HRTF measurements with dummy heads have been done in laboratories such as "The CIPIC Interface Laboratory" and "The MIT Media Lab" [17]. By putting the sound source(s) in different 12

places in the laboratories and recording the results using microphones, a series HRIRs have been obtained. In this project and with SHM, the HRTF is neither completely individual nor non-individual. The simulated HRTF should be regarded as an average type. It means the people who have the same head radii use the same HRTF for synthesis of binaural signals. It can be called "Average HRTF". 1.3 Minimum Audible angle In 1958, Mills obtained the MAA (Minimum Audible angle) as a function of frequency and azimuth. 1.3 Binaural Models Figure 1.3- Set of points with the same ITDsnd ILDs Figure 1.6-The minimum audible angle versus frequency, Mills (1958) 13

As shown in Figure 1.6 the MAA, by using headphones, is about 1 degree when the sound source is in front of head at frequency range about 500-750 Hz. 1 degree MAA is proportional to smallest detectable ITD, about 10 µs. With increasing frequency the MMA increases. MAA is symmetric around model. 90 in the spherical head 1.4 Cone of Confusion Figure 1.7-Cone of confusion of azimuth The cone of confusion consists of the points that have identical ITDs and ILDs in 3D hearing space. Using only one of the cues for synthesis binaural signals in virtual auditory models causes the confusion. It doesn't usually happen in the nature because there are many other parameters for localizing the sound source(s) such as reflected sound waves from the environment and vision hearing. 14

2. The Spherical head model In the spherical head model (SHM) it is supposed that the head is as a sphere. All calculations of binaural cues are done under this assumption. The scattered audio signals by the head, torso, shoulders, outer ear and ear drum are ignored in binaural measurement with headphones. The SHM captures sound wave diffraction caused by the head and is a useful model for synthesizing binaural cues, ITD and ILD. The behavior of the ILD and ITD in SHM will be developed in this chapter. 2.1 Modeling ITD Figure 2.1 shows a spherical head model in horizontal plane. If the sound source is located in x, ) in Cartesian coordinates, ( ss yss the ITD has been solved by the following equations [10]: 15

Figure 2.1-The Spherical Head Model in horizontal plane (Joel D. Miller, 2001) L D 2 HR cos 1 ( HR/ D) AL AR L R 90 90 AL AR AZ AZ 2 DLA HR* * /180 L DRA HR* * /180 90 AZ x y ss ss D.sin( ) R D.cos( ) 16

DLD ( x ss HR) 2 y 2 ss DRD ( x ss HR) 2 y 2 ss If the sound source is at right side Otherwise If the sound source is at left side Otherwise DL L DLA DL DLD DR L DRA DR DRD ITD abs ( DL DR ) / c (2) Woodworth and Schlosberg (1954) calculated the ITD in polar coordinate system and Joel D. Miller (2001) has solved the ITD equations in Cartesian system [10]. From (2), the ITD is a symmetrical function at the two sides, ipsilateral and contralateral. In the SHM, the ITD is strongly dependent on head radius and weakly dependent on source distance at far-field, but is frequency independent. If the absolute value of the ITD is calculated, it will be a symmetrical function at the two sides, ipsilateral and contralateral. 17

Figure 2.2 shows the dependency of ITD on the head radius in SHM. The source distance is 2m and the azimuth varies between 0 and 180 degrees. Changing 1 cm in the head radius causes 80 µs in the ITD in 0 or 180 degree (two sides in front of the ears). Figure 2.3 also shows the dependency of ITD to the source distance shows the dependency of ITD to the in SHM. The head radius is 9 cm. As shown in these two Figure ures the ITD is more sensitive to hr and does not change so much with the source distance. Dependency of the ITD to source distance can be described in another phrase. Figure 2.4 shows dependency of the ITD to source distance with constant azimuth angles. 800 700 600 hr=.09 m hr=.08 m hr=.1 m 500 ITD(us) 400 300 200 100 0 0 20 40 60 80 100 120 140 160 180 Azimuth angle(degree) Figure 2.2-The ITD (µs) versus azimuth (degree) with distance source=2m and different head radii 18

700 600 dis= 5m dis= 2m dis= 1m 500 ITD(us) 400 300 200 100 0 0 20 40 60 80 100 120 140 160 180 Azimuth angle Figure 2.3-The ITD (µs) versus azimuth (degree) with head radius= 9 cm and different source distances. 800 700 az=90 600 az=75 ITD(us) 500 400 az=60 az=30 300 200 az=15 100 0 0.5 1 1.5 2 2.5 3 Source distance(m) Figure 2.4-The ITD (µs) versus the sound source distance (m) with different azimuth angles 19

2.2 Modeling ILD Figure 2.5 shows the frequency response of Rayleigh's spherical head model. It is seen in the figure that the most attenuation happens at the angles rises again at 180. 150 165 and the response That is a simple model and can fulfill ILD requirement. For the head size 9 cm, the normalized frequency μ=1 corresponds to a frequency about 607 Hz and μ=20 corresponds to frequency= 12131 Hz. The model behaves as a low pass filter for 100. Figure 2.5-The frequency response of the Rayleigh head model, μ= normalized frequency, a=head radius, c=speed of sound and f =frequency, θ=azimuth. The right ear is at θ= 0, and the left ear is at θ= 180 and θ= 90 is front of the head. 20

2.2.1 ILD Approximation in Spherical Head Model It is possible to model the ILD with a first order transfer function. A simple linear filter can provide a frequency response as given in Figure 2.5. It is needed to have a transfer function which changes with azimuth and frequency. One suggested transfer function is a single-pole, single-zero head shadow filter [1]: H (, ) 1 1 j 2 j 2 0 ( ) 2 (3) c. The normalized frequency corresponding to is a a 1. The coefficient is a function of θ and follows c this formula: ( ) 1 cos( ) or min ( ) (1 min ) (1 )cos( 180 ) 2 min (4) In the second row values 0. min 1 and min 150 give a good approximation of Figure 2.5 [1]. The model created from (3) and (4) is shown in Figure 2.6. In this model the response drops off with the angle 180. The case 21

0 is corresponding to maximum head shadow, and 2 creates 6dB amplification at high frequencies. The magnitude in Figure 2.5 generally increases with the frequency at the ipsilateral ear. It happens because at high frequencies the sound wave is reflected off the surface of the sphere back in the source direction. Amplitude response(db) 10 5 0-5 -10-15 -20 az=0 az=30 az=45 az=60 az=75 az=90 az=105 az=110 az=120 az=135 az=150 az=165-25 az=180 az=170-30 10-1 10 0 10 1 10 2 µ=2fa/c Figure 2.6-ILD approximation of spherical head model, μ=2πfa/c is normalized frequency with the source distance=2m. The reflected and directed sound waves combine to generate a 6-dB boost at the ear location. By moving the sound source to the front of the head, the gain decreases [ 2]. az 0 Corresponds to the location of ipsilateral ear and az 180 corresponds to the location of contralateral ear. 22

2.3 The HRTF in SHM The properties of the HRTF in the horizontal plane are described by the interaural cues in the spherical head model. The amplitude of the HRTF increases in ipsilateral ear as the sound source distance decreases and decreases in contralateral ear as the source distance increases. The HRTF increases in ipsilateral ear with frequency and decreases in contralateral ear with frequency. In near-filed (< 1m) the amplitude of the HRTF varies rapidly with distance, whereas it changes slowly in the far-field (>1m). The HRTF is obtained by convolving the ILD and ITD impulse responses. 23

24

3 The Virtual Auditory Model The modeled ITD and ILD in (2) and (3) is implemented in this section. Calculating the cues for a sound source with a specified distance in a given azimuth angle and head radius is done in Matlab. Two functions have been created in order to calculate the ITDs and ILDs. The block diagram of the SHM model used in this project can be shown in Figure 3.1. The diagram consists of three main blocks: applying ITD, applying ILD and sending out binaural signals. This project has been focused on the two first blocks. The third block is a Matlab Audio Processing (MAP) framework that is explained later in this chapter. The applied ITD and ILD to input digital audio signals have been done by digital filters. The time delays from the sound source to each ear and corresponding gains have been calculated in Matlab by two functions. The pseudo-codes of these two functions are given in Appendix A. 25

Monoural sound source Time delay to left ear Gain to left ear Playback to left ear L Time delay to right ear Gain to right ear Playback to right ear R Applying ITD Applying ILD Sending out binaural signals Figure 3.1- The block diagram of implementing SHM 3.1 Calculating ITD The formulas yielded to (2) are the equations that have been calculated with input arguments like source distance, head radius and azimuth angle. The source distance has a constant value in the auditory model and it is 2 m. The head radii and the angles are entered by the test operator who works with a test work bench that will be described later. The Matlab function which calculates the ITD returns the time arrivals from the sound source to both ears. In Table 3.1 it is shown some time delays and ITDs with different azimuth angles. The next step is applying these time delays to the input audio signal. 26

Azimuth Ipsilateral delay Contralateral ITD in SHM (degree) (ms) delay (ms) (ms) 0 2.6531 3.3394 0.6863 15 2.6629 3.2707 0.6078 30 2.6914 3.2020 0.4658 45 2.7362 3.1333 0.3971 60 2.7935 3.0647 0.2712 75 2.8588 2.9960 0.1372 90 2.9273 2.9273 0.0000 Table 3.1-The ITDs in SHM model with different azimuth angles, source distance= 1 m and head radius=9 cm 3.1.1 Time Delay Filtering As we have a virtual auditory model, input audio signal is a digital audio file. As we observed in section 2.1 the ITD is frequency independent in SHM. It means creating the time delays for a digital audio signal is not dependent on frequency. Therefore we need a digital filter which applies calculated delays for all frequencies. One of the best solutions for the goal was a fractional delay (FD) all pass filter. This filter can apply group delays in samples over 27

the whole audio spectrum. Among different types of FD filters, the maximally flat one could satisfy the requirements. A discrete time all-pass filter has a transfer function as below: A( z) z D( z ) a a... a z N 1 1 ( N1) N N1 1 1 ( N1 D( z) 1a z an z ) 1... 1 z z a N z N N (5) where N is the order of the filter and the filter coefficients a k ( k 1,2,..., N) are real. The coefficients a k can be designed for having a maximally flat group delay D with the following formula: a k ( 1) k N k N n0 D N n D N k n, k 0,1,2,..., N (6) where N k N! k!( N k )! specifies the k th binomial coefficient. The coefficient a 0 is always 1, so there is no need to normalize the coefficient vector [14]. 28

Thiran (1971) showed that if D N ; the roots of the denominator (poles) are within the unit circle in the complex plane. It means the filter is stable. The filter is also stable when N 1 D N. The poles are inside the unit circle and as the nominator is a mirrored version of the denominator, the zeroes are outside the unit circle. The angles of the zeroes and the poles are the same, but the radii are inverse of each others. For this reason the amplitude response of the filter is flat. It is possible to say: jn j j e D ( e ) A ( e ) j D ( e ) 1 44.8 FD-MF Allpass (Thiran) N=44 44.6 Phase delays in samples 44.4 44.2 44 43.8 43.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized frequency Figure 3.2- The group delays of N=44, Thiran FD-MF all pass filter 29

The group delays response of the Thiran all pass filter with the order number N=44 is shown in Figure 3.2. The group delays in samples are started at D N 0. 5 and stopped at D N 0. 5. The group delay response in Figure 3.2 makes a delay between 43.5 and 44.5 samples. With a 44100 Hz sample rate frequency it is possible to make a delay between 0.986 ms and 1.009 ms. 3.1.2 The FD-MF all pass filter For designing a filter with the transfer function in (5), we have to calculate the coefficients in (6). A Matlab code has been created to calculate the coefficients. The pseudo-code is given in Appendix A. The order of the filter depends on the needed time delay and the sampling rate, since the group delays are in samples. The order of the filter can be calculated as: N Time delay * sample rate (7) N has to be rounded to the nearest integer number. For instance to create a time delay of 2.6629 ms at the sampling rate 44100 Hz, the order of the filter is N=117. With this order of the filter we can make a delay over an audio signal which has been sampled at 44100Hz, between 116.5 and 117.5 samples. The accuracy of delaying depends on the numbers of the divided steps in this area. 30

The ITD in our SHM model is a symmetric function at right and left ear. For each area 0 90 and 90 180, there are 45 division equivalent to 46 tabs. Hence the delay at each area will be 11 µs. This accuracy has two advantages. The first advantage is avoiding built-in errors and the second is fulfilling the MAA. Of course we don't need this accuracy at all frequencies and all azimuth angles for achieving the MMA, but it may be useful for future work. 3.2 Calculating ILD As shown in (3), a one-pole one-zero transfer function with angle varying coefficient can satisfy the amplitude gains in the SHM. A Matlab code has created the amplitude response in Figure 2.6. The result is a vector that applies needed gain at every frequency. The transfer function is minimum phase by itself. But because of avoiding any change in time delays obtained in (5), only amplitude response has been used. The amplitude response is a vector for every azimuth angle and head radius in the far-field source distance. The pseudo-code of the function used in Matlab for realizing the transfer function in (3) is given in Appendix A. 31

3.3 The Generated HRTF As has been mentioned in section 1.2, the HRTF describes the filtering of a sound source before being received by the ears. The HRTF is achieved by cascading transfer functions given in (3) and (5). The HRTF is a unique function for every azimuth angle, head radius and source distance. It also varies with frequency because of the dependency on the ILD to frequency. 10 The HRTF at Ispilateral and Contralateral ear Ear, az=0 10 The HRTF at Ispilateral and Contralateral ear Ear, az=45 5 0 Ipsilateral ear 5 Ipsilateral ear Magnitude (db) -5-10 Magnitude (db) 0-5 -15-20 Contralateral ear -10 Contralateral ear -25 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Frequency ( rad/sample) -15 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Frequency ( rad/sample) The HRTF at Ispilateral and Contralateral ear Ear, az=75 4 2 The HRTF at Ispilateral and Contralateral ear Ear, az=90 2 1 Magnitude (db) 0-2 -4-6 Ipsilateral ear Contralateral ear Magnitude (db) 0-1 -2-3 -4-5 -8-6 -10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Frequency ( rad/sample) -7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Frequency ( rad/sample) Figure 3.3-The HRTF amplitude response in SHM, top and left: HRTF at az=0; top and right: HRTF at az=45; bottom and left: HRTF at az=75; bottom and right: HRTF at az=90. All HRTFs have been calculates at source distance 1 m and head radius 9 cm. 32

Convolving the impulse response of the FD-MF all-pass filter and the impulse response of the single-pole, single-zero head shadow filter produces the HRTF. As we mentioned before the achieved HRTF is an average type. Figure 3.3 shows 4 samples of HRTFs at 4 different azimuth angles in the SHM. The angles are 0, 45, 75, and 90 degrees. At az =0 degree there is more than 20 db level differences at the two ears. At az=90 degrees there is no level differences at the two ears. 33

34

4. The Directionality Test Work Bench and Test Equipment The next step after realizing the SHM is measuring the human's directionality. A Matlab graphic user interface (GUI) program has been created in order to measure human's directionality. Some equipment have also been used for the test. 4.1 The GUI Interface The program which is designed for human's directionality test is called "Azimuth Directionality Test" and it is a test work bench that helps the operator to play binaural signals for a listener. The workbench has been designed by GUI interface in Matlab. The operator makes a "New Test" for every new listener. Then he/she chooses a test method between two choices: 10 stages and 20 stages. The number of stages shows how many binaural signals have to be played for the listeners. 35

Figure 4.1- The "Azimuth Directionality Test" window 36

Entering the listener's head diameter is the next step. The operator has to measure the listener's head diameter before the test. As it was observed in above chapters all calculations are dependent the head radius. "Start test" starts the measuring with the first monaural digital audio signal at specified azimuth angle. All audio signals with pre-determined azimuth angles are in a database file. The binaural signals are played for the listener through the headphones by pressing the play button. If the pink noise checkbox is active, it means the binaural signal is in presence of noise. Finally it is possible to observe the plots of the HRTF that is corresponding the head radius and the azimuth angle. Figure 4.1 shows the appearance of the main window of the azimuth directionality test work bench. If the operator forgets entering any data, he/she receives an error message. 4.2 Test Requirements The directionality test needs some equipment. A computer, an ASIO compatible sound card and the driver software, Matlab software, calibration equipment, calculation, a test work bench and a test place. 37

4.2.1 The ASIO Sound Card The sound card used for sending out the binaural signals is "EDIROL UA-1EX". The sound card is an ASIO-compatible USB audio interface. It is possible to set the configuration working at 44100, 32000, 48000 and 96000 Hz sample rate. It has been designed to offer component-quality audio signals in and out of the computer. It has A/D and D/A converters. Figure 4.2 shows a UA- 1EX [15]. Figure 4.2- EDIROL UA-1EX 4.2.2 The Matlab Audio Processing Framework The framework used in the thesis, MAP (Matlab Audio Processing), is a tool for low latency real-time audio signal processing within the Matlab environment. It has been developed by the acoustic research group at Blekinge Institute of Technology. It consists of a thin layer between any Audio Stream Input/Output (ASIO) compatible sound card and user defined scripts in Matlab. 38

The frameworks presents sampled audio data from the sound card to the user in blocks, and the user processes the input signal to produce the output signal, which is presented to the sound card for playback. The framework is only limited by what the sound card in use can supply (for example the number of input and output channels, block size and sample rate). The framework allows development, evaluation and demonstration of algorithms in realtime within Matlab. 4.2.3 The Calibration Calibrating the headphones output has been done by the Acoustilyzer AL1. The Acoustilyzer is an audio test device with a wide range of acoustical measurement ability. The sound pressure level, speech intelligibility, and reverberation time are some of its functions [16]. Figure 4.3-The Acoustilyzer AL1 with the microphones Figure 4.3 shows an acoustilyzer AL1 with the related microphone. For calibrating the whole path from the binaural signal 39

generating to the headphone outputs, a 10 seconds, and 1 KHz pure tone has been generated. The audio signal had to be enough audible so the signal level at the output of every channel was 70 dba. The method was measuring the level of each channel output by the AL1 microphone when the signal was playing at 0 and 180, through a chamber. This calibration has to be repeated for every new test session and after changing every part of the test path (headphone, cables, sound card ). 4.2.4 The Test Environment The environment was an acoustic classroom at a music house with 2-layer windows and a 2-layer door. The room was sound isolated. There was also a control room next by the classroom. The listeners sat in the classroom and the equipment were in the control room. There was an interface box between the classroom and the control room. There was also an interface box between these two rooms that had been installed on the wall. We could communicate through the interface. The binaural signals have also been played for the listeners through the interface. The room was approximately dark during the measurement. Listeners sat in the middle of the classroom, on a chair and next to a table. 40

Figure 4.4- The test environment. Top and left: the interface through the wall, bottom and left: a listener with headphone, window between the classroom and the control room, right: a 2-layer door. 41

42

5. The Directionality Test and The Error Calculation The last step of the project was binaural hearing measurement. The judged angles have been compared with the target angles and the judgments error have been calculated. 5.1 The Measurement Method In order to measure human's directionality tools mentioned in chapter 4, a measurement method has been designed. Listeners who wanted to discern the sound source direction had a guide semicircle shown in Figure 5.1. The Figure has divided the front semicircle into 18 partitions. The angle 0 is at the right ear and 180 is at the left ear. The listeners guessed the direction of the sound was played through the headphones. 5.2 The Test Procedure The binaural hearing test consists of two main categories of the stage numbers. It is always possible to change these numbers of stages as default. It is been determined a specified audio signal at specified direction. The number of the stages is chosen from the "azimuth directionality test" window in section 4.1 by the operator. 43

Figure 5.1- A semicircle of horizontal plane with 18 partitions 5.2.1 The Test Signals The signals were combination of impulses, pure tones and speech, and in some stages in presence of pink noise. The impulses were 100 ms pure tones and the width of the pure tones was 2 seconds. The pink noise has been generated in Matlab and there were two uncorrelated random signals for two channels. The lowest frequency was 250 Hz and the highest was 16000 Hz. The sample rate frequency was 44100 Hz. 44

5.2.2 The Subjects The Binaural measuring test has been done in two different positions. In the first position 12 listeners participated. The population included 9 men and 3 women in the age from 21 to 62. Most of them were students in the age between 20 and 30 and and some of them had some experiences in mixing music. One of the subjects had changed his eardrum 14 years ago and he was 29 years old. One of the others was left handed and three of them wore eyeglasses. In the second test position 6 subjects participated. The population included 4 men and 2 women in age from 23 to 63. All subjects in both two experiments filled a form. The form consisted of some questions about their background ear problems and their profession. They also determined if they were left or right-handed and if they wore eyeglasses. In second position 7 listeners participated. The listeners included 2 women and 4 men in age from 21 and 63. 5.3 The Experiment 5.3.1 Average Directionality Error The first measuring was a simple test with only linear differences in levels at the two ears without any frequency dependency and without any time arrival differences. The results are shown in Table 5.1. 45

Subject Age Overall Average Error(degree) Impulses Error(degree) Continues Pure Tones Error(degree) Speech Error(degree) 1(m) 23 12.52 11.45 16.4 9.71 2(m)* 29 19.95 19.65 21 19.2 3(m) 28 9.95 11.65 10 8.2 4(m) 27 9.55 8.35 12.4 7.9 5(m) 25 13.7 12.55 15.4 13.15 6(m) 27 14.23 14.25 15.2 13.23 7(m) 26 10.15 9.95 12.3 8.2 8(f) 25 9.05 8.65 10.7 7.8 9(f) 57 15.45 14.45 17.5 14.4 10(f) 25 12.58 11.95 14.4 11.39 11(m) 30 18.35 17.55 21.5 16 12(m) 63 17.75 17.95 18.2 17.13 Average 32 13.60 13.2 15.42 12.19 Table 5.1-The results of the binaural hearing measurement with only simple differences in level without any frequency dependency and without calculating ITD, (m) =male and (f) =female. * This subject had changed his eardrum when he was 15 years old. The average errors and the errors in every category of signals have been calculated. 46

They heard some signals on top and behind the head through the headphones. The reason was illustrated in Figure 1.7. 180 160 140 Judged angle(degree) 120 100 80 60 40 20 0 0 20 40 60 80 100 120 140 160 180 Targer angle(degree) Figure 5.2-The judged angles versus Target angles in test position 1 for only pure impulses 180 160 140 Judged angle(degree) 120 100 80 60 40 20 0 0 20 40 60 80 100 120 140 160 180 Targer angle(degree) Figure 5.3-The judged angles versus Target angles in test position1 for speech signals 47

Figure 5.2 and Figure 5.3 show the impulse and speech signals error for 7 listeners. The test condition was different in the second position. Both ILD and ITD in chapter 3 were used to create a binaural signal. The results are illustrated in Table 5.2. Subject Age Overall Average Error(degree) Impulses Error(degree) Continues Pure Tones Error(degree) Speech Error(degree) 1(f) 43 12.65 12.1 14.65 11.2 2(f) 21 11.65 11.3 12.75 10.9 3(m) 50 11.25 11.45 12.55 9.75 4(m) 29 11.35 11.6 12.15 10.3 5(m) 63 15.7 15.15 17.75 14.2 6(m) 23 9.5 9.5 10.65 8.35 7(m) 28 10.15 9.95 11.35 9.15 Average 36 11.75 11.57 13.12 10.55 Table 5.2- The results of the binaural hearing measurement with applying ILD and ITD to input digital audio signal, (m) =male and (f) =female. The subject 5 and 6 participated in both tests. The results illustrate that they have lower error in second test position. 5.3.2 The Audiogram A hearing loss test was provided for each listener by the software "Home Audiometer" and the listeners' audiograms was 48

obtained. The audiograms helped to find the relation between hearing loss and directionality practically. A few of the listeners had some degrees of hearing loss at high frequencies. One of them (subject 6 in Table 5.1) had a strange audiogram and could hear all frequencies at average level -18 dba. 5.4 Data Analysis The analysis of the data shows that the directionality at low frequencies is more accurate than high frequencies. It is also seen that directionality at the speech signals is much more accurate than the impulses and the pure tones. The error of the impulses is near to the overall average error and the pure tones error is much worse than the overall average error. The lowest error at speech signals is reasonable. The speech frequency band is up to 4 KHz and it can be considerable as a low frequency signal. Another reason can be the silence intervals in the speech signals. Human gets new ITD information after every silence time distance. It means the brain is updated with differences in levels and time arrivals information during listening to a speech signal. The results also show that having a good audiogram is necessary for having a good directionality, but it is not enough. The subjects who are old have noticeable directionality error. Telecommunication software and multimedia lab, Helsinki 49

University of Technology, has achieved 9.7 degrees average azimuth error [6]. Our test results with some basic equipment are comparable with the results obtained from an advanced laboratory. 5.5 Improvement The test procedures can be improved by changing some parts of the test. It is possible to measure directionality with presence of different kinds of noise and or stimuli. This test can also be done in an acoustic or absorbance room. The method of judging the sound source direction by listeners can be changed. There were two kinds of the guidance semicircles in the two different tests. But there are still some problems that cause built-in errors. 50

- Conclusion The thesis has approached to a method for measuring human ability in directionality. The directionality test was done with real human subjects. All parts of the thesis consist of studying and investigating recent models, choosing a model, implementing the model, designing GUI interface and directionality test had a one goal and it was achieving a systematical method in order to measure human ability in directionality. - Future Work Except some improvements that have been mentioned in section 5.5, the thesis can be continued by adding some other abilities. One work can be simulating a virtual room for getting online results and plotting the error results at the same time. Another work is adding other models such as pinna model, room model and inner ear model to SHM and getting more accurate results. Elevation directionality can also be added to azimuth directionality. 51

52

Appendix A: Some important functions that have been used in the virtual auditory model are explained in this section. **************************************************************************** function [h phasdelay]=fd_mf_fb(delay, sample_rate); Function FD_MF_FB designs fractional delay maximally flat allpass filter. Inputs: -delay = the delay in second that you need to create for every input audio signal - sample rate = the sample rate frequency in Hz %% delay*sample_rate must not be more than 200 outputs: - h: filter impulse response - phasdelay : group delay in samples **************************************************************************** function [direct_gain lateral_gain]=ild(az, hr) Function ILD calculates the interaural level differences in horizontal plane Inputs: - az = azimuth angle in horizontal plane and front semicircle in degree az=0 at right ear, az= 180 at left ear and az=90 in front of head - hr = Head radius in meter Outputs: - direct_gain = The filter amplitude in ipsilateral ear and it is a vector - lateral_gain = The filter amplitude in contralateral ear and it is a vector 53

**************************************************************************** function [direct_delay lateral_delay]=itd(az, dis, hr) Function ITD calculates the time arrivals of an audio signal from a sound source to both ears.. Inputs: - az = The angle between median plane and the line between the sound source and the center of head(degree) - dis = The distance between the sound source and the center of head(m) - hr = Head radius (m)(consider the head as a sphere) Outputs: - direct_delay = Time arrival at ipsilateral ear in sec - lateral_delay = Time arrival at contralateral ear in sec 54

References: [1] Brown C. P. and Duda R. O., 1998 "A structural model for binaural sound synthesis", IEEE Transaction on Speech and Audio Processing, vol. 6, No 5. [2] Brungart D. S. and Rabinowitz W. M., 1999 "Auditory localization of nearby sources. Head related transfer functions ", Acoustical Society of America. [3] Cheng C. A., 2001 " Visualization, Measurement, And Interpolation Of Head-Related Transfer Functions (HRTF S) With Applications In Electro-Acoustic Music ", Michigan University. [4] Daniel J., 2003 " Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format ", AES 23 rd International Conference, Copenhagen. [5] Duda R. O. and Martens W. L., 1998 "Range dependence of the response of a spherical head model". [6] Grö hn, M., Lokki, T., Savioja, L., 2001 "Using binaural hearing for localization in multimodal virtual environments", 17th International Congress on Acoustics, Rome. [7] Hartman W. M., 1999 "How we localize sound", America institute of Physics. 55

[8] Hartman W. M., 1983 "Localization of sound in rooms", Department of Physics, Michigan State University. [9] Hasegava H. and Matsumoto S., 1999 "Binaural sound reproduction using head-related transfer functions (HRTFs) approximated by IIR filters", IEEE TENCON. [10] Miller J. D., 2001 "Modeling interaural time difference assuming a spherical head", Musical Acoustic, Stanford University. [11] Pulkki V., Karjalainen M. and Huopaniemi J., 1999 "Analyzing virtual sound source attributes using a binaural auditory model", Helsinki University of Technology, Laboratory of Acoustic and Audio Signal Processing. [12] Stern R. N. and Trahiotis C., 1995 "Models of binaural perception", The Conference of Binaural and Spatial Hearing. [13] Visle H. and Evangelista G., 2004 "Binaural source localization", Conference of Digital Audio Effects, Naples, October 5-8. [14] Välimäki V., 1994 "Simple design of fractional delay allpass filters", Helsinki University of Technology, Laboratory of Acoustic and Audio Signal Processing. 56

[15] http://www.rolandus.com/products/productlist.aspx?parentid=114 [16] http://www.nti-audio.com/ [17] http://interface.cipic.ucdavis.edu [18] http://www.pa.msu.edu/acoustics/loc.htm 57