Audio Engineering Society. Convention Paper. Presented at the 131st Convention 2011 October New York, NY, USA

Similar documents
Spatial Audio Reproduction: Towards Individualized Binaural Sound

Enhancing 3D Audio Using Blind Bandwidth Extension

HRTF adaptation and pattern learning

Audio Engineering Society. Convention Paper. Presented at the 131st Convention 2011 October New York, NY, USA

Evaluating HRTF Similarity through Subjective Assessments: Factors that can Affect Judgment

Listening with Headphones

HRIR Customization in the Median Plane via Principal Components Analysis

BINAURAL RECORDING SYSTEM AND SOUND MAP OF MALAGA

A Virtual Audio Environment for Testing Dummy- Head HRTFs modeling Real Life Situations

Auditory Localization

Binaural Hearing. Reading: Yost Ch. 12

Convention Paper 9870 Presented at the 143 rd Convention 2017 October 18 21, New York, NY, USA

Convention Paper Presented at the 139th Convention 2015 October 29 November 1 New York, USA

University of Huddersfield Repository

Convention Paper Presented at the 144 th Convention 2018 May 23 26, Milan, Italy

Psychoacoustic Cues in Room Size Perception

A triangulation method for determining the perceptual center of the head for auditory stimuli

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

Proceedings of Meetings on Acoustics

Sound source localization and its use in multimedia applications

Convention Paper 9712 Presented at the 142 nd Convention 2017 May 20 23, Berlin, Germany

Acoustics Research Institute

Ivan Tashev Microsoft Research

Creating three dimensions in virtual auditory displays *

Externalization in binaural synthesis: effects of recording environment and measurement procedure

PERSONALIZED HEAD RELATED TRANSFER FUNCTION MEASUREMENT AND VERIFICATION THROUGH SOUND LOCALIZATION RESOLUTION

Sound Source Localization using HRTF database

Convention Paper Presented at the 125th Convention 2008 October 2 5 San Francisco, CA, USA

ORIENTATION IN SIMPLE VIRTUAL AUDITORY SPACE CREATED WITH MEASURED HRTF

PAPER Enhanced Vertical Perception through Head-Related Impulse Response Customization Based on Pinna Response Tuning in the Median Plane

Convention e-brief 433

Spatial Audio & The Vestibular System!

Proceedings of Meetings on Acoustics

Envelopment and Small Room Acoustics

THE INTERACTION BETWEEN HEAD-TRACKER LATENCY, SOURCE DURATION, AND RESPONSE TIME IN THE LOCALIZATION OF VIRTUAL SOUND SOURCES

Analysis of Frontal Localization in Double Layered Loudspeaker Array System

Proceedings of Meetings on Acoustics

Upper hemisphere sound localization using head-related transfer functions in the median plane and interaural differences

396 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011

The relation between perceived apparent source width and interaural cross-correlation in sound reproduction spaces with low reverberation

Computational Perception. Sound localization 2

A binaural auditory model and applications to spatial sound evaluation

3D Sound Simulation over Headphones

Modeling Head-Related Transfer Functions Based on Pinna Anthropometry

WAVELET-BASED SPECTRAL SMOOTHING FOR HEAD-RELATED TRANSFER FUNCTION FILTER DESIGN

ANALYZING NOTCH PATTERNS OF HEAD RELATED TRANSFER FUNCTIONS IN CIPIC AND SYMARE DATABASES. M. Shahnawaz, L. Bianchi, A. Sarti, S.

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

Capturing 360 Audio Using an Equal Segment Microphone Array (ESMA)

Master MVA Analyse des signaux Audiofréquences Audio Signal Analysis, Indexing and Transformation

The analysis of multi-channel sound reproduction algorithms using HRTF data

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O.

Computational Perception /785

PERSONAL 3D AUDIO SYSTEM WITH LOUDSPEAKERS

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA. Why Ambisonics Does Work

THE TEMPORAL and spectral structure of a sound signal

NEAR-FIELD VIRTUAL AUDIO DISPLAYS

3D sound image control by individualized parametric head-related transfer functions

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. 3D and Virtual Sound. Paris Smaragdis. paris.cs.illinois.

Circumaural transducer arrays for binaural synthesis

Extracting the frequencies of the pinna spectral notches in measured head related impulse responses

c 2014 Michael Friedman

III. Publication III. c 2005 Toni Hirvonen.

University of Huddersfield Repository

Personalized 3D sound rendering for content creation, delivery, and presentation

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

Comparison of binaural microphones for externalization of sounds

The psychoacoustics of reverberation

Spatial audio is a field that

Approaching Static Binaural Mixing with AMBEO Orbit

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 VIRTUAL AUDIO REPRODUCED IN A HEADREST

ANALYSIS AND EVALUATION OF IRREGULARITY IN PITCH VIBRATO FOR STRING-INSTRUMENT TONES

Perceptual effects of visual images on out-of-head localization of sounds produced by binaural recording and reproduction.

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

On distance dependence of pinna spectral patterns in head-related transfer functions

THE DEVELOPMENT OF A DESIGN TOOL FOR 5-SPEAKER SURROUND SOUND DECODERS

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model

6-channel recording/reproduction system for 3-dimensional auralization of sound fields

Aalborg Universitet. Audibility of time switching in dynamic binaural synthesis Hoffmann, Pablo Francisco F.; Møller, Henrik

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 A MODEL OF THE HEAD-RELATED TRANSFER FUNCTION BASED ON SPECTRAL CUES

Audio Engineering Society. Convention Paper. Presented at the 124th Convention 2008 May Amsterdam, The Netherlands

Improved Head Related Transfer Function Generation and Testing for Acoustic Virtual Reality Development

From Binaural Technology to Virtual Reality

Tara J. Martin Boston University Hearing Research Center, 677 Beacon Street, Boston, Massachusetts 02215

Convention Paper 7024 Presented at the 122th Convention 2007 May 5 8 Vienna, Austria

Speech Compression. Application Scenarios

DECORRELATION TECHNIQUES FOR THE RENDERING OF APPARENT SOUND SOURCE WIDTH IN 3D AUDIO DISPLAYS. Guillaume Potard, Ian Burnett

Influence of artificial mouth s directivity in determining Speech Transmission Index

Convention Paper 7057

3D AUDIO AR/VR CAPTURE AND REPRODUCTION SETUP FOR AURALIZATION OF SOUNDSCAPES

Multichannel Audio Technologies. More on Surround Sound Microphone Techniques:

Distortion products and the perceived pitch of harmonic complex tones

Virtual Reality Presentation of Loudspeaker Stereo Recordings

Improving room acoustics at low frequencies with multiple loudspeakers and time based room correction

Dataset of head-related transfer functions measured with a circular loudspeaker array

Introduction. 1.1 Surround sound

Personalization of head-related transfer functions in the median plane based on the anthropometry of the listener s pinnae a)

SIMULATION OF SMALL HEAD-MOVEMENTS ON A VIRTUAL AUDIO DISPLAY USING HEADPHONE PLAYBACK AND HRTF SYNTHESIS. György Wersényi

Convention e-brief 400

Transcription:

Audio Engineering Society Convention Paper Presented at the 131st Convention 2011 October 20 23 New York, NY, USA This Convention paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed by at least two qualified anonymous reviewers. The complete manuscript was not peer reviewed. This convention paper has been reproduced from the author's advance manuscript without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. The Effects of Headphones on Listener HRTF Preference Braxton Boren 1, Agnieszka Roginska 2 Music and Audio Research Lab, New York University, New York, NY, 10012, USA bbb259@nyu.edu 1, roginska@nyu.edu 2 ABSTRACT Listener-selected HRTFs have the potential to provide the accuracy of an individualized HRTF without the time and resources required for HRTF measurements. This study tests listeners HRTF preference for three different sets of headphones. HRTF datasets heard over the noise-cancelling Bose Aviation headset were selected as having good externalization more often than those heard over Sennheiser HD650 open headphones or Sony MDR-7506 closed headphones. It is thought that the Bose headset s frequency response is responsible for its superior externalization. This suggests that in systems where high quality headphones are not available, post-processing equalization should be applied to account for the effect of the headphones on HRTF reproduction. 1. INTRODUCTION The application of spatial audio techniques to virtual reality, augmented reality, and auditory displays has increased the desirability of understanding how the human auditory system localizes sound sources in threedimensional space. While a physical acoustic perspective might focus on merely reproducing the sound field accurately through a technique such as wavefield synthesis, spaces with finite numbers of loudspeakers will be very limited in their ability to synthesize an entire sound field. For most reproduction situations, it is simpler and more effective to focus on the psychoacoustic perspective by reproducing the headrelated transfer function (HRTF). A listener s HRTF encompasses the three spatial cues that the auditory system processes: interaural time difference (ITD), interaural intensity difference (IID), and the spectral filtering of incident sounds due to the physical characteristics of the torso, head, and pinnae. The two interaural cues give the strongest localization information along the horizontal plane, and at many frequency bands these cues can be well predicted based on simple head measurements [1]. HRTFs spectral

component is essential for elevation perception [9] and for differentiating between source positions along the cone of confusion, which create identical interaural intensity and time differences. In addition, in headphone reproduction situations good spectral cues help a listener externalize sound sources, which may otherwise appear to be located inside the listener s own head [2]. Since the filtering effects of HRTFs are extremely sensitive to slight differences in body shape, everyone s HRTF is unique. As such, the best virtual sound spatialization is achieved using individually measured HRTFs [4]. Measuring each person s individualized HRTF is not feasible for the broader population, as the measurement process can last over an hour and requires specialized equipment and facilities. Although some listeners obtain good localization cues from someone else s HRTF, in many cases this leads to distortions in elevation perception and reversals of virtual sources from the front to the back, and vice-versa [10]. To disseminate 3D audio technology to the broader public, HRTFs need to be quickly available without a cumbersome measurement process while still yielding convincing spatial effects. Current attempts to correlate the spectral components of HRTFs to measured anthropometric data currently yield very low localization accuracy [3]. In many cases, however, absolute virtual source localization is less important than obtaining a good spatial impression of an immersive environment. Because of this, subjective listener preference can be an important tool for finding effective HRTFs for widespread use. While a preference-based system may not provide exactly the same localization cues as a listener s own individual HRTF, it could provide a fast and effective means to avoid a complete collapse of the spatial image along any of the crucial criteria of good HRTF reproduction, such as externalization, elevation perception, and front/back discrimination. Since many such systems will be implemented over headphones, it is also necessary to study the effects of different headphones on a listener s spatial sound perception. Such an approach is adopted in this study, focusing only on the listener s stated preference for a given HRTF based on these same three criteria. 3D audio systems based on listener selection could take mere seconds to calibrate, as opposed to measuring as many as 27 distinct anthropometric values for the torso, head, and pinnae [1]. 2. BACKGROUND The concept of user-selected HRTF preference is fairly recent, and has been examined in two main studies [5, 6]. The first study investigated the existence of generic categories of HRTFs that were preferred by different subpopulations of listeners. Anecdotal evidence from audio professionals had hypothesized the existence of such broad categories of HRTF preference, but no prior work had investigated the phenomenon rigorously. The study tested ten listeners preference for HRTF datasets from the CIPIC and IRCAM databases as well each individual s own HRTF and the HRTF of the KEMAR mannequin. HRTFs were used to spatialize an infrapitch signal of repeated pink noise. The criteria tested were externalization, elevation, and front/back discrimination. While most participants preferred the CIPIC HRTFs to those from IRCAM, two of the subjects consistently preferred the datasets from IRCAM and never selected those from the CIPIC database. Subjects usually but not always selected their own individualized HRTF for the specified criteria. In addition, some HRTF datasets were selected nearly as often as the listener s own individualized HRTF. The study concluded that there may be different types of listeners, with preferences for distinct groups of HRTF datasets [5]. The second study investigated the role of the source signal on listener preference for a larger amount of subjects (n=20). The signals used were infrapitch, speech, and music. The study narrowed the HRTF datasets used to ten from the previous study that had the most spectral contrast between front and back positions. The study found that the infrapitch signal yielded the highest rates of listener preference, presumably because it had the widest frequency spectrum. Interestingly, the doubled number of subjects also yielded a doubled subpopulation (4 subjects) that preferred the IRCAM HRTF datasets [6]. 3. METHODS 3.1. Overview An experiment was designed to test whether the phenomenon observed in the prior studies was dependent on the headphones used to present the stimuli. Subjects were instructed to select all HRTF datasets that fulfilled the following criteria: Externalization: the source appeared to be located outside the listener s head. Page 2 of 8

Elevation: the source could be perceived at a high and low elevation, given a fixed azimuth. Front/back discrimination: the source could be perceived at distinct front and back positions along the cone of confusion. A Matlab graphical user interface was used to administer the listening test (Figure 1). The same ten HRTF datasets with high spectral contrast for front and back locations from the previous study were used for each listener. Since individualized ITDs were not available for each listener, the ITDs from the KEMAR HRTF dataset were applied to the HRTFs, since KEMAR s features represent average human head size and therefore average ITD. Figure 2 Frequency response of Sennheiser HD650s Figure 1 HRTF preference GUI used during testing 3.2. Stimulus The stimulus used in all trials consisted of an infrapitch signal consisting of a sample of pink noise repeated at a rate of 5 Hz for 500 milliseconds [8]. This was found to be the most effective stimulus in the previous study, so other stimulus signals were not tested in this study. Subjects listened to the stimuli in the Spatial Audio Research Lab at New York University, a hemi-anechoic acoustical space. The stimuli were heard over three sets of headphones: Sennheiser HD650 open headphones, which were used in the previous two studies, Sony MDR-7506 closed headphones, and a Bose Aviation headset, with noise-cancelling enabled. The Sony and Bose headphones were selected as low-end and highend closed headphones respectively, in contrast to the Sennheiser open headphones for which previous data were available. Frequency responses were measured for all three sets of headphones on a Neumann KU100 dummy head using a 2-second swept sinusoid from 20 Hz to 20 khz (Figures 2-4). Figure 3 Frequency response of Sony MDR-7506s Figure 4 Frequency response of Bose Aviation headset Page 3 of 8

No equalization was applied to the dummy head measurements, but the relative differential changes between the headsets are informative. The Bose headset shows the flattest spectrum from 100 Hz to 10 khz. The Sennheiser headphones bass roll-off effectively produces a peak at 1.5 khz, followed by a gradual rolloff with increasing frequency and a more severe notch around 10 khz. The Sony headphones have a moderate peak at 6 khz as well as a sharper peak at 10kHz. 3.3. HRTFs Ten HRTF datasets were selected from the ten most selected datasets used in an earlier experiment [5]. Three of these were taken from the database obtained at IRCAM and AKG from the LISTEN project [10], and seven were taken from the CIPIC databases measured at UC Davis [1]. Table 1 shows the list of HRTFs used during the experiment. HRTF ID HRTF Database Subject Number 1 LISTEN 1014 2 LISTEN 1022 3 LISTEN 1028 4 CIPIC 12 5 CIPIC 15 6 CIPIC 27 7 CIPIC 58 8 CIPIC 119 9 CIPIC 131 10 CIPIC 154 Table 1 3.4. Participants HRTF datasets used in experiment Fifteen volunteers seven female, eight male participated in the experiment. Most were students or faculty in the Music Technology department at New York University. All subjects had normal hearing. Subjects rated their level of musical experience from 1 to 5, with 5 representing the level of a professional musician. The mean reported musical experience was 4.0. The mean age of the subjects was 26.9 years. 3.5. Procedure Over a series of trials, the listener was presented with five randomly selected HRTFs convolved with the stimulus signal. Each HRTF s button was highlighted as the corresponding dataset was played. Subjects heard all five datasets in order, and then had the option to hear them again as many times as needed. Subjects selected datasets by using checkboxes under each HRTF s button. They had the option to pick any, all, or none of the datasets presented in each trial. Subjects heard the same dataset in three separate trials for each criterion. A dataset was considered to be selected if it was chosen in at least two of its three trials. Only datasets selected for each criterion were passed forward to the next stage. For the first criterion, externalization, subjects first heard a monophonic infrapitch signal constructed using the HRTF for 0 azimuth and 0 elevation. After this reference signal, the subject heard five processed stimuli, all spatialized using a single HRTF at a random position on the horizontal plane at one of the following azimuths: ±150, ±120, ±90, ±60, and ±30. This sequence of the monophonic reference signal followed by five spatialized stimuli was repeated for all five HRTF datasets within a single trial. The sequence of source positions was identical for all datasets within the same trial. During testing for the second criterion, elevation, subjects were presented with five pairs of processed infrapitch signals. Each pair was spatialized at a random azimuth from the same values as during the first criterion. One of each pair was located at +36 with respect to the horizontal plane, and the other at -36, with the positive and negative elevations presented in random order. The sequence of azimuths and elevations was identical for all datasets within the same trial. Tests for the third criterion, front/back discrimination, also consisted of five pairs of stimuli. Each pair included two processed infrapitch signals spatialized along the cone of confusion, both virtual sources being equidistant to the listener s ears. All were on the horizontal plane at random azimuths selected from the following: ±150, ±120, ±60, and ±30. The sequence of locations was identical for all datasets within the same trial. This procedure was repeated for all three sets of headphones for each subject in the study. The headphones were presented in randomized order to prevent any sequential bias effects. Page 4 of 8

4. RESULTS The results of the study are presented below, focusing first on the headphone-dependent HRTF selections, and second on the subject-dependent HRTF selections. 4.1. Headphone-dependent results 4.1.1. Externalization For the externalization criterion (Figure 5), as in previous studies, the IRCAM HRTF datasets (#1-3) were selected much less often than the CIPIC datasets (#4-10). Out of a possible 15 selections, dataset #10 was selected 13 times when heard over the Bose headset, the highest selection rate for any HRTF/headphone combination (Figure 5). Dataset #10 was selected much less often over the Sennheiser and Sony headphones (7 and 9 times, respectively). Over all datasets, the Bose headset received 68 selections, the Sennheiser headphones received 56, and the Sony received 58. However, datasets #1-3 were not selected enough to yield significant data for the effects of different headphones. For the CIPIC HRTFs alone, the Bose headset had 66 selections while the Sennheiser and Sony headphones had 54 and 55 selections, respectively. It should be noted, however, that this effect may be stronger or weaker depending on the HRTF dataset used; the strong preference for the Bose headset for HRTF #10 is in contrast to subjects equal preference for all three headphones when listening to HRTF #9. Figure 5 Number of externalization selections by HRTF database, for each set of headphones 4.1.2. Elevation For the elevation criterion (Figure 6), results of the CIPIC and IRCAM HRTF datasets again differed greatly. All the IRCAM datasets (#1-3) received a total of just 4 elevation selections on all of the headphones. Over datasets #4-10, the Bose headset received the most elevation selections with 48, while the Sennheiser and Sony headphones received 39 and 34 selections, respectively. Again, HRTF #10 showed the highest preference for the Bose headset, while the preference was reduced or unclear for the remaining CIPIC datasets. Figure 6 Number of elevation selections by HRTF database, for each set of headphones 4.1.3. Front/back discrimination Fewer HRTFs were selected during the third criterion, front/back discrimination. This is partly due to the design of the experiment, since only selected HRTFs are passed forward to the next criterion. The task of front/back discrimination tends to be the most difficult of the three criteria, however, since front/back reversals rely not only on accurate HRTFs but also on head movements [2], which were not included in this study. The Bose headset again outperformed the other headphones, receiving 28 selections over the CIPIC HRTFs, compared to 22 and 20 for the Sennheiser and Sony headphones, respectively. The extreme preference for the Bose headset for HRTF #10 remained, as it was selected 7 times compared to 3 for both the Sennheiser and Sony headphones. Interestingly, the Bose headset was not preferred for front/back discrimination with HRTF #5, where both the Sennheiser and Sony received Page 5 of 8

more selections (6 and 5, respectively) than the Bose headset s 3 selections. Figure 7 Number of front/back selections by HRTF database, for each set of headphones 4.2. Subject-dependent results Tables 1-3 show the selections per subject for each criteria. Each table represents the results for one set of headphones. For each subject/hrtf combination, 1, 2, and 3 represent the criteria for which each dataset was selected (1=externalization, 2=elevation, 3=front/back discrimination). Although the experiment was designed to only pass forward selected results from the past criterion, if a subject selected no datasets for a given criterion, all datasets were passed forward to the next criterion. Because of this, there are some cases in which a dataset was selected for a later criterion but not the criterion preceding it. For instance, there are three cases of subjects selecting a dataset for criteria 1 and 3 but not 2, suggesting that front/back discrimination does not imply good elevation perception in all cases. Surprisingly, subject 6 did not select any HRTFs as providing good externalization over the Sennheiser headphones (table 1) but selected six of the CIPIC datasets as providing good elevation cues. The subject only selected one dataset for externalization with the Sony headphones and two datasets with the Bose headset. It is therefore possible that the subject would have selected other datasets for the elevation criterion had they been passed forward from the previous selection. This supports the conclusion of an earlier study, which found that different HRTFs may be preferred for different criteria [6]. Other subject-specific findings are more curious. For instance, subject 5 was remarkably consistent in selecting nearly all the CIPIC HRTFs (#4-10) for all three sets of headphones. However, the subject also selected HRTF #1 from the IRCAM database for all three criteria only when listening over the Sony headphones. Similarly, subject 13 selected only HRTF #5 for all three criteria for both the Sennheiser and Sony headphones, while selecting no other HRTFs. When listening over the Bose headset, however, the same subject chose HRTF #6 for the first two criteria and did not select HRTF #5 at all. HRTF: 1 2 3 4 5 6 7 8 9 10 Subject: 1 1 1 1,2 2 1,2,3 1,2 1,2 1,2 1,2 1,2,3 3 1,2 4 1,2 1,2,3 1,2,3 1,2,3 1,2,3 1 1 5 1,2,3 1,2,3 1,2,3 1,2,3 1,2,3 1,2,3 1,2,3 6 2 2 2,3 2 2 2 7 1,2 1 1 1 8 1 1,2 1 9 1 1 1 1,2 1 1 1 10 1 1,3 11 1 1,2,3 12 1,2 1,2 1,2,3 13 1,2,3 14 1,2,3 1,2,3 1,2 1,2 1 1,2,3 15 1 1,3 1 1 Table 2 Listener HRTF Selection by Criterion, Sennheiser HD650 Headphones Page 6 of 8

HRTF: 1 2 3 4 5 6 7 8 9 10 Subject: 1 1,2 1 2 1 1 1 1,2 1 3 2 2,3 2 4 1,2 1,2,3 1,2,3 1,2,3 1 1 1,2,3 5 1,2,3 1,2,3 1,2,3 1,2,3 1,2,3 1,2,3 1,2,3 1,2,3 6 1 7 1 1 8 1 1 1 1,2 1,2 1 9 1,2 1,2 1,2 1,2 10 1 1,2 1,2,3 11 1,2 1 1 12 1 1,2,3 1 1,2 1 1 13 1,2,3 14 1,2,3 1,2,3 1,2 1,2,3 1,2,3 1,2 1,2,3 15 1,3 1 1 1 Table 3 Listener HRTF Selection by Criterion, Sony MDR-7506 Headphones HRTF: 1 2 3 4 5 6 7 8 9 10 Subject: 1 1 2 1,2,3 1,2 1,2,3 1 1,2,3 3 1,2 1 4 1,2,3 1,2,3 1,2,3 1,2 1,2,3 1,2,3 1,2,3 5 1,2,3 1,2 1,2,3 1,2,3 1,2,3 1,2,3 1,2,3 6 1 2 1,2,3 7 1 1 1,2 1,2 1 1,2 1 8 1,2 1 1 1,2 1,2 1,2 1 9 1,2,3 1,2,3 1,2 1,2 1,2,3 10 1,2 1 1,2 1 1,2 1 1,2,3 11 1 1,2 1,2,3 1,2,3 1,2,3 1 12 1 13 1,2 14 1,2,3 1,2 1,2,3 1 1,2,3 1,2 1,2,3 15 1 1 1,2,3 1 Table 4 Listener HRTF Selection by Criterion, Bose Aviation Headset 5. DISCUSSION This paper presents the results of listener preference for HRTF datasets heard over different sets of headphones. While previous studies on this issue had shown evidence for a minority who consistently preferred HRTFs from the IRCAM database, none of the subjects in this study showed such a preference. If such a subpopulation exists, a larger pool of subjects will be needed to show this effect conclusively. Out of the results from the CIPIC database s HRTFs, the Bose headset provided significantly better externalization than the Sennheiser or Sony headphones. The Bose headset was selected more often for the elevation and front/back discrimination criteria as well, but this may be a result of the structure of the experiment, since more HRTFs were passed forward to those criteria as a result of the higher externalization rate. In general, the Bose s superior performance was somewhat surprising, since although it is a high quality headset, it is closed, meaning that sound on the outside cannot get it, and sound reproduced through the headphones cannot be heard outside. Open headphones are often thought to have better externalization than closed headphones because of open headphones lower acoustic impedance at the ear, which provides a more natural sound [7]. Since the Spatial Audio Research Lab is already isolated from outside noise, it seems doubtful that the isolation provided by the Bose headset is responsible for its increased performance. In addition, the Sony headphones, which are also closed, performed approximately as well as the Sennheiser open headphones. Thus it seems more likely that the differences between headphones are related to their frequency responses. The Bose headset s response had Page 7 of 8

the flattest spectrum, which may have yielded better reproduction of the original HRTF. Though a flat frequency response is not always ideal for general headphone reproduction, flatter may be better for the volatile spatial images provided by the ears filtering effects. This suggests that if an HRTF database uses listener preference, it should contain some postprocessing equalization component to account for the frequency spectrum of the listener s headphones. This study also supports the conclusion that listeners may prefer different HRTFs for different perceptual criteria. Subjects selections showed that good front/back discrimination did not necessarily imply good elevation perception, and in some cases good elevation perception did not require good externalization as a prerequisite. This experiment was designed to exclude bad HRTFs in early stages to limit the time of the overall listening test. However, future work on this question should consider allowing listeners to hear all HRTF datasets for every criterion. 6. REFERENCES [1] Algazi, V.R., Duda, R.O., Thompson, D.M. and Avendano, C., (2001) The CIPIC HRTF Database, Proc. 2001 IEEE Workshop on Applications of Signal Processing to Audio and Electronics, Mohonk, New Paltz, NY, Oct. 21-24. [6] Roginska, A., Wakefield, G., & Santoro, T.S. (2010) Stimulus-dependent HRTF preference, Proceedings of the 129 th Audio Engineering Society Convention, San Francisco, CA, USA. [7] Vorländer, M. (2000) Acoustic load on the ear caused by headphones, Journal of the Acoustical Society of America, 107(4), 2082-2088. [8] Warren, R.M., & Bashford, J.A., Jr. (1981) Perception of acoustic iterance: Pitch and infrapitch, Perception & Psychophysics, 29, 395-402. [9] Wenzel, E.M., Wightman, F.L., Kistler, D.J., Foster, S. (1988) Acoustic origins of individual differences in sound localization behavior, Journal of the Acoustical Society of America, 84(1), S79. [10] Wenzel, E.M., Arruda, M., Kistler, D.J., & Wightman, F.L. (1993) Localization using nonindividualized head-related transfer functions, Journal of the Acoustical Society of America, 94(1), 111-123. [11] http://recherche.ircam.fr/equipes/salles/listen/ [2] Durlach, N.I., Rigopulos, A., Pang, X.D., Woods, W.S., Kulkarni, A., Colburn, H.S., and Wenzel, E.M. (1992) On the Externalization of Auditory Images, Presence, 1(2). [3] Hugeng, Wahab, W., & Gunawan, D. (2010) Enhanced Individualization of Head-Related Impulse Response Model in Horizontal Plane Based on Multiple Regression Analysis, Second International Conference on Computer Engineering and Applications. [4] Kendall, G.S. (1995) A 3-D Sound Primer: Directional Hearing and Stereo Reproduction, Computer Music Journal, 19(4), 23-46. [5] Roginska, A., Wakefield, G., & Santoro, T.S. (2010) User Selected HRTFs: Reduced Complexity and Improved Perception, Proceedings of the Undersea Human Systems Integration Symposium 2010, Providence, RI. Page 8 of 8