IMPLEMENTATION AND APPLICATION OF A BINAURAL HEARING MODEL TO THE OBJECTIVE EVALUATION OF SPATIAL IMPRESSION

Similar documents
The relation between perceived apparent source width and interaural cross-correlation in sound reproduction spaces with low reverberation

Psychoacoustic Cues in Room Size Perception

III. Publication III. c 2005 Toni Hirvonen.

DECORRELATION TECHNIQUES FOR THE RENDERING OF APPARENT SOUND SOURCE WIDTH IN 3D AUDIO DISPLAYS. Guillaume Potard, Ian Burnett

Assessing the contribution of binaural cues for apparent source width perception via a functional model

DESIGN OF ROOMS FOR MULTICHANNEL AUDIO MONITORING

Envelopment and Small Room Acoustics

Validation of lateral fraction results in room acoustic measurements

The psychoacoustics of reverberation

Multichannel Audio Technologies. More on Surround Sound Microphone Techniques:

Combining Subjective and Objective Assessment of Loudspeaker Distortion Marian Liebig Wolfgang Klippel

Audio Engineering Society. Convention Paper. Presented at the 124th Convention 2008 May Amsterdam, The Netherlands

MULTICHANNEL CONTROL OF SPATIAL EXTENT THROUGH SINUSOIDAL PARTIAL MODULATION (SPM)

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

Auditory Localization

Binaural Hearing. Reading: Yost Ch. 12

Sound source localization and its use in multimedia applications

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

APPLICATIONS OF A DIGITAL AUDIO-SIGNAL PROCESSOR IN T.V. SETS

The analysis of multi-channel sound reproduction algorithms using HRTF data

The Human Auditory System

Perceived cathedral ceiling height in a multichannel virtual acoustic rendering for Gregorian Chant

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

THE PAST ten years have seen the extension of multichannel

ROOM SHAPE AND SIZE ESTIMATION USING DIRECTIONAL IMPULSE RESPONSE MEASUREMENTS

The Subjective and Objective. Evaluation of. Room Correction Products

Reducing comb filtering on different musical instruments using time delay estimation

Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model

Perception of low frequencies in small rooms

DISTANCE CODING AND PERFORMANCE OF THE MARK 5 AND ST350 SOUNDFIELD MICROPHONES AND THEIR SUITABILITY FOR AMBISONIC REPRODUCTION

Convention Paper Presented at the 128th Convention 2010 May London, UK

Development and Validation of an Unintrusive Model for Predicting the Sensation of Envelopment Arising from Surround Sound Recordings

University of Huddersfield Repository

Proceedings of Meetings on Acoustics

The Spatial Soundscape. James L. Barbour Swinburne University of Technology, Melbourne, Australia

Multichannel level alignment, part I: Signals and methods

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O.

University of Huddersfield Repository

ALTERNATING CURRENT (AC)

Multichannel Audio In Cars (Tim Nind)

Enhancing 3D Audio Using Blind Bandwidth Extension

Listening with Headphones

Introduction. 1.1 Surround sound

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015

ANALYSIS AND EVALUATION OF IRREGULARITY IN PITCH VIBRATO FOR STRING-INSTRUMENT TONES

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4

Analysis of Frontal Localization in Double Layered Loudspeaker Array System

Binaural auralization based on spherical-harmonics beamforming

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 VIRTUAL AUDIO REPRODUCED IN A HEADREST

A3D Contiguous time-frequency energized sound-field: reflection-free listening space supports integration in audiology

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Fundamentals of Digital Audio *

SOUND COLOUR PROPERTIES OF WFS AND STEREO

NAME STUDENT # ELEC 484 Audio Signal Processing. Midterm Exam July Listening test

Complex Sounds. Reading: Yost Ch. 4

THE PERCEPTION OF ALL-PASS COMPONENTS IN TRANSFER FUNCTIONS

Spatialisation accuracy of a Virtual Performance System

Principles of Musical Acoustics

Influence of artificial mouth s directivity in determining Speech Transmission Index

A binaural auditory model and applications to spatial sound evaluation

QoE model software, first version

Psychology of Language

Measuring impulse responses containing complete spatial information ABSTRACT

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

MUS 302 ENGINEERING SECTION

EBU UER. european broadcasting union. Listening conditions for the assessment of sound programme material. Supplement 1.

SPATIAL AUDITORY DISPLAY USING MULTIPLE SUBWOOFERS IN TWO DIFFERENT REVERBERANT REPRODUCTION ENVIRONMENTS

Perceptual Band Allocation (PBA) for the Rendering of Vertical Image Spread with a Vertical 2D Loudspeaker Array

Spatial audio is a field that

SOUND QUALITY EVALUATION OF FAN NOISE BASED ON HEARING-RELATED PARAMETERS SUMMARY INTRODUCTION

University of Huddersfield Repository

HRTF adaptation and pattern learning

Convention Paper Presented at the 125th Convention 2008 October 2 5 San Francisco, CA, USA

Spatial Audio Reproduction: Towards Individualized Binaural Sound

Improving room acoustics at low frequencies with multiple loudspeakers and time based room correction

An overview of multichannel level alignment

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

THE TEMPORAL and spectral structure of a sound signal

Technical Note Vol. 1, No. 10 Use Of The 46120K, 4671 OK, And 4660 Systems in Fixed instaiiation Sound Reinforcement

Pre- and Post Ringing Of Impulse Response

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

A Comparative Study of the Performance of Spatialization Techniques for a Distributed Audience in a Concert Hall Environment

SPEECH AND SPECTRAL ANALYSIS

Predicting localization accuracy for stereophonic downmixes in Wave Field Synthesis

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

A study on sound source apparent shape and wideness

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Psychoacoustics of 3D Sound Recording: Research and Practice

Capturing 360 Audio Using an Equal Segment Microphone Array (ESMA)

Distortion products and the perceived pitch of harmonic complex tones

CONTENTS. Preface...vii. Acknowledgments...ix. Chapter 1: Behavior of Sound...1. Chapter 2: The Ear and Hearing...11

Loudspeaker Distortion Measurement and Perception Part 2: Irregular distortion caused by defects

SGN Audio and Speech Processing

True Peak Measurement

Novel approaches towards more realistic listening environments for experiments in complex acoustic scenes

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York

Transcription:

IMPLEMENTATION AND APPLICATION OF A BINAURAL HEARING MODEL TO THE OBJECTIVE EVALUATION OF SPATIAL IMPRESSION RUSSELL MASON Institute of Sound Recording, University of Surrey, Guildford, UK r.mason@surrey.ac.uk A binaural hearing model has been developed over a number of years that predicts the perceived width and position of sounds, over frequency and over time. The most appropriate methods for applying this model to evaluations of spatial impression are considered, including suitable test signals. Examples of a range of measurements are shown in a range of situations. INTRODUCTION A measurement that can predict the perceived spatial impression of sounds and provide the results in an intuitive manner would be a useful tool to aid the development of sound recording and reproduction systems. In this paper the term spatial impression is intended as an overall descriptor which refers to all perceivable attributes that can be described in terms of three physical dimensions, which includes the majority of the attributes laid out in Rumsey s spatial scene analysis paradigm [1]. Currently, it is necessary to conduct expensive and time-consuming listening tests every time the spatial impression of a sound system needs to be evaluated. Whilst it may never be possible to entirely replace subjective tests with an objective measure, the use of an objective metric can be used to support subjective testing, such as pre-selecting algorithms for more rigorous subjective evaluation, or helping to bridge the gap between the physical and perceptual domains. In this way, a reliable objective measurement that predicts aspects of perceived spatial impression can save time and money when used in combination with subjective evaluation. The majority of previous work that has been undertaken to develop such objective metrics has focused on perceived location (or localisation). This is a logical place to start when developing such measures, as it is a relatively straightforward phenomenon that can be studied relatively easily, and it can be easily implemented as a measurement. However, its perceptual importance and whether it matches the way that humans perceive sound is questionable. For instance, a sound recording and reproduction system can be evaluated by placing sound sources at given positions around the recording system, and then using a measurement to predict the localisation accuracy of each of these when the sound is reproduced. This should give an accurate prediction of the perceived localisation of sound sources in these positions, but will not necessarily give an indication of the spatial impression of the acoustical environment. Whilst a completely accurate reproduction of the sound source localisation may indicate that the acoustical environment will similarly be reproduced accurately, it is difficult to interpret how any errors in the reproduction localisation will affect the perceived spatial impression. For instance, the direct sound and reflections of an acoustical environment combine to give rise to a single perceptual component with a modified spatial impression [2] it would be complex to determine this interaction based on the intended and actual localisation of each of the components (i.e. the direct sound and each reflection). Therefore, a measurement system that can evaluate the attributes that contribute to spatial impression as perceived by a listener would be more useful. To use Rumsey s scene-based description structure [1], this can be interpreted as the location (azimuth, elevation, distance), and size (width, height, depth) of various AES 28 th International Conference, Piteå, Sweden, 2006 June 30 to July 2 1

elements of a sound scene such as individual sources, groups of sources or the acoustical environment. There is still a great deal of work to be done to develop measurements that predict the perceived effect of all of these attributes for a wide range of signals. The author, together with colleagues, has developed a binaural hearing model that can predict the perceived location and width of stimuli. This is described in this paper, together with related information about how this can be implemented in the analysis of sound recording and reproduction systems. Finally, example measurements of a range of systems are shown, giving an overview of the capability of such a measurement system. 1 BINAURAL HEARING MODEL The binaural hearing model was developed to predict aspects of the perceived spatial impression of a wide range of signals. The main factors that are predicted are the location (in terms of an azimuth) and width (in terms of a subtended angle) of the signals, in a number of frequency bands, dynamically over time. In addition, there is a simple prediction of the perceived loudness (in phons) when a calibrated input signal is used. Depending on the content of the measured sound, the predicted parameters are either of the perceived source or the perceived environment (or reverberation). In the future, the aim is to develop the model to be able to automatically separate these two components. The measurement was designed to be as widely applicable as possible, in that it should be able to accurately predict the perceived source or environment width of any binaural signal that is fed into it. This means that it should work for any situation where a human listener can perceive the source or environment width of a sound, be it concert hall acoustics, reproduced sound, virtual reality, or any other form of listening. In addition, it should give comparable results for any input signal rather than having to rely on a single test signal or audio extract. The central calculation on which the measurement model is based is the interaural cross-correlation coefficient (IACC). This was shown to be inversely related to the perceived width of auditory stimuli as early as the 1960s [3, 4]. Whereas recent research has suggested that it is not an accurate representation of the physiology of the binaural hearing process [5], the predictions of models of binaural perception based on the IACC have shown remarkable similarity to experimental data relating to lateralisation (e.g. [6, 7, 8]), binaural detection thresholds (e.g. [9, 10, 11]), and the perceived source width of an auditory stimulus (e.g. [12, 13, 14]). Measurements based on this calculation that attempt to predict aspects of the perceived spatial impression have been developed previously by a number of researchers, however there were problems with the implementation of these, as the results were not directly comparable for a wide range of situations. The model described in this paper was developed to solve a number of these problems, such as the dependence of the perceived width on the loudness and frequency of stimuli. The basic block diagram of the binaural model is shown in Figure 2. The main features of this binaural hearing model are as follows: inclusion of half-wave rectification and lowpass filtering so that the correlation of high frequency signals is affected by the envelope of the signal rather than the fine temporal detail [15] the use of a running measurement to uncover variations in the perceived parameters over time [16] the effect of the frequency and loudness of the input signal is taken into account when predicting the perceived width [15, 17] predictions of the perceived location and width are integrated into a single output [18] the results are output in terms of an angle rather than an unintuitive value such as a coefficient [16] the effect of the IACC on the perceived localisation is taken into account [18] the effect of double peaks in the IACC calculation are taken into account in the width and localisation prediction [18] The results of this analysis can be output as an animation or as an interactive plot. The animated display (an example screenshot is shown in Figure 1) shows the width and location as an angle against loudness and frequency, which is animated over time. The interactive plot (an example is shown in Figure 3) shows either the width or location or a combination as an angle, together with loudness against time, for selected frequency bands and time segments.

Binaural input Filterbank Rectification and low-pass filtering Windowing Loudness measurement Detection of interaural level difference Cross-correlation calculation Detection of interaural time difference Frequency and loudness compensation Combination of localisation cues Localisation detection Temporal smoothing Width detection Integration of localisation and width Conversion to angle Display Figure 2: Block diagram of the main processing stages of the binaural model. Figure 1: Example output of the animation display, showing the combined width and location in degrees on the x-axis, over frequency on the y-axis, with the loudness indicated by the brightness of the plot. The red and green lines indicate the peak hold of the extreme left and right values respectively. Figure 3: Example output of the interactive plot display showing the combined width and location of the sound on the y-axis over time on the x-axis in the upper plot, and the loudness of the sound on the y-axis over time on the x-axis in the lower plot, for the frequency bands selected on the left hand side. AES 28 th International Conference, Piteå, Sweden, 2006 June 30 to July 2 3

A number of evaluations have been conducted on the resulting binaural model, including those reported in [16]. This has demonstrated that the model accurately predicts the perceived spatial impression of most simple stimuli (for example consisting of a single source in a reverberant environment), and with much greater accuracy than simple IACC measurement techniques. There are a number of ways in which the model can be improved, including integration of results across frequency, automated separation of sources and reverberation, and further development of temporal smoothing algorithm. 2 MEASUREMENT PROCESSES Now that a suitable binaural model has been developed to predict some aspects of the perceived spatial impression of stimuli, the optimum method for applying this needs to be developed. The model could be used for either comparative or absolute evaluations of a reproduced sound field. This sound field may be affected by any and all components in the signal chain which, depending on the individual situation, may include the original source signal, an original acoustical environment (including the source, environment and capturing system), additional signal processing, some form of storage or distribution medium, reproduction signal processing, and the reproduction system (including the loudspeakers and the acoustical environment) and finally the binaural capture system. An absolute measurement will include the effect of each of these components, and to evaluate the effect of one of these requires the quantification and standardisation of all the other factors in the signal path. Comparative evaluations are somewhat simpler, and are the basis of other perceptual quality meters, such as PEAQ [19]. As an example, if the aim of the recording and reproduction was to accurately recreate the perception of the original acoustical environment, then comparable measurements could be undertaken of both the original sound field and the reproduced version. Such an approach was taken by Furlong [20], and it is a useful and straightforward technique for evaluating a complete system. However, it is rare that all the parameters of a complete recording and reproduction system are in the control of one individual or organisation. A more common type of test would be to evaluate the effect of one component in the recording and reproduction chain, and this can be done by exchanging or altering the given component, which will enable the evaluation of differences caused by this quantifiable change. This is also a relatively simple task, however it is complicated by the possibility that the other components in the signal chain will act to mask changes. For instance, a change to a spatial processing algorithm may have one effect with a certain reproduction system, but may produce a different effect with a different reproduction system. For this reason, the comparisons need to be conducted with care, and if possible in as many situations as possible. Another important consideration in this type of test is the source signal that is used. Ideally, this would be representative of a wide range of stimuli that will be reproduced through the system under test. It should also include the extreme range of values of all the relevant physical parameters that may be present in typical stimuli in order to fully test the capabilities of the reproduction system. This is considered in more detail in the next section. In order to limit complexity, this paper will focus on the problem of evaluating some aspect of a sound reproduction system, be it some form of processing algorithm, the loudspeakers and their setup, the acoustical environment, or the listening position. In this context, it is necessary to consider what subjective attributes are to be evaluated, as this will help to structure the application of the binaural model. The overall characteristic that will be investigated could be described as the quality of the spatial impression. However, as discussed above, this is a multidimensional term, made up of many individual attributes. In addition, quality is recognised to be a higher level cognitive attribute, which includes the detection of lower level quantitative judgments together with an evaluation of the suitability of these towards a certain aim or expectation. The binaural model predicts the perception of lower level sensory attributes, so it is logical to limit the application to these attributes, at least until further work has been undertaken to map the relationship between these and quality for a wide range of situations and stimulus types. Therefore, it would be simpler to measure low level attributes, based on the factors that the binaural model currently predicts. The predictions of the perceived width can be used to determine the accuracy and range of widths that can be reproduced by a system. Likewise, the predictions of perceived location can be used to determine the accuracy and range of localisation that can be reproduced by a system. Measurements can be made of the consistency of the results across time, position, frequency, or other variables. Finally the prediction of the perceived loudness can be used to evaluate whether there is any emphasis or suppression AES 28 th International Conference, Piteå, Sweden, 2006 June 30 to July 2 4

of the reverberation, which may affect the perceived envelopment [21] 1. 3 TEST SIGNALS As discussed in the previous section, one of the most important and challenging issues in testing sound reproduction systems is the selection of suitable test signals. These should be representative of the type of stimuli that will be reproduced through the system under test, and should cover a wide range of each relevant physical parameter in a relatively short time. The potential experiment questions posed in the previous section dictate the properties of the test signals to a certain extent. To evaluate the range of locations and widths that can be reproduced, the test signal will have to contain a range of interchannel correlation values and level / time differences 2. To evaluate the accuracy of these factors, the correct perceived effect of these parameters will have to be pre-determined. To evaluate the suppression or enhancement of reverberation, the test signal will have to include a reverberation-like decay in order to monitor any level compression or expansion effects that may affect how the reverberation is perceived relative to the direct sound. 3.1 Signal spectrum One important factor in the choice of test signal is the effect of the signal spectrum on the result. The interaural cross-correlation coefficient (IACC) has been shown to be related to the perceived width of a sound, and is basically a measure of the similarity of the signals arriving at each ear. This can be affected by differing notches in the transfer response from the source to each ear when combined with the reflections from the acoustical environment. For instance, the plots in Figure 4 show a fast Fourier transform of a binaural sound field consisting of a direct sound from directly in front and a single lateral reflection. It can be seen that this causes comb filters that are different at the two ears due to the differing path length of the reflection to each ear. The signal overlaid in the first plot will have a relatively 1 This is not necessarily an attribute of spatial impression as defined in the introduction, though it is regularly considered to be similar and could be included in a measurement as the relevant information is required as part of the binaural hearing model. 2 This assumes that the system under test uses fairly simple time or level based panning. If the recording / reproduction system involves a more complex panning system then the localisation performance can be tested using that method of sound source positioning. high IACC as the spectral components are at frequencies where the response of the sound field is relatively similar at the two ears. However, the signal overlaid in the second plot contains spectral components at a frequency where there is a large difference between the sound field responses at the two ears; therefore this will cause a relatively low IACC. FFT Magnitude (dbfs) FFT Magnitude (dbfs) 0-5 -10-15 -20-25 -30-35 Right channel Left channel -40 1970 1980 1990 2000 2010 2020 2030 Frequency Hz 0-5 -10-15 -20-25 -30-35 Right channel Left channel -40 1970 1980 1990 2000 2010 2020 2030 Frequency Hz Figure 4: Plot of the fast Fourier transform of a binaural sound field consisting of a direct sound and a single lateral reflection, overlaid with two signals of differing spectral content. Therefore, it is apparent that the spectral content of a signal can have a large effect on the resulting interaural cross-correlation. Based on this, if a test signal is to successfully detect any such constructive and destructive interferences, it needs to be spectrally complex in order to excite these differences. However, on the other hand if the signal is spectrally too complex, any differences that occur will be a small proportion of the overall signal. For example, a significant destructive interference across a narrow frequency range would have a large effect on the IACC if a sound consisted of two tonal components, one at the interference frequency and one elsewhere. However, if there were a large number of tonal components across a wide frequency range, the fact that the destructive interference is only a small proportion of the signal would mean that the IACC would remain close to 1. AES 28 th International Conference, Piteå, Sweden, 2006 June 30 to July 2 5

A wide-band test signal, such as noise, has the advantage that all components and potential interactions can be evaluated at once, and there is less risk of missing an interaction as would happen with the signal in the upper plot of Figure 4. It is also simple to vary the interchannel cross-correlation (ICCC) of a wide-band test signal, and amplitude envelopes that are representative of a reverberant decay can be introduced in a consistent manner. On the other hand, the disadvantage of a wide-band signal is that these interactions could be a small proportion of the overall result as discussed above, and noise is dissimilar to the music or speech signals that are most likely to be played through a sound reproduction system. A tonal or harmonic signal would be more musically representative, and therefore the results of any measurements made of these would be likely to be more externally valid, especially when considering the effect of the spectral complexity on the result as discussed above. However, in order to detect all significant interactions, either a number of these signals would be required with differing pitches, or the signal will need to include a frequency sweep of some form this will mean that the signal (and therefore the measurement and analysis time) will be longer. In addition, it would be difficult to vary the ICCC of such a signal in a meaningful way, and any amplitude envelope that is introduced will have to be repeated for each pitch resulting in a longer signal duration. As there is no way to compromise between these two signal types and still retain the requirements set out at the start of this section, it was decided to use two signals, one wide-band and one tonal. 3.2 Wide-band signal The wide-band signal that was chosen is a noise-like signal made up of a large number of sine tones a spectrogram of this is shown in Figure 5. The sine tones are spaced 5 Hz apart, based on the fact that the bandwidth of strong interactions and resonances such as modes in most listening environments are equal to or greater than this [22], meaning that this signal should excite any significant irregularities in the response. The starting phase of the sine tones is pseudo-random, in order to minimise the crest factor of the noise. The ICCC can be swept from 0 to 1 or vice versa, and is varied by altering the arrangement of the tonal components in each channel. The number of output channels is theoretically unlimited, though a smaller spacing of sine tones may be required if there are a large number of channels. The amplitude envelope of this signal is designed to test the dynamic response of the reproduction system. It has a transient attack, followed by a sustained segment that is 12dB lower in level than the attack. The sustain segment lasts for 3 seconds, during which the ICCC sweeps from 0 to 1 or 1 to 0, as specified. Following this, there is an exponential decay, which simulates a reverberant decay with a reverb time (RT60) of 2 seconds. The perceived location of the sound can be manipulated by introducing interchannel time or level differences, based on previous experimental data such as those listed in [23]. Finally, the signal is filtered to give a frequency response similar to pink-noise. Figure 5: A spectrogram of the wide-band test signal s howing the spectral content of the signal on the y-axis against time on the x-axis, with the magnitude shown on the colour scale in dbfs. This signal can be used to evaluate the following factors, for wide-band signals: the range of widths ( which may be otherwise described as the focus and spaciousness of the reproduced sound) that can be reproduced by a system any enhancement or suppression of the reverberation the accuracy of the reproduced localisation. 3.3 Tonal signal The tonal signal that was chosen is based on the research of Neher et al [24], who investigated the properties of musical signals and their effect on the IACC when reproduced in a room. In order to ensure that there are at least two tonal components in each of the frequency bands in the binaural hearing model (so that there is an interaction to AES 28 th International Conference, Piteå, Sweden, 2006 June 30 to July 2 6

measure), two harmonic complexes are used, which are harmonically related to each other. The first has a fundamental frequency of 55 Hz with 34 harmonics, the second a fundamental frequency of 220 Hz with 42 harmonics. This results in a signal with a frequency range of 55 Hz to 9240 Hz. The frequency components are held constant for half a second, sweep up an octave over approximately 3.5 seconds, then rest at the final pitch for half a second. Therefore, the frequency range at the end of the signal is 110 Hz to 18480 Hz. The sweep rate was derived from experimentation it was chosen as a compromise of being able to see all significant deviations in the result whilst keeping the stimulus duration as short as possible. A spectrogram of the resulting signal is shown in Figure 6. 4 EXAMPLE MEASUREMENTS A number of 2-channel stereo reproduction systems in a range of acoustical environments were tested by reproducing both test signals through each and capturing the results with a binaural head and torso simulator (HATS). The resulting binaural signals were then passed through the binaural hearing model. A subset of these results is discussed below. 4.1 Listening room The first reproduction system consisted of two monitor loudspeakers positioned at a height so that the tweeters were at ear level, a distance 2m from the HATS, and ±30 from the median plane, in an ITU-R BS 1116 standard listening room. It was expected that this system would be representative of a fairly high quality reference system. The results of the measurements of this system are shown in Figure 7 to Figure 11 below. Figure 6: A spectrogram of the tonal test signal showing the spectral content of the signal on the y-axis against time on the x-axis, with the magnitude shown on the colour scale in dbfs. The level of each harmonic in the tone complex is equal, though it is filtered after generation to give a frequency response similar to pink-noise. It is not possible to vary the ICCC in a meaningful way using this stimulus, so this is kept at a value of 1. As for the wide-band stimulus, the perceived location of the sound can be manipulated by introducing interchannel time or level differences. This signal can be used to evaluate the following factors, for tonal signals: the narrowest width (which may be otherwise described as the focus ) that can be reproduced by a system the accuracy of the reproduced localisation the variation or consistency of the reproduced localisation and width over frequency. Figure 7: Output of the binaural hearing model for all frequency bands, for the wide-band stimulus reproduced in the listening room. Figure 7 shows the combined width and location results for the wide-band stimulus, averaged over all the measured frequency bands, together with the loudness in each frequency band. An ideal width and location result for this test signal would be the reproduction of wide range of widths with a smooth transition in between, with a location centred on 0 throughout. In other words, the stimulus should start narrow, and then should gradually increase in width to at least ±30 (the loudspeaker spacing) as the ICCC of the signal gradually moves from 1 to 0 between 0.5 and 3 seconds. An ideal loudness result for this test signal would be a AES 28 th International Conference, Piteå, Sweden, 2006 June 30 to July 2 7

short transient attack that is approximately 12 phons above the loudness of the sustain section, followed by a decay of approximately 30 phons per second. It can be seen from Figure 7 that the overall trend follows the expectation that the sound should increase in width over the duration of the signal, but the area of paler colour at the start of the stimulus indicates that not all frequency bands are in agreement. Further analysis indicated that some of the high frequency bands had a significant interaural level difference, as shown in Figure 8, which caused the prediction of the localisation of these frequency bands to be to one side or the other. This was found to be due to the fact that the loudspeakers used in the system under test had nonmatching tweeters, with differing level and phase responses. Figure 9: Output of the binaural hearing model for the frequency bands up to 1 khz, for the wide-band stimulus reproduced in the listening room. The results for the tonal stimulus are shown in Figure 10 and Figure 11. Again, there is a large amount of variation at higher frequencies caused by the mismatched tweeters. Figure 8: Output of the binaural hearing model interactive plot display showing the interaural level difference of the sound on the y-axis over time on the x- a xis in the upper plot, and the loudness of the sound on the y-axis over time on the x-axis in the lower plot, for the frequency bands above 1 khz, for the wide-band stimulus reproduced in the listening room. Figure 9 shows a similar measurement to Figure 7, though showing only the results below 1 khz. In this plot the variation in the prediction of the perceived width can be seen to be as expected. Analysis of the loudness measurements indicates that this reproduction system has a relatively linear amplitude response. This means that this system will not emphasise or suppress the perceived level of the reverberation with respect to the direct sound. Figure 10: Output of the binaural hearing model for all frequency bands, for the tonal stimulus reproduced in the listening room. AES 28 th International Conference, Piteå, Sweden, 2006 June 30 to July 2 8

Figure 11: Output of the binaural hearing model for the frequency bands up to 1 khz, for the tonal stimulus reproduced in the listening room. The measured results in Figure 11 show that below 1 khz there is little variation in width and location as the tonal signal sweeps across frequency. This is to be expected as the reproduction room is acoustically treated to reduce modal and reflection-based artefacts, therefore minimising frequency-dependent interactions. This analysis has indicated that the spatial response of this reproduction system is relatively neutral across some of the frequency range, with consistent localisation and width cues, a wide range of reproduced widths, and no suppression or enhancement of the reverberation. However, it has indicated problems at higher frequencies, which were subsequently found to be due to mis-matched high frequency drivers in the loudspeakers. 4.2 Automobile As a comparison, measurements were made of an in-car audio system where the 2-channel stereo is reproduced over multiple loudspeaker drivers. This is a very different situation to the listening room measurements above, both in terms of the loudspeaker arrangement and the acoustical environment. The HATS was set up in the driver s seat, and binaural signals were recorded for both test stimulus types. The results of the measurements of this system are shown in Figure 12 to Figure 15 below. Figure 12: Output of the binaural hearing model for all frequency bands, for the wide-band stimulus reproduced in the automobile. Figure 12 shows the combined width and location results for the wide-band stimulus, averaged over all the measured frequency bands. As before, an ideal result for this test signal would be to start narrow and centred on 0, then increase in width. It can be seen in Figure 12 that the perceived width prediction does increase as the ICCC reduces, though not as clearly, and to a lesser extent, than the listening room measurement. In the previous set of measurements, the results at low frequencies showed a much clearer trend. Similar analysis (including the frequency range below 1 khz) was made of the automobile measurements, and the results are shown in Figure 13. It can be seen in Figure 13 that the variation in width is still not clear even in this frequency range. Further examination of individual frequency bands indicates that the variation in IACC (and therefore the width) is relatively small as the ICCC is varied, especially compared to the results in the listening room. In addition, in a number of frequency bands the location is displaced to one side or the other. These factors are likely to be a result of the off-centre listening position and the use of multiple drivers in different locations for each channel. AES 28 th International Conference, Piteå, Sweden, 2006 June 30 to July 2 9

Figure 13: Output of the binaural hearing model for the frequency bands up to 1 khz, for the wide-band stimulus reproduced in the automobile. Figure 14: Output of the binaural hearing model for all frequency bands, for the tonal stimulus reproduced in the automobile. The results for the loudness measurement show that, as for the listening room system, the automobile audio system relatively accurately reproduces the amplitude envelope of the wide-band stimulus. This indicates that the level of the reverberation will not be enhanced or suppressed compared to the direct sound. It can be seen from the end of the decay that the dynamic range of this system is less than the listening room, but this is caused by the increased background noise of this measurement situation. The measurements of the tonal stimulus (shown in Figure 14 and Figure 15) show a similar trend to those of the wide-band stimulus. The tonal stimulus has an ICCC of 1, and yet the prediction of the perceived width is still relatively wide. In addition, the width and location of the sound in each frequency band varies as the pitch of the stimulus changes. This indicates that the imaging of this system is inconsistent across frequency. The measurements of this system indicate that its spatial performance is less neutral than the listening room system. This is to be expected due to the inherent problems of an automobile system, such as the offcentre listening position, the use of multiple drivers in a range of positions, and the non-optimal acoustical environment that is imposed. Specific problems that were found using this analysis were inconsistent location and width over frequency and a limited range of widths that can be reproduced. Further use of this analysis technique could help to suggest solutions to these problems, and to test updated systems. Figure 15: Output of the binaural hearing model for the frequency bands up to 1 khz, for the tonal stimulus reproduced in the automobile. 5 FURTHER WORK There are number of ways in which this work can be developed further. Firstly, the binaural hearing model and its display could be improved by revising the temporal smoothing algorithms that are employed, developing an automated method for separating sourcerelated and environment-related aspects, and by investigating the optimum method to integrate the results across the frequency bands rather than taking a AES 28 th International Conference, Piteå, Sweden, 2006 June 30 to July 2 10

loudness-weighted average. Secondly, the measurement method could be developed further, by taking into account head movements made when listening, and by making developing a database of evaluations so that the predictions of the low-level perceptual attributes can be converted to a quality or preference scale. 6 CONCLUSIONS This paper has briefly summarised a binaural hearing model that has been developed to predict the perceived localisation and width of a binaural recording and display the results in an intuitive manner. The potential problems in employing such a binaural hearing model to testing sound recording and reproduction systems have been considered. Based on this, a method for testing the spatial response of a sound reproduction system has been proposed, together with the type of attributes that can be tested. It was discussed that one of the most important factors in testing sound reproduction systems using this method is the selection of appropriate test signals. Two test signals were proposed, one wide-band and one tonal, which aim to uncover the maximum information about the spatial performance of a reproduction system whilst keeping the measurement time to a minimum. Measurements were made of a range of sound reproduction systems using this technique. Examples were shown of two systems, and it was found that the measurement technique gave useful information about the spatial performance of these. Finally, a number of methods for further developing the measurement technique were suggested. ACKNOWLEDGEMENTS This work was supported by the Engineering and Physical Sciences Research Council (EPSRC), UK grant GR/R55528/01 and EP/D049253. REFERENCES [1] Rumsey, F. 2002: Spatial quality evaluation for reproduced sound: terminology, meaning, and a scene-based paradigm, Journal of the Audio Engineering Society, vol. 50, no. 9 (September), pp. 651-666. [2] Barron, M. and Marshall, A. H. 1981: Spatial impression due to early lateral reflections in concert halls: the derivation of a physical measure, Journal of Sound and Vibration, vol. 77, no. 2, pp. 211-232. [3] Chernyak, R. I. and Dubrovsky, N. A. 1968: Pattern of the noise images and the binaural summation of loudness for the different interaural correlation of noise, Proceedings of the 6th International Congress on Acoustics, Tokyo, pp. A53-A56. [4] Keet, W. de V. 1968: The influence of early lateral reflections on the spatial impression, Proceedings of the 6th International Congress on Acoustics, Tokyo, pp. E53-E56. [5] Boehnke, S. E., Hall, S. E., and Marquardt, T. 2002: Detection of static and dynamic changes in interaural correlation, Journal of the Acoustical Society of America, vol. 112, pp. 1617-1626. [6] Sayers, B. M. 1964: Acoustic-image lateralization judgments with binaural tones, Journal of the Acoustical Society of America, vol. 36, pp. 923-926. [7] Okano, T. 2000: Image shift caused by strong lateral reflections, and its relation to inter aural cross correlation, Journal of the Acoustical Society of America, vol. 108, pp. 2219-2230. [8] Constan, Z. A., and Hartmann, W. M. 2001: Sound localization by interaural time differences at high frequencies, Journal of the Acoustical Society of America, vol. 109, pp. 2485. [9] Sayers, B. M., and Cherry, E. C. 1957: Mechanism of binaural fusion in the hearing of speech, Journal of the Acoustical Society of America, vol. 29, pp. 973-987. [10] Robinson, D. E., and Jeffress, L. A. 1963: Effect of varying the interaural noise correlation on the detectability of tonal signals, Journal of the Acoustical Society of America, vol. 35, pp. 1947-1952. [11] Bernstein, L. R., and Trahiotis, C. 1992: Discrimination of interaural envelope correlation and its relation to binaural unmasking at high frequencies, Journal of the Acoustical Society of America, vol. 91, pp. 306-316. AES 28 th International Conference, Piteå, Sweden, 2006 June 30 to July 2 11

[12] Plenge, G. 1972, Über das problem der im-kopflokalisation (On the problem of in-head localisation), Acustica, vol. 26, pp. 241-252. [13] Kurozumi, K., and Ohgushi, K. 1983: The relationship between the cross-correlation coefficient of two-channel acoustic signals and sound image quality, Journal of the Acoustical Society of America, vol. 74, pp. 1726-1733. [14] Blauert, J., and Lindemann, W. 1986: Spatial mapping of intracranial auditory events for various degrees of interaural coherence, Journal of the Acoustical Society of America, vol. 79, pp. 806-813. [21] Soulodre, G. A.; Lavoie, M. C.; Norcross, S. G. 2003: Objective measures of listener envelopment in multichannel surround systems, Journal of the Audio Engineering Society, vol. 51, no. 9 (September), pp. 826-840. [22] Everest, F. A. 2001: Master handbook of acoustics, 4 th edition, McGraw Hill. [23] Rumsey, F. 2001: Spatial audio, Focal Press. [24] Neher, T., Brookes, T. and Mason, R. 2006: Musically representative test signals for interaural cross-correlation coefficient measurement, in preparation. [15] Mason, R., Brookes, T. and Rumsey, F. 2005: Frequency dependency of the relationship between perceived auditory source width and the interaural cross-correlation coefficient for timeinvariant stimuli, Journal of the Acoustical Society of America, vol. 117, no. 3 (March), pp. 1337-1350. [16] Mason, R., Brookes, T. and Rumsey, F. 2004: Evaluation of an auditory source width prediction model based on the interaural crosscorrelation coefficient, 148th Meeting of the Acoustical Society of America, Journal of the Acoustical Society of America, vol. 116, no. 4 (October), pp. 2475. [17] Mason, R. and Brookes, T. 2006: Loudness dependency of the relationship between perceived auditory source width and the interaural cross-correlation coefficient for timeinvariant stimuli', in preparation. [18] Mason, R., Brookes, T. and Rumsey, F. 2004: Integration of measurements of interaural crosscorrelation coefficient and interaural time difference within a single model of perceived source width, Audio Engineering Society Preprint, 117th Convention, preprint no. 6317. [19] ITU-R BS. 1387 2001: Method for objective measurements of perceived audio quality, International Telecommunication Union recommendation. [20] Furlong, D. J. 1989: Comparative study of effective soundfield reconstruction, Audio Engineering Society Preprint, 87th Convention, preprint no. 2842. AES 28 th International Conference, Piteå, Sweden, 2006 June 30 to July 2 12