Presented at the 102nd Convention 1997 March Munich,Germany

Similar documents
Envelopment and Small Room Acoustics

Multichannel Audio Technologies. More on Surround Sound Microphone Techniques:

Acoustics II: Kurt Heutschi recording technique. stereo recording. microphone positioning. surround sound recordings.

University of Huddersfield Repository

Surround: The Current Technological Situation. David Griesinger Lexicon 3 Oak Park Bedford, MA

Binaural Hearing. Reading: Yost Ch. 12

Sound source localization and its use in multimedia applications

University of Huddersfield Repository

DISTANCE CODING AND PERFORMANCE OF THE MARK 5 AND ST350 SOUNDFIELD MICROPHONES AND THEIR SUITABILITY FOR AMBISONIC REPRODUCTION

The analysis of multi-channel sound reproduction algorithms using HRTF data

Multichannel Audio In Cars (Tim Nind)

EBU UER. european broadcasting union. Listening conditions for the assessment of sound programme material. Supplement 1.

Virtual Sound Source Positioning and Mixing in 5.1 Implementation on the Real-Time System Genesis

Convention Paper 7057

Accurate sound reproduction from two loudspeakers in a living room

Introduction. 1.1 Surround sound

Spatial audio is a field that

Auditory Localization

The Spatial Soundscape. James L. Barbour Swinburne University of Technology, Melbourne, Australia

Convention Paper 6230

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA. Why Ambisonics Does Work

Introducing Twirling720 VR Audio Recorder

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Improving room acoustics at low frequencies with multiple loudspeakers and time based room correction

Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model

IMPLEMENTATION AND APPLICATION OF A BINAURAL HEARING MODEL TO THE OBJECTIVE EVALUATION OF SPATIAL IMPRESSION

From time to time it is useful even for an expert to give a thought to the basics of sound reproduction. For instance, what the stereo is all about?

DECORRELATION TECHNIQUES FOR THE RENDERING OF APPARENT SOUND SOURCE WIDTH IN 3D AUDIO DISPLAYS. Guillaume Potard, Ian Burnett

Convention Paper 7480

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Psychoacoustic Cues in Room Size Perception

THE TEMPORAL and spectral structure of a sound signal

Microphone a transducer that converts one type of energy (sound waves) into another corresponding form of energy (electric signal).

Sound localization with multi-loudspeakers by usage of a coincident microphone array

The Why and How of With-Height Surround Sound

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 VIRTUAL AUDIO REPRODUCED IN A HEADREST

III. Publication III. c 2005 Toni Hirvonen.

ROOM SHAPE AND SIZE ESTIMATION USING DIRECTIONAL IMPULSE RESPONSE MEASUREMENTS

Predicting localization accuracy for stereophonic downmixes in Wave Field Synthesis

Improving 5.1 and Stereophonic Mastering/Monitoring by Using Ambiophonic Techniques

A spatial squeezing approach to ambisonic audio compression

Validation of lateral fraction results in room acoustic measurements

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

The psychoacoustics of reverberation

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York

RECOMMENDATION ITU-R BS User requirements for audio coding systems for digital broadcasting

Multi-Loudspeaker Reproduction: Surround Sound

DESIGN OF ROOMS FOR MULTICHANNEL AUDIO MONITORING

CHAPTER ONE SOUND BASICS. Nitec in Digital Audio & Video Production Institute of Technical Education, College West

APPLICATIONS OF A DIGITAL AUDIO-SIGNAL PROCESSOR IN T.V. SETS

Convention Paper Presented at the 125th Convention 2008 October 2 5 San Francisco, CA, USA

LINE ARRAY Q&A ABOUT LINE ARRAYS. Question: Why Line Arrays?

SOUND 1 -- ACOUSTICS 1

LOCALISATION OF SOUND SOURCES USING COINCIDENT MICROPHONE TECHNIQUES

Psychoacoustics of 3D Sound Recording: Research and Practice

Convention Paper 5721

Finding the Prototype for Stereo Loudspeakers

Spatial Audio Reproduction: Towards Individualized Binaural Sound

Added sounds for quiet vehicles

Convention Paper Presented at the 126th Convention 2009 May 7 10 Munich, Germany

A3D Contiguous time-frequency energized sound-field: reflection-free listening space supports integration in audiology

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

Sound Design and Technology. ROP Stagehand Technician

Reducing comb filtering on different musical instruments using time delay estimation

MUS 302 ENGINEERING SECTION

A study on sound source apparent shape and wideness

Capturing 360 Audio Using an Equal Segment Microphone Array (ESMA)

EE1.el3 (EEE1023): Electronics III. Acoustics lecture 20 Sound localisation. Dr Philip Jackson.

Measuring impulse responses containing complete spatial information ABSTRACT

Convention Paper 7024 Presented at the 122th Convention 2007 May 5 8 Vienna, Austria

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

New acoustical techniques for measuring spatial properties in concert halls

Binaural auralization based on spherical-harmonics beamforming

A Comparative Study of the Performance of Spatialization Techniques for a Distributed Audience in a Concert Hall Environment

O P S I. ( Optimised Phantom Source Imaging of the high frequency content of virtual sources in Wave Field Synthesis )

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Suppose you re going to mike a singer, a sax, or a guitar. Which mic should you choose? Where should you place it?

Reproduction of Surround Sound in Headphones

MEASURING DIRECTIVITIES OF NATURAL SOUND SOURCES WITH A SPHERICAL MICROPHONE ARRAY

Convention Paper Presented at the 128th Convention 2010 May London, UK

Multichannel Audio Technologies: Lecture 3.A. Mixing in 5.1 Surround Sound. Setup

What applications is a cardioid subwoofer configuration appropriate for?

Listening with Headphones

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015

Speaker placement, externalization, and envelopment in home listening rooms

Development and application of a stereophonic multichannel recording technique for 3D Audio and VR

Chapter 6: Room Acoustics and 3D Sound Processing

FLOATING WAVEGUIDE TECHNOLOGY

Convention e-brief 310

MONOPHONIC SOURCE LOCALIZATION FOR A DISTRIBUTED AUDIENCE IN A SMALL CONCERT HALL

PRELIMINARY INFORMATION

Perceived cathedral ceiling height in a multichannel virtual acoustic rendering for Gregorian Chant

Selecting the right directional loudspeaker with well defined acoustical coverage

Computational Perception. Sound localization 2

Josephson Engineering, Inc.

Loudspeaker Array Case Study

A White Paper on Danley Sound Labs Tapped Horn and Synergy Horn Technologies

Transcription:

Coincident Microphone Techniques Preprint 4429 (E2) For Three Channel Stereophonic Reproduction Douglas McKinnie, Francis Rumsey University of Surrey Guildford, Great Britain Presented at the 102nd Convention 1997 March 22-25 Munich,Germany AuD, This preprint has been reproduced from the author's advance manuscript, without editing, corrections or consideration by the Review Board. The AES takes no responsibility for the contents. Additional preprints may be obtained by sending request and remittance to the Audio Engineering Society, 60 East 42nd St., New York, New York 10165-2520, USA. All rights reserved. Reproduction of this preprint, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. AN AUDIO ENGINEERING SOCIETY PREPRINT

Coincident microphone techniques for three channel stereophonic reproduction Douglas McKinnie and Francis Rumsey Music Department, University of Surrey, Guildford GU2 5XH, UK Abstract Coincident stereo microphone techniques which could be used for three-channel frontal playback are analysed and described, based on a three-channel psychoacoustic panning law devised by Gerzon. For each configuration giving rise to appropriate gain and phase relationships, the recording angle, image distortion and reverberation distribution are analysed. I Three-channel stereophony in a 5.1 channel context This paper presents coincident stereophonic microphone technique for playback on the three front channels of 5.1 loudspeaker surround systems. Our aim is certainly not to suggest that the two side loudspeakers be ignored. Most program production situations do not lend themselves to full-circle surround recording, and in these situations it is appropriate to consider the signals for the front loudspeakers separately from those for the side speakers. This is perhaps most obvious in location recording for video where there are numerous sources of interfering sound and the microphone technique that is used must have higher overall sensitivity in the direction of the desired sounds and less sensitivity in the directions of the unwanted noise. Directional coincident mic arrays for the three front loudspeakers allow boom-mounting of mics, simplifying the creation of multi-channel surround program. The surround speaker signals can be a combination of well-placed room mics, reverberation returns, and created ambient effects.

Even recording sessions of classical music sometimes occur in non-ideal locations where some deleterious aspect of the room sound must be reduced. The use of a microphone array for full-circle surround recordings of live events with audience presents a very difficult challenge. The placement must simultaneously encode reverberation at the right level from the right directions, provide a good ensemble balance with good imaging, and keep the level of audience sound from becoming distracting during the program or overpowering during applause. Many surround playback systems have small band-limited speakers for the side channels. Some surround recording techniques, such as Ambisonics, depend upon the contributions of all of the loudspeakers to achieve correct localization, even in the front. If a program may be played on systems where there is a difference in frequency response between the various loudspeakers, then such recording techniques are not a good choice. The use of three loudspeakers can provide more accurate imaging than two speakers subtending the same total angle, but five speakers are not enough to allow creation of phantom sources all the way around a listener. When the left front and left rear speaker are equidistant from the left ear they are also equidistant from the right ear of a forward facing listener. It should be obvious that amplitude and time-of-arrival differences between these loudspeakers will do little to create interaural differences that would allow the listener to locate the sound as occurring between the loudspeakers. A field of diffuse sound can be created with only five loudspeakers l, therefore it may be most practical to make recordings where the ambient sotmds that provide a sense of the performance environment come from all around, while the important direct sounds come from a highly accurate soundstage in the front. There seems to be some aesthetic justification for this pragmatic necessity in that performances of music and drama generally take place in front of, rather than around, the audience. Even popular electronically created music conforms to this performance paradigm, with many important sounds being confined to the exact centre of the front.

Early stereophonic systems used three transmission channels and three loudspeakers 2. Two-channel stereophony became standard because of the available channel capacity of delivery media, not because of any inherent strength over stereophony with more transmission channels. Three-speaker stereophony improves upon two-speaker stereo in imaging accuracy and consistency of image focus 3 or it can improve the accuracy of frequency response across the sound field and give much greater 'clarity '4. In two channel stereophony the distance of the phantom source is always close to the distance of the loudspeakers. An extra front loudspeaker increases the size of the listening area by a factor of three 5, allowing more people to be seated in the optimum listening area, allows practical furniture placements to fall within the listening 'sweet-spot', and accommodates the movements one makes even when listening attentively. Five loudspeaker systems with discrete side channels can greatly enhance perceived spaciousness and envelopment 6, but many of the necessary lateral low frequency images can also be produced by three frontal loudspeakers only. Michael Gerzon has described several techniques for recording three channel stereo using spaced single and stereo microphones, and spaced and coincident matrix recording techniques. He notes that frequencydependent signal processing is required for most techniques 7. Panning of multiple mono sources into a three-channel mix requires pan-pots which are psychoacoustically appropriate for the three-channel case 8. Optimum microphone arrays for three-channel stereophony are easier to create with second-order microphone patterns, however there are no commercial producers of such microphones (which achieve a very directional pattern from the difference between two closely-spaced pressure-gradient micsg). The recordist wishing to make high-quality stereophonic three-channel recordings has very few tools from which to choose. A high quality known mic with a non-optimum basic polarpattern might be a better choice than an unknown 2nd-order mic. The reduced horizontal directivity of a second order mic is complimentary with the addition of a centre loudspeaker, as the recording angle of three approximates that of two conventional mics. The reduced vertical directivity must be considered also, as the attenuation of reflected

sound from above and below might adversely effect the direct-toreverberant balance of recordings. There are many reference guides available to anyone wishing to begin two-channel stereophonic recording, describing the available choices of microphone pattern and direction and the consequences of each t '_l. The number of choices in three channel stereophony is considerably larger as is the proportion of 'wrong' choices. It is quite possible (perhaps likely) that randomly selected three-channel recording techniques will deliver performance that is worse than that of a twochannel recording matrixed to three speakers. What follows is a first step in the evaluation of all of the available techniques with conventional microphones against such desirable and quantifiable criteria as imaging accuracy, image focus, size of listening area, and distribution of reverberant energy. 2 Evaluation of stereophonic systems: what should be measured? In recording and reproduction systems there is a transmission path between the microphone and any processing used, and another transmission path in each channel between the final mix and the loudspeaker itl the listening environment. The performance of transmission paths that do not use lossy data reduction is evaluated by quantification of the differences between input and output. Any change that is within the bounds of audibility in the reproduced signal is an undesirable distortion. Evaluation of these paths is not necessarily a trivial problem, but it is extremely simple in comparison to the evaluation of a whole stereophonic system from the source of an acoustic sound in one environment to its perception in another. The sound field in one space will be extremely different from the sound field in the other if only five audio-bandwidth channels are used to convey the acoustic 'information' between them. The design of a recording technique is conceptually similar to the design of a audio bit-rate reduction system for transmission on a very low bandwidth channel: it is an effort to transmit as much of the useful information as possible in the limited capacity that is available. Instead of seeking accurate reproduction of a soundfield, one may seek to reproduce for the listener 4

a more or less accurate impression of that soundfield. Often one seeks to produce the impression of a soundfield that never existed in actuality, as mentioned above in relation to dealing with undesirable factors in the recording environment. The ability to manipulate the perception of the soundfield is an important element in the recording art. The product must have good musical balance, good acoustical impression and a pleasing tone on each instrument, even if the musical event which it represents did not. To evaluate such a system one must know what to measure. The important criteria for enjoyment of a reproduced performance in the home are probably similar to those for its enjoyment in the original performance space, so the quality dimensions used in evaluating auditoria may be useful tools. Important subjective attributes such as envelopment, intimacy, and loudness can all be at least roughly correlated to some objective physical measure 12. Unfortunately, the subjective parameters do not all vary independently, mapping between subjective attributes and physical parameters is not always one-to-one, and where relative importance of two parameters can be measured it may vary between persons. An overall method of scoring 'sound quality' is very far away and perhaps not possible at all. In the absence of a more global measure one must resort to assessment of the physical parameters that are known to contribute to subjective parameters that make up 'sound quality'. Humble acknowledgment that one may be ignorant of very important parameters is also important. 3 Subjective and practical characteristics of good stereophonic techniques The aphorism that ideal reproduction should sound 'just like the original' is a good starting point. The ear is selective, in natural hearing it will focus upon sounds of interest and the listener is less conscious of interfering sounds. When sound is reproduced through a limited number of loudspeakers much of this focusing ability is lost, and therefore sound that was innocuous or pleasant in the original environment, such as reverberation, can become distracting or interfering. The development of directional microphones was provoked partly by the need to make the 5

sound recorded on film sets less 'distant', good stereophonic techniques should also allow the ability to reject unwanted sound, both for control of reverberation and for help in isolation from other undesired sound. Accurate sound-image localisation is an obvious attribute of hearing in natural spaces. The inventors and pioneers of stereophonic sound were well aware of some of the mechanisms behind localisation ability, and their stereophonic techniques were designed to exploit them as accurately as possible TM. Although additional mechanisms of localisation are known (and used in transaural stereo and spectral stereo processes 15) the fundamental theory behind loudspeaker stereophony is not much different than it was in the 1930s. Much attention has been paid to the distribution of phantom images in two channel stereo systems with microphones at various distances and angles of orientation. Little attention, however, has been given to the distribution of the images of the sounds that occur outside of the recording angle, except by Michael Williams 16. The bulk of the reverberant energy will often come from behind the microphone array, and except for the portion that is below audibility due to the off-axis attenuation of the microphones, all of this energy will be reproduced as phantom images somewhere in the soundstage or as spurious real images at the loudspeaker locations. A good microphone technique will have good imaging accuracy within the stereophonic recording angle, and sound from outside the stereophonic recording angle should be evenly distributed in the reproduced sound stage. The real images produced at the loudspeakers in two channel stereophony are a problem when the microphone technique has excessive sensitivity to reverberant energy. All of this energy 'stacks up' in one clearly defined location, and this location is quite different both timbrally and in image size from the bulk of the rest of the soundstage. Sounds produced by more than one loudspeaker present multiple sound arrivals at the listeners ears. These are not normally perceived as comb filtering tt, but the sound is distinctly different from the same sound reproduced in the same position by a single loudspeaker. Engineers have learned to avoid hard loudspeaker panning of signals when this effect on image focus is undesired, and with three-channel stereo this knowledge will become more important, as there is a loudspeaker in the 6

very important centre position. The very tight image focus at the centre may prove to be a very useful effect in mixing, but for natural sounding reproduction of sotmd the recording technique should give a consistent image size and timbre regardless of its position in the soundstage. One essential attribute of concert hall quality is 'spatial impression' or 'listener envelopment' which relates to lateral energy - energy which arrives at the listener from the sides 18. The strongest effect is from sounds arriving from near plus and minus ninety degrees, but there is also an effect from reflections that occur within the reproduced angle 19. With three loudspeakers the sounds important for the impression of envelopment can be reproduced at apparent positions beyond ninety degrees, so a good stereophonic system should convey this attribute of the original soundfield. One advantage of home multi-media is the link between sound and video. Many people, in order to take advantage of the sotmd effects within video programs, have set up their sound reproducing equipment symmetrically around the television set. They have then listened from a central position, and thus experienced stereophonic imaging for the first time although the necessary equipment may have been in their home for several decades. Enthusiasts may wish to restrict themselves to a central seat, but even experienced listeners doing critical listening move beyond the limits where low-frequency localisation works. In a good stereophonic system, the imaging accuracy should not change greatly for off-centre listeners, and the sound should not collapse into the nearest loudspeaker. A good stereophonic system should not create program that is inherently flawed if it is summed for playback on two-channel or monaural systems, and should be usable in everyday practice. Small movements should not cause excessive image shift on dialog, and an ideal system is compact enough and light enough for use on a boom.

4 Meeting the criteria: auditory localisation and stereo imaging The human auditory system is remarkable in its ability to discriminate between sounds in extremely complex soundfields. The sound is evaluated at two points in space, and solely from the information provided by the ears can be separated into different 'streams' each corresponding to a particular auditory 'object' or source of sound. The spatial location of the source of a sound contributes to the stream segregation, as all of the spectral components coming from a particular point in space are likely to originate from the same source, especially if they have harmonic relationships with one another. On the other hand, two sounds arriving from different directions can be 'grouped' by the auditory system into one perceptual object if they are similar enough in timbre or onset time. Once grouped, the stream takes on only one spatial location. Often only a temporal or spectral portion of a sound is heard, because of masking effects, intervening objects, or the acoustical filtering of the environment 2. (Concert halls can exhibit 'seat-dip', where the spacing of the seats causes attenuation of the sound form the direction of the stage to the extent that the first audible arrival at the listener is a reflection from another direction.) The auditory grouping of sounds contributes to their spatial localisation, but the spatial location of sounds also contributes to their grouping or segregation. Several effects that are used by the auditory system to determine the location of a sound are of particular interest, and these do not always provide the same information, so there is 'competition' between them. Simple rules do not adequately explain which will 'win' in any given situation, although some seem generally more robust than others. The pinnae, or outer flaps of the ears, provide a direction dependent filtering of the sound, which is not perceived as timbral difference although it is extreme in its effects upon frequency response. A much broader spectral effect results from shadowing of the head and reflection from the torso, this is a binaural effect where intensity differences between the ears have a dominant role. These are the strongest systems in natural hearing, and it is interesting that accuracy of localisation depends upon familiarity with the timbre of the sound. If the timbre of a source is altered enough to make it sufficiently different from any known sound, listeners will localise it very poorly at first but

will then improve with repeated presentations 2_. Unfortunately pinna cues are highly individualised and even localisation using broader spectral effects is difficult to apply to the reproduction of natural sound sources with loudspeakers. Centred forward-facing listeners will, with appropriate amplitude differences between the loudspeakers, localise low frequency sounds very accurately because the sum, at each ear, of the two loudspeaker signals creates inter-aural phase and time differences that closely replicate the phase and time differences of a single source at that position in natural hearing. At higher frequencies sttmming localisation takes place, where the early-enough multiple arrivals of a sound are integrated into a common magnitude and direction of arrival. (The crossover between these mechanisms is often given as about 700 Hz, but is a broad, program-dependent region). Gerzon states that the equation for the energy-vector direction governing this higher-frequency localisation also applies to low frequency inter-aural phase difference localisation when the loudspeakers are not phase coherent, such as when the listener is off centre 22. Localisation in loudspeaker stereophony is certainly weakened by the reliance upon summing amplitude cues and inter-aural time difference cues to the exclusion of other stronger methods. Performance will be best if the available cues agree and if the characteristics of the program material do not happen to make other localisation cues more plausible. 5 Stereo imaging model This work follows Gerzon's position that the spatial images created by stereophonic systems should be close to accurate by both hearing mechanisms, and that the localisation should remain as steady as possible as the listener moves in a wide listening area. Rather than attempt to create a perfect stereophonic recording technique for three channels, effort is made to evaluate the coincident techniques with conventional microphones that are likely to be used, in order to narrow the selection to those which will give the best performance according to psychoacoustics of three-loudspeaker reproduction.

In a stereophonic system each loudspeaker emits a sound of some magnitude, which arrives at the listener from some direction. In this case, three loudspeakers are assumed, all equally distant from a central listener, subtending a total angle of 60 degrees. The left speaker is a t minus 30 degrees relative to the centre loudspeaker, and the right loudspeaker at plus 30 degrees. If the vector magnitude and direction of individual sources is calculated by the gain of the signal in each loudspeaker, the result is the velocity magnitude (rv) and velocity direction (ov). Energy vector magnitude (re) and energy vector direction (Oe) result from using the squares of the loudspeaker signal gains. The classic low frequency ITD image position for centred listeners is described by (Ov). That dictated by summing localisation for 700 Hz to 3.5 khz, as well as by phase localisation for off-centre listeners, is described by (Oe). Image movement as the listener moves off-centre is proportional to (1-(re)). The loudspeaker gains for a three-channel microphone system correspond one-to-one with the gains of the microphones. With knowledge of the polar pattern of the microphones used and the direction in which they are pointing, the corresponding loudspeaker signal gains are easy to determine. For this particular work, the left and right microphones were calculated as an M-S pair, so the resulting equations are: where L = K(Cfl + (1-Cfl)cosl3) + (1- K)cos(13-90 ) C = Cf2 + (1- Cf2)cosO R = K(Cfl + (1-Cfl)cosO) - (1- K)cos(O-90 ) K = mid-to-side ratio Cfl = pressure to pressure-gradient ratio of the mid mic Cf2 = pressure to pressure-gradient ratio of the centre mic = angle of incidence of the sound source 10

The values of Cf were chosen to be zero and integers to 8, divided by 8, in order to limit the selection to commercially available microphone polar patterns and the 'in-betweens' that are available on some multipattern mics. Cf = 0 = Figure-8 C f = 2/8 = Hyper-cardioid C f = 3/8 = Super-cardioid C f = 4/8 = Cardioid C f = 6/8 = Sub-cardioid Cf = I = Omnidirectional The gains of the left and right microphones were normalised to one on axis. Perfectly maintained polar patterns are assumed. The performance of real microphones may vary, and will tend to be worst in the frequency range where amplitude differences are most important for creating ITD cues. The value of K was found to be 0.377 for all combinations that replicate the gain relationships for O = 0 in Gerzon's optimal three-channel panpot law. With this starting point, gains in each channel were determined for all of the possible combinations of Cf1 and Cf2. Three degree increments were found to be adequate for this purpose. For each increment of 0, (rv), (Ov), (re) and (Oe) were calculated. The ten examples which best conformed to the requirement that (Ov) -- (Oe) across the majority of the reproduced sound stage were selected for further evaluation, which the knowledge of the energy and velocity angles and energy vector magnitude aids. Illustrations of the selected arrays and their characteristics are given at the end of this paper in Figures 1-10. 6 Attributes of the different arrays 6.1 Rejection of unwanted sound As all three microphones in these arrays tend towards hypercardioid and one is always facing front, the example with best rear attenuation has less than 9 db of difference between its highest and lowest 11

sensitivities. This is still enough to be useful in some situations, although in noisy environments more forward-dominant arrays could be desirable. 6.2 Image accuracy The equality of (Ov) and (Oe) through the bulk of the stereophonic recording angle for most of these arrays and their linear relationship to O in the same range indicate that phantom image location will be accurate. All of these arrays can be described as widely aimed left and right mics with the centre mic and loudspeaker filling the resulting 'hole-in-the-middle'. The sixty degree soundstage reproduced between the loudspeakers is a very accurate reduction of a much wider soundstage in the recording. The stereophonic recording angle should be described as the intersection of (Ov) and the _+30degree angle contained between the loudspeakers. (ov) is linearly related to _ far outside of the loudspeakers in these stereophonic arrays. 6.3 Image size and timbre The value of (re) remains consistent for the entire stereophonic recording angle and only varies far from the ideal value of i toward the side loudspeakers. The worst value recorded is 0.86. This is in contrast to the model panning law for placing sources into three loudspeakers, and is a consequence of the anti-phase portion of the microphone pattern in the opposite channel having an effect. For every angle of O the sound is reproduced by at least two loudspeakers, therefore there are no problems of excessive clarity. 6.4 Conveyance of sense of intended performance environment Listener envelopment is related to reflections coming from directions near + and -90 degrees. Work of Griesinger and of Marimoto indicates that low frequency energy below 500 Hz is especially important to the impression of envelopment. A very interesting effect of the threechannel recording techniques presented here is that the low frequency localisation extends to beyond 90 degrees. They should therefore theoretically be adequate for producing the impression of envelopment even without the use of additional surround loudspeakers. (Oe) does not continue past the loudspeakers but in all cases does spread the rear images evenly into the front soundstage. 12

6.5 Size of listening area The high value of (re) especially as images are nearer to the centre of the soundstage, indicates that the listening area should not be limited to a single sweet spot. 6.6 Compatibility with systems of fewer channels Summing of the three channels into a single channel yields a single microphone of polar pattern Ghat will tend more toward Cfl than Cf2, due to the gain normalisation of the matrixed L and R signals. Summing of the three channels into two will produce left and right signals that approximate conventional two-channel coincident technique. 6.7 Usability on dialog and boom Unfortunately, none of the psychoacoustically optimal combinations of K, Cfl, and Cf2 proved to have Cfl= Cf2. Had this been the case, conventional stereo mics could have been used. A combination of a stereo microphone and a single channel microphone will perhaps create a boom combination that is not unwieldy. The very wide stereophonic recording angle and the calculations above indicate that small source movements will not cause large image shifts, so aside from the comparatively low attenuation of sound from behind the array, these arrays should be quite compatible with use on dialog. 7 Conclusion Selection of stereophonic microphone techniques for three~channel reproduction has been described as a problem, and several subjective and practical criteria are presented as being important. The subjective criteria are equated with specific predictable and measurable phenomena, and these are used to select and evaluate the many possible choices of microphone pattern and direction. Several three-channel microphone arrays are shown to be 'good', and these are presented with accompanying performance charts. 13

References lt. Holman, 'The Number of Audio Channels', looth AES Convention, (May 1996) 2W. Snow, 'Basic Principles of Stereophonic Sound', J. Society of Motion Picture and Television Engineers, vol. 61 (Nov. 1953), Reprinted in: Stereophonic Techniques, an anthology published by the Audio Engineering Society, New York (1986) 3M. Gerzon, 'Three channels: the Future of Stereo?', Studio Sound vol.32, No. 6, pp. 112-125 (June 1990) 4Holman, ibid. (1) 5G. Theile, 'On the Performance of Two Channel and Multi-Channel Stereophony', 88th AES Convention, preprint 2887 (March 1990) 6D.Griesinger, 'Spaciousness and Envelopment in Musical Acoustics', 101st AES Convention, preprint 4401 (November 1996)?M. Gerzon, 'Microphone Techniques for Multichannel Stereo', 93rd AES Convention, preprint 3450 (October 1992) 8M. Gerzon, 'Panpot Laws for Multispeaker Stereo', 92nd AES Convention, preprint 3309 (March 1992) 9H. Olsen, 'Gradient Microphones', Journal of the Acoustical Society of America, vol.17, No. 3, pp. 192-198 (January 1946) 1 Stereophonic Techniques, an anthology published by the Audio Engineering Society, New York (1986) llm. Williams, 'Unified Theory of Microphone Systems for Stereophonic Sound Recording',82nd AES Convention, (March 1987) 14

12j.S.Bradley, 'Contemporary Approaches to Evaluating Auditorium Acoustics', AES 8th International Conference (May 1990) 13A. Blumlein, 'British Patent Specification 394,325 (Directional Effect in Sound Systems)', Reprinted in: Stereophonic Techniques, an anthology published by the Audio Engineering Society, New York (1986) 14Snow, ibid. (2) lsd. Begault, 3-D Sound for Virtual Reality and Multimedia, Academic Press, Boston, (1994) 16Williams, ibid. (12) 17G. Theile, 'On the Naturalness of Two-Channel Sound', Proceedings of a Symposium on Perception of Reproduced Sound, (Gammel Avern0es, Denmark 1987) 18j. Bradley and G. Soulodre, 'Listener Envelopment: An essential part of good concert hall acoustics', lournal of the Acoustical Society of America, Vol. 99(1) p.22 (January, 1996) 19M. Barron and A Marshall, 'Spatial Impression Due to Early Lateral Reflections in Concert Halls: the Derivation of a Physical Measure', Journalof Sound and Vibration, Vol. 77, 211-232 (1981) 2 A.Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound, MIT Press, Cambridge, USA (1990) 21Begault, ibid.(16) 22Gerzon preprint 3309 ibid. 15

Figure la. Three-channel coincident microphone array: / _,, i1 \\ mid pattern Cfl = 0.125 mid/side ratio K = 0.377 / / left pattern CfL = 0.0625 _'.-... f". "'-. ' N,,/ /s xx. left axis OL =63 ",,...( ; ' (gain of left mic = CfL+ (1- CfL)*COS(O-OL) ', / / \, /s centrepattern Cf2 = 0... Figure lb. Three-channel microphone technique amplitudes: 7,50E-01 J'." x ". 5.00E-01,/ "xxxx ",,, O.OOE+O0 I, ; / I,( _ '_ : _x I,' -2.50E-01 " \ " \x -5,00E-01 i',,., ', / / /. ' " x_xx x -I.OOE+O0 o o _, o =o g o o o o o.2 '7 _ '7 _ Horizontal axis indicates sound source position in the full circle around the microphone array Vertical axis indicates microphone gain Left microphone is solid line. centre is dashed, right is dotted. 17

Figure lc. Image Localisation Accuracy: Real sound source location in the recording space (HORIZONTAL) mapped against Perceived Auditory Location (VERTICAL) -- Ov ITD (IPD) localisation... Ce Summing amplitude Iocalisation 120-30.60.o0-120 '7, '7... Only areas between +30 on the vertical scale will be reproduced between the loudspeakers, The intersection of the solid line with +30 P,A,L, defines the stereophonic recording angle. Perfect 360 location would be represented by a diagonal line from lower left to upper right, Figure Id. Image focus and stability with lateral movement:... re Energy vector magnitude rv Velocity vector magnitude 1.25 1 0.75 0.5 - - i i i I i I I i - i _ i i o o o o o o... _ o m o), c0 _ o_ m The energy vector magnitude is a good indication of image focus; the maximum value of (1) corresponds to the focus of sound coming from a single loudspeaker. Image focus should be uniform across the S.R.A. I-re also describes the amount of image movement that can he expected for off- centre listening, 1-re should be as close as possible to 0. 18

Figure 2a. Three-channel coincident microphone array: // _.,?/ "x mid/side ratio K = 0.377 :,_... left pattern Cfg = 0.175 /,, ', II/I.. left axis OL = 69,..,.,:_. (gain of left mic = CfL+ (1- CfL)*COS(_D-OL) ',,,,,' centre pattern Cf2 = O. Figure 2b. Three-channel microphone technique amplitudes: 0,75! /." _x, //,. x x. 0.5 d //.'" "x '", 0,2_ - /,, x ', -0.25 "' / " ".0.75 / / Horizontal axis ndicates sound source position in the t'ull circle around the microphone array. Vertical axis indicates microphone gmn. Left microphone is solid line, centre is dashed, right is dotted. 19

Figure 2c. Image Localisation Accuracy: Real sound source location in the recording space (HORIZONTAL) mapped against Perceived Auditory Location (VERTICAL) -- 0v ITD (IPD) localisation... 0e Summing amplitude loealisation 12o 30 0,, -30-60 -90-120 Only areas between +30 on the vertical scale will be reproduced between the loudspeakers. The intersection of the solid line with +30 P.A.L. defines the stereophonic recording angle. Perfect 360 location would be represented by a diagonal line from lower left to upper right. Figure 2d. Image focus and stability with lateral movement:... re Energy vector magnitude --- rv Velocity vector magnitude 1,5 /' _ i/ '\ x \ I! 1.25 _\ // 1 \x / / " / 0,75 0._._... i... I- --I... q... I----i i i-- i i The energy vector magnitude is a good indication of image focus; the maximum value of (1) corresponds to the focus of sound coming from a single loudspeaker. Image focus should he uniform across the S.R.A. l-re also describes the amount of image movement that can be expected for off- centre listening. 1-re should be as close as possible to 0. 20 [

Figure 3a. Three-channel coincident microphone array: // ',, II \1 mid mid/side pattern ratio Cfl K = 0.377 1., '.....,'!'"'"', x _, [ /// left pattern CfL = 0.377 _(_} -.z_.'\ " \"\ /": left axis el = 90,"' ",, ' " (gain of left mic = CfL+ (1- CfL)*COS(O~I_L) ", / x // centrepattern Cf2 = 0. '... Figure 3b. Three-channel microphone technique amplitudes: 0,75 /.' x '-.,' \,, /.' x '. 0.5 /'.., \,,,,, /. x 0.25 " / " \ -,, //,,, \\ o.' /'_, -0,5 _I / / ', x / x -0.75 'i // xx Horizontalaxis indicate soundsourcepositioninthe fullcirclearoundthemicrophonearray. Verticalaxis indicatesmicrophonegain. Left microphone is solid line, centre is dashed, right is dotted. 21

Figure 3c. Image Localisation Accuracy: Real sound source location in the recording space (HORIZONTAL) mapped against Perceived Auditory Location (VERTICAL) -- 0v ITD (IPD) localisation... 0e Summing amplitude localisation 120 3o -30-60 -00-120 Only areas between +30 on the vertical scale will be reproduced between the loudspeakers. The intersection of the solid line with +30 P.A.L. defines the stereophonic recording angle. Perfect 360 location would be represented by a diagonal line from lower left to upper right. Figure 3d. Image focus and stability with lateral movement:... re Energy vector magnitude rv Velocity vector magnitude 1,25 1 0,75 o,a i i i _ i i i i i i i i m * c_ i._ co The energy vector magnitude is a good indication of image focus; the maximum value of (1) corresponds m the focus of sound coming from a single loudspeaker. Image focus should be uniform across the S.R.A, l-re also describes the amount of image movement that can be expected for off- centre listening. 1-re should be as close as possible to 0. 22

Figure 4a. Three-channel coincident microphone array: / "\ // \\ mid pattern Cfl = 0.25,"Y '" '"",, mid/side ratio K = 0.377 " left pattern CfL = 0.121 left axis OL = 66 (gain of left mic = CfE+ (1- CfL)*COS(O-OL) centre pattern Cf2 = 0.125 Figure 4b. Three-channel microphone technique amplitudes: 0.75,.,.,, o.5_ / / X ', N 0 _ -*_///"_--_--_' + in _x, x 4----- I ', ', -0.25 _tl/ /'. x II',, / ' '0.5 _1 ',,,',' -0.75 I... 1 -t _ Horizontal axis indic:lies sound source position in iht full circle around the microphone array. Vertical axis indicates microphone gain, Left microphone is solid lille, centre is dashed, right is dotted. 23

Figure 4e. Image Localisation Accuracy: Real sound source location in the recording space (HORIZONTAL) mapped against Perceived Auditory Location (VERTICAL) 0v ITD (IPD) localisation... e Summing amplitude localisation 120 go /la 306 -_-_ ---_-_-- _ o :: 77-- --- -... -30-60 -90-120 Only areas between :t:30 on the vertical scale will be reproduced between the loudspeakers. The intersection of the solid line with +30 P.A.L. defines the stereophonic recording angle. Perfect 3600 location would be represented by a diagonal line from lower left to upper right. Figure 4d. Image focus and stability with lateral movement:... re Energyvectormagnitude rv Velocityvectormagnitude 1,25 0.75 '7 '7, ' ' The energy vector magnitude is a good indication of image focus; the maximum value of (l) corresponds to the focus of sound coming from a single loudspeaker. Image focus should be uniform across the S.R.A. l-re also describes the amount of image movement close as possible to 0, that can be expected for off- centre listening, l-re should be as 24

Figure Sa. Three-channel coincident microphone array: // x\ / // \\ midpattern Cfl = 0.5,, mid/side ratio K = 0.377 ','., _ :,, /I/ "% '"' ' J/"-'-_'N! left axis OL = 0.72 --'"' leftpattern CfL 0.225 7_-: : (gain of left mic = CfL+ (1- CfL)*COS(_-OL) centre pattern Cf2 = 0.125 Figure 5b. Three-channel microphone technique amplitudes: / ',, '% /,, \, 0.25 _ / x, t //...,,: x\,, -0.5 - //', - ' _ x -0.75 - - _ -1" '7 '7 Horizontal axis indicates sound source position in the full circle around the microphone array, Vertical axis indicates microphone gain. Left microphone is solid line, centre is dashed, right is dotted. 25

Figure 5c. Image Localisation Accuracy: Real sound source location in the recording space (HORIZONTAL) mapped against Perceived Auditory Location (VERTICAL) -- v ITD (IPD) localisation... 0e Summing amplitude loealisation 120 3O 0... _ -30-60 -90-120 '"7 '7... Only areas between +30 on the vertical scale will be reproduced between the loudspeakers. The intersection of the solid line with :t:300 P.A.L. defines the stereophonic recording angle, Perfect 360 location would be represented by a diagonal line from lower left to upper right, Figure 5d. Image focus and stability with lateral movement:... re Energy vector magnitude --- rv Velocity vector magnitude 1.25 1 0.75 0,5 I l-.q... t... t... )-----_ I --_ I I I _+1 The energy vector magnitude is a good indication of image focus; the maximum value of (1) corresponds to the focus of sound coming from a single loudspeaker. Image focus should be uniform across the S.R.A. l-re also describes the amount of image movement that can be expected for off- centre listening. 1-re should be as close as possible to 0. 26

Figure 6a. Three-channel coincident microphone array: midpattern Cf1 : 0.125 // ; _/ / '"', mid/side ratio K = 0.377 left pattern ell = 0.063 / Z"Nf/_,i'%,:,,,:,'% '"-,:,Y left axis OL = 63,,,...,'--,-/,_ f (gain of left mic = CfL+ (1- CfL)*COS(O-OL) centre pattern Cf2 = 0.25 Figure 6b. Three-channel microphone technique amplitudes: 0,75,""' '"x '""- // ; //,'" "x ' /, x ', 0.5 _ /,',,,, 0 '-- i /.z j,' _ - x\i - I.i_----_ -0.25,// -0.5 ' _2',7' -0,75 Horizontal axis indicates sound source position in the full circle around the microphone array. Vertical axis indicates microphone gain. Left microphone is solid line, centre is dashed, right is dotted. 27

Figure 6c. Image Localisation Accuracy: Real sound source location in the recording space (HORIZONTAL) mapped against Perceived Auditory Location (VERTICAL) -- OvlTD (IPD) Iocalisation... Ce Summing amplitude Iocalisation 120 90 0A0 J... -60 -ao,o0 /-"f_' -120 7 Only areas between :t:30 on the vertical scale will be reproduced between the loudspeakers. The intersection of the solid line wtth +30o P.A.L. defines the stereophonic recording angle, Perfect 360 location would be represented by a diagonal line from lower left to upper right. Figure 6d. Image focus and stability with lateral movement:... re Energy vector magnitude --- rv Velocity vector magnitude 1.25 0,75 The energy vector magnitude is a good indication of image focus; the maximum value of (1) corresponds to the t'ocus of sound coming from a single loudspeaker. Image focus should be uniform across the S.R.A. l-re also describes the amount of image movement that can be expected for off- centre listening, l-re should be as close as possible to 0, 28

Figure 7a. Three-channel coincident microphone array: mid/side ratio K = 0.377 mid left pattern Cfl CfL =0.175 0.375?,,_//,,,'_-_'"-,,,,) _/_/_/,' left axis OL = 69 '_ (gain of left mic = CfL+ (1- CfL)*COS(O-OL) centre pattern Cf2 = 0.25 Figure 7b. Three-channel microphone technique amplitudes: 1 0.75 0.25 0-0.25-0.5.... -. -0_75. -1 Horizontal axis indicates sound source potation in the full circle around the microphone array. Verlical axis indicates microphone gain. Left microphone is solid line, centre is dashed, right is dotted. 29

Figure 7c. Image Localisation Accuracy: Real sound source location in the recording space (HORIZONTAL) mapped against Perceived Auditory Location (VERTICAL) -- v ITD (IPD) localisation... e Summing amplitude localisation 120 9O -30 _ -60-00 / -120 Only areas between +30 on the vertical scale will be reproduced between the loudspeakers. The intersection of the solid line with +30 P.A.L. defines the stereophonic recording angle. Perfect 360 location would be represented by a diagonal line from lower left to upper right. Figure 7d. Image focus and stability with lateral movement:... re Energy vector magnitude rv Velocity vector magnitude 1.25 0.76 c_ co, e4 u3 co _,,_ t,,- The energy vector magnitude is a good indication of image focus; the maximum value of (1) corresponds to the focusof sound coming from a single loudspeaker. Image focus should be uniform across the S.R.A. I-re also describes the amount of image movement that can be expected for off- centre listening, l-re should be as close as possible to 0. 3O

I Figure 8a. Three-channel coincident microphone array: jr_ mid pattern Cfi = 0.75 -'..."'",, mid/side ratio K = 0.377 left pattern CfL = 0.310 ',," left axis 13L = 81 (gain of left mic = CfL+ (1- CfL)*COS(O-OL) centre pattern Cf2 = 0.25 Figure 8b. Three-channel microphone technique amplitudes: 075 0.25 0.5 _'',,x,, I 0 ['"",? > _/////I,,,F'"' I 'x xl_'x I/ -0.75 2-1 Horizontal axis indicates sound source position in the full circle around the microphone array. Vertical axis indicates microphone gain. Left microphone is solid line, centre is dashed, right is dotted. 31

Figure 8c. Image Localisation Accuracy: Real sound source location in the recording space (HORIZONTAL) mapped against Perceived Auditory Location (VERTICAL) -- Ov ITD (IPD) localisation... 0e Summing amplitude localisation 120 90... _x -30.60-90 / -120 Only areas between :t:30 on the vertical scale will be reproduced between the loudspeakers. The intersection of the solid line with +300 P.A.L. defines the stereophonic recording angle. Perfect 360 location would be represented by a diagonal line from lower left to upper right. Figure 8d. Image focus and stability with lateral movement:... re Energy vector magnitude rv Velocity vector magnitude 0.75 0._ I ----_... I... -I _--'1... I- I I I I I I I'-- '7 '7 The energy vector magnitude is a good indication of image focus; the maximum value of (I) corresponds to the locus of sound coming from a single loudspeaker. Image focus should be uniform across the S.R.A. 1-re also describes the amount of image movement that can be expected for off- centre listening, l-re should be as close as possible to 0. 32

Figure 9a. Three-channel coincident microphone array: midcfl mid/side ratio K = 0.377 left pattern CfL = 0.310 left axis OL = 81 _,,' (gain of left mic= Cf L+ (1- CfL)*COS(I_-OL) centre pattern Cf2 =.375 Figure 9b. Three-channel microphone technique amplitudes: '] 0.25 / -0.5 z '-... -0.75 l.1 z "T '7 ' ' eq Horizontal axis indicates sound source position in the full circle around the microphone array. Vertical axis indicates microphone gain. Left microphone is solid line, centre is dashed, right is dotted. 33

Figure 9c. Image Localisation Accuracy: Real sound source location in the recording space (HORIZONTAL) mapped against Perceived Auditory Location (VERTICAL) -- 0v ltd (IPD) localisation... Oe Summing amplitude localisation 120-90- 3O _._5'-... _'_ -30 * '... '... ' '_ -60 - -90-120 '7 '7 Only areas between ±30 on the vertical scale will be reproduced between the loudspeakers. The intersection of the solid line with +30 P.A.L. defines the stereophonic recording angle. Perfect 360 location would be represented by a diagonal line from lower left to upper right. Figure 9d. Image focus and stability with lateral movement:... re Energy vector magnitude --- rv Velocity vector magnitude 1,5 1.25 1 -" ' ".> J ' *- 0.75 0,5 - -- = t - i *... '7, ' oo The energy vector magnitude is a good indication of image focus; the maximum value of (1) corresponds to the focus of sound coming from a single loudspeaker. Image focus should he uniform across the S,R.A. l-re also describes the amount of image movement that can be expected for off- centre listening, l-re should be as close as possible to 0. 34

Figure 10a. Three-channel coincident microphone array: mid pattern Cf1 = 0. left pattern CfL = 0.377 mid/sideratio left axis K OL = 0.337 90 (gain of left mic = CfL+ (1- CfL)*COS(O-OL) centre pattern Cf2 = 0.375 Figure lob. Three-channel microphone technique amplitudes: 0.75,,' x ', 0.5 '" _ '" 0.25,.,,',,,,, / 0 "'i"_.q// I I. '1'": I _ I I I Ix_ d I o25..?... -0.5 ' -0.75 'x J -1 Horizontal axis indicates sound source position in the full circle around the microphone array. Vertical axis indicates microphone gain. Left microphone is solid line, centre is dashed, right is dotted. 35

Figure 10c. Image Localisation Accuracy: Real sound source location in the recording space (HORIZONTAL) mapped against Perceived Auditory Location (VERTICAL) -- OvlTD (IPD) loealisation... ee Summing amplitude localisation 120 90 ' 60 +..+ -60-90 -120 "T, "T Only areas between +30 on the vertical scale will be reproduced between the loudspeakers. The intersection of the solid line with +30 P.A.L. defines the stereophonic recording angle. PeH'ect 360 location would be represented by a diagonal line from lower left to upper right, Figure 10d. Image focus and stability with lateral movement:... re Energy vector magnitude --- rv Velocity vector magnitude 1,25 / _.. 0.75 g g g _ g _, g g g 2 g The energy vector magnitude is a good indication of image focus; the maximum value of (1) corresponds to the focus of sound coming from a single loudspeaker. Image focus should be uniform across the S.R.A. l-re also describes the amount of image movement that can be expected for off- centre listening, 1-re should be as close as possible to 0. 36