Introduction 1 This chapter introduces the project. First a brief description of surround sound is presented. A problem statement is defined which leads to the goal of the project. Finally the scope of the project is defined. 1.1 Surround sound Origins of surround systems lie in the movie industry, where the reproduced sound is tightly connected to the action on a movie screen. In this respect, the goal of any surround system is to place the listener at the scene with the sound arriving from all around him, hence the term surround. In order to produce such an impression for a listener, a surround recording needs to contain two main aspects: Sound source localization: Render impression of spatially distributed sound sources. Ambience impression: Render impression of presence in an acoustical environment. Giving the feeling of spaciousness and envelopment. Therefore, it is not only necessary to correctly position sound sources or to depict their movement, but also to render a particular environment (ambience) with all of its acoustical properties (early reflections, reverberation, existence of echoes, to name a few). For example, if a movie scene takes place in a cathedral, the listener needs to feel the ambience, in this case the usual long reverberation of the cathedral. This is as important as being able to correctly determine positions of a choir and, say, a priest in such a scene. To sum up, if the mentioned aspects are captured and with a proper mixing recorded on a soundtrack, then a listener perceives a convincing spatial image with the correct impression of ambience. Nowadays it is also common to find music produced as a multichannel surround track. In the case of music, the sound sources (instruments) do not change their positions, but proper localization of parts of an orchestra is important in order to have a perception of the ensemble width of the orchestra. Furthermore, using more than two channels it is possible to make a more convincing environment impression than with a conventional stereo setup. A multichannel setup enables room reflections to be perceived from all directions, like in a real concert hall. However, a surround track is intended to be reproduced in a room (living room, for example) with a certain number of loudspeakers. Therefore, the question arises how to reproduce a surround track in full quality with spatial and ambient information preserved when a surround setup is not available or when it is unsuitable to be used. These situations can be, for example, watching a movie with a surround track in an airplane, or when for proper enjoyment high sound levels are required that may be annoying to others. Group 963 October 12, 2005 22.59 1
Chapter 1. Introduction In such situations a logical and most obvious approach would be to use a convenient sound reproduction device, namely headphones. One such solution for surround sound via headphones is developed and evaluated in this project, and presented in this report. 1.2 Problem statement The problem addressed in this project arises when it is either impossible or inconvenient to use a regular surround sound setup for reproduction of the surround track. Thus, the surround track needs to be reproduced in a different way preserving spatial and ambience information encoded in the surround track. An approach to solve this problem is considered to be via headphones. 1.3 Goal of the project Based on the problem statement a hypothesis can be defined in order to solve the problem using headphones: It is possible, by separately controlling signals sent to each ear via headphones, to recreate in two headphone channels the spatial and ambience information encoded in a multichannel surround track. Therefore, the goal of the project is to construct a system as shown in Fig. 1.1 and see to which degree the sign of equivalence in this figure is correct. Surround Source Our system Figure 1.1: Block diagram representation of the goal of this project. 2 Group 963, October 12, 2005 22.59
1.4. Scope of the project 1.4 Scope of the project This project addresses methods for preserving both spatial and ambience aspects of a surround recording, while listening via headphones. Data necessary for acquiring all of the surround information will be identified in an analysis of the problem. The data will either be found by experimental measurements or by following recommendations or standards in the applicable field. These data will be the input into the core technology of the project, the binaural technology. The implemented system will be evaluated by listening tests. The scope of the project does not include the combined loudspeaker/room equalization. Moreover, the assumption is that the surround track is recorded properly, with all of the spatial and ambience information. If this is not the case, no attempts will be made to correct the recorded spatial/ambience image. 1.5 Binaural reproduction and recording This section describes the idea behind the binaural technology. The aim is to justify that the binaural principle will be able to give an impression of surround sound in headphones. 1.5.1 General principle As descriped in section?? the direction of a sound is determined by analyzing the pressure signal on the left and right eardrum. This directional perception is a result of the HRTF s from the source to the left and right ear, respectively. Therefore the auditory event in a real environment can be duplicated if it is possible to recreate the exact same pressure signal on the eardrums. Therefore the task of binaural recording and reproduction is the one shown in figure 1.2. This setup produces ITD, ILD and spectral cues as one would have with direct listening. Listener Dummy head Figure 1.2: General principle of the binaural reproduction technique. In order to make the spatial perception correct several conditions need to be fulfilled. The artificial head has to be equal to the head of the listener since the shape of the head and the pinnaes determines a specific transfer function to which the signal processing in the brain has adapted. Since there are individual differences between every human being it is only possible to make a correct binaural signal for one person. Fortunately it has been shown that it is possible to obtain credible binaural signals using an average ear [Minnaar et al, 2001]. Group 963 October 12, 2005 22.59 3
Chapter 1. Introduction In order to record the pressure at the earcanal it is necessary that the earcanal and the eardrum of the artificial head are identical to the earcanal and eardrum of the listener. In this matter it has been shown that the pressure at the entrance of a blocked earcanal contains the full directional information [Møller, 1992]. Therefore it is possible to record binaural signals at the entrance of a blocked earcanal and it is not necessary to take the earcanal into account. Hence it is possible to perform binaural playback without taking the human listener into consideration. Furthermore a possibility is to use an artificial head by which recordings can be made with a totally steady head and without human noise such as heartbeats and swallowing. Regarding the use of artificial heads Aalborg University has the artificial head Valdemar which proved to be among the best considering localization [Christensen, 2001]. In this context it should be noted that a well-selected human head still gives a better performance. By using headphones for the reproduction crosstalk is avoided since the ear signals are only sent to the specific ear. However, it is important to note that the headphone transfer function should be flat in order to assure that the correct pressure signal is present at the eardrum. This is especially important for the median plane, as there are no ITD nor ILD (front and back). Since this is rarely the case, an equalization of the headphones is necessary in order to achieve the best spatial impression. 1.5.2 Electrical considerations Source HRTF Microphone Electrical system PTF Possible storage Figure 1.3: Block diagram of the binaural recording/reproduction scenario. The binaural reproduction scenario shown in figure 1.2 can hence be described by the block diagram in figure 1.3. The sound source is filtered through a specific HRTF to each ear. These two pressure signals are picked up by the microphones and hereby converted into an electrical signal. This signal can either be stored or played back directly in the headphones. The headphones transfer function (PTF) describes the relation from the input to the pressure at the entrance of the earcanal. The transfer function from the source to the ears of the listener H STE (s) is hereby: H STE (s) = H HRTF (s)h Mic (s)h El (s)h PTF (s) (1.1) To attain an ideal binaural reproduction the source to ears transfer function should equal the HRTF. Therefore the ideal electrical system should equalize the microphone and the headphones. The electrical transfer function is hereby ideally: H El (s) = 1 H Mic (s)h PTF (s) (1.2) 4 Group 963, October 12, 2005 22.59
1.6. Summary and conclusion In this context it is important to mention that it might not be possible to make this transfer function due to the introduction of an unstable system. Using a electrical system like this it is possible to reproduce the spatial information of the recording situation. 1.5.3 Recording of binaural surround sound The binaural recording using Valdemar gives the possibility of recording ambient sound and reproducing the same situation using headphones. Therefore it is possible to create surround in headphones by placing Valdemar as the listener in a surround sound setup. 1.6 Summary and conclusion This chapter discussed the two most important aspects of a surround sound: localization and environment impression. These aspects are responsible for creating a convincing illusion of being at the scene. Furthermore, different surround sound reproduction setups were presented. It was shown that any incoming direction of a sound can be characterized by the following directional cues: interaural time differences (ITD), interaural level differences (ILD) and the spectral content of the sound entering the ear canal (head and pinna filtering - HRTF). Moreover, it was concluded all of the directional cues are combined in the pairs of HRTFs. It was also shown that, by creating time and/or level differences between different sound sources (loudspeakers), it is possible to create virtual sources, where the physical ones are not perceived any longer. Impression of an environment is created by the direct sound, early reflections and reverberation. Assuming that these are correctly recorded in a surround soundtrack it is essential that they are retained during listening. Since the listening room is also a closed environment it can alter the total perception of the recorded environments. For the case of reverberation the case of coupled rooms was discussed, and for the relation between direct and reflected sound it was concluded that it depends on the position of the listener. In a surround sound reproduction system positional information mainly lies in the front speakers whereas ambience is placed in both front and rear speakers. Dialog is placed in the center channel. A recommendation of a 3/1 and a 3/2 surround setup was presented. If was found that the LFE channel essentially is non-directional. The ambient sound of a surround sound setup can be recreated using headphones by recording the sound in the ears of an artificial or real head. If using the right equalization of the recorded signals the pressure at the eardrums is the same and hence the ambient information is the same. Hereby surround sound is created using a regular pair of headphones. 1.6.1 Project scope To create these binaural signals a surround setup and an artificial or real head is needed. However, it would be expedient to create the binaural surround without the need of the surround setup and the recording equipment. Therefore the scope of the project is to develope a system able Group 963 October 12, 2005 22.59 5
Chapter 1. Introduction to convert the channels of a surround setup into two binaural signals. To test the performance the reference is the recorded signals of a real setup in the ears of the artificial head Valdemar [Christensen, 2001]. This situation is shown in Fig. 1.4. It is chosen to use Valdemar even though a better performance could be achieved using a well selected real head. The reason is the uncertainty in the choice of a real head and the problems accompanying a real head: Use of miniature microphones, human noise in measurements and the stationarity of a human being. Equalization Surround Source Our system Figure 1.4: General scope of the project. The Valdemar recording is meant to serve as a reference for the developed system. In the developement of the system several issues need to be taken into account: Human perception of spatial information The influence of reverberations in the playback scenario Different surround systems The problems occuring when using the binaural reproduction technique Calculation complexity of the system When the system has been developed it is desirable to compare it to the reference recording of Valdemar in a listening test. This test should be a subjective test showing if a representive selection of subjects prefers the recordings or the processed solution. 6 Group 963, October 12, 2005 22.59
Bibliography [Christensen, 2001] Flemming Christensen. recording and playback. AAU, 2001. Binaural Technique with special emphasis on [Minnaar et al, 2001] J. Audio Eng. Soc. Localization with binaural recordings from artificial and human heads. Pauli Minnaar, Søren Krarup Olesen, Flemming Christensen and Henrik Møller. number 5. May 2001. page 323 336. [Møller, 1992] Applied Acoustics. Fundamentals of binaural technology. Henrik Møller. 1992. page 171 218.. Group 963 October 12, 2005 22.59 7