Active control of sound for improved music experience in vehicles

Size: px

Start display at page:

Download "Active control of sound for improved music experience in vehicles"

Charlene Kristin Floyd
5 years ago
Views:

Active control of sound for improved music experience in vehicles Michael Vanhoecke Promotor: prof. dr. ir. Dick Botteldooren Begeleiders: Mirjana Adnadevic, ir.

1 Active control of sound for improved music experience in vehicles Michael Vanhoecke Promotor: prof. dr. ir. Dick Botteldooren Begeleiders: Mirjana Adnadevic, ir. Pieter Thomas Masterproef ingediend tot het behalen van de academische graad van Master in de ingenieurswetenschappen: elektrotechniek Vakgroep Informatietechnologie Voorzitter: prof. dr. ir. Daniël De Zutter Faculteit Ingenieurswetenschappen en Architectuur Academiejaar

3 Preface Sound is a fascinating medium. During my stay at the Technical University of Denmark in the first semester, I had the privilege of being introduced to their elaborate research facilities, covering a broad range of sound-related applications. This master thesis was a next step in employing my skills as an electronics engineer, to tackle a problem in the field of audio. My stay abroad also implied that this thesis had to be executed within a strict time schedule. During the past 5 months all my time and energy was consumed by this project, but the interesting topic resulted in a satisfied feeling at the end. This work would not have been realized without the help of various people at Intec- Acoustics whom I would like to thank. Professor Botteldooren, for providing this interesting topic and the flexibility in my approach and schedule. Moreover, he could be counted on to tackle problems and to give inspiration for new solutions. Mima, for guiding me through my thesis and always keeping me motivated. Pieter Thomas, for the help in building the amplifiers and proofreading my thesis. Peter Guns, for the technical assistance in building the cabin and mounting the loudspeakers. Finally, I would like to express my appreciation to my family and friends surrounding me. Special thanks to Fran for supporting me during all those evenings and weekends I was working instead of doing more relaxing stuff together. Michael Vanhoecke, 10/06/2013

5 Permissions De auteur geeft de toelating deze masterproef voor consultatie beschikbaar te stellen en delen van de masterproef te kopiëren voor persoonlijk gebruik. Elk ander gebruik valt onder de beperkingen van het auteursrecht, in het bijzonder met betrekking tot de verplichting de bron uitdrukkelijk te vermelden bij het aanhalen van resultaten uit deze masterproef. The author gives permission to make this master dissertation available for consultation and to copy parts of this master dissertation for personal use. In the case of any other use, the limitations of the copyright have to be respected, in particular with regard to the obligation to explicitly state the source when quoting results from this master dissertation Michael Vanhoecke, 10/06/2013

7 Active control of sound for improved music experience in vehicles by Michael Vanhoecke Master thesis submitted for obtaining the academic degree of Master of Science in Electrical Engineering Electronic Circuits and Systems Academic year Promotor: prof. dr. ir. D. Botteldooren Supervisors: M. Adnadevic, Ir. P. Thomas Faculty of Engineering and Architecture Ghent University Department of Information Technology - Acoustics Head of Department: prof. dr. Ir. D. De Zutter Summary In this master thesis, the possibility of creating virtual 3D audio in a vehicle environment is investigated. Based on the mechanisms of human sound source localization, binaural signals can be used to present spatial information to the listener. For this, it is necessary to be able to control the sound at the ears of a listener. However, when using loudspeakers, there is no passive channel separation present. Furthermore, multiple sound reflections in the cabin give rise to spectral deformation. To overcome this, an extended version of the crosstalk cancellation technique is introduced, to actively control the sound field at both ears independently. Transfer functions from speakers to the ears are measured in a cabin and are used to design an inverse filter matrix. Different loudspeaker topologies are tested to improve performance. A four channel system shows to have an improved performance over a basic two channel setup by including an additonal stereo dipole. A channel separation higher than 20 db is achieved in a frequency range of 200 Hz to 8 khz at the optimal listening position while the rotational sweet spot is increased. ITD and ILD cues are tested to validate the quality of the 3D reproduction. The spatial information is preserved for the optimal listening position using both setups. For a head rotation of 30, the four channel system can also reproduce the correct ITD, but the increased sweet is not sufficient to reproduce the ILD. Keywords Virtual 3D audio, Crosstalk cancellation, Vehicle environment

9 Active control of sound for improved music experience in vehicles Michael Vanhoecke Supervisor(s): Mirjana Adnadevic, Pieter Thomas, Dick Botteldooren Abstract This article investigates the possibility of creating virtual 3D audio in a vehicle environment. Based on the mechanisms of human sound source localization, binaural signals can be used to present spatial information to the listener. However, when using loudspeakers, there is no passive channel separation present. Furthermore, multiple sound reflections in the cabin give rise to spectral deformation. The crosstalk cancellation technique is implemented to actively control the sound field at both ears independently. Transfer functions from speakers to the ears are measured in a cabin and are used to design an inverse filter matrix. Different loudspeaker topologies are tested to improve the performance. A four channel system shows to have an improved performance over a basic two channel setup by including an additonal stereo dipole. An improvement is noted in channel separation, sweet spot size, distortion and ability to reproduce virtual sources. Keywords Virtual 3D audio, Crosstalk cancellation, Vehicle environment I. INTRODUCTION AUDIO in vehicles has always been a topic of interest. Almost every car has its own sound system since listening to music or the radio is the only type of entertainment that can be combined with driving a car. However, a vehicle is far from an ideal listening environment. The audio spectrum is heavily influenced by reflections on windows and resonances due to the small volume of the space. Moreover, speakers are often placed at non-conventional positions. [1]. An interesting approach to improve the music experience is to create a virtual 3D audio environment,making it possible to place sound sources anywhere in space, without the need for a physical source to be present. Creating the exact sound field over a large area requires a lot of loudspeakers and thus is impractical. Alternatively, the sound field can be controlled only at the two ears of the listener to deliver sound with spatial information. This method is referred to as binaural audio [2]. II. BINAURAL REPRODUCTION Binaural signals can be obtained in two different ways. A first way is to record them using an artificial head with microphones at the places of the ear drums. A second way is to create a synthetic binaural signal by adding spatial information to a mono sound. A key component in binaural audio is the Head-Related Transfer Function(HRTF). A set of HRTFs comprises the main localization cues of the auditory system, being the interaural level and time differences and the monaural spectral deformation introduced by the pinna. Convolving a mono signal with a set of HRTFs results in a pair of binaural signals, a process referred to as binaural synthesis [2]. Binaural signals should be delivered exactly at the ears, so they are suited to be used with headphones. However, headphones are not comfortable to wear while driving a car. External loudspeakers can be used, but this introduces the problem of crosstalk. Sound cannot be sent to each ear independently anymore. The crosstalk cancellation technique is implemented to actively control the sound field at both ears [2]. A. Theory of Operation III. CROSSTALK CANCELLATION A listening situation is characterized by a plant matrix H. For an S channel loudspeaker setup, this is a 2 S matrix containing the transfer functions from each of the speakers to both ears [3]. The transfer functions include the speaker response, the headrelated transfer function (HRTF) and the room influence. The plant matrix describes the deformation of the sound before it reaches the ears. A filter matrix C is added before sound is sent to the speakers, to compensate for the plant matrix. A system in which the signals are delivered to the ears perfectly is described by the unity matrix, so it is clear that the matrix C has to be the inverse of the plant matrix. B. Inverse Filtering It is generally very hard to calculate the exact inverse of the plant matrix. The response will be non-minimum phase because sound is present in echoes resulting from room and pinna reflections. Inverting a non-minimum phase response is only stable when being acausal, so a modelling delay has to be included [3]. Moreover, when deconvolving an impulse response, the optimal filter is inevitable of infinite duration, which makes it not realizable. The responses at the ears contain deep notches at certain frequencies due to interference of reflections at the pinnas and the room response. Hence, a perfect equalization would result in a large amount of energy being sent to try to compensate for these notches. A method to calculate the inverse filters is presented by Tokuno et al. [3] and combines least squares inversion in the frequency domain and zeroth-order regularization. The solution for the crosstalk cancellation matrix is given by C = [H H H + βi] 1 H H (1) in which β is the regularization parameter. Regularization allows to control the effective duration of the filters and limit the energy. It introduces a trade-off between performance and effort optimization. IV. VEHICLE SETUP A cabin was built out of metal with a Plexiglas front window and a roof made out of wooden panels in which the loudspeakers are mounted. Absorbent material is placed against the walls on the inside of the cabin to create a realistic sound environment.

10 In order to compare the efficiency of different setups, a linear sweep is sent to the left channel and different criteria are extracted to quantify the performance. Ideally one would recover the sweep exactly at the left ear an record silence at the right ear. The channel separation gives the ratio of the sound level at the ipsilateral ear to the level at the contralateral ear. The crosstalk filters are designed for one specific position, so a movement of the head results in a reduced performance. The area in which crosstalk cancellation is achieved is called the sweet spot. In this work, only the rotational sweet spot is considered. Distortion is last property which is checked. A four channel system, comprising speakers 1,2,4,5, results in the best performance. A channel separation over 20 db was achieved for the optimal listening position, which is a value matching the (anechoic) performance of common systems [4]. The two channel system with speakers 1 and 2 effectively cancels crosstalk, but has a limited sweet spot for high frequencies. Speakers 4 and 5 are placed closely together and form a stereo dipole [5]. They have a poor performance at low frequencies but result in a broad sweet spot for higher frequencies. The four channel systems manages to combine the assets of both two channel systems. Results for the channel separation and frequency response at the ears can be found in figures 2 and 3. It can be seen that there is almost no distortion, indicating that the inverse filtering also succeeds in equalizing the room response. The virtual 3D reproduction is tested by playing back binaural signals through the crosstalk cancellation system and comparing the ITD and ILD of the recorded signals with those of the original signals. At the optimal listening position, both the two channel and four channel setup produce differences smaller than the human discrimination threshold. At a head rotation of 30 degrees, the cues are lost for the two channel setup, but the ITD is preserved with the four channel setup. Since the ITD cue dominates the ILD cue [2], the virtual source direction will be reproduced correctly for broadband sources Fig. 1. Positions of the loudspeakers. The position of the head is marked with a cross The positions of the loudspeakers are displayed in figure 1. The plant matrix is measured using a B&K HATS. The impulse responses are truncated to samples to design the inverse filters. The transfer functions show a severe spectral deformation. At low frequencies, peaks due to the resonances in the cabin are present, while at higher frequencies deep notches occur, indicating destructive interference effects caused by room and pinna reflections. There is no natural channel separation as a result of the the room influence. The system in the cabin is designed and tested for a limited frequency range of 80 Hz to 8000 Hz. V. RESULTS Channel Separation (db) Frequency (Hz) Fig. 2. Channel separation for different angles of head rotation calculated per 1/3 octave band Magnitude (db) Frequency (Hz) Left Right Fig. 3. Response at the two ears for a linear sweep throught the left channel VI. CONCLUSION A four channel crosstalk cancellation system was implemented in a vehicle environment, improving the performance of a basic two channel setup. A channel separation over 20 db was achieved for the optimal listening position, matching common systems [4]. Virtual source reproduction was correct for a rotational sweet spot of 30 degrees, though only for broadband sources. Subjective test should be performed to validate this. Further research consists of integrating the system in real-time and possibly upgrading it to a dynamic system, which updates the inverse filters according to the position of the head. REFERENCES [1] A. Farina and E. Ugolotti, Spatial equalization of sound systems in cars, in Audio Engineering Society Conference: 15th International Conference: Audio, Acoustics and Small Spaces, [2] W. Gardner, 3D Audio Using Loudspeakers. Springer, [3] H. Tokuno, O. Kirkeby,P. Nelson, and H. Hamada, Inverse filter of sound reproduction systems using regularization, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. 80, no. 5, pp , [4] B. Masiero, J. Fels, and M. Vorländer, Review of the crosstalk cancellation filter technique, in International Conference on Spatial Audio, [5] O. Kirkeby, P. Nelson, and H. Hamada, Local sound field reproduction using two closely spaced loudspeakers, The Journal of the Acoustical Society of America, vol. 104, no. 4, pp , 1998.

11 Actieve geluidscontrole voor verbeterde muziekervaring in voertuigen Michael Vanhoecke Supervisor(s): Mirjana Adnadevic, Pieter Thomas, Dick Botteldooren Abstract Dit artikel onderzoekt de mogelijkheid om virtuele 3D audio te creëren in een voertuigomgeving. Gebaseerd op de menselijke mechanismen voor geluidslokalisatie, kunnen binaurale signalen gebruikt worden om ruimtelijke informatie aan een luisteraar te presenteren. Wanneer er luidsprekers gebruikt worden, is er echter geen natuurlijke kanaalseparatie aanwezig. Bovendien resulteren reflecties in een ernstige spectrale vervorming. De crosstalk cancellation techniek wordt geïmplementeerd om actief het geluid aan beide oren onafhankelijk te controleren. De transferfuncties van de luidsprekers tot de oren worden gemeten in een cabine and worden gebruikt om een inverse filter matrix te ontwerpen. Verschillende luidsprekeropstellingen worden getest om de performantie te verbeteren. Een vierkanaals systeem vertoond een verbeterde performantie tegenover een basis tweekanaals opstelling, door de toevoeging van een extra stereo dipool. Er is een verbetering in kanaalseparatie, sweet spot grootte, vervorming en de mogelijkheid om virtuele bronnen te reproduceren. Keywords Virtuele 3D audio, Crosstalk cancellation, Voertuigomgeving I. INTRODUCTIE AUDIO in voertuigen is al sinds lang een onderwerp van interesse. Bijna elke auto heeft zijn eigen geluidssysteem, aangezien naar muziek of de radio luisteren de enigste vorm van ontspanning is die kan gecombineerd worden met het besturen van een auto. Een voertuig is echter verre van een ideale luisteromgeving. Het audio spectrum wordt sterk beïnvloed door reflecties op ramen en resonanties door het kleine volume van de ruimte. Bovendien zijn luidsprekers vaak geplaatst op ongewone posities [1]. Een interessante benadering om de muziekervaring te verbeteren is het creëren van een virtuele 3D audio omgeving, die toelaat om geluidsbronnen om het even waar te plaatsen zonder dat een physiche bron aanwezig moet zijn. Om het geluidsveld exact te creëren over een groot gebied zijn een groot aantal luidsprekers nodig, wat onpraktisch is. Een alternatieve manier is om enkel het geluidsveld te controlleren aan beide oren en geluid met ruimtelijke informatie af te leveren. Deze methode wordt binaurale audio genoemd [2]. II. BINAURALE REPRODUCTIE Binaurale signalen kunnen verkregen worden op twee verschillende manieren. Een eerste manier is ze op te nemen met een artificieel hoofd dat microfoons heeft op de plaats van de trommelvliezen. Een tweede manier is een synthetisch binauraal signaal te creëren door ruimtelijke informatie toe te voegen aan een mono signaal. Een belangrijke component in binaurale audio is de Head-Related Transfer Function(HRTF). Een set HRTFs bevat de belangrijkste lokalisatiemechanismen, zijnde de interaural tijds- en niveauverschillen en de monaurale spectrale vervorming door de oorschelp. De convolutie van een mono signaal met een set HRTFs resulteerd in een paar binaurale signalen, een techniek genaamd binaurale synthese [2]. Binaurale signalen moeten exact aan de oren worden afgeleverd en zijn dus geschikt om met een hoofdtelefoon te gebruiken. Hoofdtelefoons zijn echter niet comfortabel om te dragen tijden het besturen van een wagen. Externe luidsprekers kunnen worden gebruikt, maar dit introduceert het probleem van crosstalk. Geluid kan nu niet meer onafhankelijk naar elk oor gestuurd worden. De crosstalk cancellation techniek wordt geïmplementeerd om het geluid aan beide oren actief te controleren [2]. A. Werking III. CROSSTALK CANCELLATION Een luistersituatie wordt gekarakteriseerd door een transfermatrix H. Voor een S-kanaals luidspreker opstelling is dit een 2 S matrix die de transferfuncties van de luidsprekers tot de oren bevat [3]. The transferfuncties omvatten de response van de luidspreker, de HRTF en de invloed van de ruimte. De transfermatrix beschrijft de vervorming van het geluid voor het de oren bereikt. Een filter matrix C wordt toegevoegd voor dat het geluid naar de luidsprekers wordt gestuurd, om de transfermatrix te compenseren. Een systeem waarbij de signalen exact aan de oren worden afgeleverd, wordt beschreven door de eenheidsmatrix. Het is duidelijk dat de matrix C de inverse moet zijn van de transfermatrix. B. Filterinversie Het is doorgaans erg moeilijk om de exacte inverse van de transfermatrix te berekenen. De response zal niet-minimumfase zijn, doordat er geluid aanwezig is in echo s door reflecties in de ruimte en op de oorschelpen. De inverse van een nietminimum-fase respons is enkel stabiel wanneer ze acausaal is, bijgevolg moet een vertraging gemodelleerd worden [3]. Bovendien zal bij de deconvolutie van een impulsantwoord, het optimale filter van oneindige lengte zijn, waardoor het niet realiseerbaar is. De respons aan de oren bevat op bepaalde frequenties scherpe pieken door de bijdragen van reflecties uit de ruimte of op de oorschelp. Een perfecte egalisatie zou dus leiden tot een grote hoeveelheid energie die nodig is die te compenseren. Een methode om de inverse filters te berekenen wordt voorgesteld door Tokuno et al. [3] en combineert de kleinste-kwadraten inversie met een regularisatie parameter. De oplossing voor de crosstalk cancellation matrix wordt gegeven door: C = [H H H + βi] 1 H H (1) waarin β de regularisatieparameter is. Regularisatie laat toe om de effectieve lengte van de filters te reduceren en de energie te limiteren. Het introduceert een afweging tussen performantie en energie optimalisatie.

12 IV. VOERTUIGOPSTELLING Een cabine werd gebouwd uit metaal met een Plexiglas vooruit en een dak gemaakt van houten panelen waarin de luidsprekers worden gemonteerd. Absorberend materiaal wordt tegen de wanden aan de binnenzijde geplaatst om een realistische geluidsomgeving te creëren. De posities van de luidsprekers worden getoond in figuur 1. Om de efficiëntie voor verschillende opstellingen te vergelijken, wordt een lineaire sweep door het linker kanaal gestuurd en worden verschillende criterea afgeleid om de performantie te kwantificeren. Idealiter wordt de sweep exact gereproduceerd aan het linkeroor en is er stilte aan het rechteroor. De kanaalseparatie geeft de verhouding van het geluidsniveau aan het ipsilaterale oor to het niveau aan het contralaterale oor. De crosstalk filters worden ontworpen voor een specifieke positie, bijgevolg resulteert een beweging van het hoofd in een verminderde performantie. Het gebied waarin crosstalk cancellation wordt bereikt wordt de sweet spot genoemd. Voor dit werk, wordt enkel de rotationele sweet spot in acht genomen. Vervorming is een laatste eigenschap die bekeken wordt. Een vierkanaals systeem, bestaande uit luidsprekers 1,2,4,5 levert de beste performantie. Een kanaalseparatie van meer dan 20 db werd bereikt voor de optimale luisterpositie. Dit is een waarde die overeenkomt met die van gebruikelijke (anechoische) systemen. Het tweekanaals systeem met luidsprekers 1 and 2 slaagt erin crosstalk effectief te onderdrukken, maar heeft een beperkte sweet spot voor hoge frequenties. Luidsprekers 4 en 5 worden dicht bij elkaar geplaatst en vormen een stereo dipool [5]. Ze hebben een slechte performantie voor lage frequenties, maar hebben een brede sweet spot voor hoge frequenties. Het vierkanaals systeem slaagt erin om de voordelen van beide tweekanaals systemen te combineren. Resultaten voor de kanaalseparatie en de frequentierespons aan de oren worden getoond in figuren 2 en 3. Er is bijna geen distortie wat erop wijst dat de filterinversie er ook in slaagt de respons van de ruimte te egaliseren. De virtuele 3D reproductie wordt getest door binaurale signalen af te spelen door het crosstalk cancellation systeem en de ITD en ILD van de opgenomen signalen te verglijken met die van de originele signalen. In de optimale luisterpositie resulteren zowel het tweekanaals als het vierkanaals systeem in verschillen kleiner dan onderscheiden kunnen worden door de mens. Voor een hoofdrotatie van 30 graden gaan beide cues verloren voor het tweekanaals systeem, maar de ITD is bewaard voor het vierkanaals systeem. Aangezien de ITD domineert over de ILD [2], zal de richting van de virtuele bron behouden blijven voor breedbandige bronnen Fig. 1. Posities van de luidsprekers. De positie van het hoofd wordt aangeduid met een kruis De transfermatrix wordt opgemeten met een B&K HATS. De impulsantwoorden worden ingekort tot samples om de inverse filters te ontwerpen. De transfers functies vertonen een ernstige spectrale vervorming. Voor lage frequenties zijn pieker door de resonanties can de cabine te zien, terwijl scherpe dalen aanwezig zijn voor hoge frequenties. Dit als gevolg van destructieve interferentie, veroorzaakt door reflecties. Door de invloed van de omgeving is er geen natuurlijke kanaaalseparatie aanwezig is. Het systeem in de cabine wordt getest voor een frequentiebereik van Hz V. RESULTATEN Channel Separation (db) Frequency (Hz) Fig. 2. Kanaalseparatie voor verschillende hoofd rotaties berekend per tertsband Magnitude (db) Frequency (Hz) Left Right Fig. 3. Respons aan de twee oren voor een lineare sweep door het linker kanaal VI. CONCLUSIE Een vierkanaals systeem werd geïmplementeerd een voertuigomgeving, met als resultaat een verbetering van de performantie tegenover een basis tweekanaals systeem. Een kanaalseparatie van meer dan 20 db werd bereikt voor de optimale luisterpositie, gelijk aan gebruikelijke systemen [4]. De virtuele bron reproductie was correct voor een rotationele sweet spot tot 30 graden, hoewel enkel voor breedbandige bronnen. Subjectieve tests moeten worden uitgevoerd om dit te valideren. Verder onderzoek bestaat uit het integreren van het systeem in real-time en het mogelijks aanpassen tot een dynamisch systeem, dat de inverse filters aanpast naargelang de positie van het hoofd.

13 REFERENCES [1] A. Farina and E. Ugolotti, Spatial equalization of sound systems in cars, in Audio Engineering Society Conference: 15th International Conference: Audio, Acoustics and Small Spaces, [2] W. Gardner, 3D Audio Using Loudspeakers. Springer, [3] H. Tokuno, O. Kirkeby,P. Nelson, and H. Hamada, Inverse filter of sound reproduction systems using regularization, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. 80, no. 5, pp , [4] B. Masiero, J. Fels, and M. Vorländer, Review of the crosstalk cancellation filter technique, in International Conference on Spatial Audio, [5] O. Kirkeby, P. Nelson, and H. Hamada, Local sound field reproduction using two closely spaced loudspeakers, The Journal of the Acoustical Society of America, vol. 104, no. 4, pp , 1998.

15 Contents 1 Introduction 1 2 3D Sound Localization and Reproduction Spatial Hearing Source Direction extraction Source Distance Estimation Virtual 3D Sound Reproduction Binaural Sound Synthesis Binaural Reproduction Crosstalk Cancellation Theory of Operation Inverse Filtering Stereo Dipole Sound in a Vehicle Environment Crosstalk Cancellation in Vehicles Loudspeaker Room Interaction Modal Theory Reflections System design Modelling a vehicle cabin Loudspeaker Topologies Plant Matrix Measurements Filter Design Amplifier Design Influence of amplifier characteristics Circuit Board design Test

16 xvi CONTENTS 6 Results Crosstalk Cancellation Quality Measures Visualization using 1/3 Octave Bands System performance: Channel Separation and Sweet Spot System performance: Distortion Performance with Small Loudspeakers Performance for Shorter Filters D Sound Virtualization Conclusion 65 A EAGLE design TDA B 1/3 Octave Bands 71 Bibliography 73 List of Figures 77

17 Chapter 1 Introduction Audio in vehicles has always been a topic of interest. Almost every car has its own sound system since listening to music or the radio is the only type of entertainment that can be combined with driving a car. However, a vehicle is far from an ideal listening environment. The audio spectrum is heavily influenced by reflections on windows and resonances due to the small volume of the space. The loudspeakers have to be integrated in the design of the car and cannot be placed at conventional heights and distances. Moreover, the setup is usually asymmetric with respect to the listener, so a stereo setup cannot be used. A lot of effort is put in the sound quality in car environments, illustrated by the many high-end audio companies such as Bang&Olufsen, D+M and Bose being active in the market of automotive audio solutions. This master thesis will describe a way to create a 3D audio environment, making it possible to place sound sources anywhere in space. Reproducing the exact sound field over a large area is not possible since this requires a high number of transducers surrounding the listener, which cannot be realized in a vehicle. A more suitable approach is to only reproduce the exact sound field in a limited area around the head of the listener. Using a limited number of loudspeakers, such systems can produce virtual sources at places where no physical sound source is present. This property is very interesting since this could overcome the problem of oddly placed loudspeakers in a car. A possibility could for example be to place virtual sources in front of the driver, forming a conventional stereo setup. A possible way to present 3D audio is to make use of binaural signals. They consist of a mono sound signal encoded with the spatial information of certain location. This spatial information is contained in a set of Head-Related Transfer Functions. When two

18 2 CHAPTER 1. INTRODUCTION binaural signals are delivered to the ears, a human will perceive the mono sound originating from that particular location. It is crucial that the binaural signals are delivered exactly to the ears without any deformation. This is fairly easy when using headphones since the transducers are placed very close to the ears and no other sound is heard than the correct binaural signals. When driving a vehicle it is not desirable to wear headphones all the time, so regular loudspeakers have to be used. However, playing binaural signals through separate loudspeakers results in sound perceived by both ears rather than just the target ear. This effect is referred to as crosstalk. To solve this problem, the technique of crosstalk cancellation will be used to actively control the sound at the ears of the listener, so binaural signals can be delivered unchanged. Active control of sound refers to the use of digital processing for driving sound sources and let them interfere with each other so the sound field can be shaped, whereas passive control refers to effects such as reflection and absorption. Current commercial surround systems are already available ([1],[2]), but they are mainly discrete surround systems. They consist of a multichannel system directing sound to speakers placed at different locations in the car to create spatial sound. Their main added value lies in the signal processing which creates the optimal sound for each speaker in the car, starting from conventional audio formats. The crosstalk cancellation technique was already implemented using a hardware DSP board in a car environment by Farina [3]. Although a limited crosstalk cancellation was achieved, subjective tests showed that listeners valued the system higher than a traditional sound system. Goal The goal of this master thesis is to create virtual 3D audio in a vehicle environment. For this, the crosstalk cancellation technique will be implemented taking into account the influence of the environment. Transfer functions will be measured to characterize the sound propagation from loudspeakers to the ears of the listener. These will be used to create a filter matrix through which sound can be played back. Not only is it desired to be able to steer sound to the ears separately, it is also beneficial if a flat frequency response is obtained. The acoustic response of a vehicle generally introduces a severe deformation of the sound, so the filters can be used to equalize this response. Chapter 2 will explore the basics of human sound source localization and 3D sound reproduction. In chapter 3, the crosstalk cancellation technique will be discussed into detail. Chapter 4 will discuss the influence of the environment on the sound propagation. In chapter 5 the design of the system is discussed. Chapter 6 will then present the results of

19 3 the implemented crosstalk cancellation systems and capability of reproducing 3D sound. In a last chapter, a short summary of the work is given and some perspectives for future work.

20 4 CHAPTER 1. INTRODUCTION

21 Chapter 2 3D Sound Localization and Reproduction In a virtual 3D audio environment, an auditory event can be placed at a place where no physical sound source is present. To present 3D sound to a listener there are basically two approaches. A first solution is to produce the exact sound field in the space enclosing the listener. However, this typically requires a high number of transducers. Since the input of the human auditory system solely consists of the two acoustic signals at the ears, it is sufficient to control the sound at the ears and deliver signals with localization cues to create a virtual 3D sound. Systems based on this technique are referred to as binaural reproduction. To deceive the human hearing one needs to be aware of the mechanisms responsible for sound localization. Not only the physics but also psychoacoustic effects play a major role [4]. It should be mentioned that sound localization in fact is an audiovisual process. Visual cues can be very important, however, since the goal is controlling sound, it is understandable that the focus will be on auditory cues. In this chapter the different auditory mechanisms will be explored as well as possible ways to reproduce 3D sound. 2.1 Spatial Hearing Source Direction extraction The principal cues of sound localization are attributed to the differences of the sound at the right and left ear. This looks evident since one has two ears at two different positions and because of the analogy with the human vision where the two different images of the eyes give the ability to see a 3D environment. The different interaural cues were already

22 6 CHAPTER 2. 3D SOUND LOCALIZATION AND REPRODUCTION identified by Lord Rayleigh and resulted in the Duplex theory [5]. Two concepts can be distinguished: the interaural time difference (ITD) and the interaural level difference (ILD) [6]. When a sound source is positioned at one side of the head, the sound wave first reaches the ipsilateral ear before reaching the contralateral ear. The difference in arrival time is referred to as the ITD. Thus, the phase difference of the two signals at the ears gives us information about the location of the source. This mechanism mainly works up to frequencies of about 1500 Hz [4], the frequency at which the wavelength is similar to the dimensions of the head. At higher frequencies the phase information becomes ambiguous since the phase shift is more than one period, although there is still localization possible by looking at time differences in the signal envelopes [6]. The difference in the sound pressure level, referred to as the ILD, provides useful information at higher frequencies [4]. Due to the shadowing effect of the head, the sound is attenuated at the contralateral ear and so one perceives a sound coming from the side at which the ear receives the highest sound level [6]. It quickly becomes clear that the ITD and ILD are insufficient to unambiguously determine the position of a sound source. A well-known example is the case of front-back confusion illustrated in figure 2.1. The source point S and its image S result in the same ILD and ITD so the listener cannot determine whether the sound is originating from the back or the front only using the interaural cues. In the more general case, points resulting in the same ITD and ILD lie on a conical surface called the cone of confusion. The interaural cues play an important role for sound localization in the horizontal plane [6]. Figure 2.1: Front-back reversal [7] Additional spatial information is added in monaural spectral cues. Before reaching the eardrums, sound is influenced by the upper-body, the head and more specific the outer ear. This results in a spectral filtering of the incoming sound which adds spatial information

23 2.1. SPATIAL HEARING 7 to the signal [4]. The multipath reflections at the pinnas result in different interference patterns depending on the direction of incidence, providing an extra cue for sound source localization. Studies have shown that these spectral cues contribute significantly to both elevation and front-back discrimination [8]. The spectral information and the interaural cues are comprised in a set of transfer functions, the so-called Head-Related Transfer Functions [6]. They consist of the transfer functions of a certain source point to both ears. For example, the ITD is contained in the phase difference while the ILD influences the magnitude of the transfer functions. HRTFs can be recorded by using an artificial head which has microphones at the places of the ear drums. Another possibility is to measure individualized HRTFs by placing small microphones in the ears. Each human has a unique shape of the head, torso and ears and thus HRTFs are slightly different for each person. There are databases available which contain extensive sets of HRTFs measured in an anechoic environment for several points in space. Two widely spread databases are the CIPIC [9] and the MIT [10] database. An example is shown in figure 2.2. At low frequencies spectral shape for both ears is similar while the difference increases for higher frequencies due to the shadowing effect of the head and pinna reflections. A broad peak can be seen around 2-3 khz caused by the resonance of the ear canal. The sharp features at high frequencies are the result of reflections at the pinna. It is also possible to measure the transfer functions in a reverberant environment, but then of course these are limited to the particular room. HRTF measurements can be divided in two regions: a proximal region and a distal region [11]. Within the proximal region a high accuracy is required to determine the transfer functions, while in the distal region only the HRTF for a certain direction is needed and the distance is corrected for by adding an attenuation factor. A transfer function is strictly spoken a frequency domain function, while its time domain counterpart is the Head-Related Impulse Response. However, the term HRTF will in general be used to indicate the influence of the head. When the previous cues still give rise to confusion, head movements can provide the extra information needed to decide upon the the correct direction [8]. A listener tends to turn his head if the auditory system has difficulties to localize a sound source, to get a second point of reference. In the front-back reversal case displayed in figure 2.1, the head is turned to the right side. If the interaural cues become smaller, so a smaller ILD and ITD, the listener perceives sound coming from the front source, while increasing differences indicate sound from the rear source. The dynamic cues are not limited to the interaural cues, also shifting peaks or drops in the spectrum provide extra information.

24 8 CHAPTER 2. 3D SOUND LOCALIZATION AND REPRODUCTION Figure 2.2: HRTF for a source azimuth angle of 30 degrees (to the right of the listener) in the horizontal plane [8]. The solid line is the ipsilateral response, the dashed line is the contralateral response Room Influence In a room, or any other type of enclosure, the sound reaching a listener not only consists of the direct sound, but also of acoustic reflections at the walls or whatever object present. These secondary sources could disturb the localization cues of the auditory system. However, it appears that reverberant environments have little effect on the ability of humans to localize sounds [12]. A simple illustration hereof is the so-called precedence effect [4]. When two subsequent sounds are perceived in a very short time interval, the perceived location is determined by the first observed sound. This implies that the direct sound dominates over reflections for source localization. Still, the reverberant sound contributes to the sound level and the perceived spaciousness. This mechanism is also the key component of many contemporary sound reinforcement systems in which a delay line is included in the nearest speakers [13]. When the time interval between two coherent sounds is too small they are perceived as one sound. A guess for the position is made depending on the amplitude and time properties of both sounds. This process is referred to as summing localization and provides the basis for stereophonic sound [4]. More room acoustic properties will be addressed in section 4 when the vehicle environment will be discussed.

25 2.2. VIRTUAL 3D SOUND REPRODUCTION Source Distance Estimation Up until know, the mentioned auditory cues only provide information about the angle of incidence. The human auditory system also attributes a distance to a sound source. An important cue for distance estimation is the loudness of a sound. Since the sound pressure of an acoustic wave decreases when propagating, nearby sources are perceived louder than distant ones sending out the same energy [4]. The sound level not only depends on the acoustic path, but also on the characteristics of the room. Thus, reverberation is another aspect in distance estimation [14]. The ratio of direct sound to reflected sound gives information about how close a source is situated [14]. The acoustic propagation generally also depends on the frequency, so there are some spectral differences as well, but these are only of minor importance. An interesting binaural cue is motion parallax [15] which is in fact a dynamic cue. The already mentioned cues for direction estimation change when moving the head, but this is more noticeably for nearby sources. So looking at the variation of the cues, a distance estimation can be made. The parallax mechanism is also used by the human vision to gain depth precision. Another cue is the familiarity of certain sounds [4]. For example, humans know the characteristics associated to normal talking, whispering and shouting which allows them to judge the distance. 2.2 Virtual 3D Sound Reproduction The goal of a 3D audio system is to have the ability to position a sound source at an arbitrary spot. When no physical sound source is present at that spot, a virtual source is created. The systems can either try to reproduce the complete sound field, typically requiring many transducers or reproduce the sound field at a limited area around the head. The optimal listening position is referred to as the sweet spot. Rendering spatial audio can be done by making use of the properties of the human auditory systems. As an introduction the well-known stereophonic system will be addressed. Many basic ideas can be illustrated with this simple audio setup. A stereo setup, using two speakers, is strictly speaking a virtual sound system since it is able to place sound in between the physical position of the two speakers. The auditory mechanism allowing to do so is the summing localization effect which was already touched when discussing the room influence. There are two possible ways to create stereo sound. A first way is by recording sound with a stereo setup, which uses two microphones to respectively record time or level differences. The similarity with the ITD and ILD discussion is no coincidence. A second way is to transform a mono sound into a stereo sound by using

26 10 CHAPTER 2. 3D SOUND LOCALIZATION AND REPRODUCTION volume or time panning techniques. If the mono sound is played simultaneously through both loudspeakers at the same volume, the sound appears to be originating from a location in between the two speakers. Raising the volume of one of the channels makes the sound move towards that channel until the virtual source coincides with the speaker position. Alternatively, introducing a delay to one of the channels pushes the sound towards the opposite channel. For stereo sound, produced with a pair of speakers at an angle of ±30 degrees, there is a sweet spot. The stereo image is only optimal when the listener is placed exactly in the median plane between the two speakers [16]. Consider an initial situation with the virtual sound positioned in the center. Moving away from the sweet spot towards one of the speakers makes the sound from that speaker arrive earlier and at a higher intensity and thus causing the virtual source to move towards that particular speaker. An extension of the stereo system can be found in conventional surround sound systems such as the 5.1 surround system. 5 loudspeakers are placed around the listener with an additional subwoofer. However, this system is still limited to the horizontal plane. More spatial positions can be generated by adding even more speakers. They are in general referred to as discrete surround systems [8]. Different techniques are used to determine the contribution of each speaker. One of them is Vector Base Amplitude Panning [17]. Increasing the number of loudspeaker makes it possible to approach an exact reconstruction of the sound field. The concept of Wave Field Synthesis is based on the principle that it is sufficient to know the wave front on a surrounding surface to know the wave field [18]. In practice, a large number of discrete loudspeakers are used to generate a wave front. The source origin is the same for the complete listening area, so the system doesn t suffer from a sweet spot. Ambisonics is another techniques which is able to exactly reproduce the sound field if an infinite number of loudspeakers are placed on a sphere [19]. However, reduced order system are implemented for practical reasons. A first order Ambisonics systems uses four audio channels recorded with a soundfield microphone [8]. The microphone records the omnidirectional sound pressure together with the pressure gradient along three directions. These four channels are then used to recreate the sound field at the listening position. The optimal reproduction is limited to a sweet spot however. A very different way to render sound in three dimensions is making use of binaural sound, which typically requires less transducers [8]. The sound field at both ears can be controlled to present signals with spatial cues to a listener. The technique of binaural reproduction is explored further below. A number of sound systems combining multiple techniques also exist. An example is the Ambiophonics system which combines binaural sound reproduction for the direct sound

2.2. VIRTUAL 3D SOUND REPRODUCTION 11 and early reflections with an array of surround speakers for adding room reverberation [20]. 2.2.1 Binaural Sound Synthesis Binaural sound consists of sound signals which include spatial information cues.

27 2.2. VIRTUAL 3D SOUND REPRODUCTION 11 and early reflections with an array of surround speakers for adding room reverberation [20] Binaural Sound Synthesis Binaural sound consists of sound signals which include spatial information cues. Much as in the case of the stereo system, there are two possible ways of acquiring these signals. A first option is to directly record them, a second option is to start from a monaural signal and add spatial cues by convolving the mono signal with the HRTFs for a desired source position. Binaural signals are recorded using a dummy head as shown in figure 2.3, to simulate the upper body, head and ears. Microphones are present at the location of the ear drum and thus directly measure what would be the input of the auditory system. Figure 2.3: B&K Head and Torso Simulator A synthetic binaural signal can be created by adding spatial information to a mono sound. As illustrated before, the main localization cues are comprised in a set of HRTFs, thus convolving a monaural signal with a set of HRTF results in a binaural signal. This process is referred to as binaural synthesis [8]. The mathematical representation in the frequency domain is as follows: X = [ X L X R ] = [ H L H R ] X = H X (2.1) H L and H R are the transfer functions corresponding to a certain source position and X is a mono sound signal. X L and X R are then binaural signals as if the mono source would have been placed in that particular position. It is easy to extend this representation to a

28 12 CHAPTER 2. 3D SOUND LOCALIZATION AND REPRODUCTION system with multiple inputs at different locations which allows to create a virtual sound environment: X = N H i X i (2.2) i=1 Figure 2.4: Binaural synthesis for multiple sources Binaural Reproduction A second step in representing a virtual 3D environment to a listener is to play back the binaural signals. Since an acoustic transfer function is already encoded in the sound, it is necessary not to get any further deformation of the signal to achieve the desired effect. A straightforward approach is to use headphones. The transducers are placed very close to the ear so the transmission path has little influence on the sound. Binaural signals are directly suited to be reproduced by headphones. However, headphones also suffer from some drawbacks. First of all, they are not always comfortable to wear. Definitely, when considering a vehicle environment, people prefer not to have anything attached to the head. Another problem is that sound is often perceived inside the head. The most important cues for an external sound image are individualized pinna cues, reverberation cues, dynamic localization cues and corresponding visual cues [8]. This effect can also arise when using loudspeakers, but is more frequently present in headphone listening. A second approach, and the one followed in this master thesis, is to deliver binaural sound using loudspeakers. The situation is now more complex since there is a severe deformation of the source signal by the transmission path to the ear. A major issue introduced is the problem of crosstalk. Sound from a certain loudspeaker is now perceived by both ears,

29 2.2. VIRTUAL 3D SOUND REPRODUCTION 13 which was not the case when using headphones. The approach taken will be to actively control each listening channel by using a technique called crosstalk cancellation (CTC). Digital filtering will allow to compensate for deformation of the sound, hence it will sound like if a virtual headphone is created. The sound will be optimized for a single position of the ears, so the playback will only be valid for a limited sweet spot. Only one listener will be considered, although it is possible to do an extension to multiple listeners. However, this increases the complexity enormously [8]. Crosstalk cancellation will be discussed into detail in the next chapter. The crosstalk technique has limited sweet spot. The filters are designed for the position of the ears of the listener. When the listener s head moves away from the ideal position, the performance deteriorates and the spatial cues from the binaural signals are lost. Increasing the sweet spot is an important aspect of ongoing research in crosstalk cancellation. A dynamic system can be implemented using a head-tracker to update the filters to the position of the head [21]. It is also possible to implement a dynamic binaural synthesis [21]. Up to now, when binaural signals are delivered to a listener either a headphone or the CTC technique, the sound source moves together with the head. If the position of the head is tracked, the set of HRTFs for the binaural synthesis can be updated accordingly.

30 14 CHAPTER 2. 3D SOUND LOCALIZATION AND REPRODUCTION

31 Chapter 3 Crosstalk Cancellation To introduce the concepts of crosstalk cancellation a classic two channel setup will be looked at first. This setup has already been studied for decades [22] and allows to illustrate the problems that arise when dealing with crosstalk. A review of the crosstalk cancellation technique can be found in [23]. Filter inversion is the key component of the system and will not be straightforward due to the ill-conditioned nature of the problem. The solution will be applicable for multichannel problems as well. It is assumed that all systems in this thesis are linear and time-invariant so they are fully determined by an impulse response or the associated transfer function. 3.1 Theory of Operation A listening situation for a two loudspeaker setup is depicted in figure 3.1. The goal is to deliver a pair of binaural signals to the ears, but unlike when using headphones, there are no separated paths to the ears. Sound from each loudspeaker reaches the ears and this crosstalk has to be cancelled. A basic system can be characterized by a 2 2 filter matrix, called the plant matrix, which contains the acoustic transfer functions from the loudspeakers to the ears. These include the air propagation and head related transfer function, but can also include speaker response and room influence, which has a major influence in a vehicle environment. In a dynamic system, the plant matrix can be updated according to the position of the listener. The filtering can be written down as follows: [ ] [ ] [ ] E L H LL H RL Y L = (3.1) E R H LR In which E is a vector of the signals delivered at each ear, Y is a vector of speaker signals H RR Y R

32 16 CHAPTER 3. CROSSTALK CANCELLATION y L y R H LL H LR H RL H RR e L e R Figure 3.1: Listening situation for a two loudspeaker setup and H is the plant matrix. H AB denotes the transfer function from speaker A to ear B. To get an equalization of this filtering by the plant matrix, the binaural signals X are filtered by an extra 2 2 matrix C before being sent to the speakers: [ Y L Y R ] = [ C LL C LR C RL C RR ] [ X L X R ] (3.2) When the binaural signals are exactly reproduced at the ears, E equals X and the complete system is represented by the identity matrix. It becomes clear that the crosstalk cancellation matrix should be the inverse of the plant matrix: H C = H H 1 = I (3.3) Thus, the solution for C can be found as: C = [ H LL H LR H RL H RR ] 1 = [ 1 H RR H LL H RR H LR H RL H LR H RL H LL ] (3.4) Equation 3.4 can be rewritten by dividing numerator and denominator by H LL H RR : where [ ] [ ] 1/H LL 0 1 IT F R 1 C = (3.5) 0 1/H RR IT F L 1 1 IT F L IT F R IT F L = H LR H LL, IT F R = H RL H RR (3.6)

33 3.1. THEORY OF OPERATION 17 (a) General filter topology (b) Crosstalk cancellation by implementing the inverse plant matrix. The discriminant D = H LL H RR H LR C RL. This form was first represented by Schroeder and Atal [24] Figure 3.2: Filter topologies are called the interaural transfer functions and describe the difference in propagation to the ears at the two sides of the head[25]. This notation allows to give a physical interpretation to the crosstalk cancellation process [8]. The crosstalk cancellation is effected by the interaural transfer functions present in the off-diagonal positions of the right-hand matrix. The crosstalk is predicted by the -ITF terms and subtracted from the opposite channel. For example, the right input signal is filtered with IT F R which predicts the crosstalk at the left ear. As a result, an out-of-phase cancellation signal is sent to the left channel. The common factor 1/(1 IT F L IT F R ) compensates for higher order crosstalk effects, because each cancellation signal in turn results in crosstalk again, revealing the recursive nature of the cancellation process. It is a power series in the product of the ITFs and it is clear that the higher order crosstalk is the same for both channels. The left-hand matrix is a diagonal matrix and equalizes the ipsilateral transfer functions. When the number of speakers is increased, the plant matrix is non-square and thus it doesn t have an inverse. The notion of matrix inverse can be extended to non-square matrices by introducing the Moore-Penrose pseudo-inverse: C = [H H H] 1 H H (3.7) It can be easily verified that equation 3.7 reduces to the regular inverse of the matrix when H is square and invertible:

34 18 CHAPTER 3. CROSSTALK CANCELLATION C = [H H H] 1 H H (3.8) = H 1 (H H ) 1 H H (3.9) = H 1 (3.10) The Moore-Penrose pseudo-inverse follows as the least-squares solution of a linear system as presented in the next section [26]. Adding extra loudspeakers relaxes the constraints of the inversion by adding an extra degree of freedom. When the 2 2 is nearly singular for example, adding a third loudspeaker can be beneficial. 3.2 Inverse Filtering A numerical expression for the inverse of the plant matrix is generally very hard to calculate. The impulse response will be non-minimum phase because sound is present in echoes resulting from room and pinna reflections. Minimum-phase responses have their energy concentrated in at the start. Due to the reflections it is possible that at certain frequencies, the delayed sound is stronger than the direct sound, resulting in a non-minimum phase response. They are characterized by poles or nulls outsides the unity circle [7]. Inverting a non-minimum phase response is only stable when being acausal, so a modelling delay has to be included [27]. Moreover, when deconvolving an impulse response, the optimal filter is inevitable of infinite duration which makes it not realizable. Calculations of the filters in the frequency domain using the Discrete Fourier Transform suffer from circular convolution effects. Performing a convolution in discrete time results in a periodic summation of linear convolutions, so overlapping periods can result in a wrong result. When multiplying in the frequency domain, it is possible to avoid negative effects by using zero-padding. However, when deconvolving responses by dividing in the frequency domain, zero-padding does not help since it would have to be infinitely long. It is clear that the effective duration of the filters has to be reduced to be realizable. The responses at the ears contain deep notches at certain frequencies due to interference of reflections at the pinnas and the room response. Hence, a perfect equalization would result in a large amount of energy being sent to try to compensate for these notches. This results in clipping or a serious decrease in dynamic range if the overall gain is reduced [23]. At frequencies where the responses at both ears are almost equal, the plant matrix is close to singular. A method to calculate the inverse filters is presented by Tokuno et al. [27] and combines least squares inversion in the frequency domain and zeroth-order

35 3.2. INVERSE FILTERING 19 regularization. This method is preferentially employed due to its speed and robustness [28]. The inversion problem is displayed in block diagram form figure 3.3. The problem is not limited to the 2 2 crosstalk situation described before, but can be treated as a general multichannel inversion. u is a vector of T input signals, being the two binaural signals in the more specific case of the crosstalk cancellation. v is a vector of S source input signals to the original filter matrix H. This corresponds to an S loudspeaker setup with a plant matrix defined by H. d and w are vectors of R desired and reproduced signals with e being the resulting error. The performance in a crosstalk cancellation system is measured at the two ears and thus R = 2. The matrices A, H and C are multichannel filtering matrices. A is an R T target matrix which can be taken equal to the identity matrix of order 2 because an exact reproduction at the ears is desired. However, a modelling delay is entered to take into account the non-causal part of the inverse. H is the R S plant matrix and C the S T crosstalk cancellation matrix which aims to minimize the error. Therefore, a cost function J is defined (equation A first term is the performance error e H e, which is the traditional measure of how good the desired signals are approximated. When only this term is considered, an exact least squares solution is obtained. The second contribution is an effort penalty term βv H v in which β is a regularization parameter that controls the relative weight of the effort term. The cost function is given by: J = e H e + βv H v (3.11) For β = 0 only the performance error is minimized while the effort is minimized for β going to infinity. Filters which have a large amount of energy at certain frequencies to compensate for notches have a high effort penalty and thus can be controlled by regularization. This is a common technique used with the least-squares method and in machine learning terminology is used as a way to prevent overfitting of data (cfr. [26]). An exact inverse of the plant matrix, in least-squares sense, doesn t necessarily perform better in reality since it would be very sensitive to small errors in the plant matrix. It also turns out that β can also be used to control the duration of the inverse filters [27]. Increasing the regularization shortens the duration of the filter, which allows to avoid the undesirable wrap-around effect of circular convolution. The solution for the least squares problem is given by: C = [H H H + βi] 1 H H (3.12) Kirkeby et al. [27] describe a method to calculate stable, causal and finite filters:

36 20 CHAPTER 3. CROSSTALK CANCELLATION Figure 3.3: Multichannel inversion problem 1. Calculate the N-point FFT of impulse responses in the system to become the R S plant matrix H. 2. For each of the N values, calculate the S T filter matrix C using equation Take the inverse FFT of each of the elements and apply a cyclic shift over N/2 samples to implement a modelling delay. The exact value of the modelling delay is not critical nor is the value of the regularization parameter. The rule of thumb is to choose the modelling delay equal to half the filter length [27], implemented by the cyclic shift. The mentioned method only calculates N samples of an inverse filter that is ideally infinitely long. The regularization parameter controls the length of the filters, so an approximate filter is obtained by limiting the duration of the inverse so it fits in these N samples. The advice is to limit the duration of the filter to N/2 samples to prevent any negative circular convolution effects. The energy of the filter is thus concentrated in the central part between N/4 and 3N/ Stereo Dipole A particular implementation of a two channel crosstalk cancellation system is implemented by using two closely spaced loudspeakers. This setup is commonly referred to as a stereo dipole [29] and is discussed more into detail below. Crosstalk cancellation is achieved by destructive interference of sound waves. Due to the recursive nature of the process as indicated in section 3.1, many pulses are sent out to deliver just one pulse at the desired ear. This causes interference patterns to be located around the head and thus limits the sweet spot. When placing two speakers close together,the nature of the sound field is changed completely. The speakers act as a dipole source for which the null is steered to the ear at which cancellation is required. The inputs of the system will appear to be

37 3.3. STEREO DIPOLE 21 almost exactly out of phase, much as in the case of a real dipole. Figure 3.4 illustrates the sound field for two different source spans. The target signal is a Hanning pulse with its first zero at 6.4 khz. In 3.4a a sequence of positive pulses from the left speaker and a sequence of negative pulse from the right speaker can be seen. The first pulse is heard by the left ear, while subsequent pulses cancel out at the ears alternately. It is clear that the equalization zone is strictly limited due to copies of the signals being present around the head. In 3.4b the reproduced sound field is very different. Due to the reduced source span, subsequent pulses overlap resulting in only a single wave front arriving at the ears. The sound is directed at the left ear and a cancellation zone is present at the right ear. This extends the sweet spot zone significantly. The source inputs for the stereo dipole are formed by overlapping adjacent pulses, which causes the amount of low-frequency energy needed to increase compared to a setup with a wider source span. This makes the stereo dipole mainly interesting to achieve a broader sweet spot for high frequencies. For an angle of 10 degrees this is the case up to 11 khz. (a) 60 degree loudspeaker span (b) 10 degree loudspeaker span Figure 3.4: The sound field produced by two sources to achieve crosstalk cancellation [29] The example of the stereo dipole again shows that crosstalk cancellation is a frequency dependent process. Gardner [8] looks at a frequency range of 100 Hz to 6000 Hz where crosstalk cancellation has a good performance. For lower frequencies no localization cues occur while at higher frequencies the transfer functions depend highly on slight variations in the listening situation as well as the individual HRTFs of the listener. For higher

38 22 CHAPTER 3. CROSSTALK CANCELLATION frequencies, the crosstalk cancellation is omitted and and an energy-compensation system is proposed extend the range of the audio reproduction system. This technique is similar to the panning of a source to the closest speaker and relies on the natural channel separation which is present due to the shadowing of the head. However, it will appear that in a highly reflective environment such as a vehicle, there is almost no natural channel separation present, so this extension will not be valid. In this thesis, the filters are designed and tested for a frequency range of 80 Hz to 8000 Hz. Crosstalk cancellation filters with a matched plant matrix can usually deliver over 20 db of channel separation in anechoic environments [23]. The system implemented in a car by Farina [3], resulted in a channel separation of 10 db.

39 Chapter 4 Sound in a Vehicle Environment 4.1 Crosstalk Cancellation in Vehicles General benefits of creating a virtual 3D environment using loudspeakers also apply when being created in a vehicle environment. Spatial audio can be presented to a listener without the need of wearing headphones, which is particularly uncomfortable when driving a car. At the same time it is a very specific listening situation. According to Gardner [8] the specific constraints of car audio systems are well suited for the technology. Most of the time there is only one listener, the driver, which excludes the multi-user problem. Another asset is that the position of the head is known a priori. Gardner states that head tracking is not necessary [8]. However, a limited head movement is still possible. In this thesis, the performance with respect to head rotation is considered. Farina [30] states that the sound in cars is heavily influenced by the unusual position of the speakers. The path lengths can be quite different for each speaker and the sound is arriving under an elevation angle. A virtual environment could be used to equalize the system and place virtual loudspeakers in front of the listener as in a convenient stereo setup. Farina also indicates that the small volume of the compartment and highly reflecting surfaces, such as windows, produce evident reflections and resonances, causing large alterations in the frequency response. Crosstalk cancellation will allow to compensate for these spectral deformations and present a nearly flat response to the listener. The aimed frequency range for crosstalk cancellation is 80 Hz to 8 khz. For lower frequencies no localization cues occur and the control is limited due to the resonance of the room. At higher frequencies the transfer functions depend highly on slight variations in the listening situation as well as the individual HRTFs of the listener. Due to the highly reflective environment, this effect is even more distinct.

40 24 CHAPTER 4. SOUND IN A VEHICLE ENVIRONMENT 4.2 Loudspeaker Room Interaction Modal Theory When sound is played in a room or small enclosure the boundary conditions imposed by the walls result in the excitation of standing waves, referred to as the eigen modes of the room. The resonant frequencies depend upon the dimension of the enclosure. As an illustration one can look at the ideal case of a rectangular room with rigid surfaces [31]. A rectangular box is only a crude approximation of the cabin, but it allows to get some feeling of what is physically happening. Assuming an e jωt time dependence, the wave equation in three dimensions is given by: 2 p x + 2 p 2 y + 2 p 2 z + 2 k2 p = 0 (4.1) where p is the sound pressure and k is the wave number. The solution can be found by separation of variables and can be written as: and thus the wave equation becomes 1 2 X X x 2 p = X(x)Y (y)z(z)e jωt (4.2) + 1 Y 2 Y y Z 2 Z z 2 + k2 = 0 (4.3) The solution is independent and similar for each direction. In the x-direction this yields the one-dimensional equation for which the general solution is 1 2 X X x + 2 k2 x = 0 (4.4) X(x) = C cos(k x x + φ) (4.5) The boundary conditions are imposed by the rigid surfaces. This implies that the normal component of the particle velocity should be zero at the surface: u x = 1 p jωρ x = 0 for x = 0 and x = l x (4.6) This results in φ = 0 and k x = πn x /l x where n x = 0, 1, 2, 3,... Applying this boundary conditions in three dimensions gives the solution for the wave equation

41 4.2. LOUDSPEAKER ROOM INTERACTION 25 The eigen frequencies are then found as p = p 0 cos(πn x x l x ) cos(πn y y l y ) cos(πn z z l z ) (4.7) f n = c 2 ( n x l x ) 2 + ( n y l y ) 2 + ( n z l z ) 2 (4.8) Depending on how many modes are excited, different types of modes can be distinguished (in descending order of importance): axial modes are one-dimensional modes, tangential modes are two-dimensional and oblique modes are three dimensional. A resonance results in a sharp coloration of the frequency response. For higher frequencies, the resonances lie very close to each other and modal theory is not relevant. Absorption is also higher so the Q-factor of resonances is lower. As can be seen in equation 4.8 the resonance frequencies are inversely proportional to the dimensions of the room and hence for the cabin will be shifted upwards in the audible range. Some fundamental modes for a two dimensional enclosure are shown in figure 4.1. It can be seen that the value of n corresponds to the number of nodes in a certain direction. Since the normal component of the particle velocity is zero at a rigid wall, the pressure reaches a maximum. This results in an anti-node at the boundaries. The transfer functions depend on the positions of loudspeakers and listener. If the listener is situated in a node of a certain eigenmode, a notch will be present at the corresponding frequency while the response shows a peak when situated at an anti-node. Loudspeakers placed in a node are not able to excite that particular mode, but they can excite it effectively when positioned at an anti-node. These effects can be noticed when comparing the responses from loudspeaker at different positions. Figure 4.2 shows the frequency response at the left ear for a speaker mounted in the back corner and a speaker mounted in the middle of the front panel. A big peak is present at 128 Hz in the response of the rear speaker. Since it is placed in the corner it is likely to excite fundamental modes in different axial directions which contribute to a big resonance. In the response of the front speaker there is a peak as well, but less strong. A first peak is visible at 115 Hz, probably corresponding to the top-down axial mode. The roof of the cabin measures 1.7 m by 1.4 m, the floor measures 1.4 m by 1.4 m and the height is 1.5 m. Predicting the fundamental modes in the top-down and left-right direction using equation 4.8 gives 114 Hz and 123 Hz respectively which matches the response quite well. Higher order modes are harder to predict due to the non-realistic model. Towards 200 Hz the response of the rear speaker rises, while that of the front speaker falls in a deep notch. This could for example indicate a tangential mode, which has anti-nodes in the corners and nodes in the middle (cfr. 4.1b). The front

26 CHAPTER 4. SOUND IN A VEHICLE ENVIRONMENT (a) n x = 1 n y = 0 (b) n x = 1 n y = 1 (c) n x = 3 n y = 0 (d) n x = 3 n y = 2 Figure 4.

This influences the front-back axial mode that would exist in a rectangular box.

2 Reflections As mentioned before, modal analysis is not relevant at higher frequencies since modes are closely spaced together.

If two correlated acoustic waves arrive at the ear they can interfere constructively or destructively.

42 26 CHAPTER 4. SOUND IN A VEHICLE ENVIRONMENT (a) n x = 1 n y = 0 (b) n x = 1 n y = 1 (c) n x = 3 n y = 0 (d) n x = 3 n y = 2 Figure 4.1: Acoustic pressure modes in a rectangular enclosure [32] wall consist of the window which is put under an angle. This influences the front-back axial mode that would exist in a rectangular box. It could be expected that a mode exists somewhere in between an axial front-back mode and a tangential mode also including the roof and the floor Reflections As mentioned before, modal analysis is not relevant at higher frequencies since modes are closely spaced together. When the wave length becomes smaller the acoustic paths tend to behave as rays and one can think in terms of reflections. If two correlated acoustic waves arrive at the ear they can interfere constructively or destructively. (Which is of course also the physical principle behind the crosstalk cancellation.) These can again cause peaks and notches in the frequency response. It is also instructive to look at the impulse response as shown in figure 4.3. The response shown is the impulse response from the speaker in the back corner to the left ear. The first peak is the strongest and is referred to as the direct sound. Subsequently, a number of

43 4.2. LOUDSPEAKER ROOM INTERACTION X: 128 Magnitude (db) X: Front Speaker Back Speaker Frequency (Hz) Figure 4.2: Response from two different speaker positions measured with the HATS discrete reflections are observed. These are the early reflections and they come very short after the direct sound due to the small dimensions of the space. The speed of sound in air is 345 m/s and sound reflecting on one wall has an increased path length of less than half a meter, so the first reflections arrive within milliseconds after the direct sound. For the speakers placed further away from the side walls, these first reflections come slightly later. The tail of the impulse response contains the late, more diffuse reflections which contribute to the reverberant field. Careful inspection shows that a strong low frequent contribution is present which decays more slowly. This corresponds to the resonance frequency visible in the frequency response. For the front speaker for example, the resonance is less strong, so the contribution is also less visible in the impulse response.

44 28 CHAPTER 4. SOUND IN A VEHICLE ENVIRONMENT Time (s) Figure 4.3: Impulse response from speaker in the back corner to the left ear

45 Chapter 5 System design 5.1 Modelling a vehicle cabin (a) (b) Figure 5.1: Cabin set-up A cabin was made to represent a vehicle environment as shown in figure 5.1a. The shape was designed to resemble an agriculture machine cockpit for ongoing research on active

30 CHAPTER 5. SYSTEM DESIGN noise control. The construction is made out of metal with a Plexiglas front window and a roof made out of wooden panels. The roof measures 1.7 m by 1.

Loudspeakers are mounted in the wooden roof panels.

46 30 CHAPTER 5. SYSTEM DESIGN noise control. The construction is made out of metal with a Plexiglas front window and a roof made out of wooden panels. The roof measures 1.7 m by 1.4 m, the floor measures 1.4 m by 1.4 m and the height is 1.5 m. Absorbent material is placed against the walls on the inside of the cabin to create a realistic sound environment. Loudspeakers are mounted in the wooden roof panels. This provides some flexibility for changing the speakers and avoids the need of placing equipment inside the cabin which could influence the sound field. Two types of speakers are used, mainly Visaton FR13 WP speakers and also some smaller Visaton FRWS 5 SC speakers. Both have a high Q-factor and therefore are suited to be used in an open-baffle mounting. The frequency responses are shown in figure 5.2. Figure 5.2: Loudspeaker responses

47 5.2. LOUDSPEAKER TOPOLOGIES 31 Figure 5.3: Positions of the loudspeakers. The position of the head is marked with a cross 5.2 Loudspeaker Topologies The positions of the loudspeakers are displayed in figure 5.3. Speaker 1-5 are all of the larger type Visaton FR13, while 6 and 7 are two smaller Visaton FRWS. Initially only the two speakers in the back and the speaker in the front were present. A basic two channel crosstalk system was used first to test if the filter design was working properly and to test the influence of parameters such as truncation, regularization and the amplifier response. Next, the front speaker was included to check any change in performance. To improve the crosstalk cancellation, a stereo dipole is added above the head. Since these speakers are mainly interesting for high frequencies, it is tested if a dipole could be used formed by the smaller loudspeakers, which only have an accurate response for a higher frequency range. It reduces the costs and, also important in an automotive environment, reduces the space needed. The setups which will be tested are: ˆ Two channel system with the back speakers 1+2 ˆ Three channel system with the back speakers and the front speaker ˆ Two channel system with the middle speakers 4+5 ˆ Four channel system with the back and middle speakers ˆ Four channel system with the back and small speakers

48 32 CHAPTER 5. SYSTEM DESIGN 5.3 Plant Matrix Measurements Characterizing the system is of utmost importance to design the compensation filters. To determine the plant matrix, the impulse responses of different loudspeakers to both ears were measured. When an input signal is presented to a device under test, the output can be obtained by convolving the impulse response with the input: y(t) = h(t) x(t) = + h(τ)x(t τ)dτ (5.1) In the frequency domain this corresponds to multiplying the input spectrum with the transfer function: X(f) = H(f) Y (f) (5.2) An impulse response can be determined by exciting each frequency and measuring the response of the system. In theory a Dirac-pulse could be sent out which contains all frequencies in a very short time interval. Inserting a Dirac δ function in 5.1 shows that the impulse response is obtained instantly. However, it is in practice very hard to excite such a pulse witch sufficient power at each frequency. For the deconvolution of the measured test signal it is possible to use any excitation signal as long as it has enough energy in frequency range of interest [33]. In this thesis a linear sweep was used. It excites each frequency, starting at the lower ones, with the same amount of energy and thus the spectrum is white. Equation 5.2 shows that the impulse response can be obtained by dividing the spectrum of the output with the spectrum of the input signal and then using an inverse FFT. A problem arising is that this method generates a lot of noise outside the range of excitation. The spectrum of the input is very low outside the excited range and dividing by this spectrum boosts these frequencies in the transfer function. Band-pass filtering can be used to filter out this noise. Looking back at the time domain representation it can be seen that the impulse response can also be obtained by convolving the output response with the inverse filter of the input. For the linear sweep, the inverse filter is its time-inverse since the spectrum is white. [33]. A convolution with a time-reversed signal is equivalent to the cross-correlation:

49 5.3. PLANT MATRIX MEASUREMENTS 33 y(t) x( t) = + substituting u = τ t y(t) x( t) = + y(τ)x(τ t)dτ (5.3) y(t + u)x(u)dτ (5.4) def = R xy (t) (5.5) The operation is illustrated in figure 5.4. It is also advisable to switch to the frequency domain at this point benefit from the reduced computation time due to the FFT. The output and the inverse filter can be transformed and multiplied to obtain the transfer function. Note that the time-inverse becomes the complex conjugate of the input in the frequency domain. Figure 5.4: Calculation of the impulse response for a white excitation signal [33] Measurements were performed using a LabView interface. The test signals were sent out using an external Terratec DMX sound card to an audio amplifier which drives the speakers. A Brüel & Kjær Head and Torso Simulator was placed inside the cabin to record a left and right ear signal. For the data acquisition a PXIe-1082 chassis with a PXIe-4498 slot from National Instruments was used. It takes the (preamplified) microphone signal and feeds it back to LabView using an 24-bit ADC. The sampling frequency was put to 48 khz. The linear sweep used had a duration of 60 s and ranged from 80 Hz to 8000 Hz. The LabView program saves the input signal, but the delay to the output was not constant each time a measurement was performed. When measuring two different speakers subsequently crucial phase information for the cancelling was lost. Therefore, it was decided

50 34 CHAPTER 5. SYSTEM DESIGN to measure an analog voltage in the signal path and feed it back to a second input of the PXI to use as a reference signal for the deconvolution. The voltage can be measured at the output of the sound card allowing to include a compensation for the amplifier in the filters. The plant matrix describing the system thus encompasses: amplifier response, speaker response, room response and the HRTF. Since the response is part of the plant matrix, they don t need to be maximally flat, but a flat response relaxes the compensation effort of the inverse filters. 5.4 Filter Design FIR filters are designed based on the measured impulse responses in the plant matrix. The impulse responses are band-pass filtered so any noise outside the range of excitation is omitted. The impulse responses are truncated subsequently to samples corresponding to a length of 1 s. This length allows the reverberant tail to drop below the noise floor so no information is lost. The plant matrix is normalized to its maximum value for convenience. This only has an influence on the amplitude of the filters and doesn t influence the shape. The inverse filtering is performed in the frequency domain using 3.12, repeated here: C = [H H H + βi] 1 H H (5.6) This equation actually denotes a set of equations since the inversion is performed for each frequency separately. The regularization parameter β is chosen so the length of the filter is concentrated in the central part. An example of an inverse filter is shown in figure 5.5. An optimal value has to be determined by trial and error but the exact value is not very critical [27]. In this work a value of β = 0.05 was chosen. For a further improvement it is advisable to use a frequency-dependent regularization parameter [34]. Using a single parameter results in an attempt to equalize the response over the full frequency range. However, the frequency bands outside the range of excitation have a low level in the transfer function, so in the inverse filter they are boosted unnecessarily. By applying a very large value for the regularization parameter in these regions, the inverse filters have no energy outside the range of interest. As a final step, the inverse filters are transformed back to the time domain and a cyclic shift over half the length is applied. This implements a modelling delay and creates causal filters.

51 5.5. AMPLIFIER DESIGN Samples (n) x 10 4 Figure 5.5: Example of an inverse filter with its energy concentrated in the central part 5.5 Amplifier Design Influence of amplifier characteristics Initially, two types of Pioneer audio amplifiers were used to drive the speakers, a VSX-826 and a A-607R. The frequency responses of both were measured for the 80 Hz to 8000 Hz excitation signal from the sound card and loaded with 4 Ohm. The responses are depicted in figure 5.6. The VSX-826 has an almost perfectly flat response while the A-607R shows a fairly big drop-off above 1 khz. According to the specs, the response should be flat up 100 khz so an internal component is probably failing. This is not necessarily a problem since the amplifier characteristic can be a part of the plant matrix system and thus are equalized as well. However, above 250 Hz the response also starts showing some modulation, probably due to nonlinear effects in the amplifier. These nonlinearities cannot be compensated for, since using impulse responses implies dealing with linear systems. Crosstalk will still be possible, but as can be seen in figure 5.7, the performance for the same crosstalk system is worse than with the other amplifier. For this reason it was decided to build a set of new small mono amplifiers to be able to test a multichannel set-up without having a limited performance due to the amplifier.

52 36 CHAPTER 5. SYSTEM DESIGN Magnitude (db) A 607R VSX Frequency (Hz) Figure 5.6: Frequency responses of the Pioneer amplifiers. magnitude has no significance.) (The absolute value of the Magnitude (db) Magnitude (db) Frequency (Hz) Left Right Frequency (Hz) Left Right (a) (b) Figure 5.7: Comparison of the two channel crosstalk cancellation with the two different Pioneer amplifiers. Measurements happened subsequently with the HATS at the same position. A two channel system with speaker 1 and 2 in the back was used.

53 5.5. AMPLIFIER DESIGN Circuit The amplifier chip which was initially chosen is the TDA2050 from STMicroelectronics. It is an audio class AB amplifier suited for hi-fi applications. It can deliver a high output power over a wide range of input voltages and can be used with a single supply voltage. This allows it to be used with a commercially available laptop power supply for example and omits the need of including a dedicated power supply circuit. With a 20 V single supply voltage the amplifier is able to deliver 5 W to an 8 Ohm load -the small speakers- and 8 W to a 4 Ohm load -the big speakers- at which is more than sufficient for playback in the cabin. The influence of the external components will be discussed next and simulated when possible. The amplifier is modelled in PSpice (cfr. figure 5.8) by an ideal op-amp and the loudspeaker is modelled as a resistance in series with an inductor. The values correspond to those from the FRWS 5 SC speaker. Figure 5.8: PSpice simulation circuit Bias Network The power supply provides a bias voltage at the input of the amplifier. At DC we can consider the capacitors to be open circuits. R 1 and R 2 form a resistive voltage divider and since they are equal, the bias voltage will be half of the supply voltage. The input impedance of the op-amp can be considered very high so no currents flow through R 3 and the bias voltage is produced at the non-inverting input. In the circuit the DC voltages are

54 38 CHAPTER 5. SYSTEM DESIGN shown and verify the prediction. The power supply is decoupled for AC by C 3 and C 5. These form the common combination of a larger capacitor able to store more energy and a smaller one with a good high frequency response. Input At the input it is necessary to block DC-voltages because these would decrease the dynamic range of the amplifier or even cause saturation. In a first approximation we can consider C 2 to be a short circuit at AC and since C 2 is connected in parallel with R 1 and R 2 we can also neglect the influence of these resistors. R 3 is then connected to ground and a first order high-pass filter, formed by C 1 and R 3, is present at the input. The cutoff frequency can be found as: 1 1 f c = = = 3.3 Hz (5.7) 2πR 3 C 1 2π 2.2µF 22kΩ Using the above value the assumptions about the other components can be checked. The impedance of capacitor C 2 at the cutoff frequency is: Z = 1 2πf c C 2 = 1 2π 3.3Hz 100µF = 482 Ω (5.8) When put in parallel with two resistors of 22 kω, this leads to an impedance of 462 Ω. So neglecting the two resistors only gives a minimal difference of less than 5%. In series with a 22 kω resistor, the 482 Ω impedance is added so a total resistance of Ω is obtained. Neglecting the influence of the capacitor again results in a difference of less than 5% so the assumption of R 3 connected to ground is justified as well. The input transfer function is simulated and depicted in figure 5.9a. The 3 db mark appears to be close to 3 Hz as was predicted. Gain The gain is determined by the feedback network formed by R4, R5 and C4. The noninverting amplifier topology has following transfer function: G = 1 + R 5 R jωc 4 = 1 + jωr 5C 4 jωr 4 C (5.9) 1 + R 5 R 4 for high frequencies (5.10) 22 kω kω (5.11) 30.5 db (5.12)

55 5.5. AMPLIFIER DESIGN 39 It is clear that the gain is mainly influenced by the R 5 -R 4 ratio. The low-frequency cutoff is determined by the R 4 -C 4 combination and can be found as: f c = 1 2πR 4 C 4 = 1 2π 22µF 680Ω = 10.6 Hz (5.13) The main reason for this high-pass filtering is to provide a proper DC-bias at the output of the amplifier. At DC R 4 is decoupled and R 5 is the only component left in the feedback path. This connects the output back to the input and results in unity gain. As a consequence, the DC-bias at the input is reproduced at the output. The transfer function can again be verified in PSpice (cfr. figure 5.9b). Output Since DC voltages could damage a loudspeaker, it is necessary to prevent them from going to the output. C 7 is a blocking capacitor which forms a high-pass filter together with the resistance of the speaker. For an 8 Ohm speaker this becomes: f c = 1 2π 1000µF 8Ω = 20 Hz (5.14) This is also observed when looking at the simulated transfer function from the output of the amplifier to the speaker input in figure 5.9c. There is also a Zobel-network present at the output, formed by R 6 and C 6. Since loudspeaker impedances are generally inductive at higher frequencies because of the voice-coil impedance, the load of the amplifier increases with frequency. A Zobel-network tries to compensate for that by introducing a series RC combination in parallel with the output. By choosing R 6 equal to the loudspeaker impedance R LP and C 6 = L LP /RLP 2 the amplifier would always see purely resistive load equal to the loudspeaker impedance. Since a better model for a voice-coil would be a lossy inductor rather than an ideal one, some more complicated calculations can be made [35]. Keeping an exact resistive value for the load is mainly important when designing (passive) crossover networks. For the amplifier it is important to maintain stability at higher frequencies and the Zobel-network can be considered to be a low-pass filter. The recommended values in the datasheet introduce a pole around 150 khz to prevent ultrasonic oscillations.

56 40 CHAPTER 5. SYSTEM DESIGN (a) Input Transfer Function (b) Gain Transfer Function (c) Output Transfer Function Figure 5.9: PSpice simulations

57 5.5. AMPLIFIER DESIGN 41 Second Design A second design is made using an LM3875 amplifier chip from Texas Instruments. This chip can be embedded with almost identical external components, but outperforms on properties such as output power, distortion and noise floor. The LM3875 is able to deliver up to 56 W into an 8 Ohm load at a distortion of 0.1%, while the TDA2050 can maximally deliver 32 W into an 8 Ohm load at a distortion of 10%. The EAGLE schematic in figure 5.10 shows the final amplifier circuit. A series RC circuit with a pole around 150 khz is added in shunt with the feedback path. This lowers the gain at high frequencies and thus increases the phase margin. Also a small capacitor is added across the input pins of the amplifier. This capacitor forms a low-pass filter with the source impedance again to avoid high frequency oscillations. Currents from the output lead could be coupled to the input when the source impedance is high. The higher the source impedance, the lower the cutoff frequency of the filter. Figure 5.10: EAGLE schematic Board design EAGLE was used to make a PCB design. The schematic and layout design are depicted in figures 5.10 and The limited number of components made it possible to limit the board to a one-layer design. The copper layer and the SMD components were put on the bottom layer so through-hole components could be placed on the top. Screw terminals

42 CHAPTER 5. SYSTEM DESIGN were chosen for connections to supply, input signal and loudspeaker. This provides some flexibility towards future use.

58 42 CHAPTER 5. SYSTEM DESIGN were chosen for connections to supply, input signal and loudspeaker. This provides some flexibility towards future use. A heat sink was placed following the recommendations in the datasheet. The high current traces for the supply and the output on the one hand and signal and feedback paths on the other hand are grouped separately to avoid as much coupling as possible. A ground plane was included. No extra volume control was added to avoid creating different gains for different channels. A picture of the finished amplifier is shown in figure A PCB for the TDA2050 chip was designed and tested as well. The EAGLE files are included in appendix A. Figure 5.11: EAGLE layout

59 5.5. AMPLIFIER DESIGN Test The frequency response for both the TDA2050 and the LM3875 pcb was measured over a 20 to 20 khz range for a 4 Ohm load impedance. The results are shown in figure Both amplifiers have a nearly flat response up to the high frequencies. The cutoff frequency is around 40 Hz as predicted by equation 5.14 for a 4 Ohm load. The shape of the response is the same for both amplifiers, indicating that it is primarily influenced by the external components. The TDA2050 appears to be noisier than the LM3875 chip, though the 50 Hz component of the power net is coupled more strongly in the latter. As can be seen the next chapter, the amplifier perform good when used for the crosstalk cancellation system. 30 Magnitude (db) Frequency (Hz) Figure 5.12: Amplifier frequency response. The absolute value of the magnitude has no significance

60 44 CHAPTER 5. SYSTEM DESIGN Figure 5.13: Finished amplifiers

61 Chapter 6 Results This chapter covers the objective performance of the designed crosstalk cancellation system. Filtered test signals were played back in the cabin and recorded with the HATS to measure the quality of the crosstalk cancellation. Typical figures of merit are channel separation, sweet spot size and spectral distortion. Binaural signals can be played back after being filtered with the crosstalk filters and the quality of the reproduction can be quantified by calculating the interaural differences. 6.1 Crosstalk Cancellation Quality Measures The test setup used is similar to the one used for the determination of the plant matrix. Signals can be filtered and played back in MatLab using the open source C library Portaudio and an ASIO driver. ASIO allows to address the hardware directly, in this case the Terratec sound card, and root the different channels to the corresponding outputs. The sound is recorded with the HATS and the data is acquired in LabView through the National Instruments data acquisition system. Channel Separation The goal of a crosstalk cancellation system is essentially to perfectly control which amount of sound is sent to one ear without influencing the sound at the other ear. A criteria to quantify this is the channel separation. When a signal is sent to one channel of the crosstalk filters and nothing to the other, the ratio of the recorded sound level at the ipsilateral ear to the sound level at the contralateral ear then allows to evaluate the performance. For the tests a linear sweep is sent to the left channel, so ideally one would recover the sweep

62 46 CHAPTER 6. RESULTS exactly at the left ear and record silence at the right ear. So if no directional cues are attached to the emitted signal, the listener is not able to give a direction to that sound. He only hears the original monaural sweep at the left ear and nothing at the right ear. In practice this will be impossible to achieve and some crosstalk will always be present. The system in the cabin is designed and tested for a frequency range of 80 Hz to 8000 Hz. Sweet Spot In a virtual 3D environment the sound is only reproduced at the spot of the ears. The crosstalk filters are designed for a specific listening position and changing this position results in a decrease in performance. An automotive listening situation has the advantage that the position of the driver is known in advance, in contrast to applications where the listener can freely move around. The driver can still move his head in a limited area in which the performance deteriorates with respect to the optimal position. In the tests, only the rotational sweet spot is considered. The channel separation up to an angle of 30 degrees will be looked at. Since the listening situation is not perfectly symmetrical, rotations to the left and right side will yield slightly different results. However, this will provide no new insights so only the sweet spot for rotations to one side will be looked at. A dynamic system could improve the sweet spot by adapting the filters to the position of the head. Distortion It is not only important to present the correct interaural differences to a listener, but it is also desired to preserve the monaural characteristics of the sound. A sound system tries to reproduce music without adding any spectral coloration which could change the music experience. Moreover, in a binaural reproduction system, distortion could change the spatial information contained in the spectral cues. As a reference, the response measured with the HATS for a linear sweep through the left loudspeaker in the back without any filtering is shown in figure 6.1. The shape of an HRTF as in 2.2 can be recognized by the resonance of the ear canal at 2 khz. The many sharp peaks and dips show the big influence of the cabin environment. Unlike in the case of an anechoic HRTF, there is no natural channel separation. At certain frequency ranges the level at the contralateral ear is even higher than at the ipsilateral ear.

63 6.1. CROSSTALK CANCELLATION QUALITY MEASURES Magnitude (db) Frequency (Hz) Left Right Figure 6.1: Measured HATS-response from the left loudspeaker in the back without crosstalk cancellation filtering Visualization using 1/3 Octave Bands The crosstalk cancellation results a rapidly varying response at the contralateral ear and thus also results in cumbersome data for the channel separation. To reduce the resolution and to be able to make comparisons between different sets, the frequency range is divided into 1/3 octave band and the energy is calculated per band. The center frequencies for the bands are spaced as f n+1 = 3 2f n (6.1) and feature a constant relative bandwidth. The bands are centered at 1000 khz and are given in B. The 80 and 8000 Hz bands are at the limit of the excited range so the value is not fully valid. However, based on a more narrow band analysis, the level gives a good indication of the channel separation so it is decided to include the outer bands in the plots. An example is shown in 6.2. This approach is justified since the human auditory system doesn t have an infinite resolution either. An energy integration is performed in bands much alike 1/3 octave bands [6].

64 48 CHAPTER 6. RESULTS Channel Separation (db) Original data 1/3 Octave band analysis Frequency (Hz) Figure 6.2: Comparison of the original channel separation and the 1/3 octave band average 6.2 System performance: Channel Separation and Sweet Spot The playback results for the static crosstalk cancellation system, for the different loudspeaker topologies mentioned in the previous chapter, are shown in figures For each topology, the measured response at the ears for the linear sweep through the left channel is displayed, as well as the channel separation for different angles of rotation while keeping the inverse filters for the initial head orientation. The plant matrix was remeasured each time before a set of measurements to exclude errors from the positioning of the HATS. The first setup which was tested, was the two channel system using only speakers 1 and 2 in the back. The crosstalk filters perform already pretty good at the central position. Above 200 Hz a channel separation of 18 to 25 db achieved in the 1/3 octave bands. When the head is rotated, the performance quickly decreases. The deterioration is quicker and more severe for higher frequencies. This can be expected since the cancelling sound waves create interference patterns with a spacing proportional to the wavelength. Rotating the head can then more easily bring the ears from a maximum to a minimum or vice versa. This can even lead to a larger amount of sound received at the right ear than at the left ear as is for

65 6.2. SYSTEM PERFORMANCE: CHANNEL SEPARATION AND SWEET SPOT 49 example the case for the 1/3 octave band centered at 2500 Hz. From the response at the ears, it can been seen that the crosstalk cancellation is hard to realize at certain discrete frequencies. These are the frequencies at which the inversion-problem is ill-conditioned. This typically happens at frequencies where the measured transfer functions are almost equal for both ears. The plant matrix is then almost singular, so a good inversion is hard to obtain. It can be seen for example that the resonance peak of the cabin around 130 Hz is hard to control. Small errors can also be introduced due to noise or non-stationarity of the plant matrix. In the next step, the front loudspeaker is added so now the filtering happens through a three channel system. Figure 6.4 shows that the overall performance is improved by inclusion of a third loudspeaker. The channel separation is now in the range of 18 to 30 db in the 1/3 octave bands from 200 Hz onwards. The response also shows the number of discrete frequencies at which inversion is hard to realize is strongly reduced due to an added degree of freedom. The rotational sweet spot is not increased much however. The response decreases in a similar way to the two channel setup. This indicates that simply adding loudspeakers will not necessarily result in an improvement of the sweet spot, so a careful choice of the position should be taken into account. As already mentioned in the previous chapter, a stereo dipole pair is tested as well. Previous research shows that this setup can realize a broad sweet spot at higher frequencies at expense of a reduction in performance at the lower frequencies. The stereo dipole is formed by the two middle speakers located above the head. The plots in figure 6.5 verify the expected behavior. The performance of the crosstalk cancellation at the central position is a lot worse than with the previous setups, but a clear improvement in the sweet spot for frequencies above 2 khz is achieved. In a last setup, a four channel systems is tested which tries to combine the asset of two setups. The system with the two back speakers has a good performance for low frequencies but suffers from strong reduction at high frequencies when the head is rotated, whereas the two middle speakers have a reduction in performance at the low frequencies while having a broader sweet spot for high frequencies. The results are shown in figure 6.6. Crosstalk cancellation is achieved over the full frequency range in the central position. From the 200 Hz band onwards the cancellation doesn t drop below 21 db matching the performance of common systems in anechoic environments [23]. The response shows that the four channel system doesn t suffer from discrete frequencies at which the inversion cannot be realized. The sweet spot is increased by including the middle speakers in comparison with the the system only using the back speakers. To have an overview of the performance of the systems, the results at the central position

66 50 CHAPTER 6. RESULTS and at an angle of 30 degrees are plotted together for the different topologies in figure 6.7. It is clear that the four channel system has the best overall performance and manages to combine the assets of the middle and back loudspeaker positions. A further improvement could consist of implementing a dynamic crosstalk cancellation system. A head tracker would then need to be included to detect the position of the head and update the plant matrix accordingly. Figure 6.8 shows the channel separation for a head rotation of 30 degrees when the plant matrix is adapted to the orientation. The results for the two channel system, using speaker 1+2, and the three channel system are compared with the performance of the static crosstalk canceller for the 0 and 30 degrees rotation. It can be seen that updating the plant matrix for the rotated head allows a gain of more than 20 db for the highest frequencies. For the two channel system, the matched plant matrix at 30 degrees has a decrease in performance up to 8 db compared to the matched plant matrix at 0 degrees. This caused by different positions of the loudspeakers if rotated. In the limiting case of a rotation of 90 degrees, both loudspeaker are placed at one side of the head, so it is hard to control the sound at the ear directed away from the speakers. The three channel system doesn t suffer from this problem since at least one speaker is always placed one side of the ears. The plot shows that the plant matrix at 30 degrees results in a comparable performance with the plant matrix for 0 degrees.

67 6.2. SYSTEM PERFORMANCE: CHANNEL SEPARATION AND SWEET SPOT Magnitude (db) Frequency (Hz) (a) Response at the two ears (0 ) Left Right Channel Separation (db) Frequency (Hz) (b) Channel separation for different angles Figure 6.3: Two channel system with back speakers

68 52 CHAPTER 6. RESULTS Magnitude (db) Frequency (Hz) (a) Response at the two ears (0 ) Left Right Channel Separation (db) Frequency (Hz) (b) Channel separation for different angles Figure 6.4: Three channel system with back speakers and front speaker

69 6.2. SYSTEM PERFORMANCE: CHANNEL SEPARATION AND SWEET SPOT Magnitude (db) Frequency (Hz) (a) Response at the two ears (0 ) Left Right Channel Separation (db) Frequency (Hz) (b) Channel separation for different angles Figure 6.5: Two channel system with middle speakers

70 54 CHAPTER 6. RESULTS Magnitude (db) Frequency (Hz) (a) Response at the two ears (0 ) Left Right Channel Separation (db) Frequency (Hz) (b) Channel separation for different angles Figure 6.6: Four channel system with back speakers and middle speakers

71 6.2. SYSTEM PERFORMANCE: CHANNEL SEPARATION AND SWEET SPOT Channel Separation (db) Back LP Back+Front LP Middle LP Back+Middle LP Frequency (Hz) (a) Channel separation when the head is not rotated Channel Separation (db) Back+Middle LP Middle LP Back+Front LP Back LP Frequency (Hz) (b) Channel separation at an angle of 30 degrees Figure 6.7: Comparison of channel separation and sweet spot for different loudspeaker topologies

72 56 CHAPTER 6. RESULTS Channel Separation (db) dynamic CTC 0 30 static CTC Frequency (Hz) (a) Comparison of the channel separation with dynamic and static crosstalk cancellation for the two channel setup Channel Separation (db) dynamic CTC 0 30 static CTC Frequency (Hz) (b) Comparison of the channel separation with dynamic and static crosstalk cancellation for the three channel setup Figure 6.8: Comparison of channel separation and sweet spot for different loudspeaker topologies

73 6.3. SYSTEM PERFORMANCE: DISTORTION System performance: Distortion The frequency response at the left ear can be used to get an impression of the distortion in the system. Since a linear sweep is used as an excitation signal, a system without distortion would result in a perfectly flat response at the ear. Distortion occurs at frequencies at which it is hard to realize the equalization of the plant matrix. Figure 6.9a shows the shifted frequency responses for the different loudspeaker topologies at the central position. The four channel system has the better crosstalk cancellation performance and thus also has the least distortion. When the head is rotated, the performance decreases at higher frequencies and this also introduces more distortion. Sharp notches and peaks are noticed due to interference patterns. When the sweet spot is larger, these notches and peaks are less present, so there is also less distortion. The responses at an angle of 30 degrees are shown in 6.9b. Again the four channel system proves to result in the best performance. When compared with the reference response in figure 6.1, the crosstalk cancellation results in a major improvement in the distortion by equalizing the room response. 6.4 Performance with Small Loudspeakers The four channel system showed the best performance by increasing the sweet spot at higher frequencies by including a stereo dipole. This leads to the idea that the stereo dipole speakers could be replaced by smaller speakers which only have a good frequency response at higher frequencies. Figure 6.10 shows the results for a four channel system with the smaller loudspeakers 6,7 and the back speakers 1,2. A very good channel separation for the central position is noted, but there is almost no improvement in the sweet spot size compared to the two channel system. This indicates that the cancellation is dominated by the two back speakers. The small speakers have a lower power response than the bigger ones so an equal contribution of the two types would require higher energy filters for the former. The plant matrix takes level differences into account, but the inversion also limits the energy in the filters. There is a trade-off between having a better performance, with more energy sent to the smaller speaker, and less energy when mainly using the larger speakers with a reduced performance. To improve the sweet spot one could maybe try to force more energy in the smaller filters by introducing some channel-dependent regularization, but this possibility was not explored any further. Another possible cause for the reduced performance, could be different position of the stereo dipole. It is more to the back towards the other speakers, compared to the original stereo dipole. The four speaker are then at the same side of the head, so this topology could result in a more limited performance.

74 58 CHAPTER 6. RESULTS Magnitude (db) Back LP Back+Front LP Middle LP Back+Middle LP Frequency (Hz) (a) Frequency response at the left ear at an angle of 0 degrees Magnitude (db) Back LP Back+Front LP Middle LP Back+Middle LP Frequency (Hz) (b) Frequency response at the left ear at an angle of 30 degrees Figure 6.9: Comparison of the frequency responses for different loudspeaker topologies to illustrate distortion

75 6.4. PERFORMANCE WITH SMALL LOUDSPEAKERS Channel Separation (db) Frequency (Hz) (a) Channel separation for different angles Channel Separation (db) Back+Small LP Back+Middle LP Back+Small LP 30 Back+Middle LP Frequency (Hz) (b) Comparison of the two four channel setups Figure 6.10: Four channel system with back speakers and small speakers

76 60 CHAPTER 6. RESULTS 6.5 Performance for Shorter Filters A real time crosstalk canceller was previously developed at Intec Acoustics. The filtering was based upon at database of HRTFs measured in an anechoic environment. The challenge for integration with the current filters is that the transfer functions measured in the cabin are a lot longer due to the reverberant environment. A length up to 1 s is needed before the reverberant energy in the impulse response drops below the noise floor. In contrast a measured HRTF in an anechoic chamber can be truncated until it has a duration of only 11.6 ms. The real-time filtering is done in the frequency domain so the computation speed depends on the block length of the FFT. Testing the application shows that is the maximum block length that can be handled by the computer that is used. Increasing the block length even further, results in audible artifacts because the computations do not happen fast enough to fill the output buffer. The current filters have a length of samples so are not suited for the real-time application. Too shorten the filters, the measured impulse responses could be truncated to samples before being inverted. A decrease in performance can be expected since some relevant data in the impulse response is cut away and generally also because the filter has a lower order. Figure 6.11 shows the performance for two setups compared to the original filters. A slight reduction in the channel separation can be noted. The four channel system suffers slightly less from the reduction in filter length and proving another asset for this system. The maximum loss is 4.5 db for the four channel system while it is 5.5 db for the two channel system. The regularization parameter can be used to control the duration of the filters, but choosing the regularization parameter higher than necessary results a strongly reduce performance, before a substantial reduction in filter length is achieved. So it not advisable to use this approach to get shorter filters. Moreover, in the current implementation, the inverse filters are calculated in real-time and thus the inversion operation is a limiting factor. Shortening the inverse filters will not gain much speed D Sound Virtualization Crosstalk cancellation was introduced as a technique to deliver binaural signals to the ears. To create a 3D illusion, it is important that the spatial cues in the signals are preserved. When the channel separation is not high enough, the binaural signals are deformed and a wrong source position is perceived. In [36] a channel separation of 12 db is stated as sufficient to produce accurate virtual sources. However, in [37] a minimum value of 15 db to 20 db is indicated, so the limit is implementation-dependent. As became clear in section

77 6.6. 3D SOUND VIRTUALIZATION Channel Separation (db) Frequency (Hz) Channel Separation (db) Frequency (Hz) (a) Two channel system with back speakers (b) Four channel system with back speakers and middle speakers Figure 6.11: Performance for shorter filters 2.1, many cues contribute to the sound localization mechanism in the auditory system. To estimate the quality of the 3D reproduction, only the interaural level and time differences are used. Binaural synthesis was performed using HRTFs from the CIPIC dataset [9] to add spatial information to monaural music sample. The binaural signals were played back in the cabin after being filtered with the crosstalk cancellation filters. The signals recorded with the HATS can then be compared with the original binaural signals to estimate the quality of the reproduction. The ITD can be estimated by calculating the interaural cross-correlation between the leftear signal and the right-ear signal [38]. The time difference is then found as the time where the cross-correlation is maximized. The threshold for untrained listeners to discriminate differences in ITD at a frequency of 0.5 khz [39]. This value can be used as a reference to quantify the reproduction. The ILD can be calculated by taking the ratio of the energy spectrum of the two binaural signals: f 2 X left (f) 2 df f ILD = 10(log) 1 f 2 X right (f) 2 df f 1 (6.2) The integration range is chosen from 1 khz to 5 khz since in this region the ILD is a primary

78 62 CHAPTER 6. RESULTS cue for localization [40]. As a reference, a threshold for discrimination between differences in ILD is around 4 db at 4 khz for untrained listeners [39]. The generated virtual sources lie in the horizontal plane at following angles angles: 0, 10, 20, 30, 45 and 65. The 0 source indicates a spot directly ahead of the listener, while increasing angles indicate sources displaced to the right. Only sources in the horizontal plane are considered since the interaural cues mainly provide spatial information in this plane [6]. In table 6.1 and 6.2 the results are compared for the two channel system using loudspeakers 1+2 in the back and the four channel system also including loudspeaker 4+5 in the middle. The differences are measured when the head is in the optimal position at an angle of 0 degrees and when the head is rotated over 30 degrees. At the optimal listening position the reproduction of the ITD is nearly perfect for both systems. Only for the source at 10 a difference is measured, but it is below the discrimination threshold. When the head is rotated, the performance of the crosstalk cancellation is worse so the reproduction of the ITD is less good. For the two channel system, differences occur up to almost 400 µs and the spatial information in the ITD cue will be lost. The four channel system provides a broader rotational sweet spot and it can be seen that this results in lower differences in the ITD. It can be seen that differences are just below or above the threshold, so the spatial information is likely to be preserved. For the ILD the results are again very good in the optimal listening position. The differences are not higher than 1.1 db which is below the threshold of 4 db. For the head rotated to 30 degrees, the differences are a lot higher. The increase in performance of the four channel system is not enough in the frequency range which determines the ILD. Previous research showed that, when the cues are conflicting, the ITD cue dominates over the ILD cue [8]. Thus, for a broadband source, will be perceived correctly with the four channel system. It is clear from the results that a matched plant matrix results in a very good reproduction of the interaural cues. When a dynamic crosstalk system is used which adapts the plant matrix to the head position, it can be expected that ITD and ILD will be preserved for other positions than the optimal listening position.

79 6.6. 3D SOUND VIRTUALIZATION 63 Source position 2 channel 4 channel 0 0 µs 0 µs µs 21 µs 20 0 µs 0 µs 30 0 µs 0 µs 45 0 µs 0 µs 65 0 µs 0 µs (a) Head rotation 0 Source position 2 channel 4 channel µs 42 µs µs 63 µs µs 84 µs µs 42 µs µs 62 µs µs 62 µs (b) Head rotation 30 Table 6.1: Differences in ITD when crosstalk cancellation is performed with the two channel system (LP 1-2) and the four channel system (LP ) for the head in the optimal position at 0 and for a head rotation of 30 Source position 2 channel 4 channel db 0.17 db db 0.36 db 20 0 db 0.2 db db 0.4 db db 0.3 db db 0.5 db (a) Head rotation 0 Source position 2 channel 4 channel db 7.5 db db 12.2 db db 11.0 db db db db db db 21.2 db (b) Head rotation 30 Table 6.2: Differences in ILD when crosstalk cancellation is performed with the two channel system (LP 1-2) and the four channel system (LP ) for the head in the optimal position at 0 and for a head rotation of 30

80 64 CHAPTER 6. RESULTS

81 Chapter 7 Conclusion In this thesis the possibility of creating virtual 3D audio in a vehicle environment is investigated. Binaural signals can be used, but for this it is necessary to be able to control the sound at the ears of a listener. Measurements in the cabin show that the sound field is heavily influenced by the environment. Reflections and resonances result in spectral deformation of the sound and absence of natural channel separation. The desired control is obtained with the use of the crosstalk cancellation technique. The unwanted contribution from a loudspeaker at the contralateral ear is cancelled with sound from a second speaker. The impulse responses from the loudspeakers to the ears are measured in the cabin and used for the design of an inverse filter matrix. An exact inverse is not realizable, so approximated filters are calculated using a least squares method combined with a regularization parameter. A channel separation above 18 db in the 200 Hz to 8 khz range was already obtained at the central position using a classic two channel system, approaching the performance of 20 db of common systems in anechoic environments [23]. When the head was rotated, the performance quickly decreased above 1 khz, indicating a limited sweet spot. It was tried to improve the system by implementing a multichannel loudspeaker system. A set of small audio amplifiers was built to guarantee an identical amplification for each channel and overcome problems with combining commercial amplifiers. Different multichannel crosstalk cancellation systems have been investigated. The results for the three loudspeaker setup showed that channel separation can be improved by increasing the number of transducers, but the increase in sweet spot was limited. Additional loudspeakers were added in a specific topology. Two closely spaced loudspeakers, forming a stereo dipole, were mounted above the head. Crosstalk cancellation with these two speakers provides a broad rotational sweet spot at high frequencies while it has a bad performance at low frequencies, confirming

82 66 CHAPTER 7. CONCLUSION what is written in literature [29]. A four channel system was designed, combining the stereo dipole with the initial two channel system. This results in a channel separation over 21 db above 200 Hz and an increased sweet spot due to the two closely spaced speakers. A dynamic crosstalk could prove to be a major improvement for the sweet spot by updating the plant matrix according to the head position. A multichannel system is preferred since this allows to have at least one loudspeaker at each side of the head. However, a headtracker needs to be included to update the plant matrix. The crosstalk cancellation also has the asset of equalizing the room response, causing a less distorted sound at the ears. The more crosstalk cancellation is achieved, the more flat is the response. This implies that more distortion is present when the head is rotated. To minimize the distortion, the four channel system proves to be the best choice. Since the stereo dipole is meant to increase the high frequency performance, it is tested if smaller loudspeakers can be used, which don t need to have a response extending to the low frequencies. Results show that the channel separation is not as good as for the regular four channel system. Almost no improvement in the rotational sweet spot is noticed compared to the two channel system, so it is thought that the sound field is dominated by the larger speaker due to the higher output power. For future work it could be considered to look at a way to increase the relative amount of energy going to the small speakers, for example by introducing a channel-dependent regularization parameter. It also possible that the different positioning of the speakers influence the performance negatively. The capability of rendering virtual 3D sound is tested by synthesizing virtual sources and playing back the binaural signals in the cabin using the crosstalk cancellation systems. To quantify the 3D reproduction, the differences in ITD and ILD between the original binaural signals and the reproduced signals at both ears are calculated for a number of source positions. A comparison between the two channel and the four channel systems showed that both spatial cues are preserved for the optimal listening position. The differences are below the discrimination thresholds for untrained listeners. When the head is rotated over an angle of 30 degrees, the differences become higher. Neither of the cues can be reproduced correctly by the two channel setup. The four channel setup renders differences in ITD which are close to the threshold, so the spatial information is likely to be preserved. It is not capable of reproducing the correct ILD, indicating that the increase in rotational sweet spot is not sufficient at higher frequencies. Since the ITD dominates ILD, broadband sources will still be perceived correctly [8]. The human hearing is a complex process comprising not only the interaural differences. Physics, but also psychoacoustic effects play an important role. Therefore, future work should include subjective experiments with listeners judging the capability of creating virtual 3D sources and comparing the sound quality

83 67 of the systems. In the crosstalk implementation in a car by Farina [3], a 10 db channel separation is already sufficient to outperform a traditional audio system in subjective tests. Another aspect to be looked at, is the integration of the designed filters in the real-time application. Tests showed that the length of the filters can be reduced at the expense of a small reduction in performance. Informal tests with these filters showed real-time filtering can be executed without artifacts, but tests need to be done to validate if the filtering is performed correctly. Other ways of reducing the filter length could be investigated as well. A passive optical head tracker, compatible with the real-time software, was found. The head-tracker can be included in the real-time system to create a dynamic crosstalk cancellation system, further improving the virtual 3D environment.

84 68 CHAPTER 7. CONCLUSION

85 Appendix A EAGLE design TDA2050 Figure A.1: EAGLE schematic TDA2050

86 70 APPENDIX A. EAGLE DESIGN TDA2050 Figure A.2: EAGLE layout TDA2050

Introduction. 1.1 Surround sound

Introduction. 1.1 Surround sound Introduction 1 This chapter introduces the project. First a brief description of surround sound is presented. A problem statement is defined which leads to the goal of the project. Finally the scope of