3D Sound System with Horizontally Arranged Loudspeakers

Size: px
Start display at page:

Download "3D Sound System with Horizontally Arranged Loudspeakers"

Transcription

1 3D Sound System with Horizontally Arranged Loudspeakers Keita Tanno A DISSERTATION SUBMITTED IN FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN COMPUTER SCIENCE AND ENGINEERING Graduate Department of Computer and Information Systems The University of Aizu 2014

2 Copyright by Keita Tanno All Rights Reserved

3

4

5 Contents Chapter 1 Introduction Demand for spatial sound How humans perceive a sound image Head-related transfer function Left right perception Front rear and above below perception Influence of reverberation and head movement Equal loudness Measurement of impulse response Related research Transfer function composition method Solid angle division method Direct synthesis and compose sound field method Overview Chapter 2 A 3D Sound Generation System with Five Horizontally Arranged Loudspeakers Introduction A study of panning method for side area Hysteresis in sequential sound presentation in the side area Random sound presentation in the side area HRTF with amplitude panning for side areas Basis of 3D sound system HRTF convolution Amplitude panning based on the control points Subjective experiment Experiment 1: Comparison between 3D sound system and amplitude panning only system Results of elevation response Discussion of elevation response Results of azimuth response Discussion of azimuth response Experiment summary Experiment 2: Comparison between with and without reverberation of 3D sound system v

6 Results of elevation response Discussion of elevation response Results of azimuth response Discussion of azimuth response Experiment summary Conclusion Future Work Chapter 3 A New 5-Loudspeaker 3D Sound System with a Reverberation Reconstruction Method Introduction Headphone-based reverberation reconstruction method Conventional method New reverberation reconstruction method Four-point room impulse response measurement Calculation of sound intensity vectors Estimation of image sound sources Reconstruction of spatial room impulse response Experiment Results and discussion Experiment summary Loudspeaker-based reverberation reconstruction method Conventional method Spatial reverberation for five-loudspeaker Experiment Results and discussion Experiment summary Conclusion Future Work Chapter 4 Development of a User Interface for 3D Sound Creation Introduction System structure Programming libraries Hardware Real-time operation User interface Dual Shock Wii Remote Callback function Experiments Results and discussion Conclusion Chapter 5 Conclusion Contributions vi

7 References 112 vii

8

9 List of Figures Figure 1.1 Sound evaluation system Figure 1.2 A cross section of the ear Figure 1.3 Interaural time difference and sound image perception Figure 1.4 Interaural level difference and sound image perception Figure 1.5 Equal distance sound image localization and amplitude panning 6 Figure 1.6 Linear amplitude panning and sound image perception Figure 1.7 Traditional stereo loudspeaker system Figure 1.8 Level difference of side two loudspeakers and perception... 8 Figure 1.9 Cone of confusion Figure 1.10 The delay of reflection and distance perception Figure 1.11 Equal-loudness curves ISO 226: Figure 1.12 A-characteristic curve Figure 1.13 Synchronous and summation method Figure 1.14 A spectrogram of time-stretched pulse Figure 1.15 A spectrogram of inverse time-stretched pulse Figure 1.16 Expected signals and crosstalk Figure 1.17 Transaural system Figure 1.18 Binaural system Figure 1.19 A block diagram of HRTF player Figure multichannel audio format Figure 1.21 Wave field synthesis (WFS) Figure 1.22 Boundary surface control (BoSC) Figure 1.23 Spherical harmonic function Figure 2.1 Sequential sound level presentation and azimuth perception.. 29 Figure 2.2 Random sound level presentation and azimuth perception Figure 2.3 Sound presentation level correction curves Figure 2.4 Averaged error of azimuth responses Figure 2.5 Loudspeakers and their groups Figure 2.6 The positions of the control points Figure 2.7 Block diagram of this system Figure 2.8 Averaged error degrees of elevation responses related to treatment, target azimuth and target elevation in experiment Figure 2.9 Averaged error degrees of elevation responses related to treatment and target azimuth in experiment Figure 2.10 Averaged error degrees of elevation responses related to treatment and target elevation in experiment ix

10 Figure 2.11 Averaged error degrees of elevation responses related to target azimuth and target elevation in experiment Figure 2.12 Averaged error degrees of azimuth responses related to treatment, target azimuth and target elevation in experiment Figure 2.13 Averaged error degrees of elevation responses related to treatment, target azimuth and target elevation in experiment Figure 2.14 Averaged error degrees of elevation responses related to target azimuth and target elevation in experiment Figure 2.15 Averaged error degrees of azimuth responses related to treatment, target azimuth and target elevation in experiment Figure 3.1 Model of conventional reverberation in headphone-based system 52 Figure 3.2 Model of new reverberation including the effects of HRTFs.. 53 Figure 3.3 Four-point microphone set used for IR measurement Figure 3.4 Obtained image sound sources Figure 3.5 Reverberation model Figure 3.6 Reconstruction model of using the image sound sources of the measurement point Figure 3.7 Process flow of binaural room reverberation including the effect of HRTFs Figure 3.8 Averaged spatiality scores in headphone system related to treatment, reverberation source and sound Figure 3.9 Averaged clarity scores related to treatment and reverberation source in headphone system Figure 3.10 Averaged clarity scores related to reverberation source and sound source in headphone system Figure 3.11 Averaged clarity scores related to reverberation source, treatment and sound source in headphone system Figure 3.12 Averaged naturalness scores related to treatment, reverberation source and sound source in headphone system Figure 3.13 Averaged distance scores related to treatment, reverberation source and sound source in headphone system Figure 3.14 Averaged room size scores related to reverberation source, treatment and sound source in headphone system Figure 3.15 Model of conventional unidirectional reverberation Figure 3.16 Model of conventional reverberation adding in 3D sound system 70 Figure 3.17 Model of new reverberation adding in 3D sound system Figure 3.18 Reconstruction flow of spatial reverberation Figure 3.19 Averaged subjective scores in loudspeaker system Figure 3.20 Averaged distance and azimuth response in loudspeaker system 74 Figure 4.1 Sony Play Station 2 Controller SCPH Figure 4.2 Nintendo Wii Controller RVL-003, RVL-004, RVL Figure 4.3 A block diagram of device connection Figure 4.4 Continuous convolution calculation flow Figure 4.5 RISSICS s main window Figure 4.6 Playback setup window x

11 Figure 4.7 Partial convolution setup window Figure 4.8 Azimuth and elevation operation by Dual Shock Figure 4.9 Analog stick operation of Dual Shock Figure 4.10 Three-axis motion sensor and axes assignment Figure 4.11 The cause of the commutative function s problem Figure 4.12 Wii Remote azimuth operation Figure 4.13 Wii Remote elevation operation Figure 4.14 Callback calculation cycle Figure 4.15 Averaged score related to operation Figure 4.16 Averaged score related to treatment and operation Figure 4.17 Auto mode sound image trajectory xi

12

13 List of Tables Table 1.1 Channel mapping of 22.2 multichannel audio format Table 2.1 The positions of the control points and energy ratios Table 3.1 Dimensions of room used for measurements Table 3.2 Evaluation criteria and definitions Table 3.3 Dimensions of room used for measurements Table 3.4 Evaluation items Table 4.1 HRTF data set information Table 4.2 Sound motion recording format Table 4.3 Questions for inspection of the usability and the operability.. 92 Table 4.4 p-values of multiple comparison of operation Table 4.5 Auto mode key assign Table 4.6 Key assignments of Wii Remote Table 4.7 Wii Remote and Nunchuk button key assignments Table 4.8 Wii Remote with Classic Controller key assign Table 4.9 CG camera operation by a mouse Table 4.10 Keyboard key assign xiii

14

15 Abstract Imaging technology has improved recently, and televisions can express the depth of vision with only a flat panel. Surround loudspeaker system provides a sound field on the horizontal plane. One of the next generation of virtual reality expression is real 3D image and sound. There are various kinds of methods to make precise sound fields, including transfer function composition methods, solid angle division methods, direct synthesis, and composite sound field methods. However these methods have some limitations, such as small listening position and requirement of large-scale equipment. These problems prevent such systems from popularizing. On the other hand, 5.1 channel loudspeaker system has been popularized. Therefore, if the five loudspeaker systems have the capacity of playing spatial sound, many people can experience next generation sound. This dissertation explains a panning algorithm to display spatial sound for horizontally arranged five loudspeakers. Principal cues for human perceiving the location of a sound image are interaural time difference (ITD), the interaural level difference (ILD), and the spectral cues provided by the pinnae. Stereophonic loudspeakers, one of the most popular loudspeaker systems, can make a sound image between the two loudspeakers because the system can control ILD using a panning algorithm based on a trigonometric function. Five loudspeaker systems cannot use the panning algorithm because the human has nonlinear auditory characteristics. Human auditory sense has different sensitivities in different directions, therefore a new panning algorithm was proposed. The panning algorithm has some control points and energy distribution ratio. When a system received a direction of a sound image, the panning algorithm calculates each loudspeaker s amplitude using the control points. The panning method accounts for problems of asymmetrical loudspeaker arrangement and human auditory characteristics. If the system uses only the panning algorithm, a listener perceives a sound image only on the loudspeaker plane. The system uses a set of head-related transfer functions (HRTF) for a sound image to be perceived above the plane. The reason why HRTF is used is HRTF contains a lot useful information for spatial localization. The combination of the panning algorithm and HRTF provides spatial sound localization. A listener can perceive a sound image spatially on the horizontally arranged five loudspeakers. Moreover, the new system can be used not only in an anechoic chamber, but also in an ordinary echoic room. Then, this dissertation explains a new reverberation reconstruction method. Almost all the sound we hear contains reverberation. Reverberation gives a lot of cues for sound perception such as early reflections, reverberation time, and so on. To enhance listener envelopment, many sound sources were added reverberation. The method for adding reverberation to the sound source is convolving an impulse response (IR) and xv

16 the sound source. Convolving an IR and the sound source is not enough because reverberation contains a lot of reflections, in other words, each reflection has a room transfer function and an HRTF. Impulse response does not contain directional information. To obtain the directional information of each reflection, a closely located four omnidirectional microphone array that forms three dimensional rectangular coordinates was used. Four impulse responses were analyzed by the sound intensity method and transformed into image sound sources (ISS). A new reverberation was reconstructed using all ISSs. Every delay, attenuation, direction and head-related effect of ISS was calculated and summed. Therefore, a new reverberation could contain directional information. The new reverberation reconstruction method and a conventional reverberation method were compared in a headphone-based system and the loudspeaker-based 3D sound system. From the results, the new reverberation reconstruction method improved the sound impression such as Spatiality, Clarity, and Naturalness in the both systems. The relation between an audio content and a listener was usually a passive relation. A listener starts a content, and then the only thing the listener does is watching or listening to the contents. A positive relationship to a content is very important cue to provide a sense of immersion into a virtual reality. A real-time system named RISSICS (Real-time and Intuitive Spatial Sound Image Creation System) was developed. A Nintendo Wii Remote controller that has a three-axis motion sensor was adopted as a direction input device. A user can intuitively indicate the direction of a sound image. RISSICS immediately calculates all signal processing, and distributes the calculated signal to five loudspeaker. The refresh rate of RISSICS is shorter than the human s time resolution. A user can perceive a smoothly moving sound image. An experiment that evaluates how easy a user indicates a sound image position was conducted. Ten motions were requested to be evaluated, and the results show Wii Remote can be used as the input device. Conventional horizontally arranged multi-channel loudspeaker system could not provide a spatial sound image. However, the combination of the five channel panning algorithm using the control points and HRTF can provide a spatial sound image. Moreover, the new reverberation reconstruction method can make a reverberation with directional information. The new reverberation improved the sound impression. Furthermore, the real-time 3D sound system with an intuitive input device provides new experience that listeners have the positive relation to the sound image. These results contribute to the development of multi-channel audio and virtual reality. xvi

17 xvii

18 3 Nintendo Wii Remote 5 Wii Remote 5 5 xviii

19 Chapter 1 Introduction 1.1 Demand for spatial sound Imaging technology has improved recently, and televisions can express the depth of vision with only one flat panel. Surround loudspeaker system provides a sound field on the horizontal plane. One of the next generation of virtual reality expression is real 3D image and sound. A virtual reality (VR) close to a real situation is effective in a case of a failure not being excused such like pilot skill training. A high immersion VR and remote control are required in a case of work and operation in a dangerous area that people cannot enter. People require three elements to perceive VR that are real-time interaction, 3D space, and self-projection [1]. In acoustics, real-time interaction corresponds to a case of real-time manipulation of a sound source and a listening position. 3D space corresponds to not only left right and front rear but also above below perception. Self-projection requires consistency between different sense modality, therefore VR needs high precision auditory expression. Many first person shooting (FPS) games have launched recently on video game consoles. Spatial sound is important for these games to detect the direction of the enemy. To discover the human auditory sense from a field of spatial sound has a very important role to enrich the life of people. 1

20 CHAPTER 1. INTRODUCTION Sound source Room transfer function Physical Psychological Head-related transfer function Auditory organs Entrance of auditory canal Perception of sense elements Evaluation of sense elements Total evaluation Figure 1.1: Sound environment evaluation system. A listener s total subjective evaluation of a sound field from a sound source. 1.2 How humans perceive a sound image Fig. 1.1 shows an evaluation system that has an acoustic signal emitted from a sound source as an input and a listener s total evaluation of the input as an output [2 4]. In a physical space, an acoustic signal S(ω) emitted from a sound source reaches a position of a listener with many influences of the room transfer function (RTF) R(ω) such as reflections and diffractions from ceiling and walls. Then, head-related transfer function (HRTF) influences the acoustic signal and become an ear input signal P L, R (ω). P L, R (ω) is an acoustic stimulus S(ω)R(ω)H L, R (ω) to a listener s auditory organ. An entrance of external ear canal is the interface of physical space and psychological space because measuring acoustical stimulus is easy. Fig. 1.2 is a diagrammatic representation of an ear [5]. The acoustic stimulus vibrates ear drums via external auditory canal. The vibration travels through the cochlea, and nerves ignite according to the frequency. The nerves send pulses to the brain, and then human recognizes a sound [5 9]. In psychological space, a listener perceives a sound image with modality elements such like time, space, and quality (f n (P L, R )). Then a listener subjectively evaluates each modality element according to the listener s personal thinking (g n (f n )), the listener weights the subjective evaluations according to the listener s personal thinking (w n ), finally, the listener decides the total evaluation 2

21 1.2. HOW HUMANS PERCEIVE A SOUND IMAGE Figure 1.2: A cross section of the ear (adapted from Möricke and Mergenthaler 1959). 1: Semicircular canals. 2: Cochlea. 3: Eardrum-tensioning muscles. 4: Eustachian tube. 5: Cavum conchae. 6: External auditory canal. 7: Eardrum. 8: Hammer. ( i w i g i (f i )). A subjective information influences the total evaluation. For example, an image influences a position of a sound image [10] Head-related transfer function HRTF is a transfer function that contains important information of direction, distance, and head-related effect. HRTF H L, R describes a head-related impulse response in a frequency domain as following Eq. (1.1), H L, R (ω) = G L, R(ω) F (ω), (1.1) where, G L, R (ω) is a transfer function from a sound source to the entrance of an external ear canal measured in an anechoic chamber [11]. F (ω) is a transfer function of the sound source to the center position of the listener s head measured in an anechoic chamber. HRTF depends on the direction of the sound source because the human s head and external ears are asymmetric in all directions. Two HRTFs exist in one direction, and two HRTFs are usually used simultaneously. Because of HRTF containing complex information, which parts influence spatial sound perception is still under research. HRTF s peaks and notches are one of the spectral cues [12]. To obtain a pair of HRTF, an impulse response (IR) is measured by a microphone set at the listener s entrances of external ear canals. Then, the measured IR is transformed into a frequency domain. Time-stretched pulse (TSP) and M-sequence noise 3

22 CHAPTER 1. INTRODUCTION are used in IR measurements because these signals improve signal-to-noise (S/N) ratio than an explosive sound. HRTF has an individual difference. To optimize each person requires all listeners HRTF measurements, however, the duration of measurement becomes long because loudspeaker arrangement and measurement are repetitively needed. In a long time measurement, there are problems that a position of a listener s head change [13], and the change causes measurement error [14]. There are researches of reciprocal method that can measure many HRTFs in short time duration with enough accuracy [15 18]. Therefore, HRTF database must become discrete. There are researches estimating an HRTF interpolating between the measurement points [19 23]. There is a research measuring many HRTFs in short time duration using a revolving chair controlled by a servo motor and continuous measurement method [24]. Some HRTF databases are opened to the public [25 27]. There are researches searching an HRTF confirming to a listener from these databases [28 35]. Human can be adapted to another person s HRTF [36 40]. The learning effect lasts a month. For efficient adaptation, positive head movement and feedback on the correct answer is effective [41] Left right perception A sound image localization linearly responds to interaural time difference (ITD) in left right direction. A typical curve of lateralization for brief impulse sounds in shown in Fig. 1.3 [5]. X-axis is ITD (µs). Y-axis is a position of sound image localization (arbitrary unit). 0 is front, 5 is the entrance of external ear canal. When ITD is 1 ms, a sound image is localized at a completely side position. Between ±1 ms, delay and localization have a linear relationship. When a sound source is not on the median plane, the contralateral ear is behind the head and the energy of the contralateral ear becomes weaker than that of ipsilateral ear. Interaural level difference (ILD) mainly caused left right sound localization (Fig. 1.4) [5, 8]. X-axis is ILD (db). Y-axis is a position of sound image localization (arbitrary unit). 0 is front, 5 is the entrance of external ear canal. When there is no level difference, a sound image is localized front. When the level difference becomes ±10 db, a sound image is localized completely side position. The stereo- 4

23 (arbitrary unit) 1.2. HOW HUMANS PERCEIVE A SOUND IMAGE Figure 1.3: Interaural time difference and sound image perception for brief impulse sounds. X-axis: ITD (µs). Y-axis: a position of sound image localization (arbitrary unit). 0 is front, 5 is the entrance of external ear canal. (arbitrary unit) (db) Figure 1.4: Interaural level difference and sound image perception. X-axis: ILD (db). Y-axis: a position of sound image localization (arbitrary unit). 0 is front, 5 is the entrance of external ear canal. 5

24 CHAPTER 1. INTRODUCTION a L a R Figure 1.5: Equal distance sound image localization and amplitude panning phonic loudspeaker system uses this characteristic. A sound image appears between two loudspeakers by controlling left and right loudspeakers level difference. In the stereophonic system, equidistant sound image localization can be achieved using following equations Eq. (1.2a) and (1.2b) (Fig. 1.5), 2 a L = a R = 2 (cos θ + sin θ) = sin ( π 4 + θ ), (1.2a) 2 2 (cos θ sin θ) = sin ( π 4 θ ), (1.2b) where a L is the amplitude at the left loudspeaker and a R is the amplitude at the right loudspeaker. Total energy should be constant because the human auditory sense uses correlation with energy not amplitude [7, 42]. Following two equations Eq. (1.3a) and (1.3b) keep the amplitude constant, and then the sound image between the loudspeakers appears little further as shown in Fig a L = θ = 1 θ max θ 2 45 a R = 1 θ max = 1 θ max 2 1 θ 2 45 (1.3a) (1.3b)

25 1.2. HOW HUMANS PERCEIVE A SOUND IMAGE a L a R Figure 1.6: Bi-linear amplitude panning and sound image perception θ max and θ are defined as in Fig Front rear and above below perception Fig. 1.8 shows the level difference of two left side loudspeakers and perceptive direction. Two loudspeakers are mounted at 60 and 120 azimuth from the front. A sound image appears at the stronger side when the level difference becomes 6 db or more. Within 6 db a sound image moves fast. For human to perceive a sound image arbitrary position in side area only using amplitude panning is difficult [5]. Moreover, areas including this arrangement are called cone of confusion, and many perception errors occur in the area (Fig. 1.9). In the cone of confusion area, ILD and ITD becomes the same, and misjudgment occurs often. An amplitude spectrum of HRTF called spectral cue is important to judge a sound image in front or rear area. In the cone of confusion, human have less clues than in other areas, and front rear confusion often occurs. In the median plane, sound localization can be improved by the combination of the HRTF of the median plane and controlling ITD [43]. In side area, panning control with HRTF improves sound image localization [44]. 7

26 CHAPTER 1. INTRODUCTION Figure 1.7: Traditional stereo loudspeaker system Figure 1.8: Level difference of side two loudspeakers and sound image perception 8

27 1.2. HOW HUMANS PERCEIVE A SOUND IMAGE p Figure 1.9: Cone of confusion. (a) shows the curve that r 1 - r 2 is constant. r 1 is a distance between right ear and a point p, and r 2 is a distance between left ear and the point p. (b) shows the 3D model of the curve. If a sound image appears on the surface of the cone, the ratio of misjudgment increases. Spectral cues still have undiscovered characteristics. The lowest and next notch frequencies in more than 4 khz area and the peak around 4 khz is important for a sound localization in the median plane [12] Influence of reverberation and head movement A change of ILD within 2 m near field influences a distance perception [45]. Controlling early reflection and late reverberation controls a distance perception [46]. Fig shows the delay of reverberation becomes larger, a distance perception becomes further [47]. A head movement changes ear input signals. In the cone of confusion, when the head moves, the position of a sound image is not in the cone of confusion area. Hence, the sound image localization error decreases by a head motion [48 54] Equal loudness Sound is a quite small air pressure level vibration depending on the atmospheric pressure. The minimum level of the air pressure level vibration that humans can perceive is Pa in 20 C of air temperature and 1,013 hpa (1 atm) of air pressure. Sound pressure level (db SPL) is defined using a standard level of Pa. The maximum level humans can perceive without pain or damage is about 120 db SPL. 9

28 CHAPTER 1. INTRODUCTION : : Figure 1.10: The delay of reflection and distance perception. The delay becomes larger, the distance perception becomes further. The audible frequency range is approximately from 20 to Hz. The perception of sound pressure level depends on the sound frequency. Hearing threshold has a frequency dependency [6, 8]. Moreover, loudness level (phon) has also a frequency and sound pressure level dependency. Phon is defined for measuring loudness of sound in psychological scale. One khz sine signal is used to measure phon level [55, 56]. Phon is the intensity of the 1 khz pure tone as the same hearing level of a sound (See Fig. 1.11). Zero phon is the limit of perception. Fig shows the amount of db SPL change and the amount of phon change are different. When the overall level of the sound changes, the relative perceived level of each frequency change. The equal-loudness curves are used to design sound level meter. Sound level meter has frequency correction circuits. Sound level meter calculates weighted loudness based on the equal-loudness curves. When a sound is quiet, low frequency sounds hardly affect the overall loudness. Fig shows the A-characteristic curve. A- characteristic curve is designed by 40 phon equal-loudness curve [55]. 1.3 Measurement of impulse response Impulse response (IR) is one of the most basic and important information about signal processing. IR is used for obtaining RTF, HRTF, and so on. Various methods 10

29 1.3. MEASUREMENT OF IMPULSE RESPONSE Figure 1.11: Equal-loudness curves ISO 226:2003 Weighting (db) A-characteristic curve Frequency (Hz) Figure 1.12: A-characteristic curve is defined by 40 phon equal-loudness curve. 11

30 CHAPTER 1. INTRODUCTION are used for IR measurement such like explosion of the gunpowder, gunpowder pistol for competition, the explosion of a balloon, a pulse signal of spark discharge, a pulse signal from a loudspeaker, an M-sequence noise from loudspeaker [57 60], TSP from a loudspeaker [61, 62]. A response to an input signal emitted in a system equals to a convolved signal of an IR and the input signal. Reverberation time can be obtained from an IR. Reverberation time is an important parameter of designing a hall or a room. Causes of quality degradation are a fluctuation of a system because of time variant, nonlinear characteristics of transducers, and noise in the system [63]. A system changes by changing temperature in a measurement, and air conductor moving air and so on [64]. A physical limitation causes nonlinear characteristics. A loudspeaker cannot play a sound the perfectly same as an original [65]. A measurement method using an explosive sound is useful because a measurer can feel rough response by own ears, but S/N ratio is not good. A measurement method combined with an explosive sound and synchronous and summation method is used, but the improvement of S/N is small. M-sequence noise has characteristics of good S/N ratio but weak of time variant [66, 67]. TSP has a robustness against time variant, and combination with synchronous and summation method reduces the influence of noise (See Fig. 1.13) [68 70]. A linear TSP and a logarithmic TSP are well used, and each TSP has an advantage of S/N reduction. Then, a warped-tsp connected a linear TSP and a logarithmic TSP is proposed because the warped-tsp has robustness against background noise and a high frequency distortion [71]. TSP is defined as following equations, S (k) = { exp ( j4mπk 2 N 2 ) S (N k), N 2 ), 0 k N 2 N < k < N, (1.4a) 2 { ( exp j4mπk 2 S 1, 0 k N 2 (k) = S 1 N (N k), < k < N, (1.4b) 2 H (k) = Y (k) S 1 (k) = Y (k) S (k) = S (k) Y (k) S (k) S (k), (1.4c) where S(k) is a Fourier-transformed TSP, H(k) is a Fourier-transformed IR, Y (k) is a Fourier-transformed TSP response, S 1 is a Fourier-transformed inverse TSP (ITSP), k is a discrete frequency number, is a complex conjugate, m is a number affecting 12

31 1.3. MEASUREMENT OF IMPULSE RESPONSE Summation times Time (ms) Figure 1.13: Synchronous and summation method. First waveform includes noise between 0 ms and about 8 ms. After 64 times summation, the waveform can decrease that noise. 13

32 (s) CHAPTER 1. INTRODUCTION Figure 1.14: A spectrogram of time-stretched pulse. X-axis is frequency (khz). Y-axis is time (s). a pulse width, and N is signal length [64]. S(k) and S 1 (k) are the inverse filter. Therefore, H(k) can be rapidly calculated by cross-spectral method. Fig and 1.15 show a TSP and an ITSP. 1.4 Related research Transfer function composition method An acoustic wave propagation from a sound source to a listener is described as a transfer function, and some transfer functions are combined by a digital filter. In the transfer function composition method, all physical phenomena from a sound source to two ears are regarded as transfer functions [72 75]. There is crosstalk in a loudspeaker system, and calculation for crosstalk cancellation is required. Crosstalk is not desirable signal. In a headphone system, the left channel signal is emitted by the left side transducer, and then the signal reaches only the left ear. However, in a loudspeaker system, the left signal emitted by the left loud- 14

33 (s) 1.4. RELATED RESEARCH Figure 1.15: A spectrogram of inverse time-stretched pulse. X-axis is frequency (khz). Y-axis is time (s). speaker reaches to the both left and right ears. The signal reaching to the opposite side is the crosstalk. In Fig. 1.16, A LL and A RR are expected signals, A LR and A RL are crosstalk. y L and y R are input signals, e L and e R are ear input signals. A system that can calculate crosstalk cancellation signal is called transaural system. Transaural system is defined as following equations Eq. (1.5c) and (1.5d) (Fig. 1.16), S r = G r,r (ω)x r (ω) + G l,r (ω)x l (ω) (1.5a) Y r (ω) Y l (ω) S l = G r,l (ω)x r (ω) + G l,l (ω)x l (ω) = [H r,r (ω)g r,r (ω) + H l,r (ω)g r,l (ω)] X r (ω) + [H r,r (ω)g l,r (ω) + H l,r (ω)g l,l (ω)] X l (ω), = [H r,l (ω)g r,r (ω) + H l,l (ω)g r,l (ω)] X r (ω) + [H r,l (ω)g l,r (ω) + H l,l (ω)g l,l (ω)] X l (ω), (1.5b) (1.5c) (1.5d) where, these two pairs of filters {G r,r (ω), G r,l (ω)}, {G l,r (ω), G l,l (ω)} are called crosstalk canceler. X l (ω) is an ear input signal for left ear in an original sound field, X r (ω) is an ear input signal for right ear in an original sound field, Y l (ω) is an ear input signal 15

34 CHAPTER 1. INTRODUCTION y L y R ALR ARL ALL ARR e L e R Figure 1.16: Expected signals A LL and A RR, and crosstalk A LR and A RL in a loudspeaker system. y L and y R are input signals, e L and e R are ear input signals. for left ear in a playback sound field, Y r (ω) is an ear input signal for right ear in a playback sound field, H r,r (ω) is a signal from a right loudspeaker to an entrance of right extra ear canal, H r,l (ω) is a signal from a right loudspeaker to an entrance of left extra ear canal, H l,r (ω) is a signal from a left loudspeaker to an entrance of right extra ear canal, H l,l (ω) is a signal from a left loudspeaker to an entrance of left extra ear canal. In an anechoic chamber, a listener s head is fixed, a sound convolved with the listener s HRTF is played, and then, the listener perceived a sound image as precisely as a real sound source [72]. When a sound convolved with another person s HRTF is played, many front rear and above below confusion occur. Transaural system has a weak point that a sweet spot of listening point is small. A slightly movement of a listening position and a listener s posture affects a transfer function from a loudspeaker to entrances of ear canals. Inverse filter has a problem that the filter is sometimes unstable and degrades a quality of sound [76, 77]. There is a research that proposes a loudspeaker arrangement selection method that reduces a signal distortion and a requirement of inverse filter calculation for robustness of small head movement [78]. In headphone systems, a distance between a transducer and an entrance of an ear canal is almost zero, then crosstalk can be ignored. A listener perceives a sound image when the listener listens to a sound convolved with an HRTF because the HRTF reproduces the situation of measurement [6, 8, 74] (Fig. 1.18). A sound localization 16

35 1.4. RELATED RESEARCH the same acoustical space as the original sound field Right ear Left ear Original sound field Playback sound field Figure 1.17: Transaural system has crosstalk cancelers G r,r (ω), G r,l (ω), G l,r (ω), G l,l (ω). X l (ω) and X r (ω) are ear input signals in an original sound field. Y l (ω) and Y r (ω) are ear input signals in a playback sound field. S l and S r are output signals of loudspeakers. accuracy of headphone system with HRTF is high, however the system cannot present front side sound image localization. Moreover, the mechanism is undiscovered. In headphone system, a sound image and a listener s head move together. A listener perceives fixed sound image in headphone system with real-time head motion monitoring and real-time HRTF selection method [79]. There is a system that a listener can control a position of a sound image by a controller, and the system plays a sound convolved with HRTFs in real-time [80]. Active relationship improves the accuracy of sound localization (Fig. 1.19) Solid angle division method Vertically mounted loudspeaker array expands a sound localization area vertically [81]. A sound can be recorded by directional microphones as many as the loudspeakers. The microphones are equidistantly mounted and the directions of the microphones are the same as the loudspeakers. One channel, one microphone, one loudspeaker is the basic concept of recording. Increasing the number of channels, improving the resolution of direction perception and the accuracy of sound localization. In multichannel audio, the capacity of the playback sound field is decided by the arrangement of the channels. Therefore, the arrangement of a recording side and playback side should be the same. Some multichannel audio format is defined as an international 17

36 CHAPTER 1. INTRODUCTION Figure 1.18: A listener can perceive a virtual sound source spatially in binaural system. The system convolves a sound source and a pair of HRTF. [PC] WindowsXP Pentium4 2.4GHz HRTF Player Sound File HRTF selector HRTF Filter DirectX HRTF DATA Port Audio Direct Sound Joystick IN Audio OUT Control Listener Figure 1.19: A block diagram of HRTF player. A listener controls a sound image position using a game controller. The system convolves HRTF and short time signals in real-time. 18

37 1.4. RELATED RESEARCH Figure 1.20: 22.2 multichannel audio format. Table 1.1 shows the channel mapping. standard [82 85]. The important thing of multichannel audio format is reproducing spatial impressions to human. Reproducing accuracy is less important. The beginning of multichannel audio format aims to enrich the playback sound field in a horizontal plane. Recently, the multichannel audio format has aimed to make spatial sound localization multichannel audio format is adopted by super hi-vision video format [83 85]. Fig shows the positions of loudspeakers, and Table 1.1 shows the channel mapping Direct synthesis and compose sound field method This method is based on a large scale loudspeaker array. This method aims to enlarge a sweet spot not only entrances of ear canals but also within a boundary. Loudspeakers are mounted on the boundary based on a Kirchhoff-Helmholtz integral formula [86]. A playback signal depends on the boundary, and the system does not require individual optimization. This low dependency is the advantage of this system. However, an unrealistic number of loudspeakers are required to accurately compose in high frequency. Wave field synthesis (WFS) involves and approximates the Kirchhoff-Helmholtz integral formula to an infinite plane, and mounted loudspeakers on the manipulation point. WFS synthesizes a wave front within the manipulation points [86 91]. A different point from psychoacoustic is to faithfully synthesize a physical wave front. WFS 19

38 CHAPTER 1. INTRODUCTION Table 1.1: Channel mapping of 22.2 multichannel audio format Channel No. Label Name 1 FL Front left 2 FR Front right 3 FC Front center 4 LFE1 Low frequency effect 1 5 BL Back left 6 BR Back right 7 FLc Front left center 8 FRc Front right center 9 BC Back center 10 LFE2 Low frequency effect 2 11 SiL Side left 12 SiR Side right 13 TpFL Top front left 14 TpFR Top front right 15 TpFC Top front center 16 TpC Top center 17 TpBL Top back left 18 TpBR Top back right 19 TpSiL Top side left 20 TpSiR Top side right 21 TpBC Top back center 22 BtFC Bottom front center 23 BtFL Bottom front left 24 BtFR Bottom front right 20

39 1.4. RELATED RESEARCH Figure 1.21: Wave field synthesis (WFS). WFS with inverse filters can avoid the influence of mounting a monitor within the boundary. with inverse filters can avoid the influence of mounting a monitor within the boundary (Fig. 1.21). Boundary surface control (BoSC) spatially discretizes the Kirchhoff-Helmholtz integral formula and synthesizes a sound field of a closed curved surface (Fig. 1.22). A BoSC system that has 70 loudspeakers provide a quite high quality spatial sound. The played sound field is recorded by 80 microphones [86, 92]. WFS and BoSC has great advantages that these systems need not think about HRTF individualization and a listener s motion, and these systems have a capacity of many people listening simultaneously. However, for raising the limitation of playback fre- 21

40 CHAPTER 1. INTRODUCTION Original sound field Playback sound field Recording Playback Inverse system calculation Transfer function measurement Figure 1.22: Boundary surface control (BoSC). S is a closed surface, q i is a control point, p(q i )is a sound pressure at q i, V is a region surrounded by S, p(s) is a sound field, h ij is a transfer function matrix of inverse system, S is a congruent closed surface of S, q i is a control point corresponding to q i, p(q i) is a sound pressure at q i, V is a congruent region of V, p(s ) is a playback sound field, and q ij is a transfer function matrix. quency on the boundary, microphones and loudspeakers must be mounted within onesecond wave length. Therefore, WFS and BoSC cannot provide a sound having 20 khz frequency at this stage. There is research on recording a sound field and playback the sound field by spherical harmonic analysis [93 95]. A method analyzing and reproducing a point of a sound field and incline of sound pressure level using four spherical harmonic functions of the zeroth order (W ) and first order (XY Z) is called B-Format. Using second order or higher spherical harmonic functions is called higher-order ambisonics (HOA) (Fig. 1.23) [96]. HOA analyzes and playbacks a flat surface wave gathering a point, not a boundary. To analyze a sound field using only spherical harmonic function, spatially uniformly sampled data is required. However, ideal analysis up to third order spherical harmonics because the limitation of regular polyhedron is an icosahedron. Fourth or more order spherical harmonics must be heterogeneous array. Moreover, the manipulated thing is a point not a boundary, then there is a problem that playback area is narrow. 1.5 Overview Microphones and loudspeakers have been increasing recently to make a 3D sound and to improve a sound field. However, a multichannel loudspeaker system popularized in typical household is 5.1 channel system, and 6 or more channel system is 22

41 1.5. OVERVIEW Figure 1.23: Zeroth, first and second order spherical harmonic function. m is the order of spherical harmonic function. n is the degree of m-th spherical harmonic function. difficult to be popularized because of cost and space. Therefore, if 5.1 channel loudspeaker system that is one of the most basic multichannel loudspeaker system has a capacity of 3D sound reproduction, many people can experience a 3D sound and the system affects many other systems. Chapter 2 shows a development of a signal panning algorithm for horizontally arranged five channel loudspeakers and the results. First, the relations between the azimuth responses and the level changes of L and LS loudspeakers were studied. The azimuth responses and the level changes were nonlinear and asymmetrical. Level correction curves were made from the azimuth response obtained from the previous experiment. The results showed HRTF do not affect the azimuth perception. An hypothesis is obtained that panning algorithm affecting azimuth perception is much stronger than that of HRTF. Therefore, a panning algorithm with HRTF for five loudspeakers was proposed. The panning algorithm introduced a concept of control points to decide each loudspeaker s amplitude. When a target direction of a sound image is not at the control point, an interpolation method decides amplitudes using nearest three or four control points. The panning algorithm can keep the total energy constant. Constant energy is important for constant distance perception. A sound source and a HRTF are convolved, and panned into five loudspeakers. The combination effect of the panning algorithm and HRTF achieves elevation localization in horizontally arranged five loudspeakers. 23

42 CHAPTER 1. INTRODUCTION A comparison experiment was conducted in an anechoic chamber to evaluate the effect with or without HRTF. Twenty five directions of sound images were presented in a random permutation to avoid including hysteresis, and listeners answered a perceived direction of a sound image. The results showed the listeners could perceive higher elevation localization by the panning algorithm with HRTF than by the panning algorithm without HRTF. Another comparison experiment was conducted in the anechoic chamber and in an ordinary echoic room to evaluate the influence of room reverberation. The results showed the panning algorithm with HRTF method had no significant difference between the two rooms. Chapter 3 explains a new reverberation reconstruction method. Almost all the sound we hear contains reverberation. Reverberation gives a lot of cues for sound perception such as early reflections, reverberation time, and so on. To enhance listener envelopment, many sound sources were added reverberation. The method for adding reverberation to the sound source is convolving an impulse response (IR) and the sound source. Convolving an IR and the sound source is not enough because reverberation contains a lot of reflections, in other words, each reflection has a room transfer function and a HRTF. IR does not contain directional information. To obtain the directional information of each reflections, a closely located four omni-directional microphone array that forms three dimensional rectangular coordinates was used. Four impulse responses were analyzed by the sound intensity method and transformed into image sound sources (ISS). A new reverberation was reconstructed using all ISSs. Every delay, attenuation, direction and head-related effect of ISS was calculated and summed. Therefore, a new reverberation could contain directional information. The new reverberation reconstruction method and a conventional reverberation method were compared in a headphone-based system and the loudspeaker-based 3D sound system. From the results, the new reverberation reconstruction method improved the sound impression such as spatiality, clarity, and naturalness in the both system. Chapter 4 explains a real-time 3D sound system. The relation between an audio content and a listener was a passive relation. A listener starts a content, and then the only thing the listener does is watching or listening to the contents. A positive relationship to a content is very important cue to provide a sense of immersion into a virtual reality. A real-time system named RISSICS (Real-time and Intuitive Spatial Sound Image Creation System) was developed. Nintendo Wii Remote controller that 24

43 1.5. OVERVIEW has three-axis motion sensor was adopted as a direction input device. A user can intuitively indicate the direction of a sound image. RISSICS immediately calculates all signal processing, and distributes the calculated signal to five loudspeaker. The refresh rate of RISSICS is shorter than human s time resolution. A user can perceive a smoothly moving sound image. An experiment that evaluates how easy a user indicates a sound image position was conducted. Ten motions were requested to be evaluated, and the results show Wii Remote can be used as the input device. 25

44

45 Chapter 2 A 3D Sound Generation System with Horizontally Arranged Five-Channel Loudspeakers 2.1 Introduction There are some spatial sound reproduction technologies such as binaural recording and reproduction [97], transaural reproduction [98], wave field synthesis (WFS) [86], 22.2 multichannel audio format [83, 84], and so on. However, there is a problem in headphone system that is a listener cannot perceive a sound image front. Transaural method requires measurement of an environment and calculation before play a sound. WFS needs many microphones and loudspeakers and a large scale pre-calculation. Mounting many loudspeakers at higher than an ear level is difficult for a typical household. A spatial sound reproduction system using horizontally arranged five channel loudspeaker system is proposed. Five channel loudspeaker systems have been popularized. The advantage achieving spatial sound expression in five channel loudspeaker systems is large. First, the azimuth responses in the side area of five loudspeaker system were studied. The sound image localization in side area is nonlinear to level changes as shown in Fig Kim et al. improved the azimuth localization [99]. They determined a new gain ratio between left and left surround channel. Their results showed the new gain ratio can provide precise with robust azimuth localization. I hypothesized that a combination of a gain correction and HRTF improve azimuth localization because HRTF contains much information of sound localization. An experiment was 27

46 CHAPTER 2. A 3D SOUND GENERATION SYSTEM WITH FIVE HORIZONTALLY ARRANGED LOUDSPEAKERS conducted to obtain the azimuth responses and the level changes. From the results, level correction curves were made. Another experiment was conducted to evaluate the combination of the level correction curves and HRTF [44]. From the results, HRTF do not affect the azimuth perception. An hypothesis is obtained that panning algorithm affecting azimuth perception is stronger than that of HRTF. Second, a new system for five loudspeaker system that combines a new panning algorithm and HRTF were studied. This system convolves a HRTF with a sound source, and pans five loudspeaker amplitudes using a concept of control points. Combinations of five loudspeakers can make a vector indicating a inside position of a loudspeaker circle. I hypothesized that adding HRTF to the situation provides spatial sound localization. An experiment was conducted at an anechoic chamber where 25 directions of sound images were presented in a random permutation, and listeners indicated the perceived direction. The treatments were the new panning algorithm with HRTF and without HRTF. The results showed the new panning algorithm with HRTF can provide higher elevation localization. The combination effect of the panning algorithm and HRTF is important. Another experiment was conducted at the anechoic chamber and an ordinary echoic room to evaluate the influence of room reverberation. The results showed the panning algorithm with HRTF method had no significant difference between the two rooms. The proposed system can be used in both environment [100,101] A study of panning method for side area Hysteresis in sequential sound presentation in the side area Participants The number of participants was 3 males from 21 to 23 years old (M=21.6, SD=1.2). They all were the students of the University of Aizu. They are Japanese. Their auditory sense were normal. Materials and Procedure A traditional amplitude panning method was investigated in the range of 30 (L: left)to 120 (LS: left surround)of a 5 loudspeaker system. The total level ratio of loudspeaker L and LS was equal to 1 (a normalized value) to keep the energy unchanged. Experiments were conducted in an anechoic chamber. Devices used in the experiments 28

47 2.1. INTRODUCTION Perceived azimuth ( ) 120.) g e 110 (d 100 th u im z a d e iv e rc e P Level of L loudspeaker (dynamic range) Relative level (db) 0 Participant and motion A front->back A back->front B front->back B back->front C front->back C back->front Linear Figure 2.1: Sequential sound level presentation and azimuth perception. A, B, and C means the participants. Front >back means positions of a sound image was shifted from left to left surround. Back >front means from left surround to left. were as the follows, Sony F500 Super Legato Linear amplifier, two Yamaha NS-10MM loudspeakers. The experimental setup was to change the level ratios between the two loudspeakers, and ask the listeners about the positions of the sound images. A monaural stimulus signal of water-dripping sound at 44.1 khz of sampling rate and 1,019 ms of time duration was used. In the experiment, the subjects were asked to adjust the level ratios by themselves, so that the sound images moved from the direction of L to LS loudspeaker by a 5 step. Results The results are shown in the Fig From the results of the experiment, it can be seen some hysteresis when the azimuth was increasing and decreasing. The hysteresis property is important for precise sound image creation. By measuring and incorporating the hysteresis curves, a spatial sound system can correct the disparities by the hysteresis and create more precise sound images Random sound presentation in the side area Participants The number of participants was 5 males from 21 to 23 years old (M=21.4, SD=0.9). They all were the students of the University of Aizu. They are Japanese. Their auditory 29

48 sense were normal. CHAPTER 2. A 3D SOUND GENERATION SYSTEM WITH FIVE HORIZONTALLY ARRANGED LOUDSPEAKERS Materials and Procedure The amplitude panning in the side area without the hysteresis was investigated. The target direction was chosen randomly to eliminate the hysteresis and the level ratios were changed by the examiner without noticing to the subjects. The participants were asked to answer the directions of sound images they perceived. Results The results of this experiment are shown in Fig Although the amplitude panning in the side area was highly individual-dependent. A constant tendency of recognition was obtained. When the intensity ratios were changed from L:LS=1.0:0.0 to 0.0:1.0 by a 0.1 step, the sound images moved from the direction of L to LS. However, the motion was not linear corresponding to the level changes. We can see that the sound image moves fast in the middle area between the two loudspeakers (50 95 ) than in the areas near the two loudspeakers (30 50 and ). In the middle area, the sound image was changed from 50 to 95 (about half of the range) when the level ratios only changed from L:LS=0.5:0.5 to 0.3:0.7 (1/5 of the whole range). From the results, the amplitude panning in side areas is not linear and symmetrical. The motion from L to the center position requires more level ratio change than that from LS to the center. The motion is almost obtained on the middle range of level ratios. This phenomenon may be caused by the asymmetrical arrangement of L and LS loudspeakers. Level correction curves were made from the results. Fig. 2.3 shows the level correction curves HRTF with amplitude panning for side areas Participants The participants were the same as Materials and Procedure Devices and environments were the same as Sound presentation levels were calculated by the curves as shown in Fig Loudspeakers were located at left 30

49 2.1. INTRODUCTION Perceived azimuth ( ) 120.) g110 e Participant (d 100 A th u B im z a 70 C d e 60 iv D e 50 rc E e 40 P 30 Ideal Level of L loudspeaker (dynamic range) Relative level (db) ( ) Azimuth if there is linear relationship ( ) Figure 2.2: Random sound level presentation and azimuth perception. A, B, and E means the participants. After: Relative level (db) After: Level of L loudspeaker (dynamic range) 1 r) fte 0.9 (a 0.8 Participant r e k0.7 A a e0.6 p B s d0.5 u C lo 0.4 L f 0.3 D o l 0.2 E e v e0.1 L Linear Before: Level of L loudspeaker (dynamic range) Before: Relative level (db) Figure 2.3: Sound presentation level correction curves. A, B, and E means the participants. 31

50 50 CHAPTER 2. A 3D SOUND GENERATION SYSTEM WITH FIVE HORIZONTALLY ARRANGED LOUDSPEAKERS Treatment Conventional Corrected Error azimuth ( ) 40 ) ( 30 th u 20 a zim r ro 10 E Target azimuth ( ) Figure 2.4: Averaged error of azimuth responses. Error bars correspond to 95% confidence interval. Shorter bar is better. side of the participant, so left channel HRTF was convolved with the sound source. Right channel HRTF was not used. Results and Discussion A mixed-effects of 2-way ANOVA was run on the collected data with treatment (conventional panning method, new panning method) and with target azimuth (30, 35,, 120) as between factors, and with the error azimuth between the perceived azimuth and the target azimuth as dependent variables. This analysis showed significant effects of azimuth, F (18, 72)=6.25, p <.001, η 2 =.60, first-order interaction between treatment and target azimuth, F (18, 72)=1.97, p=.023, η 2 =.13. Treatment had no significant effect, F (1, 4)=2.31, p >.05, η 2 =.12. Fig. 2.4 shows the averaged error azimuth between the perceived azimuth and the target azimuth, and error bars correspond to 95% confidence interval. A post-hoc analysis of Tukey s HSD was run on the first-order interaction between treatment and target azimuth. There was no significant difference between treatments. This experiment did not show HRTF affects azimuth perception. The results rejected the hypothesis. 32

51 2.2. BASIS OF 3D SOUND SYSTEM 2.2 Basis of 3D sound system The combination of level correction curve and HRTF did not show a significant effect to conventional panning. It is assumed that HRTF must be used pairwise, and HRTF does not largely affect azimuth perception. In headphone system, HRTF provides elevation localization, so I hypothesized that five channel panning with pairwise HRTF provides elevation localization in horizontally arranged loudspeaker system HRTF convolution The loudspeaker arrangement of the system refers to ITU-R BS [82]. The azimuth of the center (C) loudspeaker was defined as 0, and the others were defined in a counterclockwise fashion, i.e., left (L) is 30, left surround (LS) is 110, right surround (RS) is 250 and right (R) is 330. The system does not use LFE (low frequency effects channel). Fig. 2.5 shows loudspeaker groups. L, LS belong to left group, and R, RS belong to right group, where C belongs to the both groups. The left group receives a signal convolved with a left ear HRTF, and the right group receives a signal convolved with a right ear HRTF. The HRTF database used was measured using a head and torso simulator (HATS) in an anechoic chamber, and the distance between the HATS and a sound source was 1 m. The database has 10,565 HRTFs in the resolution of 2.5 azimuth and 5 elevation [80] Amplitude panning based on the control points Control points are defined as shown in Fig A direction is written as follows, (azimuth, elevation) = (θ, ϕ). On the horizontal plane, five control points are defined at the same position of the loudspeakers. At 20 of elevation, four control points are defined between L and R, L and LS, LS and RS, and RS and R, i.e., (θ, ϕ) = (0, 20 ), (70, 20 ), (180, 20 ), (290, 20 ). These four control points are named control point front (CP F ), control point left (CP L ), control point back (CP B ) and control point right (CP R ). Then, the last control point is defined at just above a listener (zenith) such as (θ, ϕ) = (, 90 ), and named control point top (CP T ). Each control has an amplitude panning ratio as shown in Table 2.1. The reason why it was decided to have four control points at 20 of elevation is the human worst differentiable resolution of direction is 20 [102]. When a direction of a sound image is not a direction of a control point, 33

52 CHAPTER 2. A 3D SOUND GENERATION SYSTEM WITH FIVE HORIZONTALLY ARRANGED LOUDSPEAKERS Left group convolved with left L Sound image at L C LS Listener R RS Right group convolved with right R Figure 2.5: Loudspeakers and their groups. L and LS belong to left group. The left group received the signal convolved with HRT F L (θ, ϕ). R and RS belong to right group. The right group received the signal convolved with HRT F R (θ, ϕ). C received 1/2 of left group signal and 1/2 of right group signal. amplitudes are calculated by a linear combination of three or four control points. Four control points are selected when an elevation of a sound image is more than 0 and less than 20. When the elevation of a sound image is more than 20, three control points are selected. For example, to localize a sound image at (θ, ϕ) = (90, 10 ), the chosen four control points are CP L, CP B, L, and LS. To localize a sound image at (θ, ϕ) = (270, 30 ), the chosen three control points are CP T, CP B, and CP R. An amplitude ratio E (θ, ϕ) is decided by following Eq. (2.1). Where, θ RB, θ LB, θ RT, θ LT are the right bottom, left bottom, right top, left top control points of the sound image position the system providing; ϕ B is the elevation of the control point on the lower elevation plane; and ϕ T is the elevation of the control point on the higher elevation plane. k is the coefficients of the use ratio that decide how much the control points are used. k RB is right bottom, k LB is left bottom, k RT is right top and k LT is left top control point s use ratio. These coefficients are calculated by the angle ratio between the control points and the target direction. E (θ, ϕ) = k RB E (θ RB, ϕ B ) + k LB E (θ LB, ϕ B ) + k RT E (θ RT, ϕ T ) + k LT E (θ LT, ϕ T ) (2.1) When four control points are chosen, each coefficient is defined as following equations, 34

53 2.2. BASIS OF 3D SOUND SYSTEM L (30, 0) top view C CP F (0, 0) (0, 20) (330, 0) R (70, 20) (290, 20) CP L (-, 90) CP R LS (110, 0) CP T (180, 20) (250, 0) RS CP B (Azimuth, Elevation) Control point Figure 2.6: The positions of the control points. Table 2.1 shows the names, positions and energy ratios of control points. Table 2.1: The positions of the control points and their energy ratios. Energy ratios are expressed in dynamic range and db. Total energy is one. Control point Azimuth Elevation Energy ratios in dynamic range and db name θ ϕ C L LS RS R Σ C (0) 0 (- ) 0 (- ) 0 (- ) 0 (- ) 1 L (- ) 1 (0) 0 (- ) 0 (- ) 0 (- ) 1 LS (- ) 0 (- ) 1 (0) 0 (- ) 0 (- ) 1 RS (- ) 0 (- ) 0 (- ) 1 (0) 0 (- ) 1 R (- ) 0 (- ) 0 (- ) 0 (- ) 1 (0) 1 CP F (- ) 1/2 (-3.0) 0 (- ) 0 (- ) 1/2 (-3.0) 1 CP L (- ) 1/2 (-3.0) 1/2 (-3.0) 0 (- ) 0 (- ) 1 CP B (- ) 0 (- ) 1/2 (-3.0) 1/2 (-3.0) 0 (- ) 1 CP R (- ) 0 (- ) 0 (- ) 1/2 (-3.0) 1/2 (-3.0) 1 CP T /6 (-7.8) 1/6 (-7.8) 1/4 (-6.0) 1/4 (-6.0) 1/6 (-7.8) 1 35

54 CHAPTER 2. A 3D SOUND GENERATION SYSTEM WITH FIVE HORIZONTALLY ARRANGED LOUDSPEAKERS k RB = k LB = k RT = θ LB θ ϕ T ϕ, θ LB θ RB ϕ T ϕ B θ θ RB θ LB θ RB ϕ T ϕ ϕ T ϕ B, θ LT θ θ LT θ RT ϕ ϕ B ϕ T ϕ B, (2.2a) (2.2b) (2.2c) k LT = θ θ RT θ LT θ RT ϕ ϕ B ϕ T ϕ B. (2.2d) In the case of three control points are chosen, E (θ, ϕ) defined in Eq. (2.1) is replaced with Eq. (2.3). Moreover, the number of upper side control points becomes one, then Eq. (2.2c) is not used, and Eq. (2.2d) is replaced with Eq. (2.4). E (θ, ϕ) = k RB E (θ RB, ϕ B ) + k LB E (θ LB, ϕ B ) + k LT E (, ϕ T ) (2.3) k LT = ϕ ϕ B ϕ T ϕ B (2.4) In an ordinary echoic room, human auditory sensitivity has correlation with amplitude not energy, therefore Eq. (2.5) is used to obtain the amplitude coefficients [103]. E = E (2.5) Eq. (2.2a) (2.2d) and Eq. (2.4) are linear interpolation (LERP). In case of spherical linear interpolation (SLERP), Eq. (2.2a) (2.2d) are replaced as follows, { ( ) ( )} 2 θlb θ π ϕt ϕ π k RB = sin sin (2.6a) θ LB θ RB 2 ϕ T ϕ B 2 k LB = k RT = k LT = { ( θ θrb sin { sin { sin Eq. (2.4) is replaced as follows, 36 π θ LB θ RB 2 ( θlt θ π θ LT θ RT 2 ( θ θrt π θ LT θ RT 2 k LT = ) ( ϕt ϕ π sin ϕ T ϕ B 2 ) sin ) sin )} 2 (2.6b) ( )} 2 ϕ ϕb π ϕ T ϕ B 2 (2.6c) ( )} 2 ϕ ϕb π ϕ T ϕ B 2 (2.6d) { ( )} 2 ϕ ϕb π sin (2.7) ϕ T ϕ B 2 Fig. 2.7 shows a block diagram of signal and calculation flow. First, the system

55 2.2. BASIS OF 3D SOUND SYSTEM Sound Source Direction Input L Fourier Transform R Fourier Transform HRTF Database L Inverse Fourier Transform R Inverse Fourier Transform Amplitude Calculation Room Input + Gain Factor L LS C RS R L LS C Signal Output RS R Figure 2.7: Block diagram of this system. s L and s R are input signals. (θ, ϕ) is direction input by a listener. The system selects a pairwise HRTF H L (θ, ϕ) and H R (θ, ϕ) from the database, and convolves the HRTFs and the input signals in Frequency domain. Energy distribution ratio E(θ, ϕ) is calculated, and square rooted according to a room input. Then the calculated signals are sent into each loudspeaker. 37

56 CHAPTER 2. A 3D SOUND GENERATION SYSTEM WITH FIVE HORIZONTALLY ARRANGED LOUDSPEAKERS receives a direction θ and ϕ as a direction of a sound image the system presenting. A pairwise HRTF H L (θ, ϕ), H R (θ, ϕ) is extracted from the HRTF database. The system Fourier transforms a stereo sound source S L, S R into a frequency domain. The Fouriertransformed signal and HRTFs are convolved in the frequency domain. The convolved signal is inverse Fourier transformed into the time domain, and multiplied by amplitude coefficients E L, E LS, E C, E RS, E R. Then, an amplifier outputs the signal. Where, C channel is composed of one-second L signal and one-second R signal. 2.3 Subjective experiment Experiment 1: Comparison between 3D sound system and amplitude panning only system Participants The number of participants was 17 males and 3 females from 20 to 28 years old (M =22.6, SD=2.3). They all were the students of the University of Aizu. They are Japanese. Their auditory sense were normal. Two treatments, 3D sound and amplitude panning only (AP only) were evaluated, but AP only evaluated over one year after the 3D sound evaluated. Two participants evaluated both treatments, but they were counted as another participant because of the span. 3D sound were evaluated 9 males and a female from 20 to 28 years old students (M=23.3, SD=2.9). AP only were evaluated 8 males and 2 females from 21 to 24 years old students (M=21.9, SD=1.2). Materials and Procedure Experiments were conducted to evaluate the capacity of 3D sound reproduction. The experiments were conducted in an anechoic chamber. Compared method were 3D sound system (amplitude panning with HRTF) and amplitude panning only system. Each method is named 3D sound and AP only. Used interpolation method was LERP. The height of the loudspeakers (Bose 101MM) was 120 cm from the floor, and the ear level of the listener was the same as the loudspeaker. The used amplifiers were Bose 4702-III. The arrangement refers to ITU-R BS [82]. The size of the anechoic chamber is 5 m of length, 5 m of width, 5 m of height, and the reverberation time T 60 is 70 ms. The background noise level was measured by a sound level meter 38

57 2.3. SUBJECTIVE EXPERIMENT Rion NA-20, and the A-weighted sound level was less than 20 db SPL. The distance between the listener and the loudspeaker was 150 cm. The experimenter presented 25 directions in random permutation. The directions were 0, 45, 90, 135, 180, 225, 270, 315 in azimuths, and 0, 30, 60, 90 in elevations. The presented signal is a piano performance, Les Frères, Piano Pittoresque, Shamrock, 10.7 s duration. The reason why the sound was chosen is a timbre of the piano is one of the most familiar sound. The sound was presented once, therefore the sound longer than 4 s was chosen [104]. The presented A-weighted sound pressure level was 75 db SPL peak, and 72.4 db SPL mean by a sound level meter. The sound pressure level was measured at the center of the loudspeaker circle. The listener listened to the sound once, then the listener indicated the perceived direction of the sound image. The listener was fixed by oneself while the sound was being presented and the listener closed one s eyes. After one sound was finished playing, the listener was allowed to move one s body to indicate the direction of the sound image. The listener used a Wii Remote controller (RVL-003) to indicate the direction. Wii Remote returned an elevation value in 2 resolution by its three axes motion sensor. The experimenter recorded the azimuth in 5 resolution using the 10 resolution angle scale on the floor, and the elevation in 2 resolution by a Wii Remote Results of elevation response A mixed-effects of 3-way ANOVA was run on the collected data with treatment (3D sound system, AP only system) as between factor, with target elevation (ϕ = 0, 30, 60, 90 ) and target azimuth (θ = 0, 45, 90, 135, 180, 225, 270, 315 ) as within factors, with the error angle of elevation perception as dependent variable, and the error angle of azimuth perception as covariate. Where, zenith (-, 90 ) was calculated as (0, 90 ). This analysis showed significant effects of treatment, F (1, 18)=11.52, p=.003, η 2 =.10, target azimuth, F (7, 126)=3.85, p <.001, η 2 =.05, target elevation, F (3, 54)=21.97, p=.001, η 2 =.34, first-order interaction between treatment and target azimuth, F (7, 26)=2.27, p=.033, η 2 =.03, first-order interaction between treatment and target elevation, F (3, 54)=5.62, p=.002, η 2 =.09, first-order interaction between target azimuth and target elevation, F (14, 252)=4.97, p <.001, η 2 =.10. But second-order interaction between treatment, target azimuth and target elevation did not show significant effect F (14, 252)=1.34, p >.05, η 2 =.03. Fig. 2.8 shows the averaged error degrees 39

58 CHAPTER 2. A 3D SOUND GENERATION SYSTEM WITH FIVE HORIZONTALLY ARRANGED LOUDSPEAKERS treatment 3D sound AP only Error degrees of elevation response ( ) 80 ) 70 f ( 60 o n 50 e s 40 e g re v a tio d e le 30 r ro se n 20 E o sp 10 re n=10 Target azimuth ( ), target elevation ( ) and number of data Figure 2.8: Averaged error degrees of elevation responses related to treatment, target azimuth and target elevation in experiment 1. Error bars correspond to 95% confidence interval. Shorter bar is better. elevation responses. A post-hoc analysis of Tukey s HSD was run on treatment. 3D sound and AP only showed significant effect, p <.001. Treatments, number of cases, means, standard errors, 95% confidence intervals are following; 3D sound (n=250, M =17.36, SE=1.21, 95% CI [14.98, 19.73]), AP only (n=250, M =26.92, SE=1.24, 95% CI [24.48, 29.35]). A post-hoc analysis of Tukey s HSD was run on target azimuth. Only a pair of 28 pairs shows significant effect, θ=135 and 180, p=.045. The others have no significant difference. Target azimuths, number of cases, means, standard errors and 95% confidence intervals are following; θ=0 (n=80, M=26.79, SE=2.96, 95% CI [20.90, 32.68]), θ=45 (n=60, M=20.07, SE=2.44, 95% CI [15.19, 24.95]), θ=90 (n=60, M=20.35, SE=2.05, 95% CI [16.26, 24.44]), θ=135 (n=60, M=18.63, SE=2.17, 95% CI [14.35, 23.02]), θ=180 (n=60, M=27.67, SE=2.60, 95% CI [22.46, 32.87]), θ=225 (n=60, M=21.85, SE=2.42, 95% CI [17.01, 26.69]), θ=270 (n=60, M=20.32, SE=2.39, 95% CI [15.54, 25.10]), θ=315 (n=60, M=19.82, SE=2.31, 95% CI [15.20, 24.43]). A post-hoc analysis of Tukey s HSD was run on target elevation. A pair has no significant difference, ϕ=0 and 30 (p >.05). The others show significant effects (all p <.001). Target elevations and number of cases, means, standard errors and 95% confidence intervals are following; ϕ=0 (n=160, M=14.24, SE=2.01, 95% CI 40

59 2.3. SUBJECTIVE EXPERIMENT Error degrees of elevation response ( ) se n o sp re f o e s e g re d r ro E ) 30 ( n v a tio 0 e le treatment 3D sound AP only n=40 n=30 Target azimuth ( ) and number of data Figure 2.9: Averaged error degrees of elevation responses related to treatment and target azimuth in experiment 1. Error bars correspond to 95% confidence interval. Shorter bar is better. Target azimuth 0 only shows significant difference. [10.26, 18.21]), ϕ=30 (n=160, M=18.63, SE=1.04, 95% CI [16.56, 20.69]), ϕ=60 (n=160, M=29.70, SE=2.58, 95% CI [24.61, 34.79]), ϕ=90 (n=20, M=52.90, SE=35.67, 95% CI [-21.76, ]). A post-hoc analysis of Tukey s HSD was run on first-order interaction between treatment and target azimuth. A pair shows significant effect, 3D sound:θ=0 and AP only:θ=0 (p <.001). The others of target azimuth has no significant difference such 3D sound:θ=45 and AP only:θ=45, 3D sound:θ=90 and AP only:θ=90, and so forth as shown in Fig A post-hoc analysis of Tukey s HSD was run on first-order interaction between treatment and target elevation. A pair shows significant effect, 3D sound:ϕ=60 and AP only:ϕ=60 (p <.001). The others of target elevation has no significant difference such 3D sound:ϕ=0 and AP only:ϕ=0, 3D sound:ϕ=30 and AP only:ϕ=30, and 3D sound:ϕ=90 and AP only:ϕ=90 as shown in Fig A post-hoc analysis of Tukey s HSD was run on first-order interaction between target azimuth and target elevation. Four pairs shows significant effect, θ=0 :ϕ=0 and θ=0 :ϕ=60 (p <.001), θ=0 :ϕ=0 and θ=0 :ϕ=90 (p <.001), θ=0 :ϕ=30 and θ=0 :ϕ=90 (p <.001), θ=315 :ϕ=0 and θ=315 :ϕ=60 (p <.001). The other 23 pairs have no significant difference as shown in Fig

60 CHAPTER 2. A 3D SOUND GENERATION SYSTEM WITH FIVE HORIZONTALLY ARRANGED LOUDSPEAKERS Error degrees of elevation response ( ) 100 se n 80 o sp ) 60 re f ( o n 40 e s 20 e g re v a tio 0 d e le r ro E treatment 3D sound AP only n=80 n=10 Target elevation ( ) and number of data Figure 2.10: Averaged error degrees of elevation responses related to treatment and target elevation in experiment 1. Error bars correspond to 95% confidence interval. Shorter bar is better. Target elevation 60 only shows significant difference. 95% CIs of 90 elevation are [ , ] and [20.74, ] respectively. Error degrees of elevation response ( ) ) 70 ( 60 n 50 v a tio e le 40 se n o 30 sp re 20 f o e s 10 e g re 0 d r ro -10 E n=20 Target elevation ( ), target azimuth ( ) and number of data Figure 2.11: Averaged error degrees of elevation responses related to target azimuth and target elevation in experiment 1. Error bars correspond to 95% confidence interval. Shorter bar is better. 42

61 2.3. SUBJECTIVE EXPERIMENT Discussion of elevation response The ANOVA and post-hoc analyses on treatment show the error of 3D sound is smaller than AP only. The post-hoc analysis on target elevation shows there is a tendency that the error elevation becomes larger in higher target elevation except between ϕ=0 and ϕ=30, but the post-hoc analysis on first-order interaction between treatment and target elevation shows only ϕ=60 has significant difference. From Fig. 2.10, participants perceived about 15 elevation in target elevation 0. The cues for elevation were not included, but listeners perceived elevation. It is assumed that the proposed panning algorithm contains some elevation cues. There is a tendency that the target elevation is higher, the error is bigger Results of azimuth response A mixed-effects of 3-way ANOVA was run as the same way of elevation response, but dependent variable was the error angle of azimuth perception. This analysis showed significant effects of target azimuth, F (7, 126)=8.83, p <.001, η 2 =.24, target elevation, F (3, 54)=11.99, p <.001, η 2 =.10. The others did not show significant effects and the results are following; treatment, F (1, 18)=2.73, p >.05, η 2 =.01, first-order interaction between treatment and target azimuth, F (7, 126)=1.12, p >.05, η 2 =.03, first-order interaction between treatment and target elevation, F (3, 54)=0.77, p >.05, η 2 =.01, first-order interaction between target azimuth and target elevation, F (14, 252)=0.74, p >.05, η 2 =.01, and second-order interaction between treatment, target azimuth and target elevation, F (14, 252)=0.92, p >.05, η 2 =.01. Fig shows the averaged error degrees of azimuth responses. A post-hoc analysis of Tukey s HSD was run on target azimuth. All combination of azimuth 180 have significant effects, θ=0 :θ=180, θ=45 :θ=180, θ=90 :θ=180, θ=135 :θ=180, θ=225 :θ=180, θ=270 :θ=180, θ=315 :θ=180 (these seven combinations have p <.001). And four pairs have significant effects; θ=0 :θ=135 (p=.006), θ=0 :θ=225 (p=.022), θ=135 :θ=315 (p=.014), θ=225 :θ=315 (p=.045). Target azimuth and number of cases, means, standard errors, 95% confidence intervals are following; θ=0 (n=80, M=18.16, SE=5.82, 95% CI [6.58, 29.74]), θ=45 (n=60, M=24.20, SE=3.79, 95% CI [16.61, 31.79]), θ=90 (n=60, M=26.03, SE=3.11, 95% CI [19.81, 32.26]), θ=135 (n=60, M=44.62, SE=5.28, 95% CI [34.06, 55.17]), θ=180 43

62 Error degrees of azimuth perception ( ) 160 ) 140 f ( o 120 n s e tio 100 p re e g 80 e rc d e 60 r p ro th 40 u E im 20 z a 0-20 CHAPTER 2. A 3D SOUND GENERATION SYSTEM WITH FIVE HORIZONTALLY ARRANGED LOUDSPEAKERS treatment 3D sound AP only n=10 Target azimuth ( ), target elevation ( ) and number of data Figure 2.12: Averaged error degrees of azimuth responses related to treatment, target azimuth and target elevation in experiment 1. Error bars correspond to 95% confidence interval. Shorter bar is better. (n=60, M=80.27, SE=10.41, 95% CI [59.44, ]), θ=225 (n=60, M=41.77, SE=4.23, 95% CI [33.30, 50.23]), θ=270 (n=60, M=29.57, SE=3.19, 95% CI [23.18, 35.05]), θ=315 (n=60, M=18.32, SE=3.34, 95% CI [11.63, 25.00]). A post-hoc analysis of Tukey s HSD was run on target elevation. Four combinations have significant effects, ϕ=0 :ϕ=60, ϕ=0 :ϕ=90, ϕ=30 :ϕ=60, and ϕ=30 :ϕ=90 (these four combinations have p <.001). Target elevation and number of cases, means, standard errors, 95% confidence intervals are following; ϕ=0 (n=160, M=24.34, SE=3.04, 95% CI [18.33, 30.36]), ϕ=30 (n=160, M=29.53, SE=3.33, 95% CI [22.94, 36.11]), ϕ=60 (n=160, M=48.04, SE=4.01, 95% CI [40.12, 55.96]), ϕ=90 (n=20, M=51.65, SE=17.87, 95% CI [14.25, 89.05]) Discussion of azimuth response In azimuth perception, the main effect, first-order interaction effect, second-order interaction effect of treatment do not show significant difference. Multiple comparison of main effect of target azimuth shows the rear-side azimuth perception is not good. Multiple comparison of main effect of target elevation, the target elevation is higher, the azimuth perception becomes worse. 44

63 2.3. SUBJECTIVE EXPERIMENT Experiment summary In elevation perception, main effect and first-order interaction effects of treatment show significant difference, but in azimuth perception treatment does not affect the azimuth perception. It is assumed that the effect of HRTF caused the differences of elevation perception Experiment 2: Comparison between with and without reverberation of 3D sound system Another experiment was conducted in an anechoic chamber and an ordinary echoic room to evaluate the effect of room reverberation. Reverberation is one of the spatial localization cues, so reverberation may affect the sound image localization. 3D sound system was evaluated in anechoic chamber and in an ordinary echoic room. Participants The number of participants was 9 males and a female from 20 to 28 years old (M =23.2, SD=3.0). They all were the students of the University of Aizu. They are Japanese. Their auditory sense were normal. Materials and Procedure Used interpolation method was LERP. The size of the echoic room is 8 m of length, 8 m of width, 4.7 m of height and the reverberation time T 60 is 0.38 s. The background A-weighted noise level of the echoic room was 37 db SPL. The other conditions were the same as Results of elevation response A mixed-effects of 3-way ANOVA was run on the collected data with treatment (in an ordinary echoic room, in an anechoic chamber), with target elevation (ϕ = 0, 30, 60, 90 ) and target azimuth (θ = 0, 45, 90, 135, 180, 225, 270, 315 ) as within factors, with the error angle of elevation perception as dependent variable, and the error angle of azimuth perception as covariate. Where, zenith (-, 90 ) was calculated as (0, 90 ). This analysis showed significant effects of target azimuth, F (7, 63)=2.89, 45

64 CHAPTER 2. A 3D SOUND GENERATION SYSTEM WITH FIVE HORIZONTALLY ARRANGED LOUDSPEAKERS treatment Anechoic Echoic 70 ) 60 f o ( 50 s e e s n 40 re o g p e s 30 d re r 20 ro th u E 10 im z a 0 Error degrees of elevation response ( ) n=10 Target azimuth ( ), target elevation ( ) and number of data Figure 2.13: Averaged error degrees of elevation responses related to treatment, target azimuth and target elevation in experiment 2. Error bars correspond to 95% confidence interval. Shorter bar is better. p=.011, η 2 =.04, target elevation, F (3, 27)=4.20, p=.015, η 2 =.08, first-order interaction between target azimuth and target elevation, F (14, 126)=2.60, p=.002, η 2 =.05. The other did not show significant effect; treatment, F (1, 9)=0.8, p >.05, η 2 =.00, firstorder interaction between treatment and target azimuth, F (7, 63)=0.82, p >.05, η 2 =.01, first-order interaction between treatment and target elevation, F (3, 27)=0.70, p >.05, η 2 =.00, second-order interaction between treatment, target azimuth and target elevation, F (14, 126)=1.00, p >.05, η 2 =.02. Fig shows the averaged error degrees elevation responses. A post-hoc analysis of Tukey s HSD was run on target azimuth. Only a pair of 28 pairs shows significant effect, θ=135 and 180, p <.001. Target azimuths, number of cases, means, standard errors, 95% confidence intervals are following; θ=0 (n=80, M=15.13, SE=2.38, 95% CI [10.40, 19.85]), θ=45 (n=60, M=16.35, SE=2.15, 95% CI [12.04, 20.66]), θ=90 (n=60, M=15.57, SE=2.11, 95% CI [11.35, 19.79]), θ=135 (n=60, M= 9.10, SE=1.58, 95% CI [ 5.93, 12.27]), θ=180 (n=60, M=22.80, SE=2.68, 95% CI [17.43, 28.17]), θ=225 (n=60, M=16.63, SE=2.43, 95% CI [11.76, 21.51]), θ=270 (n=60, M=17.98, SE=2.45, 95% CI [13.09, 22.87]), θ=315 (n=60, M=17.67, SE=2.13, 95% CI [13.40, 21.93]). A post-hoc analysis of Tukey s HSD was run on target elevation. Four pairs of six pairs show significant effects, ϕ=0 and 90 (p <.001), ϕ=30 and 60 (p=.049), ϕ=30 46

65 Error degrees of elevation response ( ) 2.3. SUBJECTIVE EXPERIMENT f o s e re g e d r ro E ) ( 40 e35 s n30 o25 p s20 15 re 10 n 5 tio 0 a v le e n=20 Target elevation ( ), target azimuth ( ) and number of data Figure 2.14: Averaged error degrees of elevation responses related to target azimuth and target elevation in experiment 2. Error bars correspond to 95% confidence interval. Shorter bar is better. and 90 (p <.001), ϕ=60 and 90 (p=.002). Target elevations and means, standard errors and 95% confidence intervals are following; ϕ=0 (n=160, M=14.28, SE=1.49, 95% CI [11.34, 17.21]), ϕ=30 (n=160, M=13.83, SE=1.07, 95% CI [11.72, 15.94]), ϕ=60 (n=160, M=18.89, SE=1.50, 95% CI [15.93, 21.85]), ϕ=90 (n=20, M=32.85, SE=6.53, 95% CI [19.19, 46.51]). A post-hoc analysis of Tukey s HSD was run on first-order interaction between target azimuth and target elevation. Two pairs show significant effects, θ=0 :ϕ=90 and θ=0 :ϕ=0 (p <.001), and θ=0 :ϕ=90 and θ=0 :ϕ=30 (p=.007) as shown in Fig Discussion of elevation response In elevation perception, main effect, first-order interaction effects and second-order interaction effect of treatment have no significant effect. Both treatments show similar results such smaller error in 135 azimuth, and larger in 180 azimuth. Higher target elevation occurs larger error Results of azimuth response A mixed-effects of 3-way ANOVA was run as the same way of elevation response, but dependent variable was the error angle of azimuth perception. This analysis showed significant effects of target azimuth, F (7, 63)=5.43, p <.001, η 2 =.22, target eleva- 47

66 CHAPTER 2. A 3D SOUND GENERATION SYSTEM WITH FIVE HORIZONTALLY ARRANGED LOUDSPEAKERS treatment Anechoic Echoic Error degrees of azimuth response ( ) 160 ) 140 f 120 o ( e s se 100 n 80 e g re o sp d re 60 r ro th u 40 E 20 a zim n=10 Target azimuth ( ), target elevation ( ) and number of data Figure 2.15: Averaged error degrees of azimuth responses related to treatment, target azimuth and target elevation in experiment 2. Error bars correspond to 95% confidence interval. Shorter bar is better. tion, F (3, 26)=6.26, p=.002, η 2 =.05. The others did not show significant effects and the results are following, treatment, F (1, 9)=0.19, p >.05, η 2 =.00, first-order interaction between treatment and target azimuth, F (7, 63)=0.55, p >.05, η 2 =.01, first-order interaction between treatment and target elevation, F (3, 27)=1.06, p >.05, η 2 =.01, first-order interaction between target azimuth and target elevation, F (14, 126)=1.05, p >.05, η 2 =.02, and second-order interaction between treatment, target azimuth and target elevation, F (14, 126)=0.67, p >.05, η 2 =.01. Fig shows the averaged error degrees of azimuth responses. A post-hoc analysis of Tukey s HSD was run on target azimuth. All combination of azimuth 180 have significant effects, θ=0 :θ=180, θ=45 :θ=180, θ=90 :θ=180, θ=135 :θ=180, θ=225 :θ=180, θ=270 :θ=180, θ=315 :θ=180 (these seven combinations have p <.001). Target azimuth and number of cases, means, standard errors, 95% confidence intervals are following; θ=0 (n=80, M=17.00, SE=5.71, 95% CI [5.63, 28.37]), θ=45 (n=60, M=18.83, SE=3.21, 95% CI [12.42, 25.25]), θ=90 (n=60, M=19.33, SE=2.61, 95% CI [14.12, 24.55]), θ=135 (n=60, M=26.83, SE=3.66, 95% CI [19.51, 34.16]), θ=180 (n=60, M=78.50, SE=11.41, 95% CI [55.69, ]), θ=225 (n=60, M=36.33, SE=4.48, 95% CI [27.38, 45.29]), θ=270 (n=60, M=14.42, SE=3.24, 95% CI [17.93, 30.90]), θ=315 (n=60, M=18.08, SE=3.38, 95% CI [10.32, 23.85]). A post-hoc analysis of Tukey s HSD was run on target elevation. Three of six 48

67 2.4. CONCLUSION combinations show significant effects, ϕ=0 :ϕ=60 (p=.003), ϕ=0 :ϕ=90 (p=.045), ϕ=30 :ϕ=60 (p=.009). Target elevation and number of cases, means, standard errors, 95% confidence intervals are following; ϕ=0 (n=160, M=22.78, SE=3.39, 95% CI [16.09, 29.47]), ϕ=30 (n=160, M=24.34, SE=3.17, 95% CI [18.07, 30.61]), ϕ=60 (n=160, M=39.75, SE=4.10, 95% CI [31.66, 47.84]), ϕ=90 (n=20, M=37.00, SE=16.43, 95% CI [ 2.61, 71.39]) Discussion of azimuth response In azimuth perception, main effect, first-order interaction effects and second-order interaction effect of treatment have no significant effect. Both treatments show similar results such smaller error in 135 azimuth, and larger in 180 azimuth. Higher target elevation occurs larger error Experiment summary In both elevation and azimuth perception, treatment did not show significant effect. Room reverberation does not affect the perception of the system. 2.4 Conclusion This chapter described horizontally arranged five loudspeaker can provide spatial sound using the panning algorithm with HRTF. Listeners can perceive elevation localization higher by 3D sound system than only the panning algorithm system. The 3D sound system is not affected by reverberation. The system can be used in ordinary rooms. The great advantage of the 3D sound system is low calculation cost. The system calculated only two channels convolution and energy distribution coefficient, therefore, the system can be run in real-time. The greatest advantage is the 3D sound system can enlarge the limitation of expression of horizontally arranged multi-channel systems. Conventional horizontally arranged multi-channel systems cannot express elevation localization. The 3D sound system contributes the development of multi-channel system. 49

68 CHAPTER 2. A 3D SOUND GENERATION SYSTEM WITH FIVE HORIZONTALLY ARRANGED LOUDSPEAKERS 2.5 Future Work Listeners perceived elevation without HRTF effect. Higher target elevation becomes error larger. Moreover, in rear area, the 3D sound system has large presentation errors. To reduce the errors, the number of control points, the positions of control points, the energy distribution ratios of control points, etc., studies for optimization are needed. Moreover, a study for the influence of changing LERP into SLERP is needed. 50

69 Chapter 3 A New 5-Loudspeaker 3D Sound System with a Reverberation Reconstruction Method 3.1 Introduction Two types of spatial sound synthesis methods are popular: headphone-based and loudspeaker-based systems. In headphone-based systems, spatial sound localization can be achieved by convolving sound sources and head-related transfer functions (HRTF) [5, 105]. However, the systems have problems of inside-the-head localization and front rear confusion. To solve these problems, digital audio workstation (DAW) has a function of convolution reverb effect, however, the function convolves an impulse response (IR) including room reverberation with a sound source. Information of traveling direction of sound cannot be recorded via a microphone. Reverberation consists of many directional reflections. Information of traveling direction is important. A closely located four microphone method and the sound intensity method can obtain the information of traveling direction. An IR with directional information can improve the impression of sound. Therefore, this chapter describes a new reverberation reconstruction method. Image sound sources (ISS) can be obtained from four IRs measured by the closely located four microphone method and analyzed by the sound intensity method. A new reverberation is reconstructed by all paths from all ISSs to the reconstruction point, i.e., all distances, directions, delays, attenuations, and head-related effects are calculated to reconstruct a new reverberation. A experiment was conducted in a headphone-based system to compare a conven- 51

70 CHAPTER 3. A NEW 5-LOUDSPEAKER 3D SOUND SYSTEM WITH A REVERBERATION RECONSTRUCTION METHOD Dry Source Impulse Response L HRTF L L Headphone R R R Figure 3.1: Model of conventional reverberation in headphone-based system tional reverberation and a new reverberation. In this experiment, three room IRs were measured and reconstructed, then two treatment (conventional method, new method) were compared. The results showed spatiality, clarity, naturalness, and distance were improved [ ]. Another experiment was conducted in the 3D sound system (Chapter 2) to compare a conventional reverberation and a new reverberation. In this experiment, a room IR was re-measured and reconstructed. The results showed spatiality, reverberation, clarity, and naturalness were improved, and azimuth response showed a significant difference [109]. 3.2 Headphone-based reverberation reconstruction method Conventional method A traditional method is to add the reverberations to the sources in the stage before the convolution with HRTFs as shown in Fig. 3.1 [110,111]. In this figure, S is the dry source, H L and H R are the HRTFs from the sound source to the left and right ears, I L and I R are the impulse responses (IRs) from the sound source to the positions of the left and right ears, and S L and S R are the sound signals perceived at the left and right ears, respectively, S L = I L H L S, (3.1) S R = I R H R S. (3.2) New reverberation reconstruction method In a real situation, since the reflected sounds arrive from different directions, each reflection should be convolved with different HRTFs of different directions as shown 52

71 3.2. HEADPHONE-BASED REVERBERATION RECONSTRUCTION METHOD Impulse Response HRTF Dry Source L 256 L L Headphone 256 R R R Figure 3.2: Model of new reverberation including the effects of HRTFs in Fig Thus, the following equations hold: S L = S i I Li H Li, (3.3) S R = S i I Ri H Ri, (3.4) where H Li and H Ri are the HRTFs to the left and right ears corresponding to the direction of the ith reflected sound, and I Li and I Ri are the IRs from the ith reflected sound to the positions of the left and right ears, respectively. The spatial information of a real environment can be included in a recording such as a four-channel B-format recording for loudspeaker systems [ ]. However, when we create virtual sound images with different directions and distances, we require the same number of transfer functions from the positions of sound images to the receiving points. Obviously, measuring the transfer functions for every source position and every receiving point in a real environment is not realistic. The creation of reverberation by a room simulator is also consuming because it is necessary to model the detailed shape of a room and obtain the acoustic data of the materials of the surfaces. Measurement by a closely located four microphone method and analysis by the sound intensity method provides the intensities and the 3D coordinates of ISSs. In a rectangular parallelepiped room, by the method of images, when the positions of the sound source and the reconstruction point are moved, a new reverberation can be estimated. Therefore, reverberations near the measurement point can be reconstructed [115]. 53

72 CHAPTER 3. A NEW 5-LOUDSPEAKER 3D SOUND SYSTEM WITH A REVERBERATION RECONSTRUCTION METHOD Table 3.1: Dimensions of room used for measurements Small Medium-size Parameter room room Gymnasium Length 6.5 m 11.5 m 22.0 m Width 7.9 m 11.0 m 36.0 m Height 3.0 m 3.0 m 10.5 m Distance from microphone to loudspeaker 3.0 m 3.0 m 3.0 m Direction from microphone to loudspeaker T s 1.07 s 3.92 s Four-point room impulse response measurement A four-point microphone set was used to measure the IRs from a fixed sound source to a fixed measurement point inside rooms [69,116]. Usually the IRs of each reflection cannot be precisely separated from the entire IR. The measured IRs were divided into frames of 2.93 ms, and then one reflected sound could be obtained from one time frame by sound intensity method. Four omni-directional microphones (ACO Type 7146 and amplifier Type 5006) were mounted together to form the Cartesian coordinate system shown in Fig The length from microphone O located at the origin to microphones X, Y, and Z is 50 mm. In the measurements, a dodecahedral loudspeaker (Solid Acoustics SA-355) and an audio interface (Roland UA-1000) were used. The signal we used for IR measurement was a time-stretched pulse (TSP) with a duration of 0.74 s. Measurements were conducted by a three times synchronous summation and averaging method [117, 118]. The dimensions of each room used for the measurements are listed in Table Calculation of sound intensity vectors This section describes the calculation of sound intensity vectors for each time frame of the IRs. The instantaneous intensity vector I(t) is defined as the product of the instantaneous sound pressure p and particle velocity u, 54 I (t) = pu. (3.5)

73 3.2. HEADPHONE-BASED REVERBERATION RECONSTRUCTION METHOD Z O Y X Figure 3.3: Four-point microphone set used for IR measurement The component normal to any chosen surface having unit normal vector n is defined by I n (t) = In = pun = pu n. (3.6) The fluid momentum equation for the direction of n is given as p n = ρ u n 0 t, (3.7) where u n is the particle velocity vector in the direction of n, and ρ 0 is the average density of the fluid. Then, u n (t) can be approximated by u n (t) 1 t [p 1 (τ) p 2 (τ)] dτ, (3.8) ρ 0 d where d is the distance separating the acoustic centers of the microphones, and p 1 and p 2 are two microphones signals. Then, the instantaneous intensity components are approximated by the following equation: I n (t) 1 t ρ 0 d [p 1 (t) + p 2 (t)] [p 1 (τ) p 2 (τ)] dτ. (3.9) For time-stationary signals, since the time derivatives of their second-order moments equal zero, x(t)x (t) = y(t)y (t) = 0, (x(t)y(t)) = x(t)y (t) + y(t)x (t) = 0, we 55

74 obtain CHAPTER 3. A NEW 5-LOUDSPEAKER 3D SOUND SYSTEM WITH A REVERBERATION RECONSTRUCTION METHOD where z 2 is defined as I n (t) 1 t ρ 0 d p 1 (t) p 2 (τ ) dτ = 1 ρ 0 d p 1 (t) z 2 (t), (3.10) z 2 (t) = t p 2 (τ ) dτ. (3.11) Note that τ has been replaced with τ to avoid confusion with the correlation time delay. The cross-correlation function R pz between p 1 (t) and z 2 (t) is given by 1 R pz (τ) = lim T T T 1 I n = lim T T = R pz (0) = 0 T 0 p 1 (t) z 2 (t + τ) dt, (3.12) I n (t) dt The corresponding cross-spectral density function S pz is S pz (ω) = 1 2π S pz (ω) dω. (3.13) R pz (τ) e iωτ dτ. (3.14) Rewriting e iwτ as e iω(t+τ)+iωt and dτ as d (t + τ) for fixed t, S pz (ω) = 1 [ 1 T ] lim p 1 (t) z 2 (t + τ) dt 2π T T 0 By by parts, i ω z 2 (t + τ) e iω(t+τ) d (t + τ) = dz 2 d (t + τ) e iω(t+τ) d (t + τ) = i ω Hence, we have the following equation: S pz (ω) = i [ 1 lim 2πω T T 56 = i 2πω e iω(t+τ) e iωt d (t + τ). (3.15) T 0 [ ] i ω z 2 (t + τ) e iω(t+τ) p 2 (t + τ) e iω(t+τ) d (t + τ). ] p 1 (t) p 2 (t + τ) dt e iωτ d (τ) R p1 p 2 (τ) e iωτ d (τ) (3.16)

75 3.2. HEADPHONE-BASED REVERBERATION RECONSTRUCTION METHOD = i R p1 p 2πω 2 (τ) e iωτ d (τ). (3.17) Therefore, I n (ω) = 2 ρ 0 d R {S pz (ω)} where The total intensity is given by = 2 ρ 0 ωd I {S p 1 p 2 (ω)} = 1 ρ 0 ωd I {G p 1 p 2 (ω)}, (3.18) 2S pz (ω), ω > 0 G pz (ω) = S pz (ω), ω = 0 0, ω < 0. I n = Estimation of image sound sources 0 (3.19) I n (ω) dω. (3.20) From the sound intensity vector calculated from a single time frame of the four room IRs, the position of one image sound source can be estimated [119,120]. Suppose that the sound intensity vector is (I x, I y, I z ) for the ith time frame. The azimuth θ i and elevation ϕ i of the image sound source are given by θ i = arctan 2 (I yi, I xi ), ( ϕ i = arctan 2 I zi, The distance of the image sound source d i can be obtained as (3.21a) ) Ixi 2 + I2 yi. (3.21b) d i = c t f i, (3.22) where t f is the time length of each time frame and c is the sound velocity. The coordinates of the image sound source (x i, y i, z i ) are given by x i = d i cos ϕ i cos θ i, y i = d i cos ϕ i sin θ i, z i = d i sin θ i. (3.23a) (3.23b) (3.23c) 57

76 CHAPTER 3. A NEW 5-LOUDSPEAKER 3D SOUND SYSTEM WITH A REVERBERATION RECONSTRUCTION METHOD 6 4 Room Height (m) Room Depth (m) Room Width (m) 10 Figure 3.4: Obtained image sound sources. Blue-filled circle is the sound source. Blue circles are image sound sources. Red dot is the microphone array. The geometry of the room is defined using the position of the microphone array. Fig. 3.4 shows the distributions of the image sound sources measured in small room in Table 3.1. The wired cube is the room, red-filled circle is the measurement point, blue-filled circle is the direct sound and blue circles are image sound sources. The largeness of each circle shows the intensity. The room reverberation can be transformed into a set of image sound sources. Therefore, a new reverberation can be reconstructed using the image sound sources assuming the image sound sources are in free field and no reflection space, as shown in Fig Reconstruction of spatial room impulse response Reverberation in a room varies with listener position. However, the estimated image sound sources do not change when the reconstruction position is changed slightly. Thus, the reverberation of a new position can be reconstructed from the geometrical position of the image sound sources relative to the new reconstruction position. The reverberation can be approximated at any point near the measurement point using the same image sound sources. 58

77 3.2. HEADPHONE-BASED REVERBERATION RECONSTRUCTION METHOD reflections a set of image sound sources real sound source image sound source Figure 3.5: Reverberation model. sound sources. Reflected sound can be transformed into image Image sound source Sound source Measurement point Reconstruction point Figure 3.6: Reconstruction model of using the image sound sources of the measurement point 59

78 CHAPTER 3. A NEW 5-LOUDSPEAKER 3D SOUND SYSTEM WITH A REVERBERATION RECONSTRUCTION METHOD L delay L IFT * * L delay L R R IFT * * IFT * * delay + + L R + R R IFT * * delay IFT: Inverse Fourier Transform Figure 3.7: Process flow of binaural room reverberation including the effect of HRTFs Assume that (0, 0, 0) is the actual measurement point and (x r, y r, z r ) is the reconstruction point. Then the distance d i between the reconstruction point and the ith image sound source is given by d i = (x i x r ) 2 + (y i y r ) 2 + (z i z r ) 2. (3.24) Then, the new delay time t i, azimuth θ i and elevation ϕ i of each image sound source can be calculated using t i = d i c, (3.25) θ i = arctan 2 (y i y r, x i x r ), (3.26a) ( ) ϕ i = arctan 2 z i z r, (x i x r ) 2 + (y i y r ) 2. (3.26b) Similarly to the direct sound source, the signals of image sound sources to the listener s ears will also be affected by the HRTFs depending on their incident direction. Therefore, the signal of each image sound source must be convolved with the left and right HRTFs, H Li and H Ri, of the direction (θ i, ϕ i). Since each image sound source is generated by each time frame of the room IRs, the signal from each image sound 60

79 3.2. HEADPHONE-BASED REVERBERATION RECONSTRUCTION METHOD source can be approximated by the signal of the ith time frame of room IRs I Li and I Ri. A change in the distance between each image sound source and the reconstruction point will cause a change in the sound energy of time frame i by a factor of d i/d i. Thus, the reconstructed room IRs for the left ear I L and right ear I R can be estimated by the following equations: I L (t) = I R (t) = n i=1 n i=1 d i d i I Li (t) H Li, d i d i I Ri (t) H Ri, (3.27a) (3.27b) where n is the total number of time frames. Fig. 3.7 shows the process flow used for reproduction Experiment Participants The number of participants were 9 males from 21 to 24 years old (M=22.3, SD=1.2). They all were the students of the University of Aizu. They are Japanese. Their auditory sense were normal. Materials and Procedure An experiment was conducted in an anechoic chamber to compare the treatments (conventional method, the proposed method). Three room IRs were measured as shown in Table 3.1 and used as reverberation source of reconstruction. Used TSP was 0.74 s (32,768 points) of time duration and 44.1 khz of sampling rate. Measured IRs were divided into frames of 2.93 ms (128 points). The number of obtained ISSs was 256. A canal-type earphone, etymotic research ER-4B was used as a presentation device. Two sound source were used, one was a girl s laughing voice 2.05 s of time duration, and the other was a piano performance 7.56 s. The peak amplitude was 76 db. The distance from the subject to the virtual sound source was set to 3.0 m. Windows media player was used to play the sounds. The listeners could not relate the sounds to the treatments because the two sounds were labeled A and B and the experimenter did not tell the relations. The listeners manipulated a computer mouse and listened to the two sounds in an arbitrary order. The experiment was evaluated 61

80 CHAPTER 3. A NEW 5-LOUDSPEAKER 3D SOUND SYSTEM WITH A REVERBERATION RECONSTRUCTION METHOD Table 3.2: Evaluation criteria and definitions Criteria Spatiality center of inside of surface of within out of the head the head the head reach reach Clarity very poor poor fair good excellent Naturalness highly artificial somewhat fairly natural artificial artificial natural Distance (d) d<2 m 2 m d<4 m 4 m d<6 m 6 m d<8 m 8 m d Room size bathroom small room medium room large room gymnasium in terms of the following five criteria on a one-to-five scale with one decimal point resolution as shown in Table 3.2. Spatiality indicates how reverberation affects sound localization. The listeners evaluated the sound including reverberation as being localized inside or outside the head. Clarity indicates how clear or vague the sound is. Naturalness indicates how natural or artificial the sound is. Distance indicates how far the direct sound image is. The listeners evaluated the distance to the virtual sound source. Room size indicates how large the virtual room is Results and discussion A mixed-effects of 3-way ANOVAs were run on the collected data with treatment (conventional reverberation method, new reverberation method), with reverberation source (small room, medium-size room, gymnasium), with sound source (piano sound, a girl laughing voice) as within factors, with spatiality, clarity, naturalness, distance, room size as dependent variables respectively. Spatiality The analysis of spatiality showed significant effect of treatment, F (1, 8)=46.12, p <.001, η 2 =.42. The others did not show significant effect; reverberation source, F (2, 16)=1.81, p >.05, η 2 =.04, sound source, F (1, 8)=0.84, p >.05, η 2 =.15, firstorder interaction between reverberation source and treatment, F (2, 16)=1.30, p >.05, η 2 =.02, first-order interaction between reverberation source and sound source, F (2, 16)=0.36, p >.05, η 2 =.00, first-order interaction between treatment and sound source, F (1, 8)=1.22, p >.05, η 2 =.01, second-order interaction between reverberation source, treatment and sound source, F (2, 16)=1.07, p >.05, η 2 =.03. Fig. 3.8 shows the averaged spatiality scores. 62

81 3.2. HEADPHONE-BASED REVERBERATION RECONSTRUCTION METHOD treatment Conventional New Spatiality score 5 re o4 c s e3 s n2 o p s1 e R small room medium-size gymnasium small room medium-size gymnasium piano laugh n=9 Reverberation source, sound source and number of data Figure 3.8: Averaged spatiality scores related to treatment, reverberation source and sound source in headphone system. Error bars correspond to 95% confidence interval. Longer bar is better. Gymnasium reverberation source with laugh sound, and mediumsize reverberation source with piano sound show significant effects p <.001 and p=.021 respectively. A post-hoc analysis of Tukey s HSD was run on treatment. Conventional reverberation and new reverberation shows significant effect p <.001. Treatment, number of cases, means, standard errors and 95% confidence intervals are following; conventional reverberation (n=54, M =2.09, SE=0.14, 95% CI [1.81, 2.38]), new reverberation (n=54, M=3.65, SE=0.12, 95% CI [3.40, 3.89]). The main effect of treatment shows the new method is better. Two of six combinations between treatments show the significant effects as shown in Fig Clarity The analysis of clarity showed significant effects of reverberation source F (2, 16)=12.43, p <.001, η 2 =.15, treatment, F (1, 8)=45.29, p <.001, η 2 =.34, first-order interaction between reverberation source and treatment, F (2, 16)=17.11, p <.001, η 2 =.10, first-order interaction between reverberation source and sound source, F (2, 16)=9.51, p=.002, η 2 =.14, second-order interaction between reverberation source, treatment and sound source, F (2, 16)=7.92, p=.004, η 2 =.04. The others did not show significant effects; sound source, F (1, 8)=0.41, p >.05, η 2 =.01, first-order interaction between treatment and sound source, F (1, 8)=0.21, p >.05, η 2 =.00. A post-hoc analysis of Tukey s HSD was run on reverberation source. Small room and gymnasium has p=.002 and small room and medium-size room has p=.001. Rever- 63

82 CHAPTER 3. A NEW 5-LOUDSPEAKER 3D SOUND SYSTEM WITH A REVERBERATION RECONSTRUCTION METHOD treatment Conventional New Clarity score small room medium-size gymnasium n=18 Reverberation source and number of data Figure 3.9: Averaged clarity scores related to treatment and reverberation source in headphone system. Error bars correspond to 95% confidence interval. Longer bar is better. beration source, number of cases, means, standard errors and 95% confidence intervals are following; small room (n=36, M =3.58, SE=0.16, 95% CI [3.26, 3.91]), mediumsize room (n=36, M =2.92, SE=0.18, 95% CI [2.54, 3.29]), gymnasium (n=36, M =2.94, SE=0.15, 95% CI [2.63, 3.26]). A post-hoc analysis of Tukey s HSD was run on treatment. Conventional reverberation and new reverberation show a significant effect p <.001. Treatment, number of cases, means, standard errors and 95% confidence intervals are following; conventional reverberation (n=54, M =2.63, SE=0.14, 95% CI [2.36, 2.90]), new reverberation (n=54, M=3.67, SE=0.11, 95% CI [3.45, 3.89]). A post-hoc analysis of Tukey s HSD was run on reverberation source and treatment. Medium-size room and gymnasium show significant effects between treatments both p <.001 as shown in Fig A post-hoc analysis of Tukey s HSD was run on reverberation source and sound. Small room:piano and gymnasium:laugh has p=.010, medium-size:piano and mediumsize:laugh has p=.005, medium-size:piano and small room:laugh has p=.010, small room:piano and gymnasium:piano has p=.010, and small room:piano and mediumsize:piano has p <.001 as shown in Fig A post-hoc analysis of Tukey s HSD was run on reverberation source, treatment and sound. Medium-size reverberation source with piano sound and, gymnasium reverberation source with laugh sound have significant effects between treatments p=

83 3.2. HEADPHONE-BASED REVERBERATION RECONSTRUCTION METHOD 5 Clarity score small room medium-size gym small room medium-size gym piano laugh n=18 Reverberation source, sound source and number of data Figure 3.10: Averaged clarity scores related to reverberation source and sound source in headphone system. Error bars correspond to 95% confidence interval. Longer bar is better. and p <.001 respectively as shown in Fig Clarity evaluations were affected by not only treatments but also reverberation source and sound source. The reverberation source measured by a small room did not affect clarity response, whereas the reverberation sources measured by a mediumsize room and gymnasium affect clarity response. It is assumed that clarity response is affected by reverberation duration or volume of a room size. Naturalness The analysis of naturalness showed significant effect of treatment, F (1, 8)=14.34, p=.005, η 2 =.11. The others did not show significant effect; reverberation source, F (2, 16)=1.83, p >.05, η 2 =.03, sound source, F (1, 8)=3.50, p >.05, η 2 =.05, firstorder interaction between reverberation source and treatment, F (2, 16)=0.78, p >.05, η 2 =.01, first-order interaction between reverberation source and sound source, F (2, 16)=0.93, p >.05, η 2 =.02, first-order interaction between treatment and sound source, F (1, 8)=0.01, p >.05, η 2 =.00, second-order interaction between reverberation source, treatment and sound source, F (2, 16)=0.45, p >.05, η 2 =.00. Fig shows the averaged naturalness scores. A post-hoc analysis of Tukey s HSD was run on treatment. Conventional reverberation and new reverberation shows significant effect p <.001. Treatment, number of cases, means, standard errors and 95% confidence intervals are following; conven- 65

84 CHAPTER 3. A NEW 5-LOUDSPEAKER 3D SOUND SYSTEM WITH A REVERBERATION RECONSTRUCTION METHOD treatment Conventional New Clarity score small room medium-size gymnasium small room medium-size gymnasium piano laugh n=9 Reverberation source, sound source and number of data Figure 3.11: Averaged clarity scores related to reverberation source, treatment and sound source in headphone system. Error bars correspond to 95% confidence interval. Longer bar is better. Medium-size reverberation source with piano sound, and gymnasium reverberation source with laugh sound show significant effects p=.021 and p <.001 respectively. treatment Conventional New Naturalness score small room medium-size gymnasium small room medium-size gymnasium piano laugh n=9 Reverberation source, sound source and number of data Figure 3.12: Averaged naturalness scores related to treatment, reverberation source and sound source in headphone system. Error bars correspond to 95% confidence interval. Longer bar is better. 66

85 3.2. HEADPHONE-BASED REVERBERATION RECONSTRUCTION METHOD treatment Conventional New Distance score small room medium-size gymnasium small room medium-size gymnasium piano laugh n=9 Reverberation source, sound source and number of data Figure 3.13: Averaged distance scores related to treatment, reverberation source and sound source in headphone system. Error bars correspond to 95% confidence interval. The ideal score is two. Closer to two is better. tional reverberation (n=54, M =2.80, SE=0.13, 95% CI [2.53, 3.06]), new reverberation (n=54, M=3.48, SE=0.14, 95% CI [3.20, 3.77]). Naturalness has significant effect in main effect of treatment. Reverberation source and sound source do not affect naturalness evaluation. Distance The analysis of distance showed significant effect of reverberation source, F (2, 16)=6.21, p=.010, η 2 =.09. The others did not show significant effect; treatment, F (1, 8)=2.73, p >.05, η 2 =.04, sound source, F (1, 8)=1.61, p >.05, η 2 =.02, first-order interaction between reverberation source and treatment, F (2, 16)=0.61, p >.05, η 2 =.00, first-order interaction between reverberation source and sound source, F (2, 16)=2.47, p >.05, η 2 =.04, first-order interaction between treatment and sound source, F (1, 8)=0.96, p >.05, η 2 =.01, second-order interaction between reverberation source, treatment and sound source, F (2, 16)=2.14, p >.05, η 2 =.02. Fig shows the averaged distance scores. A post-hoc analysis of Tukey s HSD was run on reverberation source. Small room and gymnasium, small room and medium-size room show significant effects p=.046 and p=.015 respectively. Reverberation source, number of cases, means, standard errors and 95% confidence intervals are following; small room (n=36, M =2.36, SE=0.14, 95% CI [2.08, 2.64]), medium-size room (n=36, M =3.11, SE=0.20, 95% CI [2.71, 3.51]), gymnasium (n=36, M=3.00, SE=0.23, 95% CI [2.54, 3.46]). 67

86 CHAPTER 3. A NEW 5-LOUDSPEAKER 3D SOUND SYSTEM WITH A REVERBERATION RECONSTRUCTION METHOD Treatment does not affect distance evaluation. The main effect of reverberation source only has significant effect. It is assumed that delay of early reflections or reverberation duration affect the distance evaluation. Room size The analysis of room size showed significant effects of reverberation source, F (2, 16)=12.76, p <.001, η 2 =.30, second-order interaction between reverberation source, treatment and sound source, F (2, 16)=5.40, p=.016, η 2 =.06. The others did not show significant effect; treatment, F (1, 8)=0.82, p >.05, η 2 =.01, sound source, F (1, 8)=2.88, p >.05, η 2 =.03, first-order interaction between reverberation source and treatment, F (2, 16)=3.01, p >.05, η 2 =.03, first-order interaction between reverberation source and sound source, F (2, 16)=1.99, p >.05, η 2 =.03, first-order interaction between treatment and sound source, F (1, 8)=1.77, p >.05, η 2 =.02. A post-hoc analysis of Tukey s HSD was run on reverberation source. All combinations show significant effect; medium-size room and gymnasium p=.002, small room and gymnasium p <.001, small room and medium-size room p=.012. Reverberation source, number of cases, means, standard errors and 95% confidence intervals are following; small room (n=36, M =2.64, SE=0.16, 95% CI [2.32, 2.95]), medium-size room (n=36, M =3.31, SE=0.18, 95% CI [2.94, 3.67]), gymnasium (n=36, M =4.11, SE=0.16, 95% CI [3.78, 4.44]). A post-hoc analysis of Tukey s HSD was run on reverberation source, treatment and sound source. There was no significant effect between treatments as shown in Fig In room size evaluation, the main effect of reverberation source and second-order interaction effect of reverberation source, treatment and sound source show significant effect. Multiple comparison results of reverberation source show that all combinations have significant effects. Therefore, room size evaluation is affected by reverberation source Experiment summary Spatiality, clarity, and naturalness are improved by the new reverberation method. Reverberation source affects the results of clarity, distance and naturalness. 68

87 3.3. LOUDSPEAKER-BASED REVERBERATION RECONSTRUCTION METHOD treatment Conventional New Room size score piano laugh piano laugh piano laugh small room medium-size gymnasium n=9 Sound source, reverberation source and number of data Figure 3.14: Averaged room size scores related to reverberation source, treatment and sound source in headphone system. Error bars correspond to 95% confidence interval. There is no significant effect between treatments. Red bar is the ideal scores. 3.3 Loudspeaker-based reverberation reconstruction method Conventional method A traditional and very simple method is to add reverberations to sound sources as shown in Fig [110, 111, 121]. In this figure, S L and S R are the left and right channels of a sound source, I L and I R are IRs in a room from the sound source to the positions of left and right microphones, and S L and S R are the sound signals at the left and right ears, respectively, S L = I LS L, S R = I RS R. Fig. 2.7 shows the model of 3D sound system. (3.28) Fig shows the 3D sound system with conventional reverberation adding method. I L and I R are IRs in a room including the characteristics of measurement situation. Sig- Figure 3.15: Model of conventional unidirectional reverberation 69

88 CHAPTER 3. A NEW 5-LOUDSPEAKER 3D SOUND SYSTEM WITH A REVERBERATION RECONSTRUCTION METHOD Amplitude Panning Figure 3.16: Model of conventional reverberation adding in 3D sound system nals are panned after convolving IR and HRTF. S LS = S L I L H L a LS S L = S L I L H L a L S C = 1 2 (S LI L H L + S R I R H R ) a C (3.29) S R = S R I R H R a R S RS = S R I R H R a RS Spatial reverberation for five-loudspeaker The proposed system is shown in Fig The system realizes better reverberation adding by consideration of each room transfer function (RTF) and HRTF of reverberation element. S LS = S L I Li H Li a LSi i S L = S L I Li H Li a Li ( i ) S C = 1 S L I Li H Li a Li + S R I Ri H Ri a Ri 2 i i S R = S R I Ri H Ri a Ri i S RS = S R I Ri H Ri a RSi i (3.30) Fig shows the calculation flow of the new system. The system chooses a HRTF from the database according to the polar information θ i and ϕ i of ith image 70

89 3.3. LOUDSPEAKER-BASED REVERBERATION RECONSTRUCTION METHOD 1024 Amplitude Panning Figure 3.17: Model of new reverberation adding in 3D sound system sound source. If the HRTF of θ i and ϕ i does not exist, the nearest HRTF is used. Then, the system calculates E (θ i, ϕ i ) as described in Chapter 2. The system convolves the time frame of the chosen image sound source and the chosen HRTF, and adjusts the energy of the convolved signal to the energy of the original time frame. The last calculation of one time frame is applying the delay of the chosen time frame to the energy-adjusted signal. A spatial reverberation is obtained for five-loudspeaker system by summation of all calculations. HRTFs measured in the University of Aizu were used and their measurement resolutions are 2.5 in azimuth and 5 in elevation. The HRTFs were measured in an anechoic chamber with B&K Head and Torso simulator type 4128C [80] Experiment Participants The number of participants were 8 males and 2 females between 20 and 28 years old (M=21.8, SD=2.27). They all were the students of the University of Aizu. They are Japanese. Their auditory sense were normal. Materials and Procedure The effectiveness of the proposed system is demonstrated by listening tests compared to a conventional system. Used IR was measured using TSP s of time 71

90 CHAPTER 3. A NEW 5-LOUDSPEAKER 3D SOUND SYSTEM WITH A REVERBERATION RECONSTRUCTION METHOD frame IR 1 frame IR 2 Energy Adjustment Energy Adjustment is convolution Delay Delay Amplitude Panning 1 HRTF Amplitude Panning 2 HRTF frame IR n Energy Adjustment Delay Amplitude Panning n HRTF n n n n Test Signal 5CH Spatial Reverberation Signal Output Figure 3.18: The reconstruction flow of five channel spatial reverberation for fiveloudspeaker system Table 3.3: Dimensions of room used for measurements Parameter Small room Length 6.5 m Width 7.9 m Height 3.0 m Distance from microphone to loudspeaker 2.1 m Direction from microphone to loudspeaker s T 60 duration, and the IR was divided into frames of ms. The number of obtained image sound sources was 1,024. The listening tests have conducted in an anechoic chamber. The sound source used as a test signal is 2.04 s of time duration, 16 bits of quantization and 44.1 khz of sampling rate (a little girl is laughing). The playback A-weighted sound pressure level was 75 db SPL of peak level and 67.5 db SPL of the mean level by a sound level meter Rion NA-20 at the listening position. The test signals were played in random permutation by a experimenter to avoid including hysteresis. The evaluation items are listed in Table 3.4. The experiment was conducted in Japanese. Former five questions are answered in five degrees questionnaire form (1:very poor, 2:poor, 3:fair, 4:good and 5:excellent). Latter two questions are answered 72

91 3.3. LOUDSPEAKER-BASED REVERBERATION RECONSTRUCTION METHOD Spatiality Reverberation Clarity Sharpness Naturalness Distance (m) Azimuth (deg.) Table 3.4: Evaluation items Can you feel expanse of sound? Can you fell expanse of virtual room? Can you feel reverberation? How clear is the entire sound? How sharp or clear is the sound image? Can you fell the impact of the sound image? How natural is the entire sound? How is the balance of the direct sound and reverberation? How far is the direct sound? Where does the sound come from? directly in meter and azimuth degree Results and discussion A mixed-effects of 1-way ANOVAs were run on the collected data with treatment (conventional reverberation method, new reverberation method) as within factor, with spatiality, reverberation, clarity, sharpness, naturalness, distance, and azimuth as dependent variables respectively. The analysis of spatiality showed significant effect, F (1, 9)=27.56, p <.001, η 2 =.55, reverberation, F (1, 9)=21.00, p=.001, η 2 =.49, clarity, F (1, 9)=9.99, p=.011, η 2 =.32, naturalness, F (1, 9)=13.50, p=.005, η 2 =.46, azimuth, F (1, 9)=6.31, p=.033, η 2 =.09. The others did not show significant effects; sharpness, F (1, 9)=3.86, p >.05, η 2 =.14, distance, F (1, 9)=0.31, p >.05, η 2 =.01. Fig shows averaged subjective evaluations. Fig shows averaged distance responses and averaged azimuth responses. 73

92 CHAPTER 3. A NEW 5-LOUDSPEAKER 3D SOUND SYSTEM WITH A REVERBERATION RECONSTRUCTION METHOD Averaged score 5 s re o4 c s d e3 g ra e v2 A 1 treatment Conventional New Spatiality Reverberation Clarity Sharpness Naturalness n=10 Subjective evaluation criterias and number of data Figure 3.19: Averaged subjective scores in loudspeaker system. Error bars correspond to 95% confidence interval of loudspeaker-based reverberation. Longer bar is better. Averaged distance (m) Conventional New n=10 Treatment and number of data Averaged azimuth ( ) Conventional New n=10 Treatment and number of data a) Averaged distance response b) Averaged azimuth response Figure 3.20: a) Averaged distance and b) azimuth response in loudspeaker system. Error bars correspond to 95% confidence interval of loudspeaker-based reverberation. Red bar means the ideal value. 74

93 3.4. CONCLUSION Experiment summary The proposed method improves spatiality, reverberation, clarity, naturalness than the conventional method. There is no significant difference in sharpness evaluation and distance perception. In azimuth perception, there is significant difference, therefore reverberation affects the azimuth perception. 3.4 Conclusion A new reverberation reconstruction method is proposed to improve the impression of sound. The new method reconstructs IRs with directional information using ISSs obtained by the closely located four microphone method and the sound intensity method. In headphone system and the 3D sound system (Chapter 2), the comparison experiments were conducted and the results showed the new method improves Spatiality, Clarity, Naturalness. The advantage of the new method is the method does not require changing the existing systems. The only thing to add the new reverberation into the sound source is convolving the created IR and the sound source. If the conventional IR set in DAW are replaced the new IR set, DAW can use the new reverberation. 3.5 Future Work In this dissertation, the frame sizes of the sound intensity method were experimentally determined as 2.9 ms and 11.6 ms. However, 10 ms was used in [116], and 33.3 ms or 6.6 ms were used in [122]. A study for finding optimum length of frame size is needed. In loudspeaker system, reverberation affects the azimuth response. Therefore, a study of azimuth and elevation response of the 3D sound system with the new reverberation is needed as future work. 75

94

95 Chapter 4 Development of a User Interface for 3D Sound Creation 4.1 Introduction The purpose of this research is to build an intuitive system that provides 3D spatial sound creation tool using existing sound sources. RISSICS (Real-time and Intuitive Spatial Sound Image Creation System) is the implementation of the 3D sound system (Chapter 2) in real-time. RISSICS continuously calculates signal processing for spatializing sound and distributing calculated signals to five-loudspeaker appropriately. The system provides input device selection for flexibility of the user. RISSICS shows situations of system running states by a computer display device in real-time. Moreover, the system provides sound motion trajectories recording or reproducing functions. 4.2 System structure Programming libraries RISSICS uses some libraries that are for signal stream manipulation, for CG (computer graphics) manipulation and for GUI (graphical user interface) construction on Microsoft Windows. Audio stream input-output (ASIO) is an audio library developed by Steinberg. The library can manipulate audio streams with little latency, so the library is used in professional recording use that requires real-time monitoring. RISSICS requires five-channel output ports, so an audio library that can manipulate multi-channel signal streams is required. ASIO is used in this research because the library can solve the aforementioned 77

96 CHAPTER 4. DEVELOPMENT OF A USER INTERFACE FOR 3D SOUND CREATION problems. PortAudio is a cross-platform and free library that can manipulate many audio stream libraries [123]. The library helps to allocate an audio buffer, and has higher maintainability. DirectX is a powerful multimedia library on Microsoft Windows [124]. Developers can use hardware peculiar functions without coding for the hardware because the library is embedded multimedia programming interface. The library provides common programming interface. The DirectX library contains some sub-libraries that are DirectInput for device recognition, Direct3D for 3D rendering and so forth. Recently, graphic cards have acquired powerful rendering capacities and native DirectX decoder, so developers can use wonderful CG functions via the DirectX library. RISSICS uses the DirectX library for HID (human interface device) recognition, a listener s manipulation analysis and CG rendering..net Framework is a software runtime framework developed by Microsoft. We can construct a GUI on Windows easier using the library than without the.net Framework library. Wiiuse library is written in C language to communicate with several Nintendo Wii Remote controllers. The library captures button presses, motion sensing, IR tracking, and supports the Nunchuk and Classic Controller. Unlike similar projects, the library is single threaded and nonblocking making it a light-weight and a clean API. RISSICS accesses a Wii Remote via wiiuse library [125]. Wiiuse library runs as a device driver, therefore, the library needs a device driver development kit called WinDDK [126] Hardware ASIO requires a hardware device which is ready for ASIO 2.0. We selected an USB audio interface device Roland UA-1000 which has 10-in 10-out full-duplex capacity [127]. RISSICS uses some game controllers which have analog input for continuous input detection as input devices. A SONY PlayStation2 controller SCPH (Dual Shock 2) is used as a wired user interface. Fig. 4.1 shows the shape of the controller. A Nintendo Wii controller RVL-003 (Wii Remote), an expansion controller RVL-004 (Nunchuk) and another expansion controller RVL-005 (Classic Controller) are used 78

97 4.2. SYSTEM STRUCTURE Figure 4.1: Sony Play Station 2 Controller SCPH (Dual Shock 2) Figure 4.2: Nintendo Wii Controller RVL-003 (Wii Remote), RVL-004 (Nunchuk), RVL-005 (Classic controller) as wireless user interfaces. Fig. 4.2 shows the shapes of the controllers. A Nunchuk and a Classic Controller connect up an Expansion Port of Wii Remote to use. We can connect and disconnect freely because these optional devices correspond to hot-swap. A HID interface Owltech PP-0301 is used for the PlayStation2 interface conversion into the USB (universal serial bus). A Wii Remote is a Bluetooth device and the device requires a Bluetooth receiver connecting to a computer for device recognition. Wii Remote controllers equip Broadcom BCM2042 LSI chip as Bluetooth controller. The chip is based on the Bluetooth standard 2.0 and Wii Remote controllers can connect within 10 m. We use a USB Bluetooth dongles that is Buffalo BSHSBD02. Fig. 4.3 shows the relations between devices. 79

98 CHAPTER 4. DEVELOPMENT OF A USER INTERFACE FOR 3D SOUND CREATION ROLAND Loudspeakers Figure 4.3: A block diagram of device connection 80

99 4.2. SYSTEM STRUCTURE Sampling Rate Coefficient Depth Window Length Azimuth Resolution Elevation Resolution Channel Measurement Model Distance Measurement Location Total 44.1 khz 16 bits 512 points 144 directions, [0, 360), 2.5 resolution 37 directions, [-90, 90], 5 resolution 2, L and R Head and Torso Simulator Type 4128C 1.8 m Anechoic Chamber 10,656 HRTFs (83.25 MiB) Real-time operation Table 4.1: HRTF data set information A listener inputs a sound image position as an azimuth and an elevation using a HID device, and then an appropriate HRTF is selected according to the azimuth and the elevation. The selected HRTF is convolved with input signals. However a listener s input changes instantly. So, RISSICS divides input signals into small frames and convolves each frame with selected HRTFs. A sound source and all HRTF are allocated in memory. HRTFs are Fourier transformed into the frequency domain previously. Table 4.1 shows the detailed information of HRTF database [80]. RISSICS has a continuous convolution function. HRTFs that RISSICS uses are 512 points length data and 44.1 khz sampling rate. Input signals are divided into 512 points length frame to fit the length with HRTF length. Fig. 4.4 shows a continuous convolution calculation flow. Convolution is calculated as multiplication in the frequency domain. From the characteristics of the Fourier transform, a wraparound noise occurs. Therefore, to avoid including the noise, two frames are used in a calculation cycle. The latter frame of the last calculation cycle and the frame of the current calculation cycle are joined to become a 1,024 points length frame. An HRTF frame also becomes 1,024 points length frame by zero padding. 1,024 points length signal and 1,024 points length HRTF are multiplied and inverse Fourier-transformed back into the time domain. The former frame is discarded because of containing wraparound noise. The latter frame is used as the convolved signal in this calculation cycle. The refresh rate of RISSICS is 86 times per second and the calculation cycle time is less than 11.6 ms [128, 129]. The human s time resolution for sound is about 50 ms 81

100 CHAPTER 4. DEVELOPMENT OF A USER INTERFACE FOR 3D SOUND CREATION Discard Figure 4.4: Continuous convolution calculation flow [130], so the refresh rate of RISSICS is fast enough. Chapter 2 describes the details of amplitude panning of this system User interface RISSICS has a function that is a CG rendering function to recognize a location of a sound image. A listener and a sound image object are displayed on a computer monitor in real-time. RISSICS provides three input devices that are Dual Shock 2, Wii Remote and Wii Remote with Nunchuk. Each controller has strong and weak points and they depend on situations. For this reason, RISSICS can change input devices arbitrarily. The left side of main window shows some information that are signal playback conditions, radio buttons for switching panning algorithms, HID states and HRTF selection state. The right side of main window shows each loudspeaker s signal strength in db of dynamic range, and root mean square (RMS) value in db. RISSICS can record trajectories of sound image localizations. RISSICS records sound motions when the record radio button 2 is checked in playback setup window (Fig. 4.6). RISSICS records sound motion trajectories to a file as following Table

101 4.2. SYSTEM STRUCTURE Figure 4.5: RISSICS s Main Window The avatar of a listener is made using 1: playback information area. 2: panning algorithm selection area. 3: HRTF information area. 4: controller information area. 5: CG rendering area. 6: an avatar of a sound image. 7: an avatar of a listener. 8: an avatar of a loudspeaker. 9: a toggle button of sound motion recording. 10: peak meters in db. 11: RMS meters in db. 12: gain control bar. 13: environmental selection button. 83

102 CHAPTER 4. DEVELOPMENT OF A USER INTERFACE FOR 3D SOUND CREATION 1 2 Figure 4.6: Playback setup window, 1 is playback sound motion toggle button, 2 is record sound motion toggle button. Data Type 32-bit integer 64-bit real 64-bit real 64-bit real Content Data System running time Azimuth Elevation Distance Table 4.2: Sound motion recording format format. RISSICS can change the ratio of HRTF usage (Fig. 4.7). This function is implemented because HRTF duplication causes a sound localization worse Dual Shock 2 The analog sticks are mainly used for an azimuth and an elevation of a sound image input. The analog sticks can detect a listener s input sensitively. The left stick (L3 stick) is used for an azimuth input. The right stick (R3 stick) is used for an elevation input. This controller cannot control distance between a sound image and a listener, and the distance is 150 cm. A listener inputs an azimuth using the left stick (L3 stick) of a Dual Shock 2. Fig. 4.8 shows the operation of an azimuth and an elevation. RISSICS initializes HIDs using DirectInput library. The L3 stick of Dual Shock 2 84

103 4.2. SYSTEM STRUCTURE Figure 4.7: Partial convolution setup window Figure 4.8: Azimuth and elevation operation by Dual Shock 2. The left stick (L3 stick) is used to input azimuth. The right stick (R3 stick) is used to input elevation. 85

104 CHAPTER 4. DEVELOPMENT OF A USER INTERFACE FOR 3D SOUND CREATION,,,,,,,, Figure 4.9: Analog stick operation of Dual Shock 2. This figure shows the relations between stick s axes and intensity directions. The L3 stick s inputs are recognized as X Axis and Y Axis. The R3 stick s inputs are recognized as Z Axis and Z Rotation. input is recognized as input intensities of X Axis and Y Axis by DirectInput library. The R3 stick of Dual Shock 2 input is recognized as input intensities of Z Axis and Z Rotation. The maximum values of these input intensities are arbitrary set when a HID is initialized by DirectInput library. RISSICS uses 1,000 as maximum value and -1,000 as minimum value. Fig. 4.9 shows sticks axes and intensity directions. An azimuth value is calculated by following Eq. (4.1) according to Fig θ azimuth = arctan 2 ( Y axis, X axis ) (4.1) When the calculated azimuth value is negative, RISSICS adjusts the value by following Eq. (4.2). θ azimuth θ azimuth + 2π (4.2) A listener inputs an elevation using the right stick (R3 stick) of a Dual Shock 2 (Fig. 4.8). The range of elevation is between -90 and 90 so the R3 stick is used only above or below directions. An elevation value is calculated by following Eq. (4.3) according to Fig ϕ elevation = 90 Z rotation 1000 (4.3) Wii Remote Input methods and button key assigns are different when a Wii Remote is connected or disconnected to an option device. 86

105 4.2. SYSTEM STRUCTURE Figure 4.10: Three-axis motion sensor and axes assignment Wii Remote controllers have a three-axis motion sensor. The motion sensor can get data in the ±3 times gravity with 10% sensitivities. The motion sensor can get accelerometers for ±x-axis, ±y-axis and ±z-axis. Roll and pitch are derived from the three-axis motion sensor. RISSICS assigns azimuth with roll, and elevation with pitch. The motion sensor can detect gradients of the controller. A tilt of a Wii Remote can be calculated directly by the force of gravity when the Wii Remote is not accelerating. When a Wii Remote is accelerating due to hand motion, the acceleration values are normalized within ±1 gravity range, because angles are calculated with arcsin() and arctangent() trigonometric functions. We call the normalized values g force. Roll and pitch values are calculated following Eq. (4.4a) and (4.4b) with Eq. (4.4c) ranges. θ Roll = 180 arctan 2 (Z gforce, X gforce ) π ϕ Pitch = 180 arctan 2 (Z gforce, Y gforce ) π 1 X gforce 1 1 Y gforce 1 1 Z gforce 1 (4.4a) (4.4b) (4.4c) Pitch values are calculated precisely by Eq. (4.4b). However, RISSICS cannot localize a sound image at the ideal location when Roll value is over ±68 because of the characteristics of a three-axis sensor of a Wii Remote and that of the arctangent trigonometrical function. Therefore, RISSICS uses Eq. (4.5a) commutatively when Roll value is over ±68. The arctangent function changes values in a range near calcula- 87

Introduction. 1.1 Surround sound

Introduction. 1.1 Surround sound Introduction 1 This chapter introduces the project. First a brief description of surround sound is presented. A problem statement is defined which leads to the goal of the project. Finally the scope of

More information

Auditory Localization

Auditory Localization Auditory Localization CMPT 468: Sound Localization Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University November 15, 2013 Auditory locatlization is the human perception

More information

Spatial Audio Reproduction: Towards Individualized Binaural Sound

Spatial Audio Reproduction: Towards Individualized Binaural Sound Spatial Audio Reproduction: Towards Individualized Binaural Sound WILLIAM G. GARDNER Wave Arts, Inc. Arlington, Massachusetts INTRODUCTION The compact disc (CD) format records audio with 16-bit resolution

More information

PERSONAL 3D AUDIO SYSTEM WITH LOUDSPEAKERS

PERSONAL 3D AUDIO SYSTEM WITH LOUDSPEAKERS PERSONAL 3D AUDIO SYSTEM WITH LOUDSPEAKERS Myung-Suk Song #1, Cha Zhang 2, Dinei Florencio 3, and Hong-Goo Kang #4 # Department of Electrical and Electronic, Yonsei University Microsoft Research 1 earth112@dsp.yonsei.ac.kr,

More information

Spatial Audio & The Vestibular System!

Spatial Audio & The Vestibular System! ! Spatial Audio & The Vestibular System! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 13! stanford.edu/class/ee267/!! Updates! lab this Friday will be released as a video! TAs

More information

Principles of Musical Acoustics

Principles of Musical Acoustics William M. Hartmann Principles of Musical Acoustics ^Spr inger Contents 1 Sound, Music, and Science 1 1.1 The Source 2 1.2 Transmission 3 1.3 Receiver 3 2 Vibrations 1 9 2.1 Mass and Spring 9 2.1.1 Definitions

More information

Sound source localization and its use in multimedia applications

Sound source localization and its use in multimedia applications Notes for lecture/ Zack Settel, McGill University Sound source localization and its use in multimedia applications Introduction With the arrival of real-time binaural or "3D" digital audio processing,

More information

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. 3D and Virtual Sound. Paris Smaragdis. paris.cs.illinois.

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. 3D and Virtual Sound. Paris Smaragdis. paris.cs.illinois. UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab 3D and Virtual Sound Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Overview Human perception of sound and space ITD, IID,

More information

Sound Source Localization using HRTF database

Sound Source Localization using HRTF database ICCAS June -, KINTEX, Gyeonggi-Do, Korea Sound Source Localization using HRTF database Sungmok Hwang*, Youngjin Park and Younsik Park * Center for Noise and Vibration Control, Dept. of Mech. Eng., KAIST,

More information

Measuring impulse responses containing complete spatial information ABSTRACT

Measuring impulse responses containing complete spatial information ABSTRACT Measuring impulse responses containing complete spatial information Angelo Farina, Paolo Martignon, Andrea Capra, Simone Fontana University of Parma, Industrial Eng. Dept., via delle Scienze 181/A, 43100

More information

Analysis of Frontal Localization in Double Layered Loudspeaker Array System

Analysis of Frontal Localization in Double Layered Loudspeaker Array System Proceedings of 20th International Congress on Acoustics, ICA 2010 23 27 August 2010, Sydney, Australia Analysis of Frontal Localization in Double Layered Loudspeaker Array System Hyunjoo Chung (1), Sang

More information

Envelopment and Small Room Acoustics

Envelopment and Small Room Acoustics Envelopment and Small Room Acoustics David Griesinger Lexicon 3 Oak Park Bedford, MA 01730 Copyright 9/21/00 by David Griesinger Preview of results Loudness isn t everything! At least two additional perceptions:

More information

III. Publication III. c 2005 Toni Hirvonen.

III. Publication III. c 2005 Toni Hirvonen. III Publication III Hirvonen, T., Segregation of Two Simultaneously Arriving Narrowband Noise Signals as a Function of Spatial and Frequency Separation, in Proceedings of th International Conference on

More information

Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model

Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model Sebastian Merchel and Stephan Groth Chair of Communication Acoustics, Dresden University

More information

Binaural Hearing. Reading: Yost Ch. 12

Binaural Hearing. Reading: Yost Ch. 12 Binaural Hearing Reading: Yost Ch. 12 Binaural Advantages Sounds in our environment are usually complex, and occur either simultaneously or close together in time. Studies have shown that the ability to

More information

Spatial audio is a field that

Spatial audio is a field that [applications CORNER] Ville Pulkki and Matti Karjalainen Multichannel Audio Rendering Using Amplitude Panning Spatial audio is a field that investigates techniques to reproduce spatial attributes of sound

More information

University of Huddersfield Repository

University of Huddersfield Repository University of Huddersfield Repository Lee, Hyunkook Capturing and Rendering 360º VR Audio Using Cardioid Microphones Original Citation Lee, Hyunkook (2016) Capturing and Rendering 360º VR Audio Using Cardioid

More information

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4 SOPA version 2 Revised July 7 2014 SOPA project September 21, 2014 Contents 1 Introduction 2 2 Basic concept 3 3 Capturing spatial audio 4 4 Sphere around your head 5 5 Reproduction 7 5.1 Binaural reproduction......................

More information

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS 20-21 September 2018, BULGARIA 1 Proceedings of the International Conference on Information Technologies (InfoTech-2018) 20-21 September 2018, Bulgaria INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR

More information

THE TEMPORAL and spectral structure of a sound signal

THE TEMPORAL and spectral structure of a sound signal IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 1, JANUARY 2005 105 Localization of Virtual Sources in Multichannel Audio Reproduction Ville Pulkki and Toni Hirvonen Abstract The localization

More information

Surround: The Current Technological Situation. David Griesinger Lexicon 3 Oak Park Bedford, MA

Surround: The Current Technological Situation. David Griesinger Lexicon 3 Oak Park Bedford, MA Surround: The Current Technological Situation David Griesinger Lexicon 3 Oak Park Bedford, MA 01730 www.world.std.com/~griesngr There are many open questions 1. What is surround sound 2. Who will listen

More information

The analysis of multi-channel sound reproduction algorithms using HRTF data

The analysis of multi-channel sound reproduction algorithms using HRTF data The analysis of multichannel sound reproduction algorithms using HRTF data B. Wiggins, I. PatersonStephens, P. Schillebeeckx Processing Applications Research Group University of Derby Derby, United Kingdom

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 1, 21 http://acousticalsociety.org/ ICA 21 Montreal Montreal, Canada 2 - June 21 Psychological and Physiological Acoustics Session appb: Binaural Hearing (Poster

More information

Convention Paper Presented at the 126th Convention 2009 May 7 10 Munich, Germany

Convention Paper Presented at the 126th Convention 2009 May 7 10 Munich, Germany Audio Engineering Society Convention Paper Presented at the 16th Convention 9 May 7 Munich, Germany The papers at this Convention have been selected on the basis of a submitted abstract and extended precis

More information

University of Huddersfield Repository

University of Huddersfield Repository University of Huddersfield Repository Moore, David J. and Wakefield, Jonathan P. Surround Sound for Large Audiences: What are the Problems? Original Citation Moore, David J. and Wakefield, Jonathan P.

More information

6-channel recording/reproduction system for 3-dimensional auralization of sound fields

6-channel recording/reproduction system for 3-dimensional auralization of sound fields Acoust. Sci. & Tech. 23, 2 (2002) TECHNICAL REPORT 6-channel recording/reproduction system for 3-dimensional auralization of sound fields Sakae Yokoyama 1;*, Kanako Ueno 2;{, Shinichi Sakamoto 2;{ and

More information

Acoustics Research Institute

Acoustics Research Institute Austrian Academy of Sciences Acoustics Research Institute Spatial SpatialHearing: Hearing: Single SingleSound SoundSource Sourcein infree FreeField Field Piotr PiotrMajdak Majdak&&Bernhard BernhardLaback

More information

3D sound image control by individualized parametric head-related transfer functions

3D sound image control by individualized parametric head-related transfer functions D sound image control by individualized parametric head-related transfer functions Kazuhiro IIDA 1 and Yohji ISHII 1 Chiba Institute of Technology 2-17-1 Tsudanuma, Narashino, Chiba 275-001 JAPAN ABSTRACT

More information

Three-dimensional sound field simulation using the immersive auditory display system Sound Cask for stage acoustics

Three-dimensional sound field simulation using the immersive auditory display system Sound Cask for stage acoustics Stage acoustics: Paper ISMRA2016-34 Three-dimensional sound field simulation using the immersive auditory display system Sound Cask for stage acoustics Kanako Ueno (a), Maori Kobayashi (b), Haruhito Aso

More information

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015 Final Exam Study Guide: 15-322 Introduction to Computer Music Course Staff April 24, 2015 This document is intended to help you identify and master the main concepts of 15-322, which is also what we intend

More information

The psychoacoustics of reverberation

The psychoacoustics of reverberation The psychoacoustics of reverberation Steven van de Par Steven.van.de.Par@uni-oldenburg.de July 19, 2016 Thanks to Julian Grosse and Andreas Häußler 2016 AES International Conference on Sound Field Control

More information

Fundamentals of Digital Audio *

Fundamentals of Digital Audio * Digital Media The material in this handout is excerpted from Digital Media Curriculum Primer a work written by Dr. Yue-Ling Wong (ylwong@wfu.edu), Department of Computer Science and Department of Art,

More information

Computational Perception. Sound localization 2

Computational Perception. Sound localization 2 Computational Perception 15-485/785 January 22, 2008 Sound localization 2 Last lecture sound propagation: reflection, diffraction, shadowing sound intensity (db) defining computational problems sound lateralization

More information

Enhancing 3D Audio Using Blind Bandwidth Extension

Enhancing 3D Audio Using Blind Bandwidth Extension Enhancing 3D Audio Using Blind Bandwidth Extension (PREPRINT) Tim Habigt, Marko Ðurković, Martin Rothbucher, and Klaus Diepold Institute for Data Processing, Technische Universität München, 829 München,

More information

Psychoacoustic Cues in Room Size Perception

Psychoacoustic Cues in Room Size Perception Audio Engineering Society Convention Paper Presented at the 116th Convention 2004 May 8 11 Berlin, Germany 6084 This convention paper has been reproduced from the author s advance manuscript, without editing,

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 VIRTUAL AUDIO REPRODUCED IN A HEADREST

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 VIRTUAL AUDIO REPRODUCED IN A HEADREST 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 VIRTUAL AUDIO REPRODUCED IN A HEADREST PACS: 43.25.Lj M.Jones, S.J.Elliott, T.Takeuchi, J.Beer Institute of Sound and Vibration Research;

More information

Improving room acoustics at low frequencies with multiple loudspeakers and time based room correction

Improving room acoustics at low frequencies with multiple loudspeakers and time based room correction Improving room acoustics at low frequencies with multiple loudspeakers and time based room correction S.B. Nielsen a and A. Celestinos b a Aalborg University, Fredrik Bajers Vej 7 B, 9220 Aalborg Ø, Denmark

More information

Accurate sound reproduction from two loudspeakers in a living room

Accurate sound reproduction from two loudspeakers in a living room Accurate sound reproduction from two loudspeakers in a living room Siegfried Linkwitz 13-Apr-08 (1) D M A B Visual Scene 13-Apr-08 (2) What object is this? 19-Apr-08 (3) Perception of sound 13-Apr-08 (4)

More information

CHAPTER ONE SOUND BASICS. Nitec in Digital Audio & Video Production Institute of Technical Education, College West

CHAPTER ONE SOUND BASICS. Nitec in Digital Audio & Video Production Institute of Technical Education, College West CHAPTER ONE SOUND BASICS Nitec in Digital Audio & Video Production Institute of Technical Education, College West INTRODUCTION http://www.youtube.com/watch?v=s9gbf8y0ly0 LEARNING OBJECTIVES By the end

More information

Robotic Spatial Sound Localization and Its 3-D Sound Human Interface

Robotic Spatial Sound Localization and Its 3-D Sound Human Interface Robotic Spatial Sound Localization and Its 3-D Sound Human Interface Jie Huang, Katsunori Kume, Akira Saji, Masahiro Nishihashi, Teppei Watanabe and William L. Martens The University of Aizu Aizu-Wakamatsu,

More information

Computational Perception /785

Computational Perception /785 Computational Perception 15-485/785 Assignment 1 Sound Localization due: Thursday, Jan. 31 Introduction This assignment focuses on sound localization. You will develop Matlab programs that synthesize sounds

More information

Wave Field Analysis Using Virtual Circular Microphone Arrays

Wave Field Analysis Using Virtual Circular Microphone Arrays **i Achim Kuntz таг] Ш 5 Wave Field Analysis Using Virtual Circular Microphone Arrays га [W] та Contents Abstract Zusammenfassung v vii 1 Introduction l 2 Multidimensional Signals and Wave Fields 9 2.1

More information

ROOM IMPULSE RESPONSES AS TEMPORAL AND SPATIAL FILTERS ABSTRACT INTRODUCTION

ROOM IMPULSE RESPONSES AS TEMPORAL AND SPATIAL FILTERS ABSTRACT INTRODUCTION ROOM IMPULSE RESPONSES AS TEMPORAL AND SPATIAL FILTERS Angelo Farina University of Parma Industrial Engineering Dept., Parco Area delle Scienze 181/A, 43100 Parma, ITALY E-mail: farina@unipr.it ABSTRACT

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 A MODEL OF THE HEAD-RELATED TRANSFER FUNCTION BASED ON SPECTRAL CUES

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 A MODEL OF THE HEAD-RELATED TRANSFER FUNCTION BASED ON SPECTRAL CUES 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, -7 SEPTEMBER 007 A MODEL OF THE HEAD-RELATED TRANSFER FUNCTION BASED ON SPECTRAL CUES PACS: 43.66.Qp, 43.66.Pn, 43.66Ba Iida, Kazuhiro 1 ; Itoh, Motokuni

More information

SOUND 1 -- ACOUSTICS 1

SOUND 1 -- ACOUSTICS 1 SOUND 1 -- ACOUSTICS 1 SOUND 1 ACOUSTICS AND PSYCHOACOUSTICS SOUND 1 -- ACOUSTICS 2 The Ear: SOUND 1 -- ACOUSTICS 3 The Ear: The ear is the organ of hearing. SOUND 1 -- ACOUSTICS 4 The Ear: The outer ear

More information

A binaural auditory model and applications to spatial sound evaluation

A binaural auditory model and applications to spatial sound evaluation A binaural auditory model and applications to spatial sound evaluation Ma r k o Ta k a n e n 1, Ga ë ta n Lo r h o 2, a n d Mat t i Ka r ja l a i n e n 1 1 Helsinki University of Technology, Dept. of Signal

More information

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York Audio Engineering Society Convention Paper Presented at the 115th Convention 2003 October 10 13 New York, New York This convention paper has been reproduced from the author's advance manuscript, without

More information

c 2014 Michael Friedman

c 2014 Michael Friedman c 2014 Michael Friedman CAPTURING SPATIAL AUDIO FROM ARBITRARY MICROPHONE ARRAYS FOR BINAURAL REPRODUCTION BY MICHAEL FRIEDMAN THESIS Submitted in partial fulfillment of the requirements for the degree

More information

Signals, Sound, and Sensation

Signals, Sound, and Sensation Signals, Sound, and Sensation William M. Hartmann Department of Physics and Astronomy Michigan State University East Lansing, Michigan Л1Р Contents Preface xv Chapter 1: Pure Tones 1 Mathematics of the

More information

Sound Processing Technologies for Realistic Sensations in Teleworking

Sound Processing Technologies for Realistic Sensations in Teleworking Sound Processing Technologies for Realistic Sensations in Teleworking Takashi Yazu Makoto Morito In an office environment we usually acquire a large amount of information without any particular effort

More information

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA. Why Ambisonics Does Work

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA. Why Ambisonics Does Work Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract

More information

29th TONMEISTERTAGUNG VDT INTERNATIONAL CONVENTION, November 2016

29th TONMEISTERTAGUNG VDT INTERNATIONAL CONVENTION, November 2016 Measurement and Visualization of Room Impulse Responses with Spherical Microphone Arrays (Messung und Visualisierung von Raumimpulsantworten mit kugelförmigen Mikrofonarrays) Michael Kerscher 1, Benjamin

More information

Multichannel Audio Technologies. More on Surround Sound Microphone Techniques:

Multichannel Audio Technologies. More on Surround Sound Microphone Techniques: Multichannel Audio Technologies More on Surround Sound Microphone Techniques: In the last lecture we focused on recording for accurate stereophonic imaging using the LCR channels. Today, we look at the

More information

Improving Virtual Sound Source Robustness using Multiresolution Spectral Analysis and Synthesis

Improving Virtual Sound Source Robustness using Multiresolution Spectral Analysis and Synthesis Improving Virtual Sound Source Robustness using Multiresolution Spectral Analysis and Synthesis John Garas and Piet C.W. Sommen Eindhoven University of Technology Ehoog 6.34, P.O.Box 513 5 MB Eindhoven,

More information

Digitally controlled Active Noise Reduction with integrated Speech Communication

Digitally controlled Active Noise Reduction with integrated Speech Communication Digitally controlled Active Noise Reduction with integrated Speech Communication Herman J.M. Steeneken and Jan Verhave TNO Human Factors, Soesterberg, The Netherlands herman@steeneken.com ABSTRACT Active

More information

SAMPLING THEORY. Representing continuous signals with discrete numbers

SAMPLING THEORY. Representing continuous signals with discrete numbers SAMPLING THEORY Representing continuous signals with discrete numbers Roger B. Dannenberg Professor of Computer Science, Art, and Music Carnegie Mellon University ICM Week 3 Copyright 2002-2013 by Roger

More information

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Loudness & Temporal resolution

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Loudness & Temporal resolution AUDL GS08/GAV1 Signals, systems, acoustics and the ear Loudness & Temporal resolution Absolute thresholds & Loudness Name some ways these concepts are crucial to audiologists Sivian & White (1933) JASA

More information

3D audio overview : from 2.0 to N.M (?)

3D audio overview : from 2.0 to N.M (?) 3D audio overview : from 2.0 to N.M (?) Orange Labs Rozenn Nicol, Research & Development, 10/05/2012, Journée de printemps de la Société Suisse d Acoustique "Audio 3D" SSA, AES, SFA Signal multicanal 3D

More information

MEASURING DIRECTIVITIES OF NATURAL SOUND SOURCES WITH A SPHERICAL MICROPHONE ARRAY

MEASURING DIRECTIVITIES OF NATURAL SOUND SOURCES WITH A SPHERICAL MICROPHONE ARRAY AMBISONICS SYMPOSIUM 2009 June 25-27, Graz MEASURING DIRECTIVITIES OF NATURAL SOUND SOURCES WITH A SPHERICAL MICROPHONE ARRAY Martin Pollow, Gottfried Behler, Bruno Masiero Institute of Technical Acoustics,

More information

AUDIO EfFECTS. Theory, Implementation. and Application. Andrew P. MePkerson. Joshua I. Relss

AUDIO EfFECTS. Theory, Implementation. and Application. Andrew P. MePkerson. Joshua I. Relss AUDIO EfFECTS Theory, and Application Joshua I. Relss Queen Mary University of London, United Kingdom Andrew P. MePkerson Queen Mary University of London, United Kingdom /0\ CRC Press yc**- J Taylor& Francis

More information

Listening with Headphones

Listening with Headphones Listening with Headphones Main Types of Errors Front-back reversals Angle error Some Experimental Results Most front-back errors are front-to-back Substantial individual differences Most evident in elevation

More information

Predicting localization accuracy for stereophonic downmixes in Wave Field Synthesis

Predicting localization accuracy for stereophonic downmixes in Wave Field Synthesis Predicting localization accuracy for stereophonic downmixes in Wave Field Synthesis Hagen Wierstorf Assessment of IP-based Applications, T-Labs, Technische Universität Berlin, Berlin, Germany. Sascha Spors

More information

MUS 302 ENGINEERING SECTION

MUS 302 ENGINEERING SECTION MUS 302 ENGINEERING SECTION Wiley Ross: Recording Studio Coordinator Email =>ross@email.arizona.edu Twitter=> https://twitter.com/ssor Web page => http://www.arts.arizona.edu/studio Youtube Channel=>http://www.youtube.com/user/wileyross

More information

Master MVA Analyse des signaux Audiofréquences Audio Signal Analysis, Indexing and Transformation

Master MVA Analyse des signaux Audiofréquences Audio Signal Analysis, Indexing and Transformation Master MVA Analyse des signaux Audiofréquences Audio Signal Analysis, Indexing and Transformation Lecture on 3D sound rendering Gaël RICHARD February 2018 «Licence de droits d'usage" http://formation.enst.fr/licences/pedago_sans.html

More information

Reproduction of Surround Sound in Headphones

Reproduction of Surround Sound in Headphones Reproduction of Surround Sound in Headphones December 24 Group 96 Department of Acoustics Faculty of Engineering and Science Aalborg University Institute of Electronic Systems - Department of Acoustics

More information

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists 3,700 108,500 1.7 M Open access books available International authors and editors Downloads Our

More information

A Virtual Audio Environment for Testing Dummy- Head HRTFs modeling Real Life Situations

A Virtual Audio Environment for Testing Dummy- Head HRTFs modeling Real Life Situations A Virtual Audio Environment for Testing Dummy- Head HRTFs modeling Real Life Situations György Wersényi Széchenyi István University, Hungary. József Répás Széchenyi István University, Hungary. Summary

More information

ON THE APPLICABILITY OF DISTRIBUTED MODE LOUDSPEAKER PANELS FOR WAVE FIELD SYNTHESIS BASED SOUND REPRODUCTION

ON THE APPLICABILITY OF DISTRIBUTED MODE LOUDSPEAKER PANELS FOR WAVE FIELD SYNTHESIS BASED SOUND REPRODUCTION ON THE APPLICABILITY OF DISTRIBUTED MODE LOUDSPEAKER PANELS FOR WAVE FIELD SYNTHESIS BASED SOUND REPRODUCTION Marinus M. Boone and Werner P.J. de Bruijn Delft University of Technology, Laboratory of Acoustical

More information

VIRTUAL ACOUSTICS: OPPORTUNITIES AND LIMITS OF SPATIAL SOUND REPRODUCTION

VIRTUAL ACOUSTICS: OPPORTUNITIES AND LIMITS OF SPATIAL SOUND REPRODUCTION ARCHIVES OF ACOUSTICS 33, 4, 413 422 (2008) VIRTUAL ACOUSTICS: OPPORTUNITIES AND LIMITS OF SPATIAL SOUND REPRODUCTION Michael VORLÄNDER RWTH Aachen University Institute of Technical Acoustics 52056 Aachen,

More information

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner. Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb 2008. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum,

More information

Sound localization with multi-loudspeakers by usage of a coincident microphone array

Sound localization with multi-loudspeakers by usage of a coincident microphone array PAPER Sound localization with multi-loudspeakers by usage of a coincident microphone array Jun Aoki, Haruhide Hokari and Shoji Shimada Nagaoka University of Technology, 1603 1, Kamitomioka-machi, Nagaoka,

More information

Convention Paper 9870 Presented at the 143 rd Convention 2017 October 18 21, New York, NY, USA

Convention Paper 9870 Presented at the 143 rd Convention 2017 October 18 21, New York, NY, USA Audio Engineering Society Convention Paper 987 Presented at the 143 rd Convention 217 October 18 21, New York, NY, USA This convention paper was selected based on a submitted abstract and 7-word precis

More information

ORIENTATION IN SIMPLE VIRTUAL AUDITORY SPACE CREATED WITH MEASURED HRTF

ORIENTATION IN SIMPLE VIRTUAL AUDITORY SPACE CREATED WITH MEASURED HRTF ORIENTATION IN SIMPLE VIRTUAL AUDITORY SPACE CREATED WITH MEASURED HRTF F. Rund, D. Štorek, O. Glaser, M. Barda Faculty of Electrical Engineering Czech Technical University in Prague, Prague, Czech Republic

More information

3D Sound Simulation over Headphones

3D Sound Simulation over Headphones Lorenzo Picinali (lorenzo@limsi.fr or lpicinali@dmu.ac.uk) Paris, 30 th September, 2008 Chapter for the Handbook of Research on Computational Art and Creative Informatics Chapter title: 3D Sound Simulation

More information

3D Audio Systems through Stereo Loudspeakers

3D Audio Systems through Stereo Loudspeakers Diploma Thesis Telecommunications & Media University of Applied Sciences St. Pölten 3D Audio Systems through Stereo Loudspeakers Completed under supervision of Hannes Raffaseder Completed by Miguel David

More information

MANY emerging applications require the ability to render

MANY emerging applications require the ability to render IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 4, AUGUST 2004 553 Rendering Localized Spatial Audio in a Virtual Auditory Space Dmitry N. Zotkin, Ramani Duraiswami, Member, IEEE, and Larry S. Davis, Fellow,

More information

Psychoacoustics of 3D Sound Recording: Research and Practice

Psychoacoustics of 3D Sound Recording: Research and Practice Psychoacoustics of 3D Sound Recording: Research and Practice Dr Hyunkook Lee University of Huddersfield, UK h.lee@hud.ac.uk www.hyunkooklee.com www.hud.ac.uk/apl About me Senior Lecturer (i.e. Associate

More information

NAME STUDENT # ELEC 484 Audio Signal Processing. Midterm Exam July Listening test

NAME STUDENT # ELEC 484 Audio Signal Processing. Midterm Exam July Listening test NAME STUDENT # ELEC 484 Audio Signal Processing Midterm Exam July 2008 CLOSED BOOK EXAM Time 1 hour Listening test Choose one of the digital audio effects for each sound example. Put only ONE mark in each

More information

Audio Engineering Society. Convention Paper. Presented at the 124th Convention 2008 May Amsterdam, The Netherlands

Audio Engineering Society. Convention Paper. Presented at the 124th Convention 2008 May Amsterdam, The Netherlands Audio Engineering Society Convention Paper Presented at the 124th Convention 2008 May 17 20 Amsterdam, The Netherlands The papers at this Convention have been selected on the basis of a submitted abstract

More information

Audio Engineering Society. Convention Paper. Presented at the 131st Convention 2011 October New York, NY, USA

Audio Engineering Society. Convention Paper. Presented at the 131st Convention 2011 October New York, NY, USA Audio Engineering Society Convention Paper Presented at the 131st Convention 2011 October 20 23 New York, NY, USA This Convention paper was selected based on a submitted abstract and 750-word precis that

More information

Convention Paper Presented at the 138th Convention 2015 May 7 10 Warsaw, Poland

Convention Paper Presented at the 138th Convention 2015 May 7 10 Warsaw, Poland Audio Engineering Society Convention Paper Presented at the 38th Convention 25 May 7 Warsaw, Poland This Convention paper was selected based on a submitted abstract and 75-word precis that have been peer

More information

Development of multichannel single-unit microphone using shotgun microphone array

Development of multichannel single-unit microphone using shotgun microphone array PROCEEDINGS of the 22 nd International Congress on Acoustics Electroacoustics and Audio Engineering: Paper ICA2016-155 Development of multichannel single-unit microphone using shotgun microphone array

More information

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics

More information

DESIGN OF VOICE ALARM SYSTEMS FOR TRAFFIC TUNNELS: OPTIMISATION OF SPEECH INTELLIGIBILITY

DESIGN OF VOICE ALARM SYSTEMS FOR TRAFFIC TUNNELS: OPTIMISATION OF SPEECH INTELLIGIBILITY DESIGN OF VOICE ALARM SYSTEMS FOR TRAFFIC TUNNELS: OPTIMISATION OF SPEECH INTELLIGIBILITY Dr.ir. Evert Start Duran Audio BV, Zaltbommel, The Netherlands The design and optimisation of voice alarm (VA)

More information

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE Copyright SFA - InterNoise 2000 1 inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering 27-30 August 2000, Nice, FRANCE I-INCE Classification: 0.0 INTERACTIVE VEHICLE

More information

FREQUENCY RESPONSE AND LATENCY OF MEMS MICROPHONES: THEORY AND PRACTICE

FREQUENCY RESPONSE AND LATENCY OF MEMS MICROPHONES: THEORY AND PRACTICE APPLICATION NOTE AN22 FREQUENCY RESPONSE AND LATENCY OF MEMS MICROPHONES: THEORY AND PRACTICE This application note covers engineering details behind the latency of MEMS microphones. Major components of

More information

Circumaural transducer arrays for binaural synthesis

Circumaural transducer arrays for binaural synthesis Circumaural transducer arrays for binaural synthesis R. Greff a and B. F G Katz b a A-Volute, 4120 route de Tournai, 59500 Douai, France b LIMSI-CNRS, B.P. 133, 91403 Orsay, France raphael.greff@a-volute.com

More information

Virtual Sound Source Positioning and Mixing in 5.1 Implementation on the Real-Time System Genesis

Virtual Sound Source Positioning and Mixing in 5.1 Implementation on the Real-Time System Genesis Virtual Sound Source Positioning and Mixing in 5 Implementation on the Real-Time System Genesis Jean-Marie Pernaux () Patrick Boussard () Jean-Marc Jot (3) () and () Steria/Digilog SA, Aix-en-Provence

More information

Investigation on the Quality of 3D Sound Reproduction

Investigation on the Quality of 3D Sound Reproduction Investigation on the Quality of 3D Sound Reproduction A. Silzle 1, S. George 1, E.A.P. Habets 1, T. Bachmann 1 1 Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany, Email: andreas.silzle@iis.fraunhofer.de

More information

Upper hemisphere sound localization using head-related transfer functions in the median plane and interaural differences

Upper hemisphere sound localization using head-related transfer functions in the median plane and interaural differences Acoust. Sci. & Tech. 24, 5 (23) PAPER Upper hemisphere sound localization using head-related transfer functions in the median plane and interaural differences Masayuki Morimoto 1;, Kazuhiro Iida 2;y and

More information

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback PURPOSE This lab will introduce you to the laboratory equipment and the software that allows you to link your computer to the hardware.

More information

Speech Compression. Application Scenarios

Speech Compression. Application Scenarios Speech Compression Application Scenarios Multimedia application Live conversation? Real-time network? Video telephony/conference Yes Yes Business conference with data sharing Yes Yes Distance learning

More information

Convention e-brief 400

Convention e-brief 400 Audio Engineering Society Convention e-brief 400 Presented at the 143 rd Convention 017 October 18 1, New York, NY, USA This Engineering Brief was selected on the basis of a submitted synopsis. The author

More information

ALTERNATING CURRENT (AC)

ALTERNATING CURRENT (AC) ALL ABOUT NOISE ALTERNATING CURRENT (AC) Any type of electrical transmission where the current repeatedly changes direction, and the voltage varies between maxima and minima. Therefore, any electrical

More information

PERSONALIZED HEAD RELATED TRANSFER FUNCTION MEASUREMENT AND VERIFICATION THROUGH SOUND LOCALIZATION RESOLUTION

PERSONALIZED HEAD RELATED TRANSFER FUNCTION MEASUREMENT AND VERIFICATION THROUGH SOUND LOCALIZATION RESOLUTION PERSONALIZED HEAD RELATED TRANSFER FUNCTION MEASUREMENT AND VERIFICATION THROUGH SOUND LOCALIZATION RESOLUTION Michał Pec, Michał Bujacz, Paweł Strumiłło Institute of Electronics, Technical University

More information

Sound Waves and Beats

Sound Waves and Beats Sound Waves and Beats Computer 32 Sound waves consist of a series of air pressure variations. A Microphone diaphragm records these variations by moving in response to the pressure changes. The diaphragm

More information

A Toolkit for Customizing the ambix Ambisonics-to- Binaural Renderer

A Toolkit for Customizing the ambix Ambisonics-to- Binaural Renderer A Toolkit for Customizing the ambix Ambisonics-to- Binaural Renderer 143rd AES Convention Engineering Brief 403 Session EB06 - Spatial Audio October 21st, 2017 Joseph G. Tylka (presenter) and Edgar Y.

More information

On the accuracy reciprocal and direct vibro-acoustic transfer-function measurements on vehicles for lower and medium frequencies

On the accuracy reciprocal and direct vibro-acoustic transfer-function measurements on vehicles for lower and medium frequencies On the accuracy reciprocal and direct vibro-acoustic transfer-function measurements on vehicles for lower and medium frequencies C. Coster, D. Nagahata, P.J.G. van der Linden LMS International nv, Engineering

More information

Live multi-track audio recording

Live multi-track audio recording Live multi-track audio recording Joao Luiz Azevedo de Carvalho EE522 Project - Spring 2007 - University of Southern California Abstract In live multi-track audio recording, each microphone perceives sound

More information

Sound localization Sound localization in audio-based games for visually impaired children

Sound localization Sound localization in audio-based games for visually impaired children Sound localization Sound localization in audio-based games for visually impaired children R. Duba B.W. Kootte Delft University of Technology SOUND LOCALIZATION SOUND LOCALIZATION IN AUDIO-BASED GAMES

More information