Perceptual effects of visual images on out-of-head localization of sounds produced by binaural recording and reproduction.

Similar documents
Introduction. 1.1 Surround sound

Auditory Localization

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

Perception of room size and the ability of self localization in a virtual environment. Loudspeaker experiment

Paper Body Vibration Effects on Perceived Reality with Multi-modal Contents

Development of multichannel single-unit microphone using shotgun microphone array

HRTF adaptation and pattern learning

Proceedings of Meetings on Acoustics

Spatial Audio Reproduction: Towards Individualized Binaural Sound

Effect of the number of loudspeakers on sense of presence in 3D audio system based on multiple vertical panning

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. 3D and Virtual Sound. Paris Smaragdis. paris.cs.illinois.

Analysis of Frontal Localization in Double Layered Loudspeaker Array System

Binaural Hearing. Reading: Yost Ch. 12

Sound source localization and its use in multimedia applications

The Why and How of With-Height Surround Sound

Proceedings of Meetings on Acoustics

University of Huddersfield Repository

Waves Nx VIRTUAL REALITY AUDIO

Personalized 3D sound rendering for content creation, delivery, and presentation

Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model

6-channel recording/reproduction system for 3-dimensional auralization of sound fields

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 VIRTUAL AUDIO REPRODUCED IN A HEADREST

Proceedings of Meetings on Acoustics

APPLICATION OF THE HEAD RELATED TRANSFER FUNCTIONS IN ROOM ACOUSTICS DESIGN USING BEAMFORMING

BINAURAL RECORDING SYSTEM AND SOUND MAP OF MALAGA

Robotic Spatial Sound Localization and Its 3-D Sound Human Interface

Capturing 360 Audio Using an Equal Segment Microphone Array (ESMA)

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York

Listening with Headphones

Outline. Context. Aim of our projects. Framework

VIRTUAL ACOUSTICS: OPPORTUNITIES AND LIMITS OF SPATIAL SOUND REPRODUCTION

3D sound image control by individualized parametric head-related transfer functions

The Association of Loudspeaker Manufacturers & Acoustics International presents

HRIR Customization in the Median Plane via Principal Components Analysis

The Official Magazine of the National Association of Theatre Owners

THE TEMPORAL and spectral structure of a sound signal

Audio Engineering Society. Convention Paper. Presented at the 131st Convention 2011 October New York, NY, USA

Predicting localization accuracy for stereophonic downmixes in Wave Field Synthesis

Speech Compression. Application Scenarios

Standardization of multi-channel audio with IEC, ITU-R SMPTE and ARIB

A virtual headphone based on wave field synthesis

Combining Subjective and Objective Assessment of Loudspeaker Distortion Marian Liebig Wolfgang Klippel

3D AUDIO AR/VR CAPTURE AND REPRODUCTION SETUP FOR AURALIZATION OF SOUNDSCAPES

From Binaural Technology to Virtual Reality

Binaural auralization based on spherical-harmonics beamforming

The analysis of multi-channel sound reproduction algorithms using HRTF data

EBU UER. european broadcasting union. Listening conditions for the assessment of sound programme material. Supplement 1.

Convention Paper Presented at the 124th Convention 2008 May Amsterdam, The Netherlands

Novel approaches towards more realistic listening environments for experiments in complex acoustic scenes

Multichannel Audio In Cars (Tim Nind)

ORIENTATION IN SIMPLE VIRTUAL AUDITORY SPACE CREATED WITH MEASURED HRTF

Multi-Loudspeaker Reproduction: Surround Sound

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA. Why Ambisonics Does Work

Perceived cathedral ceiling height in a multichannel virtual acoustic rendering for Gregorian Chant

Accurate sound reproduction from two loudspeakers in a living room

The Steering for Distance Perception with Reflective Audio Spot

3D audio overview : from 2.0 to N.M (?)

The Spatial Soundscape. James L. Barbour Swinburne University of Technology, Melbourne, Australia

A five-microphone method to measure the reflection coefficients of headsets

From acoustic simulation to virtual auditory displays

PERSONAL 3D AUDIO SYSTEM WITH LOUDSPEAKERS

SPAT. Binaural Encoding Tool. Multiformat Room Acoustic Simulation & Localization Processor. Flux All rights reserved

Computational Perception. Sound localization 2

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 A MODEL OF THE HEAD-RELATED TRANSFER FUNCTION BASED ON SPECTRAL CUES

Convention Paper Presented at the 126th Convention 2009 May 7 10 Munich, Germany

Virtual Acoustic Space as Assistive Technology

Multichannel Audio Technologies. More on Surround Sound Microphone Techniques:

Acoustics II: Kurt Heutschi recording technique. stereo recording. microphone positioning. surround sound recordings.

How Does an Ultrasonic Sensor Work?

Virtual Mix Room. User Guide

III. Publication III. c 2005 Toni Hirvonen.

The future of illustrated sound in programme making

SPATIAL AUDITORY DISPLAY USING MULTIPLE SUBWOOFERS IN TWO DIFFERENT REVERBERANT REPRODUCTION ENVIRONMENTS

Virtual Sound Source Positioning and Mixing in 5.1 Implementation on the Real-Time System Genesis

Wave field synthesis: The future of spatial audio

The effect of 3D audio and other audio techniques on virtual reality experience

Comparison of binaural microphones for externalization of sounds

THE DEVELOPMENT OF A DESIGN TOOL FOR 5-SPEAKER SURROUND SOUND DECODERS

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

The relation between perceived apparent source width and interaural cross-correlation in sound reproduction spaces with low reverberation

Audio Engineering Society. Convention Paper. Presented at the 124th Convention 2008 May Amsterdam, The Netherlands

DIFFUSE-FIELD EQUALISATION OF FIRST-ORDER AMBISONICS

SOUND COLOUR PROPERTIES OF WFS AND STEREO

Synthesised Surround Sound Department of Electronics and Computer Science University of Southampton, Southampton, SO17 2GQ

Proceedings of Meetings on Acoustics

Convention Paper Presented at the 144 th Convention 2018 May 23 26, Milan, Italy

Externalization in binaural synthesis: effects of recording environment and measurement procedure

A Virtual Audio Environment for Testing Dummy- Head HRTFs modeling Real Life Situations

Sound localization with multi-loudspeakers by usage of a coincident microphone array

Browser Application for Virtual Audio Walkthrough

INTRODUCTION Headphone virtualizers are systems that aim at giving the user the illusion that the sound is coming from loudspeakers rather then from t

Psychoacoustic Cues in Room Size Perception

Spatial Audio & The Vestibular System!

A triangulation method for determining the perceptual center of the head for auditory stimuli

Potential and Limits of a High-Density Hemispherical Array of Loudspeakers for Spatial Hearing and Auralization Research

SpringerBriefs in Computer Science

AUDITORY ILLUSIONS & LAB REPORT FORM

Localization of 3D Ambisonic Recordings and Ambisonic Virtual Sources

Binaural room impulse response database acquired from a variable acoustics classroom

c 2014 Michael Friedman

Transcription:

Perceptual effects of visual images on out-of-head localization of sounds produced by binaural recording and reproduction Eiichi Miyasaka 1 1 Introduction Large-screen HDTV sets with the screen sizes over 37 inches have become widespread in Japan. 3DTV with passive or active special lenses have also launched this year 2010. Home theaters with surround sound systems have not been so widely spread in Japan although they have been widespread in USA, Australia, and Europe. One of the reasons is considered that room sizes are smaller in Japan than in the foreign countries mentioned above so that the 5.1 channel surround sound systems used in the home theater systems are hard to be set up because they require relatively a large space including the rear loudspeakers. A new surround sound system called as the 22.2ch surround sound system appropriate to the super-hdtv has been developed at NHK Laboratories 1 where the researchers are now trying to reduce the number of the loudspeakers used in the multi-channel system as small as possible with small impairments. Whereas multi-channel sound reproduction with loudspeakers localizes the sound images at areas in 3D audio space around the loudspeakers apart from a listener, binaural reproduction with a headphone localizes them in a head of the listener. But it can localize them out of head when the sounds are recorded through a dummy head. It has been widely recognized that the extent of outof-head localization of sounds recorded by an arbitrary dummy head and reproduced through a headphone will decrease, because there exists the differences in figure between the dummy head and the real head of a listener. HRTF Head-Related-Transfer-Function, the transfer function between an eardrum of a listener and a sound source, is one of the useful tools which improve the externalization of out-ofhead localization. It is, however, dependent of individual torso, head, shoulders and pinnae because it captures the diffraction of a sound wave for a certain angle of incidence 2. So, it will be necessary to introduce the modified HRTF to the system in order to improve the localization accuracy. Not a few researchers are now strenuously studying enormous amounts of accurate measurements of individual HRTFs. The difficulties of introduction of accurate HRTFs to the systems because of dazzling variety of HRTFs are considered to prevent any success of the externalization of out-of-head localization 3. It has been also recognized that it will be difficult for a listener to identify the sound images at the same positions in a free air as those intended by the creators or the directors if no visual images are presented at the same time. Various sound images intended to be localized at different areas in a free air in multi-channel radio drams making use of binaural recording and reproduction, are frequently perceived by listeners with headphones at the different positions from those intended by the directors because they have no background knowledge of the drams. 1 Professor, Faculty of Environmental and Information Studies, Tokyo City University 78

One of the possible systems which will overcome the problems mentioned above will be a new system combined with a head-mounted HDTV display and a multi-channel headphone. This system requires no physical space to establish multi-channel audio-visual images. It will be successful if visual images will facilitate externalization of out-of-head localization of sounds, resulting in perceiving the sounds as if they come from the corresponding visual images. This paper presents some trials of perceptual experiments whether visual images will be effective for externalization of out-of-head localization of sounds which are binaurally recorded and reproduced though a headphone. 2 Experiment 1 2.1 Visual stimuli with the sounds Table 1 shows the stimuli used in the experiment. Seven visual stimuli accompanied with the sounds are prepared. The stimuli ST 1 to ST 6 are moving pictures, while the rest ST 7 is a mobile phone with a flash-lamp. Table 1 List of the stimuli used in the experiment stimulus ST 1 ST 2 ST 3 ST 4 ST 5 ST 6 ST 7 visual stimulus A golf ball rolls from right to left on a desk A golf ball rolls from front to back depth on a desk A golf ball bounces at a center point on a desk A ping-pong ball bounces from right to left on a desk A ping-pong ball bounces from front to back on a desk A golf ball bounces at a center point on a desk mobile phone with a flash-lamp auditory stimulus The sound moves from right to left The sound moves from front to back The sound diminishes without movement The sound moves from right to left The sound moves from front to back The sound diminishes without movement The sound keeps a constant sound level The sounds recorded through a KU 100 dummy head manufactured by Georg Neuman GmBH were connected to a lap-top computer through an audio interface M-Audio Firewire410. 2.2 Experimental setup and the procedure Fig.1 shows the experimental setup. An observer with a ATH AD700 headphone Audio- Technica sits straight up in the chair on casters and watches the 24 inch display setup just in front of the observer. He/she is able to move freely with the chair to forward or to backward Fig.1 Experimental setup 79

along the line suggested in the range of 50cm apart from the display to 400cm. At first, observers are asked whether they can find any area or position where they perceive that the sounds come from the corresponding visual images on the display or not. When they can find the area, they are required to identify the range of the area, that is, 1 the nearest position to the display, 2 the farthest position from the display, and 3 the most appropriate position at which the sounds and the visual images are perceived naturally to meet to each other. On the other hand, when they can not find any area or position, they are required to answer where they perceive the sounds come from. Next, without any visual images, they are required to answer where the sounds come from. When they perceive the sounds out of their heads, they are asked to answer the perceived distances between the sounds and their heads, and the direction of the sounds. Ten students with normal vision and hearing participated the experiment as the observers. 2.3 Results 2.3.1 Effects of visual images Fig.2 show the number of the observers who answered that they perceived out-of-head localization of the sounds with or without the visual images. For ST 3, ST 4, ST 5, ST 6, no less than nine out of ten observers perceived out-of-head localization of the sounds with visual images, and the observers from two to six perceived the localization without visual images. Fig.2 The number of the observers who perceived out-of-head localization of the sounds with or without the visual images 2.3.2 Results for each stimulus Fig.3 consists of seven graphs showing the results of the experiment for each stimulus from ST 1 to ST 7. The abscissa indicates the observers and the ordinate indicates the distance from the display. The symbols used in the figures are as follows ; : the minimum nearest distance from the display at which an observer perceives the sounds come from the visual images. : the maximum farthest distance from the display at which an observer perceives the sounds come from the visual images. the most appropriate position at which the sounds and the visual images are perceived naturally to meet to each other. 80

Fig.3 Results of the experiment for each stimulus The abscissa indicates the observers while the ordinate is the distance from the display The symbol means minimum distance from the display at which an observer perceives the sounds come from the visual images means the most appropriate distance means the maximum distance 81

1 ST 1 Half of ten observers could perceive out-of-head localization of the sounds with the visual images although no observers could out-of-head localization of the sounds without any visual images. The averaged appropriate distance is about 150cm to 200cm apart from the display. 2 ST 2 Half of the observers could perceive out-of-head localization of the sounds with the visual images. Two of them the observer G and I, however, could only perceive out-of-head localization of the sounds for ST 1 or ST 2. 3 ST 3 All observers perceived out-of-head localization of the sounds with the averaged appropriate distance of 100 150cm from the display. 4 ST 4 Nine observers perceived out-of-head localization of the sounds with the averaged appropriate distance of 100 250cm from the display. The observer D could not perceive out-ofhead localization for ST 1, ST2 and ST 4. 5 ST 5, ST 6 All observers perceived out-of-head localization of the sounds with the averaged appropriate distance of 100 200cm from the display. 6 ST 7 While only three observers could perceive out-of-head localization of the sounds with visual images, any observers could not perceive the localization of the sound without the visual images. The sounds were produced at a fixed point on the cell body, and the moving scene in the visual images is flashing of the lamp attached to the cell body. Such a stimulus as mentioned above will be considered to be difficult to bring out-of-localization. 2.3.3 Positions or direction of localization of the sounds For the following stimuli, ST 3, ST4, ST 5, and ST 6, almost all observers perceived out-ofhead localization of the sounds when they were presented with the visual images although less than half the observers could perceive out-of-head localization without visual images. They localized the sound images forward or above or backward area in-the-head when the sounds were presented accompanied by no visual images. These results indicate that the visual images used in the stimuli from ST 3 to ST 6 will be able to facilitate perception of out-of-head localization. 2.3.4 Sizes of the visual images The position of a movie camera with which the visual images were taken can be one of the important parameters which will influence perception of localization. An additional experiment was executed. Two types of the positions used in the experiment were 1m and 4m apart from the object accompanied with the sounds. Fig.4 shows one of the results. The abscissa indicates the stimulus ST 1m indicates the visual image taken at 1m apart from the object where a ping-pong ball bounces from right to left on a desk, while ST 4m indicates the visual image at 4m apart from the same object. The ordinate indicates the number of the observers who selected one of the two stimuli on the basis of the extent of naturalness between the sounds and the visual images. The result shows that the ST 4m is more natural than ST 1m because the former images present more visual information 82

which will facilitate the visual position at which the sounds will be produced resulting in out-ofhead localization. Fig.4 The number of the observers who selected one of the two stimuli on the basis of the extent of naturalness between the sounds and the visual images 3 Discussion and conclusion It is clear that it will be difficult to realize out-of-head localization only by auditory stimuli. Introduction of HRTF will be effective on the realization if it could be exactly close to the listener s HRTF although it is clearly difficult to execute precise measurements of HRTF. The results in this experiment imply that effective visual images will facilitate out-of-head localization of the sounds. As shown in the Fig.3, however, some visual images produce few effects on the out-ofhead localization. It is remained to be solved how any visual images facilitate out-of-head localization. An audio-visual system introduced a generic HRTF, consisting of binaural audio reproduction system and the visual system may be one of the effective systems. A display size will also be one of the important parameters which will facilitate out-of-head localization although the size was fixed to be 24 inches in this experiment. The author is now planning to test the effectiveness of the display sizes including 50 inches as well as 24 inches. Acknowledgement The author expresses thanks to Mr. Tomoyuki Fujii, Mr. Yasuaki Abe and Mr. Yuji Sato who assisted the author to perform the experiments and gather the experimental data. A part of this research was supported by HBF Hoso Bunka Foundation, 2009. References 1 K. Hamasaki, T. Nishiguchi, R. Okumura, Y. Nakayama and A. Ando ; 22.2 Multichannel Sound System for Ultra High-Definition TV, SMPTE Technical Conference Publication, 2007 2 M. Noistering, T. Musil, A. Sontacchi and R. Hoeldrich, 3D Binaural Sound Reproduction using a Virtual Ambisonic Approach, International Symposium on Virtual Environments, Human-Computer Interfaces, and Measurement Systems, 27 29 2003 3 Y. Iwaya, Individualization of head-related transfer functions with tournament-style listening test : Listening with other s ears, Acoust. Scie. & Tech., 27, 340 343 2006 83