URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. 3D and Virtual Sound. Paris Smaragdis. paris.cs.illinois.

Similar documents
Auditory Localization

Sound source localization and its use in multimedia applications

Introduction. 1.1 Surround sound

Acoustics Research Institute

Listening with Headphones

Envelopment and Small Room Acoustics

Computational Perception. Sound localization 2

Binaural Hearing. Reading: Yost Ch. 12

Accurate sound reproduction from two loudspeakers in a living room

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 VIRTUAL AUDIO REPRODUCED IN A HEADREST

The analysis of multi-channel sound reproduction algorithms using HRTF data

University of Huddersfield Repository

Spatial Audio Reproduction: Towards Individualized Binaural Sound

Spatial audio is a field that

3D Sound System with Horizontally Arranged Loudspeakers

THE TEMPORAL and spectral structure of a sound signal

c 2014 Michael Friedman

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015

Sound Source Localization using HRTF database

BINAURAL RECORDING SYSTEM AND SOUND MAP OF MALAGA

FREQUENCY RESPONSE AND LATENCY OF MEMS MICROPHONES: THEORY AND PRACTICE

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

From Binaural Technology to Virtual Reality

PERSONALIZED HEAD RELATED TRANSFER FUNCTION MEASUREMENT AND VERIFICATION THROUGH SOUND LOCALIZATION RESOLUTION

Computational Perception /785

Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model

Speech Compression. Application Scenarios

SOUND 1 -- ACOUSTICS 1

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA. Why Ambisonics Does Work

Synthesised Surround Sound Department of Electronics and Computer Science University of Southampton, Southampton, SO17 2GQ

A Virtual Audio Environment for Testing Dummy- Head HRTFs modeling Real Life Situations

Psychoacoustic Cues in Room Size Perception

3D Audio Systems through Stereo Loudspeakers

Surround: The Current Technological Situation. David Griesinger Lexicon 3 Oak Park Bedford, MA

Enhancing 3D Audio Using Blind Bandwidth Extension

The psychoacoustics of reverberation

Capturing 360 Audio Using an Equal Segment Microphone Array (ESMA)

The Why and How of With-Height Surround Sound

ORIENTATION IN SIMPLE VIRTUAL AUDITORY SPACE CREATED WITH MEASURED HRTF

Convention Paper 9870 Presented at the 143 rd Convention 2017 October 18 21, New York, NY, USA

Robotic Spatial Sound Localization and Its 3-D Sound Human Interface

University of Huddersfield Repository

IMGD 3xxx - HCI for Real, Virtual, and Teleoperated Environments: Human Hearing and Audio Display Technologies. by Robert W. Lindeman

Reproduction of Surround Sound in Headphones

Multi-Loudspeaker Reproduction: Surround Sound

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York

3D Sound Simulation over Headphones

Binaural Sound Localization Systems Based on Neural Approaches. Nick Rossenbach June 17, 2016

Multichannel Audio In Cars (Tim Nind)

TDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones and Source Counting

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4

PERSONAL 3D AUDIO SYSTEM WITH LOUDSPEAKERS

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. Audio DSP basics. Paris Smaragdis. paris.cs.illinois.

Approaching Static Binaural Mixing with AMBEO Orbit

Spatial Audio & The Vestibular System!

Audio Engineering Society. Convention Paper. Presented at the 131st Convention 2011 October New York, NY, USA

III. Publication III. c 2005 Toni Hirvonen.

Binaural hearing. Prof. Dan Tollin on the Hearing Throne, Oldenburg Hearing Garden

Upper hemisphere sound localization using head-related transfer functions in the median plane and interaural differences

Predicting localization accuracy for stereophonic downmixes in Wave Field Synthesis

A virtual headphone based on wave field synthesis

VIRTUAL ACOUSTICS: OPPORTUNITIES AND LIMITS OF SPATIAL SOUND REPRODUCTION

IMPULSE RESPONSE MEASUREMENT WITH SINE SWEEPS AND AMPLITUDE MODULATION SCHEMES. Q. Meng, D. Sen, S. Wang and L. Hayes

Virtual Mix Room. User Guide

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Analysis of Frontal Localization in Double Layered Loudspeaker Array System

DISTANCE CODING AND PERFORMANCE OF THE MARK 5 AND ST350 SOUNDFIELD MICROPHONES AND THEIR SUITABILITY FOR AMBISONIC REPRODUCTION

Master MVA Analyse des signaux Audiofréquences Audio Signal Analysis, Indexing and Transformation

Sound Processing Technologies for Realistic Sensations in Teleworking

Sound localization Sound localization in audio-based games for visually impaired children

A binaural auditory model and applications to spatial sound evaluation

Proceedings of Meetings on Acoustics

Binaural Audio Project

HRTF adaptation and pattern learning

EE1.el3 (EEE1023): Electronics III. Acoustics lecture 20 Sound localisation. Dr Philip Jackson.

Live multi-track audio recording

Measuring impulse responses containing complete spatial information ABSTRACT

Waves Nx VIRTUAL REALITY AUDIO

NAME STUDENT # ELEC 484 Audio Signal Processing. Midterm Exam July Listening test

Study on method of estimating direct arrival using monaural modulation sp. Author(s)Ando, Masaru; Morikawa, Daisuke; Uno

Chapter 6: Room Acoustics and 3D Sound Processing

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

Virtual Sound Source Positioning and Mixing in 5.1 Implementation on the Real-Time System Genesis

3D sound image control by individualized parametric head-related transfer functions

3D audio overview : from 2.0 to N.M (?)

THE DEVELOPMENT OF A DESIGN TOOL FOR 5-SPEAKER SURROUND SOUND DECODERS

Localization of the Speaker in a Real and Virtual Reverberant Room. Abstract

Auditory Distance Perception. Yan-Chen Lu & Martin Cooke

Virtual Acoustic Space as Assistive Technology

Perceptual effects of visual images on out-of-head localization of sounds produced by binaural recording and reproduction.

REAL TIME WALKTHROUGH AURALIZATION - THE FIRST YEAR

Wave field synthesis: The future of spatial audio

Designing Information Devices and Systems I Spring 2015 Homework 6

Is My Decoder Ambisonic?

Towards a generalized theory of low-frequency sound source localization

B360 Ambisonics Encoder. User Guide

Convention e-brief 400

HRIR Customization in the Median Plane via Principal Components Analysis

Virtual Reality Presentation of Loudspeaker Stereo Recordings

CHAPTER ONE SOUND BASICS. Nitec in Digital Audio & Video Production Institute of Technical Education, College West

Transcription:

UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab 3D and Virtual Sound Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu

Overview Human perception of sound and space ITD, IID, HRTFs, and all that 3D audio Measuring HRTFs Synthesizing 3D audio Virtual audio Synthesizing virtual audio 2

What is 3D audio? Fooling a listener that a sound is coming from a specific location around them Two ways to get it: Easy: Using headphones Hard: Using speakers 3

What is virtual audio? Modeling the effects of being in a virtual environment Includes 3D audio effects Also includes room effects Also includes additional environmental effects 4

Why bother? Entertainment Immersive gaming, 3D movies, virtual worlds, Practical Help listeners parse more audio streams simultaneously Help users localize multiple sources e.g. pilot discussions in place cockpits For grabbing people s attention E.g. in auditory display interfaces 5

A bit of hearing theory In order to synthesize 3D audio we need to know how to fool the human ear What are the cues that we need to use? And how do we implement them? Lots of levels of complexity 6

On having two ears Why are our ears on the sides of our head? Why not one on the chin and one on the forehead? Horizontal placement maximizes the effect of picking sounds over a terrain Good for left/right Not so good for up/down 7

Fundamentally different than vision Unlike our eyes that directly perceive 3D, our ears have to get that computed in the brain Special neural circuits in the Superior Olivary Complex (SOC) compare signals from both ears 8

The Duplex Theory Formulated by Lord Raleigh (1907) A listener s ears receive a sound with some minor differences which act as localization cues The two main cues Interaural Time Differences (ITD) Interaural Intensity/Level Differences (IID, or ILD) 9

Interaural Time Differences (ITD) Simplest possible cue Relative time difference between a sound reaching our ears Sounds familiar? 10

ITD tradeoffs Perceiving ITDs is increasingly more unreliable with higher frequencies Historically the cutoff was set to 1.5kHz (any guess why?) But we also perform ITD with the envelopes of sounds that we hear So we use higher frequencies as well 11

How we will model it We can simulate ITDs with delays Similar idea to the mic array steering vector There will be an upper limit to the delay What is it? 1 Left ear 0.5 0 10 20 30 40 50 60 70 80 90 100 110 Right ear 1 0.5 12 0 10 20 30 40 50 60 70 80 90 100 110

One more thing The precedence effect (a.k.a. Haas effect) Up to 40msec delays register as an ITD More than that and we form echo percepts 0 0.6 1.5 10 40 Approximate delay time to left channel (msec) 13

Interaural Intensity Differences (IID) For wavelengths smaller than the listener s head we observe sound absorption High frequencies get attenuated Low frequencies pass mostly unharmed Level differences in high frequencies are a very strong cue to help us localize sounds They are called IIDs, or ILDs For intensity or level 14

IID tradeoffs IIDs mostly apply to wavelengths shorter than the head of the listener About a 1.5kHz cutoff Lower frequencies diffract around the head IIDs work better when the sound source is off the plane between the two ears Otherwise there is no relative head shadowing What s an example location? 15

How can we model it? Easy to model using gain between ears The panpot model Ignores frequency dependencies (more later) Can be implemented as a filter Left ear 1 0.5 0 10 20 30 40 50 60 70 80 90 100 110 Right ear 1 0.5 16 0 10 20 30 40 50 60 70 80 90 100 110

Lateralization ITDs and IIDs tend to produce lateralization The percept of a sound on the axis between ears Inside the head effect Useful for studying perception But not quite 3D sound 17

Combining ITDs and IIDs We can very simply combine both cues This will give us a rudimentary 3D system Each ear gets a filter Filter imposes a time delay for ITD And a gain factor for the ILD Demo! 18

Cones of confusion There are parts of space that will result in the same ITD and IID values We cannot distinguish sounds from these locations At least not well In real-life we resolve that by moving our heads x b a y 19

Zoological intermission The Barn Owl Hunts through hearing in the dark Can shape its face to funnel sound towards its ears Has asymmetrical ears Can use ITDs for horizontal, and IIDs for vertical localization 20

Entomological intermission The Ormia Ochracea Finds host crickets through hearing Very good at localization! Ears are 0.5mm close How does it use ITD/IID? Coupled eardrums create new cues Currently used as model for new mics 21

One cue to rule them all! ITDs and ILDs can be insufficient Very simple model of environment Our ears adapt to localize and are in fact a lot smarter Head Related Transfer Functions (HRTFs) Incorporating more, and finer cues for localization 22

What to HRTFs capture? Many effects relating to our body Funneling by the ears, reflections off our shoulders, sound absorption from head, effects from hair, They also incorporate ITDs and ILDs 1.5 1 0.5 0 0.5 1 x 10 4 HRTF of sound from the right Left ear Right ear 1.5 23 2 0 0.5 1 1.5 2 2.5 Time (msec)

How do they look like? Sweep from front to back (right side) Time (msec) Left ear 2.5 2 1.5 1 0.5 Time (msec) 2.5 2 1.5 1 0.5 Right ear 0 0 50 100 150 Azimuth (degrees) 0 0 50 100 150 Azimuth (degrees) 24

How do they look like? Sweep from front to back (right side) Frequency (khz) 22 20 18 16 14 12 10 8 6 4 2 Left ear #10 4 7 6 5 4 3 2 1 Frequency (khz) 22 20 18 16 14 12 10 8 6 4 2 Right ear #10 4 12 10 8 6 4 2 25 0 50 100 150 Azimuth (degrees) 0 50 100 150 Azimuth (degrees)

How do they look like? Sweep from down to up on the right Frequency (khz) 20 15 10 5 Left ear #10 4 5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 Frequency (khz) 20 15 10 5 Right ear #10 4 12 10 8 6 4 2 0 0 20 40 60 80 Elevation (degrees) 0 0 20 40 60 80 Elevation (degrees) 26

How good are HRTFs? Each person has a different head/torso shape We often just use average HRTFs They won t work for everyone Being average helps in this case! But how to we get HRTFs? 27

Solution 1: Binaural recordings Use a dummy head to make 3D recordings Or stick microphones in your ears (but please don t stick anything in your ears!!) 28

Solution 2: Measure real HRTFs If we measure real HRTFs we can then use them on arbitrary sounds to make 3D audio Just apply them as filters to generate left/right/signals Two ways to measure HRTFs Measure a dummy head s HRTF Should be an average set Measure your own HRTFs You then have a personalized copy 29

How do we measure HRTFs? Same process as measuring room responses Setup microphones in dummy of human subject Play MLS from different locations For each location measure the transfer function You should remove the speaker/mic functions though Pro tip You should do that in an anechoic chamber Why? 30

In math We record: y [t]= h [t] x[t] θ,φ θ,φ Y [ω]= H [ω]x[ω] θ,φ θ,φ We deconvolve with: H θ,φ [ω]= X * [ω]y θ,φ [ω] We remove speaker/mic responses Use inverse filters of these responses How do we measure these? 31

One complication This requires some serious lab space 32

One more complication We measure the transfer function from the source location to inside the ear What will convolution with an HRTF give us? How do we reproduce it to sound as being 3D? Only works with headphones/earphones Does not compensate for effects from distant loudspeakers 33

Synthesizing 3D audio Pick a location to position a source Usually azimuth/elevation Select appropriate filters from HRTF set Note that there is left.right symmetry so there is no need to keep all of the HRTFs Filter sound to model 3D effects What about moving sounds? 34

Fast convolution reminder Convolution can be sped up significantly using the FFT Perform convolution in the frequency domain Complexity drops to 2 N log 2 N z = x y DFT z But is this useful for out case? ( ) = DFT x ( ) DFT y No, results in very large FFTs, doesn t allow for changing filters Using the STFT for convolution instead ( ) Convolve each STFT frame with the desired filter at that time 35

Overlap-add fast convolution Similar to spectrograms Step 1: Make frames Zero pad to accommodate convolution s output length Hop size == frame size Do not window Step 2: Convolve frames using FFTs i.e. multiply complex spectra Multiply each STFT frame with the DFT of the desired filter Step 3: Invert back to time Use overlap and add! Do not window 36

Usual problems Response mismatch People with funny head shapes Poor reproduction (e.g. bad headphones, MP3s) Front/back confusion Really prominent for many people Head movements Chance the relative angle of a source 37

Compensating for head movement We can track the listener s head movements Using a simple sensor on the headphones Or using computer vision to measure head pose This allows us to find the angle between the virtual source and the rotated use head One drawback: Time lag One advantage: We can resolve localization ambiguities We use head movements to deal with ambiguities 38

What about speakers We need to perform crosstalk cancellation Use negative signals to construct HRTF filtering Listener t stereo loudspeakers t What are the complications here? 39

Complications with speaker systems Head movements We need to compensate for moving ears! Not trivial to cater to multiple people simultaneously E.g. you won t get 3D sound in a movie theater Room effects Speaker output gets convolves with room and speakers Difficult to compensate for all that 40

Moving towards virtual sound 3D sound models source-to-ear effects Created 3D percept, but this is not the whole story There are more cues that we use to localize Movement cues, distance cues, context cues, Proper virtual audio also models these cues 41

Movement cues Moving sources exhibit an additional important cue for localization The Doppler effect 42

Modeling the doppler effect Variable delay lines We can read off a delay line with interpolation Sort of like changing the sample rate Tricky to get good interpolation More later in the semester 43

Distance cues We can also perceive how far a sound is Static cues Level, amount of reverberation Dynamic cues Change of source angle by head translation 44

And some more context cues Room acoustics Sounds in different parts of a room sound different We can use HRTF filter on all the reflections Overkill, but makes a difference And we know how to do that now! :) 45

Virtual sound can be complicated Lots effects that combine Not completely clear which are necessary Depends on usage scenario Also not fully clear how they all interact Still an open problem But sounds pretty good as is 46

Surround sound Potentially simpler approach Localization takes place using multiple speakers Optionally one can use sophisticated filtering Common setups 5.1 /7.1 sets Stereo surround Avoid like the plague! Ruins stereo imaging A virtual acoustic room setup 47

Theater surround sound Front channel for dialog Ensures consistent localization Side and rear channels for FX Also ambience sounds One of Dolby s claims to fame L C R screen 45 48

Recap Some of the basics of 3D perception HRTFs How to measure them How to use them Additional ties for virtual audio Surround sound 49

Reading material 3D Sound for Virtual Reality and Multimedia http://human-factors.arc.nasa.gov/publications/ Begault_2000_3d_Sound_Multimedia.pdf 50

Next lab Let s make some 3D sounds! Remember to bring your headphones/earphones You won t be able to hear the results otherwise 51