University of Huddersfield Repository

Similar documents
Capturing 360 Audio Using an Equal Segment Microphone Array (ESMA)

Psychoacoustics of 3D Sound Recording: Research and Practice

University of Huddersfield Repository

Development and application of a stereophonic multichannel recording technique for 3D Audio and VR

Convention Paper Presented at the 128th Convention 2010 May London, UK

The analysis of multi-channel sound reproduction algorithms using HRTF data

M icroph one Re cording for 3D-Audio/VR

Convention Paper 9869

Vertical Stereophonic Localization in the Presence of Interchannel Crosstalk: The Analysis of Frequency-Dependent Localization Thresholds

Sound localization with multi-loudspeakers by usage of a coincident microphone array

THE TEMPORAL and spectral structure of a sound signal

Sound source localization and its use in multimedia applications

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA. Why Ambisonics Does Work

Perceptual Band Allocation (PBA) for the Rendering of Vertical Image Spread with a Vertical 2D Loudspeaker Array

Introduction. 1.1 Surround sound

Multichannel Audio Technologies. More on Surround Sound Microphone Techniques:

Multi-Loudspeaker Reproduction: Surround Sound

Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model

A Virtual Audio Environment for Testing Dummy- Head HRTFs modeling Real Life Situations

Listening with Headphones

THE DEVELOPMENT OF A DESIGN TOOL FOR 5-SPEAKER SURROUND SOUND DECODERS

DIFFUSE-FIELD EQUALISATION OF FIRST-ORDER AMBISONICS

BINAURAL RECORDING SYSTEM AND SOUND MAP OF MALAGA

The relation between perceived apparent source width and interaural cross-correlation in sound reproduction spaces with low reverberation

The psychoacoustics of reverberation

Acoustics II: Kurt Heutschi recording technique. stereo recording. microphone positioning. surround sound recordings.

Spatial audio is a field that

Approaching Static Binaural Mixing with AMBEO Orbit

MONOPHONIC SOURCE LOCALIZATION FOR A DISTRIBUTED AUDIENCE IN A SMALL CONCERT HALL

Surround: The Current Technological Situation. David Griesinger Lexicon 3 Oak Park Bedford, MA

396 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011

Predicting localization accuracy for stereophonic downmixes in Wave Field Synthesis

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 VIRTUAL AUDIO REPRODUCED IN A HEADREST

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4

Choosing and Configuring a Stereo Microphone Technique Based on Localisation Curves

Auditory Localization

Spatial Audio Reproduction: Towards Individualized Binaural Sound

c 2014 Michael Friedman

Convention Paper 7057

Audio Engineering Society. Convention Paper. Presented at the 131st Convention 2011 October New York, NY, USA

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

A Toolkit for Customizing the ambix Ambisonics-to- Binaural Renderer

O P S I. ( Optimised Phantom Source Imaging of the high frequency content of virtual sources in Wave Field Synthesis )

A Comparison between Horizontal and Vertical Interchannel Decorrelation

Binaural Hearing. Reading: Yost Ch. 12

Localization of 3D Ambisonic Recordings and Ambisonic Virtual Sources

Virtual Acoustic Space as Assistive Technology

Acquisition of spatial knowledge of architectural spaces via active and passive aural explorations by the blind

PRELIMINARY INFORMATION

Introducing Twirling720 VR Audio Recorder

3D AUDIO AR/VR CAPTURE AND REPRODUCTION SETUP FOR AURALIZATION OF SOUNDSCAPES

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. 3D and Virtual Sound. Paris Smaragdis. paris.cs.illinois.

A binaural auditory model and applications to spatial sound evaluation

VST3 - Reaper manual, Mac and Windows

3D audio overview : from 2.0 to N.M (?)

Binaural auralization based on spherical-harmonics beamforming

Convention Paper Presented at the 144 th Convention 2018 May 23 26, Milan, Italy

Analysis of Frontal Localization in Double Layered Loudspeaker Array System

STÉPHANIE BERTET 13, JÉRÔME DANIEL 1, ETIENNE PARIZET 2, LAËTITIA GROS 1 AND OLIVIER WARUSFEL 3.

A Comparative Study of the Performance of Spatialization Techniques for a Distributed Audience in a Concert Hall Environment

The future of illustrated sound in programme making

Presented at the 102nd Convention 1997 March Munich,Germany

The vertical precedence effect: Utilizing delay panning for height channel mixing in 3D audio

Improving 5.1 and Stereophonic Mastering/Monitoring by Using Ambiophonic Techniques

ETSI TS V ( )

Analysis and Design of Multichannel Systems for Perceptual Sound Field Reconstruction

NEXT-GENERATION AUDIO NEW OPPORTUNITIES FOR TERRESTRIAL UHD BROADCASTING. Fraunhofer IIS

VIRTUAL ACOUSTICS: OPPORTUNITIES AND LIMITS OF SPATIAL SOUND REPRODUCTION

3D Sound System with Horizontally Arranged Loudspeakers

Convention Paper Presented at the 126th Convention 2009 May 7 10 Munich, Germany

DECORRELATION TECHNIQUES FOR THE RENDERING OF APPARENT SOUND SOURCE WIDTH IN 3D AUDIO DISPLAYS. Guillaume Potard, Ian Burnett

Measurements and reproduction of spatial sound characteristics of auditoria

Enhancing 3D Audio Using Blind Bandwidth Extension

Acoustics Research Institute

Simulation of wave field synthesis

Convention Paper 7480

Psychoacoustic Cues in Room Size Perception

Perception and evaluation of sound fields

A virtual headphone based on wave field synthesis

PSYCHOACOUSTIC EVALUATION OF DIFFERENT METHODS FOR CREATING INDIVIDUALIZED, HEADPHONE-PRESENTED VAS FROM B-FORMAT RIRS

Personalized 3D sound rendering for content creation, delivery, and presentation

IMPLEMENTATION AND APPLICATION OF A BINAURAL HEARING MODEL TO THE OBJECTIVE EVALUATION OF SPATIAL IMPRESSION

Source Localisation Mapping using Weighted Interaural Cross-Correlation

Accurate sound reproduction from two loudspeakers in a living room

REAL TIME WALKTHROUGH AURALIZATION - THE FIRST YEAR

Convention Paper Presented at the 142 nd Convention 2017 May 20 23, Berlin, Germany

Master MVA Analyse des signaux Audiofréquences Audio Signal Analysis, Indexing and Transformation

The Why and How of With-Height Surround Sound

3D Sound Simulation over Headphones

PERSONALIZED HEAD RELATED TRANSFER FUNCTION MEASUREMENT AND VERIFICATION THROUGH SOUND LOCALIZATION RESOLUTION

University of Huddersfield Repository

Perceptual assessment of binaural decoding of first-order ambisonics

New acoustical techniques for measuring spatial properties in concert halls

Spatial Audio & The Vestibular System!

PERSONAL 3D AUDIO SYSTEM WITH LOUDSPEAKERS

Speech Compression. Application Scenarios

Externalization in binaural synthesis: effects of recording environment and measurement procedure

Soundfield Navigation using an Array of Higher-Order Ambisonics Microphones

Convention Paper Presented at the 125th Convention 2008 October 2 5 San Francisco, CA, USA

Convention Paper Presented at the 130th Convention 2011 May London, UK

Transcription:

University of Huddersfield Repository Lee, Hyunkook Capturing and Rendering 360º VR Audio Using Cardioid Microphones Original Citation Lee, Hyunkook (2016) Capturing and Rendering 360º VR Audio Using Cardioid Microphones. In: AES Conference on Audio for Augmented and Virtual Reality, 30 Sep 1 Oct 2016, Los Angeles, USA. This version is available at http://eprints.hud.ac.uk/id/eprint/29582/ The University Repository is a digital collection of the research output of the University, available on Open Access. Copyright and Moral Rights for the items on this site are retained by the individual author and/or other copyright owners. Users may access full items free of charge; copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational or not for profit purposes without prior permission or charge, provided: The authors, title and full bibliographic details is credited in any copy; A hyperlink and/or URL is included for the original metadata page; and The content is not changed in any way. For more information, including our policy and submission procedure, please contact the Repository Team at: E.mailbox@hud.ac.uk. http://eprints.hud.ac.uk/

AES Audio for Virtual and Augmented Reality 2016 Capturing and Rendering 360 VR Audio using Cardioid Microphones Hyunkook Lee h.lee@hud.ac.uk Applied Psychoacoustics Lab (APL) University of Huddersfield, UK

Motivation Near-coincident mic arrays ORTF, NOS, etc. Arguably, preferred to pure coincident or pure spaced techniques by most professional recording engineers. Rely on the trade-off between Time and Level differences. Best of both worlds (Localisability & Spaciousness). Cardioid microphones Most popular. Most widely available. Record for VR using favourite cardioid mics arranged in a near-coincident fashion? 2

Contents Research background Localisation test in loudspeaker reproduction Localisation test in binaural reproduction Discussion Summary 3

4 Research Background

Existing methods for VR audio capture First Order Ambisonics (FOA) Pros Very good localisability due to the coincident nature (But not necessarily good localisation accuracy ). Virtual microphones from flexible decoding. Compact. Cons High interchannel correlation. Lack of spaciousness. Comb-filtering and rapid change in image position even with a small head movement. 5

Existing methods for VR audio capture Higher Order Ambisonics (HOA) Pros Higher spatial resolution. More accurate localisation. Cons Requires a large number of channels for a proper decoding. N = (M + 1) 2 Very expensive. Tonal quality. Spaciousness? 6

Existing methods for VR audio capture Quad Binaural Pros Direct pinnae filtering. No need for extra binaural synthesis. Cons Inaccurate localisation and comb-filtering due to crossfading between ear signals. Not possible to use personal HRTFs. Only for horizontal head rotation. Expensive. 7

Psychoacoustic considerations for VR In VR, it is important to match the actual and perceived source positions. Recording Reproduction -45 Binauralisation -45 Mic Array 8

Psychoacoustic considerations for VR The perceived source position should stay the same as the head rotates. Recording Reproduction -45 Binauralisation +135 Mic Array 9

Psychoacoustic considerations for VR The perceived source position should stay the same as the head rotates. Recording Reproduction -45 Binauralisation +45 Mic Array 10

Psychoacoustic considerations for VR Limitation of FOA Quadraphonic Cardioid decoding. -45 Quadraphonic +45 FOA decoding Quadraphonic playback -135 +135 11

Psychoacoustic considerations for VR Limitation of FOA Only 6dB ICLD (interchannel level difference) for the front pair for a source at 45. à Not sufficient for a full phantom image shift to 45. -45 +45 ICLD = 6dB -135 +135 12

Psychoacoustic considerations for VR Limitation of FOA Another 6dB ICLD for the left pair. The image is perceived almost at the front left speaker (mainly one ear à no effective interaural difference) -45 +45 ICLD = 6dB -135 +135 13

Psychoacoustic considerations for VR Limitation of FOA The resulting image position in the quadraphonic reproduction is still not fully shifted to 45. -45 +45 ICLD = 6dB ICLD = 6dB -135 +135 14

Psychoacoustic considerations for VR Problems of B-format (FOA) binauralisation for VR Inaccurate localisation due to insufficient ICLD. The image follows you when you rotate the head. Recording Reproduction 45 Binauralisation FOA 15

16 Proposed Technique

Design philosophy Equal Segment Microphone Array (ESMA) A design concept proposed by Williams (1991), but for 360 multichannel reproduction. Requirements 1. Equal subtended angle for all stereo segments (±45 ). 2. The stereophonic recording angle (SRA) of each segment should match the subtended angle of the segment. (±45 ) ±45 17

Design philosophy IRT-Cross by Theile Originally designed for ambience capture. d = 20 to 25cm. ORTF-Surround (or 3D) SRA not consistent for every segment. Not suitable for ESMA. 110 70 18

Design philosophy BBC Proms using ORTF 3D 19

Design philosophy The SRA of ±45 for quadraphonic ESMA à A source at ±45 in recording should be localised at ±45 in reproduction. SRA = ±45 20

Design philosophy The SRA of ±45 for quadraphonic ESMA à A source at ±45 in recording should be localised at ±45 in reproduction. Binauralisation 21

Design philosophy Suitable for VR applications with head-tracking. Binauralisation 22

Psychoacoustic basis The appropriate spacing between microphones to produce the ±45 SRA? Depends on what psychoacoustic time-level trade-off model is used for calculating the SRA. Model Microphone spacing Source to mic array distance Williams 23.8cm unknown Sengpiel 25cm unknown Wittek + Theile 24cm 2m Lee + Theile 30cm 2m Lee 50cm 2m Based on ICTD and ICLD data obtained using ±30 setup Optimised for ±45 setup 23

Designing a near-coincident VR mic array Linear time-level trade-off functions (Lee 2016) Shift region dependent. Loudspeaker base angle dependent. 1.0a Interchannel Time Difference (ICTD) in ms 0.5a 0.25a 5 20 15 10 25 30 ICTD and ICLD image shift factors change in proportion to the change of ITD and ILD. Shift factors for ±45 base angle. 8.8%/0.1ms; 6%/dB (< 30 ) 4.4%/0.1ms; 3%/dB (30-45 ). 4.25b 8.5b 17b Interchannel Level Difference (ICLD) in db 24

25 Experiments

Aim To evaluate the localisation accuracies of the quadraphonic FOA and ESMA. If the SRA of ±45 can be achieved. Loudspeaker and headphone reproduction tests in simulated head rotation scenarios. Microphone spacing tested: 0cm (FOA) 24cm (Wittek + Theile) 30cm (Lee + Theile) 50cm (Lee) 26

Stimuli creation Recording setup 0 - ITU-R BS.1116 standard room. 315 45-8 Genelec 8040As arranged in an octagonal layout. 2m - Room impulse responses (RIRs) captured for 0 and 45. 270 Mic array 90 - Soundfield SPS 422b for FOA. - Neumann KM184 for ESMA. 225 135 180 27

Stimuli creation Stimuli for Experiment 1 (Loudspeaker playback) An anechoic speech signal was convolved with the direct sounds of the RIRs (reflections removed). Target position for 0 source 0 Mic 1 Mic 2 0 head rotation Head rotations simulated for 0, ±45, ±90, ±135 and ±180 (Soundfield rotation). Mic 4 Mic 3 28

Stimuli creation Stimuli for Experiment 1 (Loudspeaker playback) An anechoic speech signal was convolved with the direct sounds of the RIRs (reflections removed). 0 Mic 1 Mic 2 Simulating 45 head rotation Mic 3 Head rotations simulated for 0, ±45, ±90, ±135 and ±180 (Soundfield rotation). Mic 4 29

Stimuli creation Stimuli for Experiment 1 (Loudspeaker playback) An anechoic speech signal was convolved with the direct sounds of the RIRs (reflections removed). 0 Mic 2 Mic 3 Simulating 90 head rotation Head rotations simulated for 0, ±45, ±90, ±135 and ±180 (Soundfield rotation). Mic 1 Mic 4 30

Stimuli creation Stimuli for Experiment 1 (Loudspeaker playback) Mic 3 An anechoic speech signal was convolved with the direct sounds of the RIRs (reflections removed). Mic 2 Simulating 135 head rotation Mic 4 Head rotations simulated for 0, ±45, ±90, ±135 and ±180 (Soundfield rotation). 0 Mic 1 31

Stimuli creation Stimuli for Experiment 1 (Loudspeaker playback) An anechoic speech signal was convolved with the direct sounds of the RIRs (reflections removed). Target position for 45 source 45 Mic 1 Mic 2 0 head rotation Head rotations simulated for 0, ±45, ±90, ±135 and ±180 (Soundfield rotation). Mic 4 Mic 3 32

Stimuli creation Stimuli for Experiment 1 (Loudspeaker playback) Mic 2 An anechoic speech signal was convolved with the direct sounds of the RIRs (reflections removed). 45 Mic 1 Simulating 45 head rotation Mic 3 Head rotations simulated for 0, ±45, ±90, ±135 and ±180 (Soundfield rotation). Mic 4 33

Stimuli creation Stimuli for Experiment 1 (Loudspeaker playback) An anechoic speech signal was convolved with the direct sounds of the RIRs (reflections removed). Mic 2 Mic 3 Simulating 90 head rotation Head rotations simulated for 0, ±45, ±90, ±135 and ±180 (Soundfield rotation). Mic 1 45 Mic 4 34

Stimuli creation Stimuli for Experiment 2 (Binaural playback) Same conditions as Experiment 1, but with the full RIRs (reflections included). Mic 1 Mic 2 The multichannel stimuli were binauralised with dry KU100 dummy head HRIRs from the SADIE database (Kearney 2015). Mic 4 Mic 3 35

Listening tests Experiment 1 (Loudspeaker playback) 315 0 45 Loudspeakers hidden by acoustically transparent curtains. Small markers were placed on the curtain from 0 with 22.5 intervals. 270 90 70dBA playback level. 225 135 36

Listening tests Experiment 1 (Loudspeaker playback) 315 0 45 9 experienced subjects repeated each test twice. The task was to mark down the perceived image position on a horizontal circle on a GUI with markers indicated with 22.5 intervals. 270 225 135 90 37

Listening tests Experiment 2 (Binaural playback) 315 0 45 The same room, subjects, task and method as Experiment 1. Equalised Sennheiser HD650 headphones were used. 270 90 Loudness matched to the playback levels of multichannel stimuli. 225 135 38

Results Loudspeaker experiment 0 source position 0 and 180 target: accurate for all arrays. 45 target: statistically accurate for 50, 30 and 24cm, but not for 0cm (Wilcoxon tests). 90 target: front-back confusion (cone-of-confusion) in general. 135 target: significantly bimodal for 0 and 30cm. 39

Results Loudspeaker experiment 45 source position 0 target: accurate for all arrays. 45 target: accurate only for 50cm. 90 target: accurate except for 0cm (sig. bimodal). 135 target: accurate except for 0cm (MED = 152 ). 180 target: accurate only for 50cm. 40

Results Binaural experiment 0 source position 0 target: significant bimodality for all arrays. 45 target: significant bimodality for 50cm. 90 target: significant bimodality except for 50cm. 135 target: significantly bimodal for all arrays. 180 target: accurate except for 30cm. 41

Results Binaural experiment 45 source position 0 target: bimodal (50cm & 30cm); inaccurate (24cm & 0cm). 45 target: accurate for 50 and 24cm. MED = 27 for 0cm. 90 target: significant bimodality for 0cm. 135 target: accurate only for 50cm. 180 target: accurate only for 50cm and 24cm. 42

Results Real source Loudspeaker Loudspeaker: accurate for all source angles. Binaural Binaural responses are generally more spread than loudspeaker ones. 0 : significantly bimodal. 45 : inaccurate, MED = 52. 90, 135 : accurate. 180 : inaccurate, bimodal. 43

Discussion Microphone spacing effect 0cm had the worst localisation performance overall. Significant bimodal distributions for many target angle conditions. Perceived to be significantly narrower for the 45 source in both loudspeaker (MED = 30 ) and binarual (MED = 27 ). 50cm was the only spacing that achieved the SRA of ±45. Seems to validate the new psychoacoustic model. 50cm had slightly better consistency and accuracy than the other configurations overall. But a smaller size might be more beneficial in practical situations. Practical importance of localisation accuracy in VR? 44

Discussion Source angle effect The 0 source produced larger response spreads and more bimodal distributions than the 45. Front-back confusion (Cone of confusion), especially for the 90 target angle. Lateral phantom image localisation is highly unstable (Theile and Plenge 1977, Martin et al 1999). 45

Discussion Loudspeaker vs. Binaural Front-back confusion was more frequently observed in the binaural presentation, but not in the loudspeaker one. The binaural presentation had more spread responses. Real source results also show similar tendencies for the 0 and 45. Might be due to the use of non-individualised HRTF, rather than the microphone arrays. But more about the lack of head movement? FB confusion can occur even with individualised HRTF when head rotation is not allowed (Wightman and Kistler 1999). The FB confusion problem might be largely resolved in practical VR applications with head tracking. 46

Discussion Higher Order ESMA For an octagonal setup, each segment should have the SRA of 45 (±22.5 ). Can potentially solve the problem of unstable side image localisation. Mic spacing d Williams: 82cm Lee: 55cm 47

Discussion Adding vertical dimension to ESMA Cardioid + Figure-of-eight in a vertically coincident fashion. Vertical Mid-Side decoding. Vertical microphone spacing has little effect on LEV (Lee and Gribben JAES 2014). Vertical level panning can provide source imaging with a limited resolution (Barbour 2003, Mironovs and Lee 2016). Vertical time panning is highly unstable (Wallis and Lee JAES 2015). 48

Conclusions ESMAs had a better localisation accuracy than FOA. 50cm spacing had the best localisation accuracy, but 30cm or 24cm might still be acceptable. Front-Back confusion in binaural reproduction without head rotation. Ongoing works Investigations on different attributes. 49 Externalisation, tonal quality, spaciousness, naturalness, etc. Practical evaluations with head tracking.

Thank you for listening. Hyunkook Lee h.lee@hud.ac.uk 50