Convention e-brief 400

Similar documents
HEAD-TRACKED AURALISATIONS FOR A DYNAMIC AUDIO EXPERIENCE IN VIRTUAL REALITY SCENERIES

Listening with Headphones

HEAD-TRACKED AURALISATIONS FOR A DYNAMIC AUDIO EXPERIENCE IN VIRTUAL REALITY SCENERIES

Ivan Tashev Microsoft Research

Computational Perception /785

Sound Source Localization using HRTF database

Enhancing 3D Audio Using Blind Bandwidth Extension

Audio Engineering Society. Convention Paper. Presented at the 131st Convention 2011 October New York, NY, USA

Spatial Audio & The Vestibular System!

Virtual Acoustic Space as Assistive Technology

The psychoacoustics of reverberation

Measuring impulse responses containing complete spatial information ABSTRACT

Psychoacoustic Cues in Room Size Perception

Convention Paper 9870 Presented at the 143 rd Convention 2017 October 18 21, New York, NY, USA

Subband Analysis of Time Delay Estimation in STFT Domain

Sound source localization and its use in multimedia applications

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. 3D and Virtual Sound. Paris Smaragdis. paris.cs.illinois.

Using sound levels for location tracking

Spatial audio is a field that

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks

Auditory Localization

3D AUDIO AR/VR CAPTURE AND REPRODUCTION SETUP FOR AURALIZATION OF SOUNDSCAPES

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA. Why Ambisonics Does Work

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York

Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model

Virtual Mix Room. User Guide

CSC475 Music Information Retrieval

VIRTUAL ACOUSTICS: OPPORTUNITIES AND LIMITS OF SPATIAL SOUND REPRODUCTION

University of Huddersfield Repository

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

MEASURING DIRECTIVITIES OF NATURAL SOUND SOURCES WITH A SPHERICAL MICROPHONE ARRAY

Waves Nx VIRTUAL REALITY AUDIO

DISTANCE CODING AND PERFORMANCE OF THE MARK 5 AND ST350 SOUNDFIELD MICROPHONES AND THEIR SUITABILITY FOR AMBISONIC REPRODUCTION

Convention Paper Presented at the 139th Convention 2015 October 29 November 1 New York, USA

3D Sound System with Horizontally Arranged Loudspeakers

Extended Kalman Filtering

The analysis of multi-channel sound reproduction algorithms using HRTF data

MPEG-4 Structured Audio Systems

Nonuniform multi level crossing for signal reconstruction

13-3The The Unit Unit Circle

OCULUS VR, LLC. Oculus User Guide Runtime Version Rev. 1

Robotic Spatial Sound Localization and Its 3-D Sound Human Interface

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback

Predicting localization accuracy for stereophonic downmixes in Wave Field Synthesis

PERSONAL 3D AUDIO SYSTEM WITH LOUDSPEAKERS

Convention Paper Presented at the 124th Convention 2008 May Amsterdam, The Netherlands

Study on method of estimating direct arrival using monaural modulation sp. Author(s)Ando, Masaru; Morikawa, Daisuke; Uno

Binaural Hearing. Reading: Yost Ch. 12

TDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones and Source Counting

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4

A triangulation method for determining the perceptual center of the head for auditory stimuli

3D Sound Simulation over Headphones

Fourier Transform. louder softer. louder. softer. amplitude. time. amplitude. time. frequency. frequency. P. J. Grandinetti

6.1 - Introduction to Periodic Functions

A spatial squeezing approach to ambisonic audio compression

Acquisition of spatial knowledge of architectural spaces via active and passive aural explorations by the blind

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

3D sound image control by individualized parametric head-related transfer functions

Virtual Sound Source Positioning and Mixing in 5.1 Implementation on the Real-Time System Genesis

6-channel recording/reproduction system for 3-dimensional auralization of sound fields

A Toolkit for Customizing the ambix Ambisonics-to- Binaural Renderer

EE1.el3 (EEE1023): Electronics III. Acoustics lecture 20 Sound localisation. Dr Philip Jackson.

Binaural Sound Localization Systems Based on Neural Approaches. Nick Rossenbach June 17, 2016

A Virtual Audio Environment for Testing Dummy- Head HRTFs modeling Real Life Situations

Direction-Dependent Physical Modeling of Musical Instruments

Proceedings of Meetings on Acoustics

On the Validity of Virtual Reality-based Auditory Experiments: A Case Study about Ratings of the Overall Listening Experience

Abstract. 1. Introduction and Motivation. 3. Methods. 2. Related Work Omni Directional Stereo Imaging

THE SINUSOIDAL WAVEFORM

Waves C360 SurroundComp. Software Audio Processor. User s Guide

SOPA version 3. SOPA project. July 22, Principle Introduction Direction of propagation Speed of propagation...

Introduction. 1.1 Surround sound

Double-Angle, Half-Angle, and Reduction Formulas

Capturing 360 Audio Using an Equal Segment Microphone Array (ESMA)

SpringerBriefs in Computer Science

University of Huddersfield Repository

Unit 8 Trigonometry. Math III Mrs. Valentine

The Discrete Fourier Transform. Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido

Basic Signals and Systems

The Mathematics of the Stewart Platform

Blind source separation and directional audio synthesis for binaural auralization of multiple sound sources using microphone array recordings

Chapter 6: Periodic Functions

ECE438 - Laboratory 7a: Digital Filter Design (Week 1) By Prof. Charles Bouman and Prof. Mireille Boutin Fall 2015

PERSONALIZED HEAD RELATED TRANSFER FUNCTION MEASUREMENT AND VERIFICATION THROUGH SOUND LOCALIZATION RESOLUTION

Michael F. Toner, et. al.. "Distortion Measurement." Copyright 2000 CRC Press LLC. <

Soundfield Navigation using an Array of Higher-Order Ambisonics Microphones

MarineBlue: A Low-Cost Chess Robot

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

MECHANICAL DESIGN LEARNING ENVIRONMENTS BASED ON VIRTUAL REALITY TECHNOLOGIES

Linux Audio Conference 2009

Final Project: Sound Source Localization

Audio Engineering Society. Convention Paper. Presented at the 131st Convention 2011 October New York, NY, USA

Convention Paper Presented at the 125th Convention 2008 October 2 5 San Francisco, CA, USA

Digital Video and Audio Processing. Winter term 2002/ 2003 Computer-based exercises

29th TONMEISTERTAGUNG VDT INTERNATIONAL CONVENTION, November 2016

Exploring 3D in Flash

The key to a fisheye is the relationship between latitude ø of the 3D vector and radius on the 2D fisheye image, namely a linear one where

Spectrum Analysis: The FFT Display

Envelopment and Small Room Acoustics

7.1 INTRODUCTION TO PERIODIC FUNCTIONS

Transcription:

Audio Engineering Society Convention e-brief 400 Presented at the 143 rd Convention 017 October 18 1, New York, NY, USA This Engineering Brief was selected on the basis of a submitted synopsis. The author is solely responsible for its presentation, and the AES takes no responsibility for the contents. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Audio Engineering Society. Audio Localization Method for VR Application Joo Won Park 1 1 Columbia University Correspondence should be addressed to Joo Won Park (jp3378@columbia.edu ABSTRACT Audio localization is a crucial component in the Virtual Reality (VR projects as it contributes to a more realistic VR experience to the users. In this paper, a method to implement localized audio that is synced with user s head movement is discussed. The goal is to process an audio signal real-time to represent three-dimensional soundscape. This paper introduces a mathematical concept, acoustic models, and audio processing that can be applied for general VR audio development. It also provides a detailed overview of an Oculus Rift- MAX/MSP demo. 1 Introduction This paper introduces a method to localize audio in Virtual Reality (VR application. This paper uses MAX/MSP as the VR platform as the front-end development that brings processed audio and VR display in Oculus Rift together. A particular audience is people who intend to localize audio in their MAX/MSP VR projects so that the audio environment is synced with the VR user s head movement in an Oculus Rift- MAX/MSP setup. Extended audience is people who seek general method to easily implement user-synced audio in some VR platform. This paper covers an essential mathematical concept, quaternion, as well as mathematical modeling that creates a sense of threedimensional (3D auditory space. There are three parts to this paper: The first part is on acoustic modeling that creates three-dimensional auditory dimension. Methods to model Inter-aural Level Difference (ILD and Head-Related Impulse Response (HRIR convolution and interpolation are introduced. Note that the model is simplified by limiting rotation to the yaw axis on the horizontal plane. Such simplification allows easier quaternion algebra and acoustic modeling while it can be extended to rotation with elevation angle. The second part covers actual implementation in coding by using quaternion algebra. The last part presents the Oculus Rift-MAX/MSP VR demo as an example of a user-interactive audio environment in VR that uses the methods introduced in the paper. Acoustic Modeling Interaural Level Difference (ILD and Interaural Time Difference (ITD are the differences between the two ear signals that are most relevant for the localization of sound source on the horizontal plane.[1] HRIR s of the two ears describe this difference, thus serving as the cue for the sound location in terms of the azimuth angle. The sound in VR should accurately represent the change of the distance between the user s ears and the sound source the further the user is from the sound source, the softer the sound should be due to the spreading loss.[] HRIR s are only measured uniformly at 1 meter away from the sound source, so an additional model is needed to reflect the sound level change by the distance beyond 1 meter.

The following mathematical functions are acoustic models of sound level by distance. The sound level decreases non-linearly, as spherical spreading causes the level to decay much rapidly when further.[] Logarithmic function is applied for level decay in shorter distance (distance 15 to have a concave down function, and inverse functions is applied for longer distance (distance > 15 to have a concave up function. Note that constants in these models must be adjusted accordingly to the VR development environment, as well as by the adequate judgement of "closeness". In the demo for this paper, 15 is the unit length in the Oculus Rift-MAX/MSP setup, and the constants are determined accordingly. The constants of the logarithmic and inverse functions are adjusted as in equations 1 and. Figures 1 and summarize that the sound level drops mildly in closer distance, and in longer distance the level drops more rapidly. 0 < f (x < 1 represents the sound level decrease, and x represents the distance between sound source and the listener. * Short Distance f (x =.33 log( 1.4 x (1 Fig. : sound level decay in long distance Sets of HRIR data were chosen from New York University s Music and Auditory Research Laboratory (MARL [3]. The python notebook script that covolves an audio file with a selected HRIR data set can be found in github [4].This python script extracts 4 HRIR s on horizontal plane (0 elevation, 15 azimuth increment and convolves them with the loaded audio file. Then, it creates 4 audio files from the convolution that are of equal loudness. These 4 audio files will be the ingredients for designing 3D auditory scene, and are saved in the local directory. Javascript is used for quaternion computations to process orientation information fed from the Oculus Rift. The javascript is implemented in MAX/MSP where a set of HRIR-convolved audio files is weighted accordingly to the user s orientation. 3.1 Quaternion Algebra Fig. 1: sound level decay in short distance * Long Distance 3 Implementation f (x = 1 x ( A 16 seconds long drum loop (mono is used for the demo of this paper, but it can be substituted by other audio sample of choice. The audio sample is then convolved with Head-Related Impulse Response (HRIR. Quaternion is a mathematical concept similar to imaginary numbers, and it is an integral part of representing user s head orientation and thus each ear s location after user s head movement. This section illustrates the concept of quaternion, its properties, and how it is applied to the tasks of this project. The idea of quaternion is first described by William Rowan Hamilton [5]. A quaternion is represented four real numbers, say q 1,q,q 3,q 4, and imaginary units, î, ĵ, ˆk. A quaternion q = q 1 î + q ĵ + q 3ˆk + q 4 = (q 1,q,q 3,q 4 represents a rotation if a quaternion q can be expressed as follows[6]: q = v x sin θ î + v x sin θ ĵ + v x sin θ ˆk + cos θ Page of 5

v = (v x,v y,v z represents a unit vector along the axis of rotation, and θ an angle of rotation. q is called "quaternion of rotation". This concept useful because Oculus Rift produces positional values and quaternion values that correspond to user s head rotation along the yaw axis. This allows real-time computation of ear s location and angle of rotation. The computation of ear s location is used for calculating the distance between each ear and the sound source, and computation of the angle of rotation is used for weighting convolved audio samples from the python script. quaternions. This allows to calculate the distance between the sound source and each ear, which the acoustic model from Section takes as input x. Location of each ear after θ angle of rotation: * Left ear: (x 0 q 4 + q,z 0 + q q 4 * Right ear:(x 0 + q 4 q,z 0 q q 4 Depending on the environment the developer is working on, the definition of angle θ is adjusted. In this paper and the demo, the angle of rotation θ is the clockwise rotation angle from the z axis. Also, the position of the sound source is fixed on xz plane. The number that represents user s head diameter (.0 is arbitrary for the ease of computation, and it is assumed that the length from the center of the head to each ear is 1.0. Limiting to yaw axis rotation simplifies the quaternion values. Axis of rotation is the y axis, so v = (0,1,0. Consequentially, q = (0,sin θ,0,cos θ. Oculus Rift s head tracker returns positional values x,y,z and quaternion constituents q 1,q,q 3,q 4 through MAX/MSP. In the simplified task limited on the xz plane and the yaw axis rotation, only x,z positional values and q = sin θ, q 4 = cos θ orientation values are relevant. 3. Distance Computation Given user s positional value (x 0,z 0 and quaternions (q,q 4, I calculated each ear s position. As suggested in Figure 5, location of the ears given head s center position (x 0,z 0 is (x 0 cosθ,z 0 + sinθ for the left ear, (x 0 + cosθ,z 0 sinθ. Due to the properties of quaternion of rotation, cosθ and sinθ can be expressed in terms of quaternions constituent: ( θ sinθ = sin cosθ = cos ( θ cos sin ( θ ( θ = q q 4 = q 4 q Thus, each ear s location can be calculated real-time as Oculus Rift s headset gear returns positional values and 3.3 Interpolation Fig. 3: Location of each ears 4 audio files are created from the python script [4]. These files are drum sample convolved with HRIR at angles in 15 increment. On the xz plane, when angle of rotation is exactly 15,30,...,345, simply playing corresponding convolved audio file would be accurate representation of the localized audio in VR. However, for angles that are not exactly in 15 increments, interpolation is necessary. Figure 4. describes the algorithm of weighing two audio files to interpolate localized audio for any angle of rotation. *Algorithm: 1. Divide parametric space (0 θ 360 into 4 bins (each bin is of 15 increment.. Compute the angle of rotation θ from the quaternion values returned from Oculus Rift: θ = cos 1 q 4 Note that there are two possible values of θ from the inverse cosine function. It is necessary to compare the Page 3 of 5

two possible options half angle sine values and pick the one that is closer to q = sin(θ/ 3. Determine which bin the computed angle of rotation (θ belongs to. 4. Interpolate audio signal as a mix of two audio signals that bounds the bin. For example, in Figure 6., at angle of rotation between 15 and 30, the audio signal should be mix of File (HRIR convolved for 15 and File 3 (HRIR convolved for 30. If θ is closer to 15, File should dominate, and vice versa. The following is the exact computation: Given θ and Bin B n given, the interpolated audio signal S should be mix of audio signal of the files that bound the bin, S n and S n+1. Weights for the other audio signals are 0. S = (n + 1 θ 15 S n + ( θ 15 n S n+1 The javascript code that is implemented in the demo can be found in github [7]. Fig. 5: MAX/MSP Patch to output interpolated audio signals, and MAX/MSP synchronizes the visual display with the audio signal. Figure 5. is a part of the MAX/MSP patch that manipulates and processes the audio signals. This patch receives user s locational data from Oculus Rift headset and plays 4 HRIR-convolved audio files simultaneously (both marked in red boxes in Figure 5.. These two are used as inputs that are processed with javascript codes that interpolate the audio signals and calculate the distance between the sound source and the user s ears (both marked in blue boxes in Figure 5.. 4.1 MAX/MSP Demo Directions Fig. 4: Algorithm for Weighting Audio Signals 4 Demo MAX/MSP is used as a bridging front-end platform that receives locational information from the Oculus Rift headset, manipulates the loaded audio signals accordingly to that information, and outputs the interpolated audio signal as well as the visual display to the headset. Professor Bradford Garton s MAX/MSP patch [8] run simple visual display and receives Oculus Rift s locational data. The demo is built over this patch as a basis. Javascript codes are implemented to process the loaded audio signals (HRIR convolved drum samples This section describes how to use the Demo. The objective of this demo is to play a desired audio sample (mono signal and manipulate it to create an auditory scene that is synchronized with user s position and orientation in VR. The following is the instruction of the demo: 1. Create 4 HRIR-convolved audio files using the python script[4] for the desired audio sample. Load the folder with the audio samples into MAX/MSP s polybuffer object. 3. Wear the Oculus Rift headset and earphones, and start the program. Toggle fullscreen. 4. Navigate in VR using the keypads and by moving the head. * key commands: Page 4 of 5

- w/up arrow: move forward - s/down arrow: move backward - d: move right - a: move left - right arrow: rotate right - left arrow: rotate left - delete: reset - escape: toggle fullscreen 5 Summary The task of this paper is to interpolate HRIR-convolved audio signals to recreate realistic auditory environment in VR when user s head movement is limited to yaw rotation. A simple mixing by weights method was used to interpolate for angles of rotation that were not strictly at 15 increments. [] B, T., Handbook for Acoustic Ecology, World Soundscape Project, Simon Fraser University, and ARC Publications, 1978. [3] Andreopoulou, A. and Roginska, A., Documentation for the MARL-NYU file format Description of the HRIR repository, 011, data Retrieved from NYU Music and Audio Research Laboratory. [4] Park, J. W.,, 017, github. [5] Hamilton, R., On Quaternions, or on a New System of Imaginaries in Algebra, Philosophical Magazine, 1850. [6] Trawny, R. S., N., Indirect Kalman Filter for 3D Attitude Estimation, Multiple Autonomous Robotic Systems Laboratory, 005. [7] Park, J. W., weight, 017, github. [8] Garton, B., Oculus Rift, 016, website. This project can serve as a basis framework to develop realistic auditory environment in VR. Some adjustments that can be made are the choice of HRIR data set (which HRIR data set optimizes the accuracy?, and reassessment of acoustic models (how does the distance and direction affect sound perception?. This project can be extended beyond the yaw rotation limitation by employing an appropriate quaternion algebra. Another important task to be solved is to develop a method to evaluate the accuracy of the auditory scene created in the demo. For this project, I used my own subjective judgment to assess if the recreated auditory scene was "good enough". But for accurate test procedures, an objective metric for assessing the auditory scene created (interpolated audio signal is necessary. 6 Acknowledgements I would like to thank Professor Nima Mesgarani and Professor Bradford Garton of Columbia University for their guidance and helpful advice. References [1] Raspaud, V. H., M. and Evangelista, G., Binaural Source Localization by Joint Estimation of ILD and ITD, IEEE Transactions on Audio, Speech, and Language Processing, 18, pp. 68 77, 010. Page 5 of 5