A Study on Complexity Reduction of Binaural. Decoding in Multi-channel Audio Coding for. Realistic Audio Service

Similar documents
A spatial squeezing approach to ambisonic audio compression

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 8, NOVEMBER /$ IEEE

Assignment Scheme for Maximizing the Network. Capacity in the Massive MIMO

RECOMMENDATION ITU-R BS User requirements for audio coding systems for digital broadcasting

(51) Int Cl.: G10L 19/24 ( ) G10L 21/038 ( )

Spatial Audio Transmission Technology for Multi-point Mobile Voice Chat

Audio Signal Compression using DCT and LPC Techniques

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4

IMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC

Implementation of cross-talk canceling filters with warped structures - Subjective evaluation of the loudspeaker reproduction of stereo recordings

Auditory modelling for speech processing in the perceptual domain

Different Approaches of Spectral Subtraction Method for Speech Enhancement

DIRECTIONAL CODING OF AUDIO USING A CIRCULAR MICROPHONE ARRAY

A Novel Approach of Compressing Images and Assessment on Quality with Scaling Factor

APPLICATIONS OF DSP OBJECTIVES

MPEG-4 Structured Audio Systems

Optimum Timing Acquisition for High Efficiency OFDM System in Wireless Communications

Evaluation of Audio Compression Artifacts M. Herrera Martinez

Proceedings of Meetings on Acoustics

Stability of Some Segmentation Methods. Based on Markov Random Fields for Analysis. of Aero and Space Images

Assistant Lecturer Sama S. Samaan

Overview of Code Excited Linear Predictive Coder

Golomb-Rice Coding Optimized via LPC for Frequency Domain Audio Coder

Performance Analysis of SVD Based Single and. Multiple Beamforming for SU-MIMO and. MU-MIMO Systems with Various Modulation.

Super-Wideband Fine Spectrum Quantization for Low-rate High-Quality MDCT Coding Mode of The 3GPP EVS Codec

An Optimized Direct Digital Frequency. Synthesizer (DDFS)

Binaural auralization based on spherical-harmonics beamforming

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Subband Analysis of Time Delay Estimation in STFT Domain

QuantumLogic by Dr. Gilbert Soulodre. Intro: Rob Barnicoat, Director Business Development and Global Benchmarking, Harman International

RECENTLY, there has been an increasing interest in noisy

Introduction. 1.1 Surround sound

Measuring impulse responses containing complete spatial information ABSTRACT

Communications Theory and Engineering

Behavioral Modeling and Digital Predistortion of Radio Frequency Power Amplifiers

GENERIC CODE DESIGN ALGORITHMS FOR REVERSIBLE VARIABLE-LENGTH CODES FROM THE HUFFMAN CODE

Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Image Compression

Audio Compression using the MLT and SPIHT

Chapter IV THEORY OF CELP CODING

Qäf) Newnes f-s^j^s. Digital Signal Processing. A Practical Guide for Engineers and Scientists. by Steven W. Smith

Sound source localization and its use in multimedia applications

The Digital Linear Amplifier

Auditory Localization

14 fasttest. Multitone Audio Analyzer. Multitone and Synchronous FFT Concepts

United Codec. 1. Motivation/Background. 2. Overview. Mofei Zhu, Hugo Guo, Deepak Music 422 Winter 09 Stanford University.

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

A Novel On-Channel Repeater for Terrestrial-Digital Multimedia Broadcasting System of Korea

Surround: The Current Technological Situation. David Griesinger Lexicon 3 Oak Park Bedford, MA

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 50, NO. 12, DECEMBER

Timbral Distortion in Inverse FFT Synthesis

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

WHY BOTHER WITH STEREO?

Psychoacoustic Cues in Room Size Perception

Binaural Cue Coding Part I: Psychoacoustic Fundamentals and Design Principles

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

OF HIGH QUALITY AUDIO SIGNALS

Ninad Bhatt Yogeshwar Kosta

Sampling and Reconstruction of Analog Signals

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

University of Huddersfield Repository

Audio and Speech Compression Using DCT and DWT Techniques

EEG SIGNAL COMPRESSION USING WAVELET BASED ARITHMETIC CODING

SINUSOIDAL MODELING. EE6641 Analysis and Synthesis of Audio Signals. Yi-Wen Liu Nov 3, 2015

MULTIPLE F0 ESTIMATION IN THE TRANSFORM DOMAIN

Encoding higher order ambisonics with AAC

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Enhanced Waveform Interpolative Coding at 4 kbps

THE STATISTICAL ANALYSIS OF AUDIO WATERMARKING USING THE DISCRETE WAVELETS TRANSFORM AND SINGULAR VALUE DECOMPOSITION

The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D.

Drum Transcription Based on Independent Subspace Analysis

An Integrated Image Steganography System. with Improved Image Quality

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Multichannel Audio In Cars (Tim Nind)

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Speech Coding in the Frequency Domain

An Equalization Technique for Orthogonal Frequency-Division Multiplexing Systems in Time-Variant Multipath Channels

Digital Signal Processing Lecture 1

The analysis of multi-channel sound reproduction algorithms using HRTF data

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York

Paper Body Vibration Effects on Perceived Reality with Multi-modal Contents

Next Generation Surround Decoding and Upmixing for Consumer and Professional Applications

Nonuniform multi level crossing for signal reconstruction

ece 429/529 digital signal processing robin n. strickland ece dept, university of arizona ECE 429/529 RNS

Speech Compression. Application Scenarios

ARM BASED WAVELET TRANSFORM IMPLEMENTATION FOR EMBEDDED SYSTEM APPLİCATİONS

Spatial Audio & The Vestibular System!

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

Open Access Improved Frame Error Concealment Algorithm Based on Transform- Domain Mobile Audio Codec

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

IJMIE Volume 2, Issue 4 ISSN:

Book Chapters. Refereed Journal Publications J11

NEXT-GENERATION AUDIO NEW OPPORTUNITIES FOR TERRESTRIAL UHD BROADCASTING. Fraunhofer IIS

UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS. Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

TRADITIONAL PSYCHOACOUSTIC MODEL AND DAUBECHIES WAVELETS FOR ENHANCED SPEECH CODER PERFORMANCE. Sheetal D. Gunjal 1*, Rajeshree D.

Convention Paper 9740 Presented at the 142 nd Convention 2017 May 20 23, Berlin, Germany

Chapter 9 Image Compression Standards

Transcription:

Contemporary Engineering Sciences, Vol. 9, 2016, no. 1, 11-19 IKARI Ltd, www.m-hiari.com http://dx.doi.org/10.12988/ces.2016.512315 A Study on Complexity Reduction of Binaural Decoding in Multi-channel Audio Coding for Realistic Audio Service Kwangi Kim Korea Nazarene University, Department of Digital Contents Wolbong-ro 48, Cheonan Chungnam, Korea Copyright 2015 Kwangi Kim. This article is distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original wor is properly cited. Abstract In this paper, we proposed the simplified binaural decoding method for reducing the complexity of the binaural decoding. In the proposed simplified binaural decoding the high frequency components of the RTF (head related transfer function) coefficients are excluded and the binaural decoding process in the high frequency regions is simplified. From the experimental results, it is confirmed that the proposed method greatly reduces the complexity of the binaural decoding in the frequency domain as 40 % and shows the statistically same sound quality compared to the binaural decoding in the frequency domain. Keywords: binaural decoding, multi-channel audio, spatial cue, down-mix signal, bacward compatibility 1 Introduction Recently, with increase of realistic 3D videos such as 3DTV, UDTV (Ultra igh Definition TV) and 3D movies, a realistic audio sound is getting more important in the area of audio service. The realistic audio sound can be generated by not stereo audio signals but more than 5.1 channel audio signals, and audio signals with more channels can mae more realistic and immersive audio sound. But, as the data rate of multi-channel audio signals increases in proportion to the number of the audio channel, the multi-channel audio signals cannot be directly

12 Kwangi Kim provided through the wired and wireless networ system. To solve the high bit-rate problem of the multi-channel audio signals, a spatial cue based multi-channel audio coding such as BCC (binaural cue coding), MPEG Surround, and (sound source location coefficient coding) has been proposed and developed [1-4]. As the spatial cue based multi-channel audio coding represents the multi-channel audio signals as a down-mix signal and additional side information, the data rate of the multi-channel audio signals can be significantly reduced. So, the multi-channel audio signals can be efficiently delivered to users through the networ system. Generally, the spatial cue based multi-channel audio coding has a unique functionality, called bacward compatibility. With the bacward compatibility, users can enjoy the down-mix signal using their stereo playbac system if they do not have a multi-channel audio coder or they just want to play the down-mix signal [5]. But, as the down-mix signal cannot realize the 3D audio sound generated by the multi-channel audio signals, the bacward compatibility of the spatial cue based multi-channel audio coding should be enhanced. From this reason, the binaural decoding can be applied to enhance the bacward compatibility of the spatial cue based multi-channel audio coding by adding the multi-channel audio effect to the down-mix signal. The binaural decoding generates the binaural stereo sound by convolving the multi-channel audio signal with RTF (head related transfer function) coefficients. Basically, the binaural decoding has very high complexity due to the linear convolution process in time domain. So, the binaural decoding has a limitation that the real time implementation of the binaural decoding is impossible. For the real time implementation of the binaural decoding, the binaural decoding in the by convolving the RTF coefficients and the multi-channel audio signals in the synthesis domain, i.e., frequency domain was proposed [6]. Although the binaural decoding in the frequency domain successfully reduced the complexity, the binaural decoding of the still has the rather high complexity. In this paper, we proposed a simplified binaural decoding to consist of envelope and phase modifications in the frequency domain. 2 Overview of the A structure of the is depicted in Fig.1. The encoder represents input multi-channel audio signals as the down-mix signal with additional side information. The decoder recovers the multi-channel audio signals using the transmitted down-mix signal and the side information. A detailed process of the encoder is shown in Fig. 2. Firstly, the input multi-channel audio signals are transformed into the frequency domain by the discrete time Fourier transform () and then they are inputted to the analyzer for extracting the spatial parameters. Virtual source location information (VSLI) is used as the spatial parameters and it indicates a spatial image in the free space to be generated by the multi-channel audio signals. The extracted spatial parameters are quantized for the transmission. In addition, the multi-channel audio signals are summed for generating the down-mix signal.

A study on complexity reduction of binaural decoding 13 A detailed process of the decoder is shown in Fig. 3. Firstly, the down-mix signal is transformed into the frequency domain and the received spatial parameters are dequantized. Then, the down-mix signal and the dequantized spatial parameters are inputted into the synthesizer for recovering the multi-channel audio signals in the frequency domain. The reconstructed multi-channel audio signals in the frequency domain are transformed into the output signals in the time domain by the inverse. The detailed description of the analysis and the synthesis can be found in [3], [4]. Down-mix Input Multi-channel Encoder Decoder Recovered Multi-channel Side Information Fig. 1. Basic structure of Input multi-channel audio signals T/F Transform by analysis Parameter quantization Lossless coding Side information Mixing information Down-mixing F/T Transform by I Fig. 2. Procedure of encoder Down-mix signal Down-mix signals T/F Transform by synthesis F/T Transform by I Reconstructed multi-channel audio signals Side information Lossless decoding Parameter dequantization Fig. 3. Procedure of decoder 3 Binaural Decoding in the Since the binaural decoding in the multi-channel audio coding has high computational loads of the linear convolution between the multi-channel audio

14 Kwangi Kim signals and the RTF coefficients in the time domain, the binaural decoding cannot avoid the complexity problem and it cannot be implemented in the real time. To resolve the complexity problem, the simplified binaural decoding performed in the frequency domain was proposed in [6] and it is shown in Fig. 4. The RTF coefficients are transformed into the frequency domain by the and they are stored in the memory. The gain factors of the multi-channel audio signals are estimated using the side information in the frequency domain and they are convolving with the RTF coefficients in the frequency domain. RTF RTF coefficients in frequency in frequency domain domain Down-mix Downmix X ( ) L X ( ) R RTF Rendering O ( ) L O ( ) R I Binaural Stereo Bitstream Synthesis Multi-channel Gain Factor m m m g, g, g 1C Lf Ls g, g, g m m m 2C Rf Rs Fig. 4. Binaural decoding in (Lf: left front, Ls: left surround, Rf: right front, Rs: right surround, C: center) X L( ), X R( ) Using the down-mix signal in frequency domain,, the calculated he multi-channel audio signals in the frequency domain, g1 C ( ), g2c( ), glf ( ), gls( ), grf ( ), grs(), and the stored RTF coefficients in frequency L R L R L R L R L R ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ) domain,, the binaural rendering C C Lf Lf Ls Ls Rf Rf Rs Rs can be performed as ( ) g ( ) ( ) g ( ) ( ) g ( ) ( ) LL 1C C Lf Lf Ls Ls ( ) g ( ) ( ) g ( ) ( ) g ( ) ( ) RL 2C C Rf Rf Rs Rs ( ) g ( ) ( ) g ( ) ( ) g ( ) ( ) LR 1C C Lf Lf Ls Ls ( ) g ( ) ( ) g ( ) ( ) g ( ) ( ) RR 2C C Rf Rf Rs Rs (1) ( ) where LL ( ) and LR are elements for left and right binaural output by center, left ( ) front, and left surround channels, respectively, while RL and ( ) RR are RTF rendering elements for elements for left and right binaural output by center, right front, and right surround channels, respectively. ere, indicates the frequency index. At last, the binaural output signals can be obtained as OL ( ) LL( ) RL( ) X L( ) OR ( ) LR ( ) RR ( ) X R ( ) (2) ( ) where L O O and ( ) R are the left and right binaural output signals, respectively.

A study on complexity reduction of binaural decoding 15 4 Proposed Simplified Binaural Decoding in the RTF in Frequency Domain RTF Pre-handling Downmix X ( ) L X ( ) R RTF Rendering U ( ) L U ( ) R I Binaural Stereo Spatial Bitstream Synthesis Multi-channel Gain Factor m m m g, g, g 1C Lf Ls g, g, g m m m 2C Rf Rs Fig. 5. Overall structure of the simplified binaural decoding in the The proposed simplified binaural decoding in the consists of the envelope and the phase modifications in the frequency domain. The RTF coefficients are pre-handled in the frequency domain to reflect human hearing property that the human hearing is insensitive to high frequency regions [7]. Therefore, the high frequency components of the RTF coefficients can be excluded and the binaural decoding process in the high frequency regions can be sipped or simplified. Fig. 5 shows the overall structure of the proposed simplified binaural decoding in the. In the proposed simplified binaural decoding method, the RTF coefficients are pre-transformed into those in the frequency domain and amplitude and phase information are calculated using them. Then, the amplitude information of the RTF coefficients is totally stored and the phase information of the RTF coefficients to be below 3.5 z are selectively stored. As the human hearing is sensitive to the phase information of the low frequency regions while being insensitive to those of the high frequency regions, we can exclude the phase information of the high frequency components of the RTF coefficients and the RTF rendering of the phase information in the high frequency regions can be sipped. Using the pre-handled and stored RTF coefficients in the frequency domain, the RTF rendering, i.e. the envelope and phase modification, can be simply performed using the modified (1) and (2). At first, (1) is divided into the following (3) and (4). LL( ) g1 C ( ) C ( ) glf ( ) Lf ( ) gls ( ) Ls ( ) RL( ) g2c ( ) C ( ) grf ( ) Rf ( ) grs ( ) Rs ( ) for 0 L LR ( ) g1 C ( ) C ( ) glf ( ) Lf ( ) gls ( ) Ls ( ) ( ) g ( ) ( ) g ( ) ( ) g ( ) ( ) RR 2C C Rf Rf Rs Rs (3)

16 Kwangi Kim LL( ) g1 C ( ) C ( ) glf ( ) Lf ( ) gls( ) Ls( ) RL ( ) g2c ( ) C ( ) grf ( ) Rf ( ) grs( ) Rs( ) for L 1 N 1 LR ( ) g1 C ( ) C ( ) glf ( ) Lf ( ) gls( ) Ls( ) ( ) g ( ) ( ) g ( ) ( ) g ( ) ( ) RR 2C C Rf Rf Rs Rs (4) ere, L is the frequency bin index of 3.5 z and N is the frame size. (3) is the RTF rendering for the frequency regions to be below 3.5 z while (4) is the RFT rendering for the high frequency regions to be beyond 3.5 z. Therefore, for the low frequency regions, both the envelope and the pahse information are used for the binaural decoding. Whereas, for the high frequency regions, only the envelope information is used for the binaural decoding. 5 Experimental Results Table 1. Complexity comparison Classification By By convolution Reduction Decoded multi-channel signals to binaural output (in time domain) 5 x N log2n 10 x (N x N multiplications + N x N summations) 100 % RTF rendering in frequency domain 2 x 2N log22n 2 x (28N multiplications + 28N summations) about 10 % RTF rendering in frequency domain with pre-handled RTF (spectral envelope shaping) 2 x 2N log22n 28N multiplications + 28N summations about 5 % RTF rendering in frequency domain with pre-handled RTF (spectral envelope shaping+phase modification) 2 x 2N log22n 1.15 x (28N multiplications + 28N summations) about 6 % To validate the performance of the proposed simplified binaural decoding, we checed the complexity of various binaural decoding methods and performed a subjective listening test. Firstly, Table 1 shows the complexity comparison results. The RTF rendering in the frequency domain can reduce the complexity of the typical binaural decoding in the time domain as 90 %. In addition, the proposed simplified RTF rendering can reduce the complexity of the RTF rendering in the frequency domain as 40 %. For the subjective test, three multi-channel audio contents were used and they are listed in Table 2 [8]. The items were sampled at 44.1 z with 16 bit resolution and have the duration of 20 seconds. An MUSRA test was performed [9] and four systems were used for the test and they are listed in Table 3.

A study on complexity reduction of binaural decoding 17 Table 2. Test materials Material ARL_applause Chostaovitch Fountain_music Description Ambience Music (bac: direct) Pathological Table 3. System under test Classification +PA Description Reference signal generated with the original signals RTF rendering in frequency domain RTF rendering in frequency domain with pre-handled RTF and only envelope modification Proposed simplified RTF rendering. and phase modification for the low frequency regions to be below 3.5 z Fig. 6 shows the subjective listening test results. For all test items, and +PA shows the good sound quality while has very poor sound quality. Although +PA is slightly low absolute score than, and +PA have the statistically same sound quality. From the experimental results, it is confirmed that the proposed simplified binaural decoding method can successfully reduce the complexity while maintaining the good sound quality. 100 90 80 70 60 50 40 30 20 10 0 +PA +PA +PA +PA ARL_applause chostaovitch fountain_music average Fig. 6. Subjective listening test results 6 Conclusion In this paper, we proposed the simplified binaural decoding method for reducing the complexity of the binaural decoding. In the proposed simplified binaural

18 Kwangi Kim decoding the high frequency components of the RTF coefficients are excluded and the binaural decoding process in the high frequency regions is simplified. From the experimental results, it is confirmed that the proposed method greatly reduces the complexity of the binaural decoding in the frequency domain as 40 % and shows the statistically same sound quality compared to the binaural decoding in the frequency domain. As the future wor, the binaural decoding method for more than 5.1 channel audio signals, i.e. ultra multi-channel audio environment, will be studied. Acnowledgements. This study was funded by the research fund of Korea Nazarene University in 2015. References [1] ISO/IEC 23003-1, Information Technology MPEG Audio Technologies Part 1: MPEG Surround, (2007). [2] C. Faller and F. Baumgarte, Binaural cue coding Part II: Schemes and Applications, IEEE Trans. Speech Audio Processing, 11 (2003), no. 6, 520-531. http://dx.doi.org/10.1109/tsa.2003.818108 [3] an-gil Moon, Jeong-il Seo, et al., A multi-channel audio compression method with virtual source location information for MPEG-4 SAC, IEEE Transactions on. Consumer Electronics, 51 (2005), no. 4, 1253-1259. http://dx.doi.org/10.1109/tce.2005.1561852 [4] Seungwon Beac, Jeongil Seo, et al., Angle-Based Virtual Source Location Representation for Spatial Audio Coding, ETRI Journal, 28 (2006), no. 2, 219-222. http://dx.doi.org/10.4218/etrij.06.0205.0079 [5] Kwangi Kim, Minsoo ahn and Jinsul Kim, Mastering Processing in MPEG SAOC, IEICE Transactions on Information and Systems, E95.D (2012), no. 12, 3053-3059. http://dx.doi.org/10.1587/transinf.e95.d.3053 [6] Kwangi Kim and Jinsul Kim, Binaural decoding for efficient multi-channel audio service in networ environment, 2014 IEEE 11th Consumer Communications and Networing Conference, (2014), 525-526. http://dx.doi.org/10.1109/ccnc.2014.6994429 [7] E. Zwicer and. Fastl, Psychoacoustics, Springer-Verlag, Berlin, eidelberg, 1999. http://dx.doi.org/10.1007/978-3-662-09562-1 [8] ISO/IEC JTC1/SC29/WG11 (MPEG), Procedures for the Evaluation of Spatial Audio Coding Systems, Document N6691, Redmond, 2004.

A study on complexity reduction of binaural decoding 19 [9] ITU-R Recommendation, Method for the Subjective Assessment of Intermediate Sound Quality (MUSRA), ITU, BS. 1543-1, Geneva, 2001. Received: December 15, 2015; Published: December 29, 2015