Super-Wideband Fine Spectrum Quantization for Low-rate High-Quality MDCT Coding Mode of The 3GPP EVS Codec

Similar documents
Overview of Code Excited Linear Predictive Coder

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008

EFFICIENT SUPER-WIDE BANDWIDTH EXTENSION USING LINEAR PREDICTION BASED ANALYSIS-SYNTHESIS. Pramod Bachhav, Massimiliano Todisco and Nicholas Evans

Auditory modelling for speech processing in the perceptual domain

Quality comparison of wideband coders including tandeming and transcoding

Flexible and Scalable Transform-Domain Codebook for High Bit Rate CELP Coders

Speech Quality Evaluation of Artificial Bandwidth Extension: Comparing Subjective Judgments and Instrumental Predictions

Audio Signal Compression using DCT and LPC Techniques

Speech Coding in the Frequency Domain

United Codec. 1. Motivation/Background. 2. Overview. Mofei Zhu, Hugo Guo, Deepak Music 422 Winter 09 Stanford University.

Speech Coding using Linear Prediction

Communications Theory and Engineering

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

Acoustics of wideband terminals: a 3GPP perspective

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC

APPLICATIONS OF DSP OBJECTIVES

ARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Wideband Speech Encryption Based Arnold Cat Map for AMR-WB G Codec

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

-/$5,!4%$./)3% 2%&%2%.#% 5.)4 -.25

Enhanced Waveform Interpolative Coding at 4 kbps

Transcoding free voice transmission in GSM and UMTS networks

ENHANCED TIME DOMAIN PACKET LOSS CONCEALMENT IN SWITCHED SPEECH/AUDIO CODEC.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Improving Sound Quality by Bandwidth Extension

ITU-T EV-VBR: A ROBUST 8-32 KBIT/S SCALABLE CODER FOR ERROR PRONE TELECOMMUNICATIONS CHANNELS

SILK Speech Codec. TDP 10/11 Xavier Anguera I Ciro Gracia

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Data Transmission at 16.8kb/s Over 32kb/s ADPCM Channel

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

IMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR

Voice Activity Detection for Speech Enhancement Applications

Golomb-Rice Coding Optimized via LPC for Frequency Domain Audio Coder

HIGH-FREQUENCY TONAL COMPONENTS RESTORATION IN LOW-BITRATE AUDIO CODING USING MULTIPLE SPECTRAL TRANSLATIONS

Speech/Music Change Point Detection using Sonogram and AANN

Experiments in two-tone interference

REAL-TIME BROADBAND NOISE REDUCTION

Audio and Speech Compression Using DCT and DWT Techniques

TELECOMMUNICATION SYSTEMS

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

Tones in HVAC Systems (Update from 2006 Seminar, Quebec City) Jerry G. Lilly, P.E. JGL Acoustics, Inc. Issaquah, WA

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Complex Sounds. Reading: Yost Ch. 4

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

RECENTLY, there has been an increasing interest in noisy

Perceptual wideband speech and audio quality measurement. Dr Antony Rix Psytechnics Limited

Speech Compression Using Voice Excited Linear Predictive Coding

Advanced audio analysis. Martin Gasser

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

A Study on Complexity Reduction of Binaural. Decoding in Multi-channel Audio Coding for. Realistic Audio Service

The Emergence, Introduction and Challenges of Wideband Choice Codecs in the VoIP Market

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Chapter IV THEORY OF CELP CODING

HARMONIC INSTABILITY OF DIGITAL SOFT CLIPPING ALGORITHMS

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

core signal feature extractor feature signal estimator adding additional frequency content frequency enhanced audio signal 112 selection side info.

Technical Report Speech and multimedia Transmission Quality (STQ); Speech samples and their usage for QoS testing

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

Audio Imputation Using the Non-negative Hidden Markov Model

Digital Audio. Lecture-6

Digital Speech Processing and Coding

Transcoding of Narrowband to Wideband Speech

Speech Synthesis using Mel-Cepstral Coefficient Feature

Evaluation of Audio Compression Artifacts M. Herrera Martinez

Music Signal Processing

ETSI TS V1.1.1 ( )

EE482: Digital Signal Processing Applications

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Carrier Frequency Offset Estimation in WCDMA Systems Using a Modified FFT-Based Algorithm

Audio Compression using the MLT and SPIHT

Modulation analysis in ArtemiS SUITE 1

Voice Excited Lpc for Speech Compression by V/Uv Classification

Bandwidth Extension for Speech Enhancement

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

Dilpreet Singh 1, Parminder Singh 2 1 M.Tech. Student, 2 Associate Professor

(51) Int Cl.: G10L 19/24 ( ) G10L 21/038 ( )

3GPP TS V5.0.0 ( )

Subjective Voice Quality Evaluation of Artificial Bandwidth Extension: Comparing Different Audio Bandwidths and Speech Codecs

22. Konferenz Elektronische Sprachsignalverarbeitung (ESSV), September 2011, Aachen, Germany (TuDPress, ISBN )

Spatial Audio Transmission Technology for Multi-point Mobile Voice Chat

Tempo and Beat Tracking

Broadcast Notes by Ray Voss

SOURCE CONTROLLED CHANNEL DECODING FOR GSM-AMR SPEECH TRANSMISSION WITH VOICE ACTIVITY DETECTION (VAD) C. Murali Mohan R. Aravind

Frequency Domain Representation of Signals

Speech Synthesis; Pitch Detection and Vocoders

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Information. LSP (Line Spectrum Pair): Essential Technology for High-compression Speech Coding. Takehiro Moriya. Abstract

ETSI EN V1.2.1 ( )

Outline. Communications Engineering 1

ETSI EN V1.2.1 ( )

Ninad Bhatt Yogeshwar Kosta

ETSI TS V ( )

RECOMMENDATION ITU-R BS User requirements for audio coding systems for digital broadcasting

3GPP TS V8.0.0 ( )

Speech Quality Assessment for Wideband Communication Scenarios

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

The Channel Vocoder (analyzer):

Transcription:

Super-Wideband Fine Spectrum Quantization for Low-rate High-Quality DCT Coding ode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15 th Dec 2015

Topics of this Presentation Background EVS Encoder Overview LR-DCT Coder Envelope Coding Spectrum Coding Bit allocation Gap Filling Evaluation Results Summary 2

Background Challenge: Encoding SWB band signal at low delay & low bit-rate ixed contents and music sampled at 32kHz Frame length : 20 ms Too few bits for quantizing SWB spectrum (quantizing 560 bins using around 256 bits) Highly efficient quantization algorithm is needed 3

EVS Encoder Overview Speech Signals ixed & usic Signals The Low Rate High Quality (LR-HQ) DCT coding is one of the mode in the EVS DCT coder. 4

LR-HQ DCT SWB Encoder (1/2) Quantized LF information for gap filling Signal Classification Energy Coding Spectrum Coding Transient Normal Harmonic Huffman Coding Advanced bit allocation 5

I Scaled Energies E ( b) round( q ( b) ), int LR-HQ DCT SWB Coder b 0,..., N bands -Envelope coding 1 Selection based on 1. Range of indices 2. Bits consumption E ( b) log 2 k kend ( b) 1 X k k start ( b) ( k) 2 Epsilon, Band Energies b 0,..., N bands 1 I I (0) I ( b) I I (0) round( q ( b) I ref int ( b 1), ) b 1,..., N Differential Indices bands 1 6

Energy Spectral Peak Tracking Peak Average Energy Average-Energy HF Bands n HF bands for Quantization Non-Zero Non-Zero Bit allocation Reduces the width for the current frame - Bit savings - Inter-frame Continuity Spectral Peak Tracking Bit allocation Temporal discontinuity -If the current frame HF spectrum peak is quantized while the previous frame is not quantized -Perceivable degradation 7

LR-HQ DCT Coder -Bit allocation (1/2) Dynamic Bit allocation Bands(1) Bits(1) Bits(1)>T1 Bands(2) : Bands(n-1) Dynamic Bit allocation Bits(2) : Bits(n-1) Bits(2)>T2 : Bits(n-1)>Tn-1 Combination Logic Bit allocation Result Bands(n) Bits(n) Bits(n)>Tn Iterative Bands with bits < Threshold identified and relocates bits to other bands 8

Bands(1) Bands(2) : Bands(n-1) Bands(n) LR-HQ DCT Coder -Bit allocation (2/2) Adaptive bit allocation based on Grouping Adaptive Grouping Identifies dominant bands (local max) and its valley to form groups Group(1) Group(2) : Group(k) Group energies Identify Dominant Group Bit allocation to Group Dynamic Bit allocation to bands in a Group Bit allocation Iterative: Identify Next Dominant Group Group energy and Variance Bit allocation Results Bits allocated 1) Adaptively grouping the bands and 2) By exploiting the relationship between the groups. This approach is more suitable for tonal (Harmonic) like signals as the energy of the bands is mainly concentrated at discrete tones. 9

LR-HQ DCT Coder -Gap Filling 0 2 4 6 8 10 12 14 khz 19 20 21 22 TCQ/USQ is performed on Non Zero bands Noise-filling is performed on 0-bit-allocated LF bands Spectral Gaps Zero-bit bands cause spectral gaps, which lead to audible artifacts if left alone. Gap filling technique is used. 10

LR-HQ DCT Coder -Gap Filling Normal ode 0 2 4 6 8 10 12 14 khz 19 20 21 22 TCQ/USQ is performed on Non Zero bands Noise-filling is performed on 0-bit-allocated LF bands Best match spectrum filled in the High Frequency Normalized LF 11

LR-HQ DCT Coder -Gap Filling Harmonic ode 0 2 4 6 8 10 12 14 khz 19 20 21 22 TCQ/USQ is performed on Non Zero bands Normalized LF 1.Harmonic relation not maintained b/w reproduced HF and input HF 2.Roughness perceived during reproduction @ the boundary regions Noise-filling is performed on 0-bit-allocated LF bands Gap Filled Spectrum Best match spectrum filled in the High Frequency irroring 12

f(n) Last spectral peak in LF LR-HQ DCT Coder -Spectral Correction Roughness perceived when 1 st spectral peak in HF 2 tones fall in 3 rd 30 spectral - 600 Hz and amplitudes peak are in rapidly HF changing f(n) Synthesized LF spectrum Last spectral peak in LF 1 st spectral peak in HF EstHarmonic Est Harmonic Replicated HF spectrum 2 nd spectral peak in HF Est Harmonic 3 rd spectral peak in HF freq Spectral correction using Harmonic transposition. This method retains both the fine structure of the spectrum and the harmonic relationships between the low frequency tones and the replicated high frequency tones 13 freq

LR-HQ DCT Coder - Listening Test Setup Degradation Category Rating (DCR) methodology (ITU-T P.800) : 24 ixed and usic samples recorded in Japanese language 16 Japanese naïve listeners Codecs EVS SWB LR-HQ mode Reference Codec: AR-WB+ 14

LR-HQ DCT Coder - Evaluation Results LR-HQ SWB Performance is equal or greater than AR-WB+ whose algorithmic delay is longer than twice of EVS (32ms) 15

Summary For encoding the SWB spectral coefficients at low bit budget Spectral band energies are quantized using an efficient Huffman coding methods Advanced bit allocation methods are used for efficient representation of spectrum. Spectral holes in the full spectrum coding is filled using gap filling techniques Gap-filling techniques are improved by introducing a fine spectrum normalization and adaptive sparse BWE coding Conclusion: EVS LR-HQ SWB coder meets the performance requirements and is adopted as a part of multi-mode DCT coding in the EVS codec. 16

References 1. 3GPP TR 22.813 - Study of Use Cases and requirements for enhanced voice codecs for the EPS, v.10.0.0, ar 2010. 2. 3GPP TS 26.090 - andatory speech codec speech processing functions; Adaptive ulti-rate (AR) speech codec; General description Speech, v.11.0.0, Sep.2012. 3. 3GPP TS 26.190 - Speech codec speech processing functions; Adaptive ulti-rate-wide-band (AR-WB) speech codec; General description, v.11.0.0, Oct.2012. 4. 3GPP TS 26.445 - Codec for Enhanced Voice Services (EVS); Detailed Algorithmic Description, v.12.0.0), Sep. 2014. 5. S. Bruhn, et al., Standardization of the new EVS Codec, IEEE ICASSP, April 2015. 6. S. Nagisetty, et al., Low Bitrate High Quality DCT Audio coding of the 3GPP EVS, IEEE ICASSP 2015. 7. ITU-T G.719, Low complexity, full band audio coding for high quality, conversational applications, ITU-T Recommendation G.719, June 2008. 8. T. R. Fischer, H. Sung, J. Zhan, E. Oh, High-quality audio transform coded excitation using trellis codes, ICASSP, 2008, pp.197-200. 9. ITU-T G.718 Annex B, Superwideband scalable extension for ITU-T G.718, ITU-T Recommendation G.718 Amendment 2, arch, 2010. 10. L. Laaksonen et al., Superwideband extension of G.718 and G.729.1 speech codecs, Interspeech 2010. 11.. Oshikiri, H. Ehara, K. Yoshida, Efficient spectrum coding for super-wideband speech and its application to 7/10/15 khz bandwidth scalable coders, IEEE ICASSP, 2004, pp.i-481-484. 12. H. Fastl, E. Zwicker, Psychoacoustics: Facts and odels. Springer series in information sciences. Springer, 3 rd edition, 2007. 13. Audio codec processing functions; Extended Adaptive ulti-rate Wide-band (AR-WB+) codec, 3GPP TS 26.290. 14. 3GPP, T-doc S4-130522, EVS performance requirements, April 2013. 15. ITU-T P.800, ethods for Subjective Determination of Transmission Quality. International Telecommunication Union (ITU), Series P., August 1996. 16. 3GPP, Tdoc S4-141065, Report of the Global Analysis Lab for the EVS Selection Phase, Aug. 2014. 17

APPENDIX 18

LR-HQ DCT SWB Encoder (2/2) Energy Frequency 0 2 4 6 8 10 12 14 khz 19 20 21 22 Spectral Coefficients are grouped Spectral bands E(1) E(2).. E(b-1) E(b) Spectral band energies 19

8 Symbol Huffman table LR-HQ DCT Coder -Envelope coding (2/2) I(0) : I(b) 32 Symbol Huffman tables Large Symbol Huffman > [-64 63] Range Small Symbol Huffman < [-64 63] Pulse ode Scale ode Context Based Huffman Resized Huffman If I(b) exceeds [-4 3] 1. Position 2. Amplitude of coded directly I(b) I(b-1) determines best Huffman table for encoding the current band I(b) I(b) Span is narrowed down to fewer code words (21 symbols) 20

LR-HQ DCT Coder -Envelope coding (3/3) Small Symbol coding method Context based Huffman: table for encoding the current band Resized Huffman : I(b) I(b-1) determines best Huffman I(b) narrowed to a smaller range for using Huffman table with fewer symbols (21 symbols). ΔI (b) ΔI ΔI (b) (b) min( ΔI max( ΔI (b-1 ) (b-1 ) -T,3), 1 ΔI, 3), ΔI (b-1 ) T (b-1 ) - T T ΔI (b) is the the new differenti al index for band b T 15 thr, T 1 15 thr 21

LR-HQ DCT Coder -Quantization (TCQ and USQ) agnitude Quantized by USQ & TCQ Position, Number and sign coded by Arithmetic coding 22

LR-HQ DCT Coder -Sparse Band Search Reduces computational load Best match spectrum filled in the High Frequency Region Best match index is identified by finding the k' which maximizes the correlation measure, S(k') according to S( k' ) corr( k' ) corr( k') Ene( k') Ncnt 1 k0 Ncnt 1 ~ X k0 2 Ene( k' ), k' 0,..., Nlag 1 ~ X ( Idx[ k]) X ( k lag[ k'] Idx[ k]) ( k lag[ k'] Idx[ k]) 2 Gap filling for Normal mode 23