SILK Speech Codec. TDP 10/11 Xavier Anguera I Ciro Gracia

Similar documents
Chapter IV THEORY OF CELP CODING

Digital Speech Processing and Coding

The Opus Codec To be presented at the 135th AES Convention 2013 October New York, USA

Overview of Code Excited Linear Predictive Coder

Analysis/synthesis coding

EE482: Digital Signal Processing Applications

Transcoding of Narrowband to Wideband Speech

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

Communications Theory and Engineering

Cellular systems & GSM Wireless Systems, a.a. 2014/2015

Page 0 of 23. MELP Vocoder

Enhanced Waveform Interpolative Coding at 4 kbps

Audio Signal Compression using DCT and LPC Techniques

Voice and Audio Compression for Wireless Communications

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

The Channel Vocoder (analyzer):

Speech Coding using Linear Prediction

DEPARTMENT OF DEFENSE TELECOMMUNICATIONS SYSTEMS STANDARD

ENHANCED TIME DOMAIN PACKET LOSS CONCEALMENT IN SWITCHED SPEECH/AUDIO CODEC.

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

A 600 BPS MELP VOCODER FOR USE ON HF CHANNELS

Speech Enhancement using Wiener filtering

Modern Quadrature Amplitude Modulation Principles and Applications for Fixed and Wireless Channels

Improving Sound Quality by Bandwidth Extension

APPLICATIONS OF DSP OBJECTIVES

An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec

Speech Synthesis; Pitch Detection and Vocoders

Low Bit Rate Speech Coding

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Super-Wideband Fine Spectrum Quantization for Low-rate High-Quality MDCT Coding Mode of The 3GPP EVS Codec

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

Comparison of CELP speech coder with a wavelet method

Typical Wireless Communication System

3GPP TS V8.0.0 ( )

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Universal Vocoder Using Variable Data Rate Vocoding

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

Enhanced Variable Rate Codec, Speech Service Options 3, 68, 70, and 73 for Wideband Spread Spectrum Digital Systems

Speech Compression Using Voice Excited Linear Predictive Coding

Distributed Speech Recognition Standardization Activity

Digital Audio. Lecture-6

Voice Excited Lpc for Speech Compression by V/Uv Classification

Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of

A Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder

Wideband Speech Encryption Based Arnold Cat Map for AMR-WB G Codec

Implementation of attractive Speech Quality for Mixed Excited Linear Prediction

Voice Codec for Floating Point Processor. Hans Engström & Johan Ross

Bandwidth Efficient Mixed Pseudo Analogue-Digital Speech Transmission

Evaluation of MELP Quality and Principles Marcus Ek Lars Pääjärvi Martin Sehlstedt Lule_a Technical University in cooperation with Ericsson Erisoft AB

The Optimization of G.729 Speech codec and Implementation on the TMS320VC5402

6/29 Vol.7, No.2, February 2012

QUANTIZATION NOISE ESTIMATION FOR LOG-PCM. Mohamed Konaté and Peter Kabal

Scalable Speech Coding for IP Networks

Published in: Proceesings of the 11th International Workshop on Acoustic Echo and Noise Control

COMPARATIVE REVIEW BETWEEN CELP AND ACELP ENCODER FOR CDMA TECHNOLOGY

Flexible and Scalable Transform-Domain Codebook for High Bit Rate CELP Coders

3GPP TS V5.0.0 ( )

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

IMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR

ITU-T EV-VBR: A ROBUST 8-32 KBIT/S SCALABLE CODER FOR ERROR PRONE TELECOMMUNICATIONS CHANNELS

Wideband Speech Coding & Its Application

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement

ARIB STD-T V Audio codec processing functions; Extended Adaptive Multi-Rate - Wideband (AMR-WB+) codec; Transcoding functions

Robust Algorithms For Speech Reconstruction On Mobile Devices

STATE OF THE ART AND TRENDS IN SPEECH CODING

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

Adaptive time scale modification of speech for graceful degrading voice quality in congested networks

Speech Coding in the Frequency Domain

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Advanced audio analysis. Martin Gasser

Waveform interpolation speech coding

Non-Uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes

Audio processing methods on marine mammal vocalizations

core signal feature extractor feature signal estimator adding additional frequency content frequency enhanced audio signal 112 selection side info.

Bandwidth Extension of Speech Signals: A Catalyst for the Introduction of Wideband Speech Coding?

International Journal of Advanced Engineering Technology E-ISSN

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM

MASTER'S THESIS. Speech Compression and Tone Detection in a Real-Time System. Kristina Berglund. MSc Programmes in Engineering

Systems for Audio and Video Broadcasting (part 2 of 2)

Defense Technical Information Center Compilation Part Notice

(12) Patent Application Publication (10) Pub. No.: US 2008/ A1

"PB4. (12) Patent Application Publication (10) Pub. No.: US 2006/ A1. (19) United States. narrowband T T. signal S100. highband.

Department of Electronics and Communication Engineering 1

Waveform Encoding - PCM. BY: Dr.AHMED ALKHAYYAT. Chapter Two

UNIVERSITY OF SURREY LIBRARY

-voiced. +voiced. /z/ /s/ Last Lecture. Digital Speech Processing. Overview of Speech Processing. Example on Sound Source Feature

Waveform Coding Algorithms: An Overview

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

Converting Speaking Voice into Singing Voice

Multiplexing Module W.tra.2

Transcoding free voice transmission in GSM and UMTS networks

CHAPTER 7 ROLE OF ADAPTIVE MULTIRATE ON WCDMA CAPACITY ENHANCEMENT

Speech Synthesis using Mel-Cepstral Coefficient Feature

Ninad Bhatt Yogeshwar Kosta

Pitch Period of Speech Signals Preface, Determination and Transformation

Transcription:

SILK Speech Codec TDP 10/11 Xavier Anguera I Ciro Gracia

SILK Codec Audio codec desenvolupat per Skype (Febrer 2009) Previament usaven el codec SVOPC (Sinusoidal Voice Over Packet Coder): LPC analysis. Quasi-harmonic modelling of the linear prediction (LPC) residual. Both the sinusoidal amplitudes and phases are explicitly encoded using new methods based on Gaussian mixture models.

Requeriments (Internet Wideband Audio Codec) Optimitzat per a treballar en temps real. Flexibilitat i adaptació de paràmetres a temps real, segons condicions: Xarxa Hardware Senyal d'àudio

Paràmetres (Internet Wideband Audio Codec) Bitrate: qualitat vs bitrate. Baix: <10kbps (parla en qualsevol idioma). Alt: excel lent per a tota senyal musical. Sampling rate: narrowband (8 Khz) wideband (24 Khz o més). Complexitat: 50 Mhz x86 core, wideband mode (16 KHz sampling rate). Packet Loss Resilience: minimitzar la propagació dels errors. Delay: retard < 30ms. Discontinuous Transmission (DTX): velocitat baixa quan només hi ha soroll de fons.

Encoder Sampling Rate: 8, 12, 16, 24 KHz Bitrate: 6-40 Kbps (1 bit/sample good, 1.5 bits/sample transparent) Packet rate: 20 ms frames, 1-5 frames/packet. Bitrate vs latency/ sensitivity. Packet Loss Resilience: us de dependències inter-frame per a detectar errors. Complexity: optimitzacions.

Escalabilitat del encoder

Evaluació subjectiva de qualitat MOS (Mean Opinion Score)

Encoder Voice Activity Detector LTP Scaling Control Gains Processor R a n g e Pitch Analysis Noise Shaping analysis LSF Quantizer Prediction Analysis E n c o d e r High-Pass filter PreFilter Noise Shaping Quantization

Decoder 1) R a n g e 2) Decode Parameters 3) 4) 5) d e c o d e r Generate Excitation 1) Range encoded bitstream 2) Coded Parameters 3) Pulses and Gains 4) Pitch lags and LTP doefficients 5) LPC coefficients 6) Decoded signañ LTP synthesis LPC synthesis 6)

Pitch analysis Returns a pitch value every 5ms and the voiced/unvoiced decision LPC analysis is done with order 16, 12 or 8 Three levels of correlation are used to reduce complexity

Noise shaping analysis Optimizes some parameters to reduce noise effect Balances quantization noise and bitrate Spectral shaping of the quantization noise: makes it follow the signal spectrum Deemphasizes spectral valleys (where noise would be more noticeable) Matches the levels of the decoded speech formants to the original ones Resulting parameters are applied to the signal in the PREFILTER module

Prediction analysis It is done differently depending whether we have voiced or unvoiced signals: Voiced: First a 5 coeff. long-term prediction analysis is performed on 20ms The residual is input to an LPC analysis LPC coefficients are converted to Line Spectral Frequencies(LSF) (less sensible to quantization noise) and quantized.

Prediction analysis It is done differently depending whether we have voiced or unvoiced signals: Unvoiced: No need for LTP analysis LPC is performed, transformed to an LSF vector and quantized.

LSF quantization A codebook method is used and non-uniform quantization rate: Rarely occurring values are quantized with low distortion but high number of bits Commonly occurring values are modeled with low error and low number of bits. The used codebook is trained from a large training set a priori

LTP quantization It also uses a vector codebook, chosen from 3 possible (containing 10, 20 and 40 vectors each) For each frame the best codebook is chosen according to a rate-distortion minimization function

Noise shaping quantization This module joins all outputs from all modules to generate the overall residual that is quantized and sent.

Range encoder It is a data compression method proposed in 1979 (now it is patent free) which is based on arithmetic encoding. It uses the probability of occurrence of each pattern to codify with less bits those that occur more often. It encodes the following: voiced/unvoiced, LTP + LPC quantization indexes, residual signal, several intermediate gains