SILK Speech Codec. TDP 10/11 Xavier Anguera I Ciro Gracia

Size: px

Start display at page:

Download "SILK Speech Codec. TDP 10/11 Xavier Anguera I Ciro Gracia"

Isabel Short
6 years ago
Views:

1 SILK Speech Codec TDP 10/11 Xavier Anguera I Ciro Gracia

2 SILK Codec Audio codec desenvolupat per Skype (Febrer 2009) Previament usaven el codec SVOPC (Sinusoidal Voice Over Packet Coder): LPC analysis. Quasi-harmonic modelling of the linear prediction (LPC) residual. Both the sinusoidal amplitudes and phases are explicitly encoded using new methods based on Gaussian mixture models.

3 Requeriments (Internet Wideband Audio Codec) Optimitzat per a treballar en temps real. Flexibilitat i adaptació de paràmetres a temps real, segons condicions: Xarxa Hardware Senyal d'àudio

4 Paràmetres (Internet Wideband Audio Codec) Bitrate: qualitat vs bitrate. Baix: <10kbps (parla en qualsevol idioma). Alt: excel lent per a tota senyal musical. Sampling rate: narrowband (8 Khz) wideband (24 Khz o més). Complexitat: 50 Mhz x86 core, wideband mode (16 KHz sampling rate). Packet Loss Resilience: minimitzar la propagació dels errors. Delay: retard < 30ms. Discontinuous Transmission (DTX): velocitat baixa quan només hi ha soroll de fons.

5 Encoder Sampling Rate: 8, 12, 16, 24 KHz Bitrate: 6-40 Kbps (1 bit/sample good, 1.5 bits/sample transparent) Packet rate: 20 ms frames, 1-5 frames/packet. Bitrate vs latency/ sensitivity. Packet Loss Resilience: us de dependències inter-frame per a detectar errors. Complexity: optimitzacions.

6 Escalabilitat del encoder

7 Evaluació subjectiva de qualitat MOS (Mean Opinion Score)

8 Encoder Voice Activity Detector LTP Scaling Control Gains Processor R a n g e Pitch Analysis Noise Shaping analysis LSF Quantizer Prediction Analysis E n c o d e r High-Pass filter PreFilter Noise Shaping Quantization

9 Decoder 1) R a n g e 2) Decode Parameters 3) 4) 5) d e c o d e r Generate Excitation 1) Range encoded bitstream 2) Coded Parameters 3) Pulses and Gains 4) Pitch lags and LTP doefficients 5) LPC coefficients 6) Decoded signañ LTP synthesis LPC synthesis 6)

10 Pitch analysis Returns a pitch value every 5ms and the voiced/unvoiced decision LPC analysis is done with order 16, 12 or 8 Three levels of correlation are used to reduce complexity

11 Noise shaping analysis Optimizes some parameters to reduce noise effect Balances quantization noise and bitrate Spectral shaping of the quantization noise: makes it follow the signal spectrum Deemphasizes spectral valleys (where noise would be more noticeable) Matches the levels of the decoded speech formants to the original ones Resulting parameters are applied to the signal in the PREFILTER module

12 Prediction analysis It is done differently depending whether we have voiced or unvoiced signals: Voiced: First a 5 coeff. long-term prediction analysis is performed on 20ms The residual is input to an LPC analysis LPC coefficients are converted to Line Spectral Frequencies(LSF) (less sensible to quantization noise) and quantized.

13 Prediction analysis It is done differently depending whether we have voiced or unvoiced signals: Unvoiced: No need for LTP analysis LPC is performed, transformed to an LSF vector and quantized.

14 LSF quantization A codebook method is used and non-uniform quantization rate: Rarely occurring values are quantized with low distortion but high number of bits Commonly occurring values are modeled with low error and low number of bits. The used codebook is trained from a large training set a priori

15 LTP quantization It also uses a vector codebook, chosen from 3 possible (containing 10, 20 and 40 vectors each) For each frame the best codebook is chosen according to a rate-distortion minimization function

16 Noise shaping quantization This module joins all outputs from all modules to generate the overall residual that is quantized and sent.

17 Range encoder It is a data compression method proposed in 1979 (now it is patent free) which is based on arithmetic encoding. It uses the probability of occurrence of each pattern to codify with less bits those that occur more often. It encodes the following: voiced/unvoiced, LTP + LPC quantization indexes, residual signal, several intermediate gains

Chapter IV THEORY OF CELP CODING

Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,