Analysis/synthesis coding

Similar documents
Communications Theory and Engineering

EE482: Digital Signal Processing Applications

Chapter IV THEORY OF CELP CODING

Cellular systems & GSM Wireless Systems, a.a. 2014/2015

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Overview of Code Excited Linear Predictive Coder

Enhanced Waveform Interpolative Coding at 4 kbps

Speech Synthesis; Pitch Detection and Vocoders

Voice Codec for Floating Point Processor. Hans Engström & Johan Ross

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

The Channel Vocoder (analyzer):

Fundamental Frequency Detection

Voice Excited Lpc for Speech Compression by V/Uv Classification

Speech Coding using Linear Prediction

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Speech Compression Using Voice Excited Linear Predictive Coding

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

MASTER'S THESIS. Speech Compression and Tone Detection in a Real-Time System. Kristina Berglund. MSc Programmes in Engineering

APPLICATIONS OF DSP OBJECTIVES

Comparison of CELP speech coder with a wavelet method

Digital Speech Processing and Coding

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

COMP 546, Winter 2017 lecture 20 - sound 2

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

Digital Signal Representation of Speech Signal

Speech Signal Analysis

Page 0 of 23. MELP Vocoder

NOVEL PITCH DETECTION ALGORITHM WITH APPLICATION TO SPEECH CODING

COMPARATIVE REVIEW BETWEEN CELP AND ACELP ENCODER FOR CDMA TECHNOLOGY

Lesson 8 Speech coding

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

SILK Speech Codec. TDP 10/11 Xavier Anguera I Ciro Gracia

Wideband Speech Coding & Its Application

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Waveform Coding Algorithms: An Overview

Audio Signal Compression using DCT and LPC Techniques

Low Bit Rate Speech Coding

A Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder

Speech Synthesis using Mel-Cepstral Coefficient Feature

Typical Wireless Communication System

Improving a Transmission Planning Tool by Adding Acoustic Factors

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding

SPEECH AND SPECTRAL ANALYSIS

Voice mail and office automation

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Wireless Communications

Telecommunication Electronics

Analog and Telecommunication Electronics

DEPARTMENT OF DEFENSE TELECOMMUNICATIONS SYSTEMS STANDARD

Improved signal analysis and time-synchronous reconstruction in waveform interpolation coding

Pitch Period of Speech Signals Preface, Determination and Transformation

A 600 BPS MELP VOCODER FOR USE ON HF CHANNELS

An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec

Improving Sound Quality by Bandwidth Extension

LOSS CONCEALMENTS FOR LOW-BIT-RATE PACKET VOICE IN VOIP. Outline

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

ENEE408G Multimedia Signal Processing

International Journal of Advanced Engineering Technology E-ISSN

Linguistic Phonetics. Spectral Analysis

STATE OF THE ART AND TRENDS IN SPEECH CODING

DECOMPOSITION OF SPEECH INTO VOICED AND UNVOICED COMPONENTS BASED ON A KALMAN FILTERBANK

Speech synthesizer. W. Tidelund S. Andersson R. Andersson. March 11, 2015

UNIVERSITY OF SURREY LIBRARY

Digital Signal Processing

Flexible and Scalable Transform-Domain Codebook for High Bit Rate CELP Coders

Evaluation of MELP Quality and Principles Marcus Ek Lars Pääjärvi Martin Sehlstedt Lule_a Technical University in cooperation with Ericsson Erisoft AB

3GPP TS V5.0.0 ( )

Techniques for low-rate scalable compression of speech signals

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM

Pulse Code Modulation

The Opus Codec To be presented at the 135th AES Convention 2013 October New York, USA

NCCF ACF. cepstrum coef. error signal > samples

EE 225D LECTURE ON SPEECH SYNTHESIS. University of California Berkeley

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES

Voice and Audio Compression for Wireless Communications

EE482: Digital Signal Processing Applications

Bandwidth Extension of Speech Signals: A Catalyst for the Introduction of Wideband Speech Coding?

3GPP TS V8.0.0 ( )

Components for Signal Compression

EUROPEAN pr ETS TELECOMMUNICATION March 1996 STANDARD

Information. LSP (Line Spectrum Pair): Essential Technology for High-compression Speech Coding. Takehiro Moriya. Abstract

EC 2301 Digital communication Question bank

Final draft ETSI EN V1.2.0 ( )

Waveform interpolation speech coding

Synthesis Algorithms and Validation

Speech Coding in the Frequency Domain

Adaptive Filters Linear Prediction

General outline of HF digital radiotelephone systems

Pulse Code Modulation (PCM)

ON-LINE LABORATORIES FOR SPEECH AND IMAGE PROCESSING AND FOR COMMUNICATION SYSTEMS USING J-DSP

LMR Codecs Why codecs? Which ones? Why care? Joseph Rothweiler Sensicomm LLC Hudson NH

Open Access Improved Frame Error Concealment Algorithm Based on Transform- Domain Mobile Audio Codec

May A uthor -... LIB Depof "Elctrical'Engineering and 'Computer Science May 21, 1999

Psychology of Language

Development of a Voice Conversion System

Transcription:

TSBK06 speech coding p.1/32 Analysis/synthesis coding Many speech coders are based on a principle called analysis/synthesis coding. Instead of coding a waveform, as is normally done in general audio coders and image coders, we have a parametric model of the source. The coder (the analysis part) tries to estimate the model parameters from the signal to be coded. These parameters are sent to the decoder (the synthesis part) which uses them to control the same model and reconstruct the signal. This will usually work well when we have a narrow class of signals where we have a good model of the source, such as human speech. However, analysis/synthesis coding might not work well for coding of general audio or image signals. The decoded signal might not be similar to the original signal in a mean square error sense, but can still sound very much like the original signal to a human listener.

TSBK06 speech coding p.2/32 Analysis by synthesis A variant of analysis/synthesis coding is analysis by synthesis coding. The coder also contains a decoder, and tries to find the model parameters that gives a decoded signal close (in some sense) to the original signal.

TSBK06 speech coding p.3/32 Human speech Sound is generated by forcing air through the vocal cords (located in the larynx). If the vocal cords are tense, they vibrate and generate tones and overtones (voiced sounds). If the vocal cords are relaxed, a noiselike sound is produced (unvoiced sounds). The sound then passes through the laryngeal cavity, the pharynx and the oral and an nasal cavities. Tongue, lips and teeth are also used to influence the sound. Everything after the vocal cords (the vocal tract) can be well modelled by a linear filter.

Human speech TSBK06 speech coding p.4/32

Examples of speech sounds TSBK06 speech coding p.5/32

Examples of speech sounds TSBK06 speech coding p.6/32

TSBK06 speech coding p.7/32 Model of speech A simple model of human speech: Noise Filter Speech Pulse source The speech is seen as a sequence of either voiced or unvoiced sounds. The voiced sounds are modelled as a filtered pulse train, while the unvoiced sounds are modelled as filtered white noise. The parameters of the model are filter coefficients, switches between voiced and unvoiced sounds, and the pulse trains.

TSBK06 speech coding p.8/32 Model of speech, cont. The speech signal y n is modelled as y n = M a i y n i + Gɛ n i=1 The coder splits the signal into shorts segments of, typically a few hundred samples (at sampling frequency 8 khz). For each segment the coder estimates if the sound is voiced or unvoiced. For unvoiced sounds a suitable pulse train is estimated. Filter parameters a i and G are estimated. All these parameters are sent to the receiver, which can then decode the sound using the model. The coding us thus a kind of linear predictive coding. One major difference, compared to our earlier description of predictive coding, is that the main part of the bit rate is used to send filter coefficients and not the prediction error.

TSBK06 speech coding p.9/32 Pulse trains The pulse trains can vary in complexity between different coders. Often the pitch period, corresponding to the fundamental frequency of the sound, is estimated. The simplest pulse trains use pulses of the same amplitude at constant intervals. The pulse train can then be described just by the pitch period and the start position of the first pulse. We can also let the amplitudes and positions of the pulses very more freely. It is then possible to get a pulse train that fits the signal better, but at the cost of a higher bit rate.

TSBK06 speech coding p. 10/32 Pulse trains, examples 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 20 40 60 80 100 120 140 160 180 200

TSBK06 speech coding p. 11/32 Voiced or unvoiced? Voiced sounds usually have a larger energy (larger amplitude) than unvoiced sounds. Unvoiced sounds usually contain higher frequencies than voiced sounds. One way of determining if a segment is voiced or unvoiced can be to comapare the signal energy with the energy of the background noise, and to count the number of zero crossings of the sounds.

TSBK06 speech coding p. 12/32 Estimating the pitch period The auto correlation function R yy (k) canbeusedtoestimatethepitch period P. For a periodic signal, the acf has a maximum at k = P. Another, better method is to use the average magnitude difference function (AMDF). It is defined by AMDF (k) = 1 N k 0 +N i=k 0 +1 y i y i k where k 0 depends on which segment we re in and N is the size of the segment. The AMDF will have a minimum where k is equal to the pitch period of the segment. The AMDF can also be used to determine if the segment is voiced or unvoiced. For unvoiced sounds the AMDF will have very shallow minima, not much different from the average value of the AMDF.

TSBK06 speech coding p. 13/32 Example 0.6 0.4 0.2 0 0.2 0.4 0.6 0 50 100 150 200 250 A segment of 256 samples from a speech signal.

TSBK06 speech coding p. 14/32 Example 0.08 0.06 0.04 0.02 0 0.02 0.04 0 10 20 30 40 50 60 70 80 90 100 Estimated auto correlation function. Gives pitch period 31.

TSBK06 speech coding p. 15/32 Example 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 10 20 30 40 50 60 70 80 90 100 Estimated AMDF. Gives pitch period 31.

TSBK06 speech coding p. 16/32 Estimating filter coefficients We want to find a i such that the average value of the quadratic error e 2 n is minimized, where e 2 n =(y n M a i y n i Gɛ n ) 2 i=1 Minimizing the expected value E{e 2 n} gives the following equation system a j E{e 2 n} =0 M a i E{y n i y n j } = E{y n y n j } i=1 In order to solve this we need to estimate E{y n i y n j }, which can be done either by the auto correlation method or by the auto covariance method.

TSBK06 speech coding p. 17/32 Auto correlation method We assume that y n is stationary, which means that E{y n i y n j } = R yy ( i j ) In addition, we assume that the signal is 0 outside of the current segment, so that we can estimate the auto correlation function as R yy (k) = n 0 +N n=n 0 +1+k y n y n k

TSBK06 speech coding p. 18/32 Auto correlation method, cont. The equation system can then be written as Rā = p where R = R yy (0) R yy (1)... R yy (M 1) R yy (1) R yy (0)... R yy (M 2)........ R yy (M 1) R yy (M 2)... R yy (0) and ā =[a 1 a 2...a M ] T p =[R yy (1) R yy (2)...R yy (M)] T Solve for ā.

TSBK06 speech coding p. 19/32 Auto covariance method We do not assume that y n is stationary. We define which can be estimated as c ij = E{y n i y n j } c ij = n 0 +N n=n 0 +1 y n y n k

TSBK06 speech coding p. 20/32 Auto covariance method, cont. The equation system can then be written as Cā = s where C = c 11 c 12... c 1M c 21 c 22... c 2M...... c M1 c M2... c MM and s =[c 10 c 20...c M0 ] T Solve for ā.

TSBK06 speech coding p. 21/32 LPC-10 Old american speech coding standard fo the rate 2.4 kbits/s. Segments of 180 sampel Pitch period 60 possible values 10 filter coefficients for voiced sounds, 4 coefficients for unvoiced sounds. Gives a rather synthetic decoded sound. Not so good for high background noise.

TSBK06 speech coding p. 22/32 Long Term Prediction (LTP) Often a predictor that utilizes both the most recent samples and samples one pitch period P back in time is used. y n = M a i y n i + i=1 K α j y n P j+1 + Gɛ n j=1 The part using α j is called long term prediction and the part using a i is called short term prediction.

TSBK06 speech coding p. 23/32 RELP Regular Excitation Linear Prediction In a RELP coder no choice between voiced and unvoiced sounds is made. The pitch period P and filter coefficients a i and α j are estimated. After inverse filtering we get a residual signal that is lowpass filtered, downsampled (typically a factor 3 or 4) and quantized and sent sampel by sampel. A RELP coder is thus rather similar to a traditional predictive coder, where the prediction error (the residual signal) is sent. Note that the quantization is outside the predictor loop. This will work for the short segments that are used.

TSBK06 speech coding p. 24/32 Multi-pulse LPC (MP-LPC) MP-LPC is an analysis by synthesis coder. The coder estimates filter coefficients. The coder then tries to find an optimal pulse train (position and amplitude for a number of pulses) that will be decoded to a signal as close to the original signal as possible. One disadvantage of MP-LPC is that the coding is rather computation intensive. Used in Skyphone, a system for telephony from airplanes, with the rate 9.6 kbit/s

TSBK06 speech coding p. 25/32 Example, MP-LPC 0.25 0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2 0 20 40 60 80 100 120 140 160 180 200 A segment of 200 samples from a speech signal.

TSBK06 speech coding p. 26/32 Example, MP-LPC We adapt a 5 coefficent filter to the signal using the auto correlation method. The filter coefficients (before quantization) are: ā 1.5373 0.2515 0.2400 0.1754 0.0912

TSBK06 speech coding p. 27/32 Example, MP-LPC A pulse train with ten pulses is optimized so that the decoded signal is as close to the original signal as possible. 0.08 0.06 0.04 0.02 0 0.02 0.04 0.06 0.08 0 20 40 60 80 100 120 140 160 180 200

TSBK06 speech coding p. 28/32 Example, MP-LPC 0.25 0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2 0 20 40 60 80 100 120 140 160 180 200 Decoded signal.

TSBK06 speech coding p. 29/32 Example, MP-LPC 0.25 0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2 0 20 40 60 80 100 120 140 160 180 200 Original signal and decoded signal.

TSBK06 speech coding p. 30/32 RPE-LTP Regular Pulse Excitation with Long Term Prediction The first coding method used in the GSM system. It has later been replaced by other methods. Can be seen as a hybrid between RELP and MP-LPC. The coder tries to find a pulse train that is decoded to a signal as close to the original signal as possible. The pulses are limitied to be located in a regular pattern. The coder uses the rate 13 kbit/s. Including error correction we get the total rate 22.8 kbit/s.

TSBK06 speech coding p. 31/32 Code Excited Linear Prediction (CELP) Analysis by synthesis. The coder estimates filter coefficients, and then tries to find an excitation signal from a codebook that is decoded to a signal close to the original signal. It is thus a form of vector quantization, often of the type gain-shape. What is sent to the receiver is filter parameters, index in the codebook and gain parameters. Often a combination of a fixed and an adaptive codebook is used. There are variants where the filter parameters are estimated using the previous segment. Since the decoder also has access to those old samples, only index data needs to be transmitted.

TSBK06 speech coding p. 32/32 CELP in GSM Enhanced Full Rate Algebraic CELP Data rate 12.2 kbit/s Adaptive Multi-Rate Algebraic CELP Data rate between 4.75 kbit/s and 12.2 kbit/s (in 8 steps). The coder tries to adapt to the channel quality. If the channel is bad the speech coder will use a low rate and then many bits are used for error correction. For better channels not as many bits are needed for error correction and the speech coder can then use a higher rate. The channel rate is either 22.8 kbit/s or 11.4 kbit/s (half rate channel).