Chapter IV THEORY OF CELP CODING

Similar documents
Enhanced Waveform Interpolative Coding at 4 kbps

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

Cellular systems & GSM Wireless Systems, a.a. 2014/2015

Analysis/synthesis coding

Overview of Code Excited Linear Predictive Coder

EE482: Digital Signal Processing Applications

The Channel Vocoder (analyzer):

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Speech Synthesis; Pitch Detection and Vocoders

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

International Journal of Advanced Engineering Technology E-ISSN

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

Low Bit Rate Speech Coding

Digital Speech Processing and Coding

Speech Enhancement using Wiener filtering

MASTER'S THESIS. Speech Compression and Tone Detection in a Real-Time System. Kristina Berglund. MSc Programmes in Engineering

APPLICATIONS OF DSP OBJECTIVES

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of

Flexible and Scalable Transform-Domain Codebook for High Bit Rate CELP Coders

Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding

Page 0 of 23. MELP Vocoder

Comparison of CELP speech coder with a wavelet method

SILK Speech Codec. TDP 10/11 Xavier Anguera I Ciro Gracia

An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

COMPARATIVE REVIEW BETWEEN CELP AND ACELP ENCODER FOR CDMA TECHNOLOGY

Speech Synthesis using Mel-Cepstral Coefficient Feature

Mel Spectrum Analysis of Speech Recognition using Single Microphone

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008

Speech Coding in the Frequency Domain

Audio Signal Compression using DCT and LPC Techniques

Lesson 8 Speech coding

Ninad Bhatt Yogeshwar Kosta

Communications Theory and Engineering

LOSS CONCEALMENTS FOR LOW-BIT-RATE PACKET VOICE IN VOIP. Outline

NOVEL PITCH DETECTION ALGORITHM WITH APPLICATION TO SPEECH CODING

Transcoding of Narrowband to Wideband Speech

Different Approaches of Spectral Subtraction Method for Speech Enhancement

DEPARTMENT OF DEFENSE TELECOMMUNICATIONS SYSTEMS STANDARD

Datenkommunikation SS L03 - TDM Techniques. Time Division Multiplexing (synchronous, statistical) Digital Voice Transmission, PDH, SDH

IN RECENT YEARS, there has been a great deal of interest

Scalable speech coding spanning the 4 Kbps divide

Auditory modelling for speech processing in the perceptual domain

Waveform interpolation speech coding

Wideband Speech Coding & Its Application

Wireless Communications

Problem Sheet 1 Probability, random processes, and noise

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Voice Activity Detection

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement

Voice Excited Lpc for Speech Compression by V/Uv Classification

Adaptive time scale modification of speech for graceful degrading voice quality in congested networks

Voice and Audio Compression for Wireless Communications

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Voice mail and office automation

REAL-TIME IMPLEMENTATION OF A VARIABLE RATE CELP SPEECH CODEC

ENEE408G Multimedia Signal Processing

Improving Sound Quality by Bandwidth Extension

A new quad-tree segmented image compression scheme using histogram analysis and pattern matching

Techniques for low-rate scalable compression of speech signals

Analog and Telecommunication Electronics

Speech Coding using Linear Prediction

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback

Waveform Encoding - PCM. BY: Dr.AHMED ALKHAYYAT. Chapter Two

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

6/29 Vol.7, No.2, February 2012

Spanning the 4 kbps divide using pulse modeled residual

A Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder

Improved signal analysis and time-synchronous reconstruction in waveform interpolation coding

ARIB STD-T V Audio codec processing functions; Extended Adaptive Multi-Rate - Wideband (AMR-WB+) codec; Transcoding functions

STUDY OF THE PERFORMANCE OF THE LINEAR AND NON-LINEAR NARROW BAND RECEIVERS FOR 2X2 MIMO SYSTEMS WITH STBC MULTIPLEXING AND ALAMOTI CODING

Adaptive Forward-Backward Quantizer for Low Bit Rate. High Quality Speech Coding. University of Missouri-Columbia. Columbia, MO 65211

Final draft ETSI EN V1.3.0 ( )

Implementation of attractive Speech Quality for Mixed Excited Linear Prediction

Telecommunication Electronics

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC

May A uthor -... LIB Depof "Elctrical'Engineering and 'Computer Science May 21, 1999

EEE 309 Communication Theory

CODING TECHNIQUES FOR ANALOG SOURCES

Nonuniform multi level crossing for signal reconstruction

Published in: Proceesings of the 11th International Workshop on Acoustic Echo and Noise Control

Non-Uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes

SNR Scalability, Multiple Descriptions, and Perceptual Distortion Measures

ON-LINE LABORATORIES FOR SPEECH AND IMAGE PROCESSING AND FOR COMMUNICATION SYSTEMS USING J-DSP

A 600 BPS MELP VOCODER FOR USE ON HF CHANNELS

RECENTLY, there has been an increasing interest in noisy

Evaluation of MELP Quality and Principles Marcus Ek Lars Pääjärvi Martin Sehlstedt Lule_a Technical University in cooperation with Ericsson Erisoft AB

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

Proceedings of Meetings on Acoustics

Preface, Motivation and The Speech Coding Scene

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

L19: Prosodic modification of speech

Fundamental Frequency Detection

Frequency Domain Implementation of Advanced Speech Enhancement System on TMS320C6713DSK

Department of Electronics and Communication Engineering 1

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Transcription:

Chapter IV THEORY OF CELP CODING

CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders, operates at the rates as low as 2 kbps but fail to provide speech quality appropriate for the commercial telephone application in the wire line as well as wireless medium. Hybrid coders based on Analysis by Synthesis (AbS) speech coding produces toll quality speech at rate less than 10 kbps. The chosen codec for this study is a time domain hybrid coder is based on Algebraic Code Excited Linear prediction (ACELP) architecture. The basic structure is that of the CELP with the secondary excitation searched from a memory less algebraic codebook. In this chapter the basic theory of the standard CELP based speech coding algorithm is described. Commencing with the generalized AbS coding, the analytical tools are briefly discussed. In order to maintain good quality in the reconstructed speech despite less computational complexity and spectrally efficient speech coding, various modifications are possible in the implementations of the different sections of the CELP based speech coding algorithm. 4.2 Generalized AbS-LPC Speech Coding In AbS-LPC coding [6] [46] schemes the locally synthesized signal is compared with original speech signal and coder parameters are selected so as to produce minimum mean square error between the original speech signal and the reconstructed speech signal. The perfonnance of this scheme is better due to the involvement of the closed loop optimization procedure for parameters estimate. The basic structure of the - 32-

AbS-LPC scheme [47] is as illustrated in figure 4.1. Functionally the entire AbS-LPC scheme can be divided into three sections or parts: (i) (ii) (iii) Time varying filter Excitation signal generation Error minimization procedure!"'""""................................... 1 Original speech I Excitation signal generator Time varying filter Error minimization coder Excitation signal generator Time varying filter ~-------- - -~." j s~;~~~c I Figure No. 4.1: Generalized AbS-LPC scheme 4.2.1 Time Varying Filters The time varying filter in the model is combination of the two linear predictors namely- the STP or LPC and the LTP or pitch filter. The Short term prediction filter exploits the correlation between the adjacent speech samples. The long term predictor removes the correlation between the distant samples, normally one pitch or multiple pitch away. The STP filter is generally implemented as linear prediction time -33-

varying filter, whereas the L TP is generally implemented as adaptive codebook in the excitation synthesis and search. 4.2.2 Excitation Signal Generator Excitation signal represents the input to the time varying filter (normally L TP) and is the most important part of the ABS scheme. The discrimination between the various AbS-LPC schemes is based on the way in which the excitation signal vector has been represented. In CELP based speech coding algorithms, the excitation signal is chosen from a predefined codebook. The excitation signal vector can be selected from one single codebook or it may be summation of two sub-vectors from two different excitation sources. Majority of the AbS-LPC schemes reported to use two codebooks, one fixed codebook and another adaptive codebook. In this study the developed spectrally efficient speech coding algorithm uses a fixed algebraic codebook, for fixed excitation and pitch adaptive codebook to generate the excitation for the LP synthesis. 4.2.3 Error Minimization Procedure The criterion of minimization over several errors, such as absolute error, maximum error, mean square error etc. is possible; the most commonly used error minimization criterion is minimum mean square error (MMSE). The mean square error between two signals 1 n=n MSE =-l;(s,'(n)-si(n)) N n=o (4.1) criterion in the error analysis. A perceptually or weighted mean square error criterion can also be used as -34-

4.2.4 Types Of AbS Speech Coders There are various implementations of the speech coding algorithms based on the analysis by synthesis speech coding [48], as listed below: SELP - Self excited linear predictor MPLPC - Multi pulse excited linear predictor RPE-LPC- Regular pulse excited linear predictor CELP - Code excited linear predictor. These differ more or less in the type of excitations used in the ABS scheme. For each group, however there can be different internal variation in the design and implementation of the coding algorithms. Theoretical details of the CELP based speech coding algorithm is discussed with specific emphasis to the Algebraic CELP, as it is the technique used in the present study. 4.3 CELP: Theoretical Aspects The theoretical aspects of the CELP based implementation of the ABS speech coding scheme is presented as follows. 4.3.1 Basic Principle The redundancies in the speech signal are almost removed after the short term prediction and long term prediction of the speech signal and the residual has very little correlation left in it. Then an excitation is searched which synthesizes the speech and the codebook index and gain are searched from the fixed codebook. The optimum codebook index selection criterion is based on MMSE between the locally synthesized speech and the original speech signal. Atal and Schroeder first proposed the CELP long back in 1984 [6], but until recently CELP has got the attention as speech coding -35-

algorithm for spectrally efficient speech coding. The standard model of the CELP is illustrated in the figure no. 4.2 ~-.. - " "'"'"""""'"'''"''''''''- -" ""''"'''"''''"""""''"''"'''''''''''''''"'''"''-" Input speech ~ Windowing and LP analysis Zero excitation Zero excitation ~I LP Synthesis LTP ~I LP Synthesis t r s~i~~~ ~i;~;; ct~i~; l.o... r... ~~~.. ~~~.~....r.... codebook -------------- ~------------ -------------- ~L-_L_P_S_y_n_th_es_is_...J!.......... :..... Select index and gain...!... -.....1 Figure No. 4.2: Block diagram of standard CELP coding algorithm 4.3.2 Operation Of The CELP Algorithm The illustrated CELP operates as follows: I. The original speech signal is portioned into frames of I Oms - 20ms and LP analysis is performed. LP model parameters are estimated using one of the various LP analysis methods. The memory of the STP is flushed out before further processing. 2. The L TP analysis is then performed over the target signal, which depends on the method used. Target signal is generally the LP residual obtained by the LP inverse -36-

filtering in the open loop method (OLM)or modified open loop method (MOLM), whereas the original speech is used as the target signal in the closed loop method (CLM) [22]. The pitch delay and pitch gain are the two LTP model parameters estimated in the L TP analysis. 3. The new target for the fixed codebook is then obtained by considering the STP and L TP contribution from the original speech signal. Secondary excitation is then determined by performing the exhaustive search of the fixed codebook, selection criterion being the MMSE. Codebook index and codebook gain are the selected parameters of the fixed code book. 4. The decoding algorithm for the CELP is as depicted in the figure No. 3.3. At the decoder the excitation ids constructed from the L TP parameters and the code book parameters. The synthesized excitation is then fed to the LP synthesis filter.the update of the excitation is usually performed on the sub multiples of the LP analysis frame. r - --- -...--- -.../ ~ I! Zo~~b~:k STP parameters. 'I parameters LP synthesis Or roo~:';"' Synthesized speech! ~-- -"''''''''''''''''''""~--- -- - -- - - -.. - -.. - -- -- -- "'"''"''''""""""'''''''''''''''''" "- ---- - '! Figure No. 4.3: Block diagram of standard CELP decoding algorithm -37-

4.3.3 Secondary Excitation Codebook The vectors contained in the codebook are very important part of the CELP based speech coding algorithms. It is used to generate the excitation for the time varying filter which synthesizes the speech at the decoder end. The contribution of the secondary excitation is more useful during the unvoiced portion or the inactive portion of the speech as for the voiced portion L TP provides more contribution. The codebook population of the excitation vectors and the search procedure of the excitation vectors are the two most important issues in the secondary excitation of the CELP based speech coding. The requirement of quality, lower search complexity and reduced memory for the storage of the codebook excitation vectors resulted in the the different type of the secondary excitation codebooks. A lot of research has been focused on reducing the complexity of the speech coding algorithm by using different code book architecture and efficient search procedures. As a consequence a variety of codebook structures has been developed. Some of the codebooks used are: I. Sparse codebook 2. Ternary codebook 3. Overlapping codebook 4. Binary pulse excited codebook 5. Algebraic codebook The secondary excitation code books have faster search if the the code book are structured, consequently CELP based coding sues the structured code book. There are three types of structured code books: 1. Sparse codebook 2. Ternary codebook 3. Algebraic codebook 38-

4.3.3.1 Sparse codebook A zero mean unit variance Gaussian random process is used to populate this type of the code book. Variables are usually set to zero whenever their absolute values are less than some predefined threshold. This type of code vectors has the ability tp produce natural sounding reconstructed speech. The design of the codebook, larger search complexity and the large storage needed are the limitation in the use of this code book. 4.3.3.2 Ternary codebook A ternary excitation codebook vector is sparse excitation codebook vector in which the nonzero values are replaced either by -1 (value<o) or +I (value>o). This result in a code vector consists of only three possible values. The computational complexity reduces due to the fact that multiplications are reduced to summation as the magnitude of the code vector elements is either zero or one. 4.3.3.3 Algebraic codebook The codebook uses algebraic codes; based in interlaced permutation codes (IPC) excitation vectors are derived. Earlier schemes of the algebraic code books used the binary codes to populate the codebook vectors. In the IPC the vectors contains few non zero pulses with predefined set of positions and pulses are allowed to take fixed amplitude, either + 1 or -1. Each pulse has a set of possible positions, distinct from the position of other pulses. The excitation code vector is determined by the position and amplitude of the non zero pulses. This codebook structure has a several merits, firstly it does not requires storage at the decoder or encoder, as the codebook index defines the code vectors completely. Secondly it defines inherent robustness against he channel errors. Finally and the most important merit of the algebraic codebook is the better search efficiency. -39-

4.3.4 Codebook Search Most of the computational complexity results due to the exhaustive search of the codebook vectors. To search for the optimum code vector out of the entire code book, exhaustive search of the code book is performed [). The search criterion is the minimum mean square error between the synthesized and the original speech. The mean square error can be minimized by maximizing the Tk, given by (4.2) alternative form as Where c k is code vector and, s k is the energy. This can be expressed in an (4.3) Where X is the target vector and H is the lower triangular matrix of impulse response of the STP synthesis filter. 4.4 CELP Implementation Issues From the above discussion it is clear that the computation can e broken down into three blocks: (ii) LPC analysis or STP (iii) Pitch analysis L TP (iv) Codebook search The LPC analysis and the L TP analysis have already been explained in the chapter II and needs no repetition here. First issue in the CLEP is the complexity resulted due to the exhaustive search of the fixed codebook (time complexity) and the storage of the code vectors (space complexity). A lot of research has been focused on reducing the -40-

complexity of the speech coding algorithm by using different codebook architecture and efficient search procedures. As a consequence a variety of codebook structures has been developed Second major issue in the CELP based speech coding algorithm is that of the efficient and transparent quantization of the LSF parameters, so as to encode the LP parameters into as few bits as possible and at the same time optimizing the computational complexity. The quantization issues of the LSF parameters have been discussed in the chapter III. Lastly, the issue of the search complexity of the pitch analysis or the adaptive codebook search. The exhaustive search for the entire range of the pitch delay has to carry out in order to estimate the pitch delay. A lot of focus is on the efficient pitch search algorithm as it is computational very complex. A pre selection based pitch lag search techniques has been implemented in this work. 4.5 Performance Evaluation Of The Speech Coders A speech coding algorithm is evaluated based on the bit rate, quality of reconstructed speech, complexity of the algorithm, algorithmic delay and robustness to channel errors. In general high quality speech coding at low bit rate is achieved by large complexity algorithms and hence longer algorithmic delay. The quality of reconstructed speech in the clean speech conditions as well as the algorithm evaluation has to perform with speech corrupted by background noise. Moreover, in some application the speech coding algorithm performance has to be checked for non speech signal such as DTMF (dual tone multifrquency) and codec performance in tandem [ 17]. general categories: For digital communication of speech the quality is classified into four Broadcast Network or toll - 41 -

Communication Synthetic Broadcast quality refers to high quality "commentary" speech, generally achieved at a rate above 64 Kbps. Toll quality refers to the quality comparable to that of classical analog speech communication (200Hz to 3300 Hz). Toll quality can be achieved at the mid range of data rates. Communication quality implies high intelligibility, may be slightly degraded in quality but natural speech with speaker recognition. Communication quality can be achieved at rates above 4.8 Kbps. Synthetic quality is intelligible, and can be unnatural without speaker recognizability. Speech coder operating below 4.8 Kbps can generate synthetic quality. The quality of the reconstructed speech, based on the four class of quality can be quantified either based on objective measure or the subjective measure. 4.5.1 Objective measure The signal to noise ratio (SNR) is one of the most popular and common objective measure for evaluating the quality performance of a compression algorithm. It is a long term measure for the accuracy of the reconstructed speech. SNR is the ratio of average speech signal power of length N, to the reconstruction error difference. It can be expressed as N-1 z:s2(n) SNR = ~-""""" 0 '--- N-1 Z::(s(n)-s(n)) n:o (4.4) Where s(n) is the reconstructed speech and s(n) is the original speech. Temporal variation in the quality of reconstructed speech can be better evaluated by segmental SNR (SEGSNR) which is given by -42-

n N-l L-1 :~::>2 (in+ n) SEGSNR = lo L log "" 0 (4.5) L n N-1 2,., :L(s(iN+n)-s(iN+n)) n O Where N is the length of the segment and L are the number of segments. As an averaging operation occurs after the logarithm, the SGSNR penalizes the speech coding algorithm more, whose performance is variable. 4.5.2 Subjective Measure The previously discussed objective measure is often sensitive to both gain and delay variations and do not account for the perceptual properties of the ear. The selection of most of the low and medium bit rate coders is determined by the perceptual criterion therefore subjective evaluation [49] [50] is required. There are number of ways to subjectively evaluate the performance but the most popular method is Mean opinion score (MOS). The MOS measure is widely used to quantify the subjective performance of the reconstructed speech through the coding algorithm and original speech as well. The MOS usually involves number of listeners, who are instructed to rate to a five level scale, the quality of speech, as given in table No.4.!. MOS scale Speech quality I Bad 2 Poor 3 Fair 4 Good 5 Excellent Table No. 4.1: Mean opinion scores -43-

The MOS rating is obtained by averaging the values of several scores. The MOS range relates to speech quality as follows: MOS from 4.0 to 4.5 implies network quality. MOS from 3.5 to 4.0 implies communication quality. MOS from 2.5 to 3.5 implies synthetic quality -ooo- -44-