Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

Similar documents
Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

Overview of Code Excited Linear Predictive Coder

Digital Speech Processing and Coding

EE482: Digital Signal Processing Applications

Proceedings of Meetings on Acoustics

Audio Signal Compression using DCT and LPC Techniques

Chapter IV THEORY OF CELP CODING

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

Enhanced Waveform Interpolative Coding at 4 kbps

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

COMPARATIVE REVIEW BETWEEN CELP AND ACELP ENCODER FOR CDMA TECHNOLOGY

APPLICATIONS OF DSP OBJECTIVES

The Optimization of G.729 Speech codec and Implementation on the TMS320VC5402

Cellular systems & GSM Wireless Systems, a.a. 2014/2015

Comparison of CELP speech coder with a wavelet method

The Channel Vocoder (analyzer):

Transcoding of Narrowband to Wideband Speech

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Wideband Speech Coding & Its Application

Voice Excited Lpc for Speech Compression by V/Uv Classification

Implementation of attractive Speech Quality for Mixed Excited Linear Prediction

Flexible and Scalable Transform-Domain Codebook for High Bit Rate CELP Coders

Waveform Encoding - PCM. BY: Dr.AHMED ALKHAYYAT. Chapter Two

Speech Compression Using Voice Excited Linear Predictive Coding

Scalable Speech Coding for IP Networks

Information. LSP (Line Spectrum Pair): Essential Technology for High-compression Speech Coding. Takehiro Moriya. Abstract

LOSS CONCEALMENTS FOR LOW-BIT-RATE PACKET VOICE IN VOIP. Outline

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Analysis/synthesis coding

10 Speech and Audio Signals

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

Communications Theory and Engineering

Waveform Coding Algorithms: An Overview

Adaptive time scale modification of speech for graceful degrading voice quality in congested networks

Pulse Code Modulation

Data Transmission at 16.8kb/s Over 32kb/s ADPCM Channel

Audio Compression using the MLT and SPIHT

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Comparative Analysis between DWT and WPD Techniques of Speech Compression

An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec

Pulse Code Modulation

EEE 309 Communication Theory

Speech Synthesis using Mel-Cepstral Coefficient Feature

Digital Communication (650533) CH 3 Pulse Modulation

ON-LINE LABORATORIES FOR SPEECH AND IMAGE PROCESSING AND FOR COMMUNICATION SYSTEMS USING J-DSP

Speech Coding using Linear Prediction

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

Wideband Speech Encryption Based Arnold Cat Map for AMR-WB G Codec

A Novel Approach of Compressing Images and Assessment on Quality with Scaling Factor

6/29 Vol.7, No.2, February 2012

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Digital Audio. Lecture-6

Analog and Telecommunication Electronics

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

Speech Synthesis; Pitch Detection and Vocoders

MASTER'S THESIS. Speech Compression and Tone Detection in a Real-Time System. Kristina Berglund. MSc Programmes in Engineering

International Journal of Advanced Engineering Technology E-ISSN

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site

-/$5,!4%$./)3% 2%&%2%.#% 5.)4 -.25

Lesson 8 Speech coding

Voice Transmission --Basic Concepts--

NOVEL PITCH DETECTION ALGORITHM WITH APPLICATION TO SPEECH CODING

Department of Electronics and Communication Engineering 1

QUESTION BANK EC 1351 DIGITAL COMMUNICATION YEAR / SEM : III / VI UNIT I- PULSE MODULATION PART-A (2 Marks) 1. What is the purpose of sample and hold

Telecommunication Electronics

CHAPTER 3 Syllabus (2006 scheme syllabus) Differential pulse code modulation DPCM transmitter

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC

EEE 309 Communication Theory

Datenkommunikation SS L03 - TDM Techniques. Time Division Multiplexing (synchronous, statistical) Digital Voice Transmission, PDH, SDH

Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding

Low Bit Rate Speech Coding

PULSE CODE MODULATION (PCM)

Speech synthesizer. W. Tidelund S. Andersson R. Andersson. March 11, 2015

Lecture Outline. Data and Signals. Analogue Data on Analogue Signals. OSI Protocol Model

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Realization and Performance Evaluation of New Hybrid Speech Compression Technique

Speech Enhancement using Wiener filtering

EC 2301 Digital communication Question bank

Tree Encoding in the ITU-T G Speech Coder

ON THE PERFORMANCE OF WTIMIT FOR WIDE BAND TELEPHONY

Low Bit Rate Speech Coding Using Differential Pulse Code Modulation

A Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Image Compression using DPCM

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Physical Layer: Outline

CHAPTER 4. PULSE MODULATION Part 2

Final draft ETSI EN V1.3.0 ( )

3GPP TS V5.0.0 ( )

Adaptive Forward-Backward Quantizer for Low Bit Rate. High Quality Speech Coding. University of Missouri-Columbia. Columbia, MO 65211

Audio and Speech Compression Using DCT and DWT Techniques

REAL-TIME IMPLEMENTATION OF A VARIABLE RATE CELP SPEECH CODEC

Comparison of Low-Rate Speech Transcoders in Electronic Warfare Situations: Ambe-3000 to G.711, G.726, CVSD

AN ABSTRACT OF THE THESIS OF. Meeta Bhutani for the degree of Master of Science in Electrical and Computer

Dilpreet Singh 1, Parminder Singh 2 1 M.Tech. Student, 2 Associate Professor

Speech Coding in the Frequency Domain

Transcription:

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP Monika S.Yadav Vidarbha Institute of Technology Rashtrasant Tukdoji Maharaj Nagpur University, Nagpur, India monika.yadav@rediffmail.com Abstract Speech coding has been major issue in the area of digital speech processing. Speech coding is the application of data compression of digital audio signal containing speech. The speech signal in a more compressed form, which can then be transmitted with few numbers of binary digits. It is not possible to access unlimited bandwidth of a channel each time we send a signal across it which leads to code and compress speech signals. Speech compression aims to compress the speech signal to attain maximum channel capacity with lower bit rate and highest quality. Code Excited Linear Prediction (CS-ACELP) algorithm, operating at a bit rate of 8Kbps for discrete speech samples sampled at a rate of 8000 samples per second. G.729 operates to take every 10ms speech frame, the input speech signal is analyzed to extract the parameters. These parameters are then coded. G.729 is one of the widely used standard in ITU-T for speech compression. Speech compression is applied in long distance communication, high class speech storage, and message encryption. Speech coding is a lossy type of coding and hence the output signal does not exactly sound like the input.speech coding techniques discussed here are Linear predictive coding, waveform coding, Code excited linear predictive coding, etc. Keywords: Speech coding, Linear Predictive Coding (LPC), Wave form coding, Conjugate Structure Algebric Code Excited Linear Predictive Coding(CS-ACELP). I. Introduction Speech coding is the process of representing a voice signal for efficient transmission or storage. These codes will be sent over both band limited wired and Wireless channels. The aim of speech coding is to represent the samples of a speech signal in a compact form thus having the less code symbols Without degrading the quality of the speech signal. Speech coding is very important in Cellular and Mobile Communication. It has applications in Voice over internet protocol (VOIP), Videoconferencing, electronic toys, archiving, Digital simultaneous voice and data (DSVD), numerous computer based gaming and Multimedia applications, Most of the speech applications require minimum coding delay in order to avoid hindering the flow of the speech conversation because of long coding delays, A speech coder is one which converts a digitized speech signal into the coded representation and transmits it in a form of frames, At the receiving end, the speech decoder receives the coded frames and synthesizes reconstructed speech signal. The decoded speech should be audible distinguishable from the original speech signal. G.729(8kbps) is one of the famous standard for speech compression by ITU-T. The ITU-T recommendation is based on Conjugate Structure Adaptive Code Excited Linear Prediction (CS-ACELP) algorithm, operating at a bit rate of 8Kbps for discrete speech samples sampled at a rate of 8000 samples per second. G.729 operates to take every 10ms speech frame, the input speech signal is analyzed to extract the parameters. These parameters are then coded. The G.729 is a low complexity continuous data transmission scheme for VoIP applications and provide good synthesized speech quality at low bit rate. IJESAT May-Jun 2016 143

CS-ACELP can only be used for human voice (due to the model used) and is relatively complex. 1. Speech coding techniques Speech coding techniques are mainly two types which are l.lossless and 2.lossy coding methods, The lossy coding technique have the reconstructed speech signal perceptually different from the original speech signal Whereas the lossless coding technique, the reconstructed signal at the decoder end has exactly the same shape as the input speech signal, Mostly the speech coding techniques are based on the lossy coding technique because it removes the information which is irrelevant from the perceptual quality point of View. Speech coders are classier based on the bit-rate at which they produce output with reasonable quality and on the type of coding techniques used for coding the speech signal. 2. Waveform coding Waveform coding is the simplest technique for speech coding, Waveform coders analyze code and reconstruct original signal, sample by sample. Waveform coders are used to reproduce the exact shape of the speech signal waveform, Without considering nature of human speech production and delivering system. The most commonly used waveform coding algorithms are uniform 16-bit PCM, companded 8-bit PCM and ADPCM. Waveform coding is explored in both time and frequency domain. 3. Differential Pulse Code Modulation Differential PCM (DPCM) is designed to calculate this difference and then transmit this small difference signal instead of the entire input sample signal, Since the difference between input samples is less than an entire input sample, the number of bits required for transmission is reduce. Using DPCM can reduce the bit rate of voice transmission down to 48 kbps. The input signal is sampled and modulated. The sampled input signal is stored in a predictor, The predictor takes the stored sample signal and sends it through a differentiator, The differentiator compares the previous sample signal with the current sample signal and sends this difference to the quantizing and coding phase of PCM (this phase can be uniform quantizing or companding with A-law or u-law). After quantizing and coding, the difference signal is transmitted to its final destination, At the receiving end of the network, everything is reversed, First the difference signal is dequantized, Then this difference signal is added to a sample signal stored in a predictor and sent to a low-pass filter that reconstructs the original input signal, DPCM faces some problems While dealing With voice quality. To solve this problem, adaptive DPCM is developed. 4. Linear Predictive Coding Linear Predictive Coding (LPC) is a powerful, good quality, low bit rate speech analysis technique for encoding a speech signal. The source filter model used in LPC is also known as the linear predictive coding model, It has two main components LPC analysis (encoding) and LPC synthesis (decoding), The goal of the LPC analysis is to estimate Whether the speech signal is voiced or unvoiced, to find the pitch of each frame and to the parameters needed to build the source filter model, These parameters are transmitted to the receiver will carry out LPC synthesis using the received parameters. 5. Code Excited Linear Prediction The basic principle that all speech coders exploit is the fact that speech signals are highly correlated waveforms. Speech can be represented using an autoregressive (AR) model. Along with its variants, such as algebraic CELP, relaxed CELP,low delay CELP and vector some excited linear prediction, it is currently the most widely used speech coding algorithm. It is also used in MPEG-4 audio speech coding. II. CS-ACELP DESCRIPTION A. Encoder IJESAT May-Jun 2016 144

The CS-ACELP coder processes input signals on a frame-by-frame and sub frame-by-sub frame basis. The frame length is 10 ms and consists of two 5 ms sub frames. The algorithm utilizes vector quantization method, both the adaptive codebook and fixed codebook are vector quantized to form conjugate structure. The 8kbps core speech coder is derived from G.729 coder[2] and the coder is based on Code-Excited Linear Predictive(CELP) coding model operating on speech frame using analysis-by-synthesis method. The encoding principle of CS-ACELP is shown in Fig. 1. The encoding stages of CS-ACELP mainly contain six blocks. 1. Pre Processing Preprocessing block contains 2 stages scaling and high pass filtering.the input to the speech encoder is assumed to be a 16-bit PCM signal and it then undergoes combined scaling and high pass filtering. The scaling means, dividing the input signal by a factor two to avoid the possibility of overflows in the fixed-point implementation of coder. For high pass filtering a second order pole/zero filter with a cut-off frequency of 140 Hz is used. Both the scaling and high-pass filtering are combined and the resulting filter is given by (z) =0.46363718-0.927244705 +0.46363719 1-1.92724705 +0.9114024 The input signal filtered through Hh1(z) is referred to as s(n), and will be used in all subsequent coder operations. 2. LP Analysis The linear prediction(lp) technique, taking the advantage of order of linear prediction filter, is the most frequently used technique for speech analysis. LP analysis block is shown in Fig. 3. Reflection coefficients are obtained as by product of Levinson Durbin algorithm in LP analysis. The short term analysis and synthesis are based on 10th order LP filter. The LP synthesis filter is defined as: = The quantization of LSP parameters are obtained by using predictive twostage quantization. Fig. 1. LP ANALYSIS BLOCK The interpolated quantized and unquantized filters are converted back to the LP filter coefficients (to construct the synthesis and weighting filters for each sub frame). IJESAT May-Jun 2016 145

where and are the quantized adaptive and fixed-codebook gains, respectively, v(n) is the adaptive codebook vector (interpolated past excitation), and C(n) is the fixed-codebook vector including harmonic enhancement. The states of the filters can be updated by filtering the signal r(n) -u(n) (difference between residual and excitation) through the filter and A( )/A( for the 40 sample subframe and saving the states of the filters. This would require three filter operations. A simpler approach, which requires only one filter operation, is as follows. The locally reconstructed speech (n) is computed by filtering the excitation signal through. The output of the filter due to the input r(n) -u(n) is equivalent to e(n) = s(n) - (n). So the states of the synthesis filter l/ (z) are given by e(n), r1 = 30,...,39. Updating the states of the filter A(z/ )/A(z/ ) can be done by filtering the error signal e(n) through this filter to find the perceptually weighted error ew(n). However, the signal ew(n) can be equivalently found by: ew(n) = X(n)- Y(n)- Z(n) Since the signals x(n), y(n) and z(n) are available, the states of the weighting filter are updated by computing ew(n) as in equation for n = 30,...,39. This saves two filter operations. Fig. 2. CS-ACELP ALGORITHM An update of the states of the synthesis and weighting filters is needed to compute the target signal in the next sub frame. After the two gains are quantized, the excitation signal, u(n), in the present sub frame is obtained using: u(n)= V(n)+ C(n) n=0,3,39 III. DECODER The decoder principle is shown in Figure 2 (b). First, the parameter s indices are extracted from the received bit stream. These indices are decoded to obtain the coder parameters corresponding to a 10 ms speech frame. These parameters are the LSP coefficients, the two fractional pitch delays, the two fixed codebook vectors, and the two sets of adaptive and fixed codebook gains. The LSP coefficients are interpolated and IJESAT May-Jun 2016 146

converted to LP filter coefficients for each sub frame. Figure 3. Decoding principle of CS-ACELP Codec IV. METHOD The CS-ACELP algorithm was simulated using MATLAB R2010. The MATLAB application supports the import and export of data in various file format. The objective measurements like Segmented SNR (segsnr), Log Likelihood ratio (LLR), Weighted Spectral Slope Measures (WSS) and Perceptual Evaluation of Speech Quality (PESQ) of CS-ACELP was also calculated using MATLAB. Segmental SNR (SSNR) is defined as the average of SNR values over segments with speech activity. LLR which compares LPC vector of original speech signal with reconstructed speech. The Weighted Spectral Slope (WSS) distance measure is a direct spectral distance measure. It is based on comparison of smoothed spectra from the clean and distorted speech samples. The PESQ, is a family of standards comprising a test methodology for automated assessment of the speech quality as experienced by a user of a telephony system. PESQ values ranges from 0.5 to 4.5. Higher values of PSEQ provides better quality. For analysis of the output speech files, Praat software package was used. Praat consist of two part, Praat object and Praat picture. Spectrogram analysis of input and output speech of CS-ACELP was performed. Time domain representation of speech signal, information regarding pitch, intensity and formants was extracted. Formant frequencies f1, f2, f3,f4 and bandwidth information were also extracted. The Praat Objects window is used to open existing sound files or create. V. RESULT This section describes the results of CS-ACELP algorithm obtained using the MATLAB R2010a. A handel.wav file is used as a test signal for coder implementation and handeldec is the reconstructed signal. First the file compression results are performed, then the analysis details and objective measurements were displayed. The objective measurements of are shown in Table 4. From the analysis of PSEQ value of speech signal, coder works with a better quality. From the Praat analysis of input and output of CS- ACELP coder, sound pressure or amplitude of the sound waves in Pascal are shown in Table 2. The very slight differences in the minimum, maximum, root mean square (RMS) and mean values are representative of the good quality of output of the vocoder. The formant frequencies and corresponding bandwidths are listed in Table 3. The combined time domain representation of the input and output sound waves are shown in Fig. 6. From the time domain representation, it can be concluded that the overall shape has been preserved. But peaks have been clipped at some portions The waveforms of intensity, pitch and formants of the sound files are shown in Figs.4 to Figs. 5 Table 1. CS-ACELP CODER AMPLITUDE (SOUND PRESSURE) Input(handel.wav) Output(handeldec.wav) Amplitude(in pa) maximum -0.799-0.863 minimum 0.799 0.930 mean 2.49x 5.05x rms 0.1962 0.124 Table 2. CS-ACELP CODER FORMATS AND BANDWIDTH Parameter( in Hz) handel(wav) handeldec(wav) F1 593.106 470.96 F2 1503.03 1050.91 F3 2281.968 2482.303 F4 2988.78 3027.77 BW1 381.71 136.74 IJESAT May-Jun 2016 147

BW2 269.57 1050.91 BW3 205.48 2482.30 BW4 206.714 280.97 Table 3. OBJECTIVE MEASUREMENTS File name Objective Measures SNRseg WSS LLR PESQ handel.wav -1.81 47.75 0.309 2.36 male.wav -1.66 42.58 0.430 2.27 child.wav -1.97 89.80 0.530 2.11 female.wav -0.90 80.40 0.670 2.17 (b) CS-ACELP output file (a) Original sound file IJESAT May-Jun 2016 148

(a) Original sound file Fig.4. Intensity waveforms (b) CS-ACELP output file IJESAT May-Jun 2016 149

Fig. 5. Pitch waveforms (b) CS-ACELP output file (a) Original sound file Fig. 6. Formants of speech V. CONCLUSION The implementation and analysis of an efficient algorithm for providing secured speech transmission for various application with different speech input is described in this paper. Praat tools have proven to be very handy in speech file analysis. From the analysis of formant, pitch and intensity graphs of the input and output files clearly have very great similarity with the input wave form. From the experimental results, it is evident that the algorithm yields good compression and obtain very good perceptual quality. REFERENCE: [1] Nimisha Susan Jacob,Ancy S. Anselam, Performance Analysis Of CS-ACELP Speech Coder ( IJEAT) ISSN:2249-8958 ISSUE-5,June-2015. [2] An Efficient Algebric Codebook Search for G.729 speech codec IEEE,19 June 2014. IJESAT May-Jun 2016 150

[3] Koji Seto and Tokunbo Ogunfunni, Scalable Wideband Speech Coding for IP Networks, Dec.2012 [4] T. Ogunfunmi and M. 1. Narasimha, "Speech over VoIP Networks: Advanced Signal Processing and System Implementation," IEEE Circuits and Systems Magazine, Vol. 12, no. 2, pp. 35-55, 2012. [5] K. Seto and T. Ogunfunmi, "Scalable Multi-Rate ilbc," Proceedings of IEEE International Symposium on Circuits and Systems, 2012. [6] G. Madre et al., Design of a variable rate algorithm for CS-ACELP coder, IEEE, 2003. [7] A. Johnston, S. Donovan, R. Sparks, C. Cunningham, and K. Summers, Session Initiation Protocol (SIP) Public Switched Telephone Network (PSTN) call flows, RFC 3666, December 2003. [8] ITU-T Recommendation P.862 Amendment 1, Source code for reference implementation and conformance tests, March 2003. [9] ITU-T G.729: Coding of speech at 8 kb/s using CS-ACELP.R. Salami et al., Description of the proposed ITU-T 8 kb/s speech coding standard, in Proc. IEEE Workshop on Speech Coding. IJESAT May-Jun 2016 151