Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP Monika S.Yadav Vidarbha Institute of Technology Rashtrasant Tukdoji Maharaj Nagpur University, Nagpur, India monika.yadav@rediffmail.com Abstract Speech coding has been major issue in the area of digital speech processing. Speech coding is the application of data compression of digital audio signal containing speech. The speech signal in a more compressed form, which can then be transmitted with few numbers of binary digits. It is not possible to access unlimited bandwidth of a channel each time we send a signal across it which leads to code and compress speech signals. Speech compression aims to compress the speech signal to attain maximum channel capacity with lower bit rate and highest quality. Code Excited Linear Prediction (CS-ACELP) algorithm, operating at a bit rate of 8Kbps for discrete speech samples sampled at a rate of 8000 samples per second. G.729 operates to take every 10ms speech frame, the input speech signal is analyzed to extract the parameters. These parameters are then coded. G.729 is one of the widely used standard in ITU-T for speech compression. Speech compression is applied in long distance communication, high class speech storage, and message encryption. Speech coding is a lossy type of coding and hence the output signal does not exactly sound like the input.speech coding techniques discussed here are Linear predictive coding, waveform coding, Code excited linear predictive coding, etc. Keywords: Speech coding, Linear Predictive Coding (LPC), Wave form coding, Conjugate Structure Algebric Code Excited Linear Predictive Coding(CS-ACELP). I. Introduction Speech coding is the process of representing a voice signal for efficient transmission or storage. These codes will be sent over both band limited wired and Wireless channels. The aim of speech coding is to represent the samples of a speech signal in a compact form thus having the less code symbols Without degrading the quality of the speech signal. Speech coding is very important in Cellular and Mobile Communication. It has applications in Voice over internet protocol (VOIP), Videoconferencing, electronic toys, archiving, Digital simultaneous voice and data (DSVD), numerous computer based gaming and Multimedia applications, Most of the speech applications require minimum coding delay in order to avoid hindering the flow of the speech conversation because of long coding delays, A speech coder is one which converts a digitized speech signal into the coded representation and transmits it in a form of frames, At the receiving end, the speech decoder receives the coded frames and synthesizes reconstructed speech signal. The decoded speech should be audible distinguishable from the original speech signal. G.729(8kbps) is one of the famous standard for speech compression by ITU-T. The ITU-T recommendation is based on Conjugate Structure Adaptive Code Excited Linear Prediction (CS-ACELP) algorithm, operating at a bit rate of 8Kbps for discrete speech samples sampled at a rate of 8000 samples per second. G.729 operates to take every 10ms speech frame, the input speech signal is analyzed to extract the parameters. These parameters are then coded. The G.729 is a low complexity continuous data transmission scheme for VoIP applications and provide good synthesized speech quality at low bit rate. IJESAT May-Jun 2016 143
CS-ACELP can only be used for human voice (due to the model used) and is relatively complex. 1. Speech coding techniques Speech coding techniques are mainly two types which are l.lossless and 2.lossy coding methods, The lossy coding technique have the reconstructed speech signal perceptually different from the original speech signal Whereas the lossless coding technique, the reconstructed signal at the decoder end has exactly the same shape as the input speech signal, Mostly the speech coding techniques are based on the lossy coding technique because it removes the information which is irrelevant from the perceptual quality point of View. Speech coders are classier based on the bit-rate at which they produce output with reasonable quality and on the type of coding techniques used for coding the speech signal. 2. Waveform coding Waveform coding is the simplest technique for speech coding, Waveform coders analyze code and reconstruct original signal, sample by sample. Waveform coders are used to reproduce the exact shape of the speech signal waveform, Without considering nature of human speech production and delivering system. The most commonly used waveform coding algorithms are uniform 16-bit PCM, companded 8-bit PCM and ADPCM. Waveform coding is explored in both time and frequency domain. 3. Differential Pulse Code Modulation Differential PCM (DPCM) is designed to calculate this difference and then transmit this small difference signal instead of the entire input sample signal, Since the difference between input samples is less than an entire input sample, the number of bits required for transmission is reduce. Using DPCM can reduce the bit rate of voice transmission down to 48 kbps. The input signal is sampled and modulated. The sampled input signal is stored in a predictor, The predictor takes the stored sample signal and sends it through a differentiator, The differentiator compares the previous sample signal with the current sample signal and sends this difference to the quantizing and coding phase of PCM (this phase can be uniform quantizing or companding with A-law or u-law). After quantizing and coding, the difference signal is transmitted to its final destination, At the receiving end of the network, everything is reversed, First the difference signal is dequantized, Then this difference signal is added to a sample signal stored in a predictor and sent to a low-pass filter that reconstructs the original input signal, DPCM faces some problems While dealing With voice quality. To solve this problem, adaptive DPCM is developed. 4. Linear Predictive Coding Linear Predictive Coding (LPC) is a powerful, good quality, low bit rate speech analysis technique for encoding a speech signal. The source filter model used in LPC is also known as the linear predictive coding model, It has two main components LPC analysis (encoding) and LPC synthesis (decoding), The goal of the LPC analysis is to estimate Whether the speech signal is voiced or unvoiced, to find the pitch of each frame and to the parameters needed to build the source filter model, These parameters are transmitted to the receiver will carry out LPC synthesis using the received parameters. 5. Code Excited Linear Prediction The basic principle that all speech coders exploit is the fact that speech signals are highly correlated waveforms. Speech can be represented using an autoregressive (AR) model. Along with its variants, such as algebraic CELP, relaxed CELP,low delay CELP and vector some excited linear prediction, it is currently the most widely used speech coding algorithm. It is also used in MPEG-4 audio speech coding. II. CS-ACELP DESCRIPTION A. Encoder IJESAT May-Jun 2016 144
The CS-ACELP coder processes input signals on a frame-by-frame and sub frame-by-sub frame basis. The frame length is 10 ms and consists of two 5 ms sub frames. The algorithm utilizes vector quantization method, both the adaptive codebook and fixed codebook are vector quantized to form conjugate structure. The 8kbps core speech coder is derived from G.729 coder[2] and the coder is based on Code-Excited Linear Predictive(CELP) coding model operating on speech frame using analysis-by-synthesis method. The encoding principle of CS-ACELP is shown in Fig. 1. The encoding stages of CS-ACELP mainly contain six blocks. 1. Pre Processing Preprocessing block contains 2 stages scaling and high pass filtering.the input to the speech encoder is assumed to be a 16-bit PCM signal and it then undergoes combined scaling and high pass filtering. The scaling means, dividing the input signal by a factor two to avoid the possibility of overflows in the fixed-point implementation of coder. For high pass filtering a second order pole/zero filter with a cut-off frequency of 140 Hz is used. Both the scaling and high-pass filtering are combined and the resulting filter is given by (z) =0.46363718-0.927244705 +0.46363719 1-1.92724705 +0.9114024 The input signal filtered through Hh1(z) is referred to as s(n), and will be used in all subsequent coder operations. 2. LP Analysis The linear prediction(lp) technique, taking the advantage of order of linear prediction filter, is the most frequently used technique for speech analysis. LP analysis block is shown in Fig. 3. Reflection coefficients are obtained as by product of Levinson Durbin algorithm in LP analysis. The short term analysis and synthesis are based on 10th order LP filter. The LP synthesis filter is defined as: = The quantization of LSP parameters are obtained by using predictive twostage quantization. Fig. 1. LP ANALYSIS BLOCK The interpolated quantized and unquantized filters are converted back to the LP filter coefficients (to construct the synthesis and weighting filters for each sub frame). IJESAT May-Jun 2016 145
where and are the quantized adaptive and fixed-codebook gains, respectively, v(n) is the adaptive codebook vector (interpolated past excitation), and C(n) is the fixed-codebook vector including harmonic enhancement. The states of the filters can be updated by filtering the signal r(n) -u(n) (difference between residual and excitation) through the filter and A( )/A( for the 40 sample subframe and saving the states of the filters. This would require three filter operations. A simpler approach, which requires only one filter operation, is as follows. The locally reconstructed speech (n) is computed by filtering the excitation signal through. The output of the filter due to the input r(n) -u(n) is equivalent to e(n) = s(n) - (n). So the states of the synthesis filter l/ (z) are given by e(n), r1 = 30,...,39. Updating the states of the filter A(z/ )/A(z/ ) can be done by filtering the error signal e(n) through this filter to find the perceptually weighted error ew(n). However, the signal ew(n) can be equivalently found by: ew(n) = X(n)- Y(n)- Z(n) Since the signals x(n), y(n) and z(n) are available, the states of the weighting filter are updated by computing ew(n) as in equation for n = 30,...,39. This saves two filter operations. Fig. 2. CS-ACELP ALGORITHM An update of the states of the synthesis and weighting filters is needed to compute the target signal in the next sub frame. After the two gains are quantized, the excitation signal, u(n), in the present sub frame is obtained using: u(n)= V(n)+ C(n) n=0,3,39 III. DECODER The decoder principle is shown in Figure 2 (b). First, the parameter s indices are extracted from the received bit stream. These indices are decoded to obtain the coder parameters corresponding to a 10 ms speech frame. These parameters are the LSP coefficients, the two fractional pitch delays, the two fixed codebook vectors, and the two sets of adaptive and fixed codebook gains. The LSP coefficients are interpolated and IJESAT May-Jun 2016 146
converted to LP filter coefficients for each sub frame. Figure 3. Decoding principle of CS-ACELP Codec IV. METHOD The CS-ACELP algorithm was simulated using MATLAB R2010. The MATLAB application supports the import and export of data in various file format. The objective measurements like Segmented SNR (segsnr), Log Likelihood ratio (LLR), Weighted Spectral Slope Measures (WSS) and Perceptual Evaluation of Speech Quality (PESQ) of CS-ACELP was also calculated using MATLAB. Segmental SNR (SSNR) is defined as the average of SNR values over segments with speech activity. LLR which compares LPC vector of original speech signal with reconstructed speech. The Weighted Spectral Slope (WSS) distance measure is a direct spectral distance measure. It is based on comparison of smoothed spectra from the clean and distorted speech samples. The PESQ, is a family of standards comprising a test methodology for automated assessment of the speech quality as experienced by a user of a telephony system. PESQ values ranges from 0.5 to 4.5. Higher values of PSEQ provides better quality. For analysis of the output speech files, Praat software package was used. Praat consist of two part, Praat object and Praat picture. Spectrogram analysis of input and output speech of CS-ACELP was performed. Time domain representation of speech signal, information regarding pitch, intensity and formants was extracted. Formant frequencies f1, f2, f3,f4 and bandwidth information were also extracted. The Praat Objects window is used to open existing sound files or create. V. RESULT This section describes the results of CS-ACELP algorithm obtained using the MATLAB R2010a. A handel.wav file is used as a test signal for coder implementation and handeldec is the reconstructed signal. First the file compression results are performed, then the analysis details and objective measurements were displayed. The objective measurements of are shown in Table 4. From the analysis of PSEQ value of speech signal, coder works with a better quality. From the Praat analysis of input and output of CS- ACELP coder, sound pressure or amplitude of the sound waves in Pascal are shown in Table 2. The very slight differences in the minimum, maximum, root mean square (RMS) and mean values are representative of the good quality of output of the vocoder. The formant frequencies and corresponding bandwidths are listed in Table 3. The combined time domain representation of the input and output sound waves are shown in Fig. 6. From the time domain representation, it can be concluded that the overall shape has been preserved. But peaks have been clipped at some portions The waveforms of intensity, pitch and formants of the sound files are shown in Figs.4 to Figs. 5 Table 1. CS-ACELP CODER AMPLITUDE (SOUND PRESSURE) Input(handel.wav) Output(handeldec.wav) Amplitude(in pa) maximum -0.799-0.863 minimum 0.799 0.930 mean 2.49x 5.05x rms 0.1962 0.124 Table 2. CS-ACELP CODER FORMATS AND BANDWIDTH Parameter( in Hz) handel(wav) handeldec(wav) F1 593.106 470.96 F2 1503.03 1050.91 F3 2281.968 2482.303 F4 2988.78 3027.77 BW1 381.71 136.74 IJESAT May-Jun 2016 147
BW2 269.57 1050.91 BW3 205.48 2482.30 BW4 206.714 280.97 Table 3. OBJECTIVE MEASUREMENTS File name Objective Measures SNRseg WSS LLR PESQ handel.wav -1.81 47.75 0.309 2.36 male.wav -1.66 42.58 0.430 2.27 child.wav -1.97 89.80 0.530 2.11 female.wav -0.90 80.40 0.670 2.17 (b) CS-ACELP output file (a) Original sound file IJESAT May-Jun 2016 148
(a) Original sound file Fig.4. Intensity waveforms (b) CS-ACELP output file IJESAT May-Jun 2016 149
Fig. 5. Pitch waveforms (b) CS-ACELP output file (a) Original sound file Fig. 6. Formants of speech V. CONCLUSION The implementation and analysis of an efficient algorithm for providing secured speech transmission for various application with different speech input is described in this paper. Praat tools have proven to be very handy in speech file analysis. From the analysis of formant, pitch and intensity graphs of the input and output files clearly have very great similarity with the input wave form. From the experimental results, it is evident that the algorithm yields good compression and obtain very good perceptual quality. REFERENCE: [1] Nimisha Susan Jacob,Ancy S. Anselam, Performance Analysis Of CS-ACELP Speech Coder ( IJEAT) ISSN:2249-8958 ISSUE-5,June-2015. [2] An Efficient Algebric Codebook Search for G.729 speech codec IEEE,19 June 2014. IJESAT May-Jun 2016 150
[3] Koji Seto and Tokunbo Ogunfunni, Scalable Wideband Speech Coding for IP Networks, Dec.2012 [4] T. Ogunfunmi and M. 1. Narasimha, "Speech over VoIP Networks: Advanced Signal Processing and System Implementation," IEEE Circuits and Systems Magazine, Vol. 12, no. 2, pp. 35-55, 2012. [5] K. Seto and T. Ogunfunmi, "Scalable Multi-Rate ilbc," Proceedings of IEEE International Symposium on Circuits and Systems, 2012. [6] G. Madre et al., Design of a variable rate algorithm for CS-ACELP coder, IEEE, 2003. [7] A. Johnston, S. Donovan, R. Sparks, C. Cunningham, and K. Summers, Session Initiation Protocol (SIP) Public Switched Telephone Network (PSTN) call flows, RFC 3666, December 2003. [8] ITU-T Recommendation P.862 Amendment 1, Source code for reference implementation and conformance tests, March 2003. [9] ITU-T G.729: Coding of speech at 8 kb/s using CS-ACELP.R. Salami et al., Description of the proposed ITU-T 8 kb/s speech coding standard, in Proc. IEEE Workshop on Speech Coding. IJESAT May-Jun 2016 151