STATE OF THE ART AND TRENDS IN SPEECH CODING

Size: px
Start display at page:

Download "STATE OF THE ART AND TRENDS IN SPEECH CODING"

Transcription

1 Philips J. Res. 49 (1995) STATE OF THE ART AND TRENDS IN SPEECH CODING by R.J. SLUIJTER, F. WUPPERMANN, R. TAORI and E. KATHMANN Philips Research Laboratories, Prof Holstlaan , AA Eindhoven, The Netherlands Abstract An introductory review of some basic speech coding techniques covers the most important properties of speech production and hearing, the ubiquitous techniques of quantization and linear prediction, and a recital of the most important measures of coding performance. In the survey that follows, several standardized speech coding systems reflecting the state of the art in speech coding are discussed in terms of coding method, bit rate, performance, complexity and typical application areas. Major future trends are indicated on the basis of expected future standards. The paper, which primarily deals with narrowband speech coding systems, is concluded by a review of the state of affairs and an outline of the future trends in the area of wideband speech coding. Keywords: speech; source coding; state ofthe art; future trends; standards; narrowband; wideband. 1. Introduetion Speech coding is the conversion of an analog speech signal into a digital signal. This signal is transmitted to a remote decoder or stored in a memory for later decoding. The decoder reproduces the original analog signal as well as possible. The purpose of digitalization is to enhance the fidelity of transmission or to allow the use of digital memory for storage purposes. Sometimes, the signal is digitalized just to allow the signal to be processed in a digital way which can be more accurate and reliable than analog processing. Speech coding has already been used in professional transmission equipment for the public switched telephone network (PSTN) and business communication networks for some decades. More recently, however, there is a remarkable growth in the use of speech coding systems. For example, speech coding is applied in public mobile telephone systems, private mobile radio, conferencehall systems, videophone systems and cordless telephone products. Today, Phillps Journal of Research Vol. 49 No

2 R.J. Sluijter et al. we also find speech coders in temporary storage applications, such as voice mail systems, digital telephone answering machines, dictation systems and pocket memos, and even some personal computers provide the possibility to store speech. Another application area, which is growing with the availability of high capacity, low cost digital read-only memory (ROM), is that of voice response ('canned speech') systems. Voice response systems are used in car navigation systems, public-address equipment, portable guidance products for use in museums and big exhibitions, and toys, amongst others. It is evident that in all the aforementioned application areas, the transmission or storage media involved should be used as efficiently as possible. So, the speech coder should yield a bit rate as low as possible. Over the years, speech coding systems have been proposed for various applications. The most important speech coding systems will be put in order in Sections 3 and 4. Before we commence this discussion, some important basics of speech coding are reviewed. 2. Basics of speech coding The physiology of the speech organ and the psycho-acoustics of hearing are important foundations of many speech coders. Although precise modelling of speech production and hearing is still in a state of research, gross characterizations suffice to serve the purpose of designing effective speech coders Speech production and hearing The human speech production mechanism can be characterized rather simply by the famous source-filter model [1, 2], as shown in Fig. I. Here, the source is either modelled as a quasi-periodic pulse source for voiced sounds (V), or a white noise source for unvoiced sounds (U). A gain factor (g) controls Noise Source ~u g Pulse Source Fig. 1. The source-filter model of speech production. 456 Phllips Journalof Research Vol. 49 No

3 State of the art and trends in speech coding the intensity of the produced sound. The source excites a filter (F), which represents the vocal tract consisting of the throat, mouth and nasal cavities. During the generation ofvoiced sounds, the air expelled by the lungs causes the vocal cords to vibrate with a certain periodicity. This periodicity (pitch) varies with time and is represented by T in Fig. 1. In speech, T may vary between 2 and 20 ms, although the variations do not usually exceed two octaves for a single speaker. During the generation of unvoiced sound the vocal cords do not vibrate and hence there is no periodicity associated with the source. In the cavities of the vocal tract, three-five resonances known as formants (see also Fig. 9) may originate. Depending on the movements of the articulators (lips, jaws, tongue and velum), these resonances vary with time. The rate of change of the articulators, including the vocal cords, is limited by the musculature that operates them and the associated time constant is in the order of 20 ms. If the speech signal is considered stationary over this time duration, it can be represented fairly accuratelyon the basis of just a handful of parameters describing the model. Using these parameters, it is possible to reconstruct a perceptually similar copy of the original speech signal with a very low bit rate. In perception, some features ofthe speech signals are quite irrelevant. Phase relations between the signal components and minor variations in pitch are but ëi E< lxg) Fig. 2. Stylized visualization of the time-frequency dependent phenomenon of auditory masking, for a periodic impulse sequence. Philips Journni of Research Vol. 49 No

4 R.J. Sluijter et al. ' two examples ofirrelevancy. Also, it is possible to replace unvoiced sounds by free running artificial noise, if only the original shape of the energy density spectrum is retained. Yet another phenomenon of perception is masking [3]. It conceals weaker signal components in the neighbourhood of relatively stronger signal components, both in time and in frequency. One ofthe most instructive visualizations ofthis time-frequency dependent phenomenon is given in Fig. 2. It shows stylized masking boundaries for a time signal consisting of periodic pulses with a period T having frequency harmonics at multiples of g = lit. All additional sound components with amplitudes 'below the roofs' are inaudible [4].In some speech coders this phenomenon is exploited by controlling quantization noise and coding distortions in such a way that their audibility is reduced, or suppressed completely. A recent treatise ofthis subject can be found in Ref Quantization First of all, the speech signal is assumed to be properly sampled using antialiasing filtering. The sampling rate is 8 khz in narrowband coding and 16kHz in wideband coding. In reproducing the analog speech signal, an appropriate reconstruction filter is required. Various ways of quantizing speech samples are described next, although the principles are applicable to any other type of variables or speech parameters [2, 6]. A uniform quantizer has equally spaced quantization levels. A possible input-output characteristic of such a quantizer is shown in Fig. 3a. If a sampled signal is quantized, quantization noise is introduced and a signal to noise ratio (SNR) can be determined by means ofthe ratio ofthe signal energy and the quantization noise energy, both measured over the same time. If the quantizer is used for a signal that occupies the full scale, a certain SNR is o 02 o o -5 o (a) (b) o o - I (c) Fig. 3. Quantization characteristics showing signai level s vs quantized levels Sq, of (a) a uniform quantizer, (b) a non-uniform quantizer and (c) an adaptive quantizer, for a certain signai level I and a larger signal level Philip, Journalof Research Vol. 49 No

5 State of the art and trends in speech coding obtained. For lower signal levels, the SNR decreases. Figure 4 shows the SNR as a function of the signallevel for a sinusoid, using 256 quantization levels, which can be represented by an 8-bit code. For speech signals, in which the signal level varies over a large dynamic range of 30dB or more (see Fig. 10), a 12 bit (4096 levels)quantizer is needed to obtain satisfactory performance, as in telephony, for instance. In a non-uniform quantizer the spacing between the quantization levels is not equal, as shown in Fig. 3b. For example, the 8-bit logarithmic quantization scheme used in PCM (see Sec. 3.1.) renders a much smaller SNR-dependence on the signal level. In Fig. 4 the SNR vs the signal level of a sinusoid is sketched. The basic idea behind logarithmic quantization is that if the input s, and its compressed version, c, are related by c = ln(1 + s), then the difference quotient is given by!j.c=.!j.sj(1 + s). If c is uniformly quantized and if in addition s» 1, then!j.c= constant, and hence.!j.sj s ~ constant. This gives an approximately constant relative quantization error and, consequently, a constant SNR. By proper expansion of the quantized signal a total input-output characteristic, as shown in Fig. 3b, is obtained. In general, the quantizer can be optimized by adapting the distribution of the quantization levels to the signal statistics. This kind of quantizer is known as a Max-Lloyd quantizer [7]. SNR Logarithmic Uniform S --- Smax Fig. 4. SNR vs relative signal level for the three different scalar quantizers. o Philip. Journal of Research Vol. 49 No

6 R.J. Sluijter et al. In an adaptive quantizer a small dependence of the SNR on the signallevel is obtained in an alternative way. In this case the quantization step is adapted to the signallevel in such a way that full load ofthe quantization characteristic.. "",M'.I,r- ' ~. -,...,~ ~'''''~;f,.-r~" e-\ "Irt"'! fy.r ;~ h...,&r,-.ffr~* '-, :'~,1/1'", '... t t...,-;,.-~ ISpursued, see FIg. 3c. The ädaptatióii can be achieved III a forward or backward v e Matching criterion Index Codebook Fig. 5. Vector quantization system. 460 Phlllps Journalof Research Vol.49 No

7 State of the art and trends in speech coding way. In forward adaptation, the input signallevel is measured and the quantization characteristic is controlled accordingly, requiring separate transmission of the level parameter for decoding purposes. In backward adaptation the signal level is estimated from the quantized signal, and since the same quantized signal will be available to the decoder, no side information needs to be transmitted. For the sake of comparison, the SNR of an 8-bit backward adaptive quantizer, again for a sinusoidal input signal, is also shown in Fig. 4. A parameter to be chosen in both forward and backward adaptation is the speed of adaptation, which should be tuned to the rate of change of the envelope of the signal. For speech, the corresponding time constant is in the order of loms, preferably with a faster 'attack' time and a slower 'decay' time. Sometimes, the level of the signal is estimated on the basis of a fixed number of samples, known as 'block' adaptation. In a vector quantizer, a block of N samples srn], which can be interpreted as an N-dimensional vector, is quantized as a whole [8]. For this purpose, a codebook containing a set of vectors of the same dimensions is used. These vectors are approximations to the expected set of possible input vectors, as shown in Fig. 5. The current input vector is compared to all vectors VI [n], I = 1,2,... L in the codebook and the error sequences, e![n] = s[n]- vl[n], are evaluated to find the best matching vector. This vector can be represented by a log2l-bit index, and in a remote decoder, which contains the same codebook, the chosen vector dimension 2 centroid / dimension 1 Fig. 6. Graphical representation of vector quantization. Philips Journalof Research Vol.49 No

8 R.J. Sluijter et al. c can be retrieved using this index. A workable matching criterion is the mean square error (MSE). Such matching criteria weight the individual quantization errors, lef[n]12, equally. Sometimes, however, it may be better to give them unequal weights in order to make certain contributions to the matching criterion less important than others. The optimum codebook contents in a certain application can be obtained by training. For this purpose, many different vectors are applied to the system and they are clustered to form cells in the N-dimensional space, as shown in Fig. 6, for the simple 2-dimensional case. The centroid of each cell is actually stored as a vector in the codebook. A popular training algorithm is the LBG algorithm [9]. Alternatively, it is also possible to construct the contents of the codebook, if the statistics of the signal to be quantized are sufficiently known. In actual operation, the vector quantizer will allocate an input vector s to the centroid of the cell in which it is located. A major difficulty in vector quantization is managing the size of the codebook. This arises from the fact that the required codebook size for acceptable performance gives rise to unmanageable computational complexity. In Fig. 5, for instance, the codebook contains segments of speech. It is evident that the codebook size will be huge if it has to contain all possible sounds even with slightly different pitches and levels. Therefore, vector quantization is almost exclusively applied to decorrelated and normalized signals. An important decorrelation technique is linear prediction Linear prediction analysis Linear prediction (LP), or linear predictive coding (LPC), is the prediction of the current speech sample srn] on the basis of a linear combination of s e Predietor ' Fig. 7. Linear prediction: inverse filter. 462 Philips Journalof Research Vol.49 No

9 State of the art and trends in speech coding previous speech samples srn - i], i = 1,2,... M. The network providing the linear combination of previous samples is called the predietor (Fig. 7), where T stands for a sampling period delay. The prediction error, or prediction residual, ern] can thus be represented by: M ern] = srn] - L ajs[n - i] j=' in which aj are the prediction coefficients, or a-parameters, and M is the order of the predictor. Minimization of the total energy, Et, of the prediction error over a certain interval {no,n,}: (1) 11) Et = L e 2 [nj, 11=110 (2) with respect to the coefficients aj' results in very attractive properties of the associated system A(z) = E(z)jS(z). First of all, the total squared error criterion produces a set of linear equations which can readily be solved. We see that Et depends quadratically on the a-parameters. Setting the partial derivatives of Et with respect to each ai to zero, yields a set of M equations. Solving these M equations according to this method, which is known as the covariance method, yields the optimum a-parameters [2]. The minimization interval {no, n,} is chosen in such a way that in this interval the a-parameters may be assumed to be stationary. A common choice is 20 ms, again. The performance of prediction is expressed in terms of the 'prediction gain', defined as the ratio of the signal energy in the minimization interval and Et. By extending the minimization interval {no, n,} to {-00,00 } and applying a finite-duration windowof, say 20 ms, to srn], the widely used autocorrelation method is obtained. The equation becomes: Pa=p, (3) \ in which the elements of the M x M matrix Pare autocorrelation coefficients: Pij = L x[n]x[n + li - ilj, lien w -li-jl where N'; is the length of the window, x[n] are the windowed speech samples, and a and p are M x 1 vectors having the elements ai and PiO, respectively. This system of equations can be solved efficiently by the well known Levinson- Durbin recursion [2]. Yet another approach makes use ofthe fact that the partial derivative ofthe (4) Philip. Journalof Research Vol.49 No

10 R.J. Sluijter et al. total square prediction error with respect to ai can be written as: oe "l ~ = -2 L e[njs[n - ij.. ai fl n=llo,,_ I, ' (5) This can be interpreted as a cross-correlation between the input and output sequences of A(z). So, we see that if this partial derivative is set to zero as before, A(z) wórks as a decorrelator for the speech signal. Some speech coding systems (e.g. ADPCM, see Sec ) are based on predietors in which the prediction coefficients are controlled in such a way that this cross-correlation is adaptively driven to zero, for each i, 1~ i ~ M. Secondly, the total square error criterion provides maximum spectral flatness of the prediction residual ern], at least for the autocorrelation method [10]. This means that the transfer function of A(z) is approximately inverse to the spectral envelope of its input signal, if it has enough coefficients, and that the spectral shape ofthe input signal is represented by the prediction coefficients. The system A(z) is referred to as the inverse filter. Consequently, on the basis ofthe total square error criterion, LP provides a useful synthesis structure. The function l/a(z) represents the spectral envelope of the speech segment under consideration. The direct-form structure of the network with transfer function 1/ A(z) is shown in Fig. 8. It is referred to as the synthesis filter. On the basis of the foregoing, we are able to make a good estimate of the prediction order M. Since a second-order function is required to create a single formant, and since there are three-five formants in speech, six-ten coefficients are needed to realize the required formant structure. The predietors in modern narrowband speech coders are mostly equipped with M = 10 coefficients. In e' :8' Predietor ' o :... ~Y?~~~s!s_1fi~t~~:_!/~S~) : Fig, 8. Linear prediction: synthesis filter. 464 Philip. Journalof Research Vol.49 No

11 State of the art and trends in speech coding IS( ) I db KHz -f Fig. 9. Linear prediction: (I) amplitude spectrum of a 20ms voiced speech segment (note the pitch harmonics); (H) transfer function of the associated synthesis filter with four formants and a spectral decay. actual operation, not all coefficients are devoted to formants, but some ofthem may represent global spectral inclination. Figure 9 shows an example of the amplitude spectrum of a 20 ms voiced speech segment and the transfer function of an associated 10 th order synthesis filter. Quantizing the a-parameters directly is not very efficient. Usually, the a- parameters are first converted into another form, namely log-area-ratios (LARs) [2] or line spectral pairs (LSPs) [11], and then quantized. The quantization schemes obtained in this way are not unique, but depend somewhat on the design. Using LARs usually results in a total of about 40 bits to obtain a perceptual equivalent of the unquantized synthesis filter. In the case of LSPs, the same can be obtained using about 34 bits. A lower bit rate can be obtained if one opts to quantize using vector quantization at the cost of increased computational complexity, in which case about 24 bits are sufficient [12]. Applying linear prediction in the way discussed above, also referred to as short-term prediction, effectively describes the formant structure of a speech segment, but leaves pitch related long-term correlation in its residual. This is shown in the example of Fig. 10. The upper trace in this figure shows about 200 ms of a transition from a voiced to an unvoiced portion of speech. The second trace shows the prediction residual of a 10 th order inverse filter, updated every 20 ms, which clearly demonstrates the presence of the long-term Philip. Journalof Research Vol.49 No

12 R.J. Sluijter et al.. correlation in the form of periodic pitch pulses. This periodicity can be removed by a long-term predietor (LTP), or pitch predictor (Fig. 11). The short-term prediction residual ern] is delayed, multiplied by a pitch prediction coefficient a p and subtracted from ern], resulting in the long-term prediction Fig. 10. Linear prediction: upper trace: a portion of speech of about 200 ms (note the quick level drop of about 30dB in the voiced to unvoiced transition); middle-trace: the short-term prediction error e; lower trace: the long-term prediction error E. 466 Philip. Journal of Research Vol.49 No

13 State of the art and trends in speech coding r e' I : X(~) _ I IL lip(z) _ Fig. 11. Long-term prediction: analysis and synthesis filters. residual E[n]. The time lag, usually constrained to the range of pitch in speech, and a p are optimized on the basis of minimizing the energy of E[n], in the same way as in short-term prediction. The transfer function of such a system will be referred to as P(z). The third trace in Fig. 10 shows the resulting E[n], in which a significant reduction in the dynamic range, and hence an increase in prediction gain, is observed. It is on this decorrelated signal E[n] that vector quantization is normally performed. The network realizing the inverse function 1/ P(z), which restores e'[n] from <:'[n],is also shown in Fig Measures of coder performance Speech quality is difficult to define since subjective issues like naturalness, intelligibility, noise, etc., are involved [13]. One objective measure is the SNR, but it only correlates well with subjective quality if it concerns relatively low-level noise and distortions. Better correlation is obtained on the basis of the segmental SNR, where the SNR is measured over short stationary segments of typically 20 ms, and averaged. Intervals of silence have to be excluded then, because they can render bad SNRs which are perceptually not relevant. More sophisticated objective measures, such as spectral distance measures, are being investigated and some ofthem even include masking models [14].By and large, the performance of objective measures is improving and they will play an important role in the future. A method for the subjective assessment of speech quality, which has already been in use for decades, is the Mean Opinion Score (MOS) test [15]. A large number of listeners are asked to assess the quality of randomly sequenced utterances, using a 1-5 scale in terms of: bad, poor, fair, good and excellent, respectively. After statistical processing of the results, a MOS number is obtained. Some of the speech material used may deliberately be contaminated by background noise or transmission errors, for example, so that these issues are also included in the MOS. In order to normalize the results, speech corrupted by what is called the Modulated Noise Reference Unit (MNRU) PhilipsJournalof Research Vol. 49 No

14 R.J. Sluijter et al. can be included in the tests [16]. Several other application-dependent subjeetive measures exist, for instance, the Diagnostic Acceptability Measure (DAM) [17] and the Diagnostic Rhyme Test (DRT) [18]. While the former has a more elaborate scale than the MOS, the latter aims at measuring the intelligibility alone. The complexity of a speech coding system is of the order of magnitude of a modern digital signal processor (DSP). Systems with a high complexity will require more DSPs and systems with a low complexity can be realized using only a part of the computing capacity of a DSP. In general, a lower bit rate or higher speech quality will require a higher complexity. Sometimes, the distribution ofthe complexity over the coder and the decoder plays a role, such as in voice response systems. In this case, it is important to keep the decoder as simple as possible, while this is not a prime requirement for the encoder. The delay from the input of the encoder to the output of the decoder is an issue in full-duplex communications, such as in telephony, because it may cause disturbing echoes. Sometimes, it is even necessary to employ expensive echo-cancellers, in which case minimization of the delay still helps to reduce their costs. The delay requirements imposed on a speech coding system, often specified in terms of intrinsic 'algorithmic delay' and hardware-dependent 'implementation delay', depend on the specific application, and vary from five to some tens of milliseconds. In half-duplex communications, in which the communication channel is used in either direction at a time, the delay is not so much of an issue. In the assessment ofthe robustness of a speech coding system, the sensitivity to background noise such as car noise picked up by a car telephone, the effect of tandeming coding-decoding systems in a network, the transparency of the system for non-speech signals, such as signalling tones, data signals, fax signals, or music signals and even the sensitivity to the absence oflow frequencies in the input speech, such as in telephone speech, may play a role. However, the most important robustness measure is often the sensitivity to transmission errors. Transmission errors cause erroneous decoding. The decoder itself must be designed such that errors are perceptually minimized. Usual techniques for this purpose are minimization of error propagation in the decoding process, and minimization of the perceptual difference in the case of single (isolated) bit errors with the help of Gray-coding techniques [6], and the like. If error detection can be applied, erroneous segments can be muted, or better, be substituted on the basis of interpolation or extrapolation. On the PSTN, error rates in the order of 10-3 are to be expected and the above approaches can handle these error rates, in general. On mobile networks, however, error rates 468 Philips Journal of Research Vol.49 No

15 State of the art and trends in speech coding of several percent can be expected, in which case error correction techniques are required. Also in storage applications, if cheap error-prone memones are used, the error behaviour has to be taken into account. 3. Narrowband coding systems Figure 12 depicts the state of the art in narrowband speech coding and the expected future trend (dashed line). The state of the art is indicated by the estimated MOS scores of 9 representative coding standards at various bit rates. They are often classified as I: simple waveform coders, which are basically quantizers; Il: advanced waveform coders, characterized by the application of adaptive prediction; and Ill: vocoders, characterized solely by parameter coding, and consequently, by the absence of any waveform matching. Table I summarizes the main characteristics of these systems. In the following subsections the standard systems and their performances will be considered in more detail, and this section will be concluded with a review of the expected future trend PCM coders The first large scale application of PCM was, and still is, in telephony. In 1972 PCM was standardized in two forms, namely the European A-law and the American J-L-law[19]. These coders, which are essentially non-uniform 5 MOS ~., III I - Simple Waveform Coders Il - Advanced Waveform Coders III - Vocoders Bit-rate [Kbit/s] Figure 12. Speech quality in MOS vs bit rate of nine standardized narrowband speech coders, representative of the state of the art and the (future) trend, indicated by the dashed line. Philip. Journalof Research Vol.49 No

16 R.J. Sluijter et al. TABLEI System lpcm 2ADPCM 3 LD-CELP 4 RPE-LTP 5 VSELP 6IMBE 7CELP 8CVSD 9 LPC-I0E PCM ADPCM LD-CELP RPE-LTP VSELP IMBE CVSD LPC CCITT ITU-T ETSI CTIA INMARSAT US-DoD PSTN MOS Bit rate Application & (kbitjs) Standard Year quality range 64 CCITT 1972 PSTN 32 CCITT 1984 MOS CCITT ETSI 1988 mobile & 8 CTIA 1989 storage 4.15 INMARSAT 1990 MOS US-DoD 1989 (military) 16 US-DoD 1973 secure voice 2.4 US-DoD 1975 MOS Pulse Code Modulation Adaptive Differential PCM Low Delay-Code Excited Linear Prediction Regular Pulse Excitation-Long Term Prediction Vector Sum Excited Linear Prediction Improved Multi-Band Excitation Continuous Variable Slope Delta modulation Linear Predictive Coding International Telegraph and Telephone Consultative Committee (now ITU-T) International Telecommunications Union-section Telecommunications European Telecommunications Standardization Institute Cellular Telecommunications Industries Association International Maritime Satellite organisation United States government-department of Defense Public Switched Telephone Network Mean Opinion Score quantizers, can each be considered as an analog, basically logarithmic, compression characteristic followed by an 8-bit uniform quantizer. The performances of the quantizers are very similar. The SNR of both quantizers amounts to approximately 38 db over a dynamic range of about 30 db (recall Fig. 4). PCM is quite insensitive to different statistical properties 470 Philip. Journal of Research Vol.49 No

17 State of the art and trends in speech coding of the input signal, so it is very transparent, and it is robust in all other respects. The bit rate, 64 kbitjs, is quite high, but PCM is simple, and it has no intrinsic delay. The MOS is a little over 4, as indicated in Fig. 12. The fact that it is not higher than 4 is mainly due to the narrow bandwidth of telephone speech Differential coders A differential coder is characterized by the fact that the difference between the original speech sample and its predicted value is quantized rather than the original sample itself. Since this essentially means the quantization of the prediction error, an improvement in SNR over PCM is obtained. This improvement is approximately equal to the prediction gain. If ern] in Fig. 7 is quantized by ab-bit quantizer Q, a differential coding system is obtained which is sometimes referred to as D*PCM (D for differential) [6].The appropriate decoder would then take the form of Fig. 8. Observing the output signal s'[n] reveals that it can take many more than 2 B possible values, since it consists of the predicted value of s'[n] plus the quantized e'[n], and it was e'[n] that was quantized using 2 B levels. Here, a disadvantage of D*PCM is encountered. Because the predicted value of s'[n] itself contains quantization noise and the quantized e'[n] is added to it, an accumulation of quantization errors occurs. Generally speaking, the quantization noise is speetrally shaped by the decoder. Especially low-order predietors will have an integrating character due to the decaying spectrum of speech signals, and the low-frequency contents of the quantization noise will be emphasized. This gives rise to hoarse-sounding quantization noise. An important improvement is obtained if the coder is rearranged according to Fig. 13 (DPCM). In the coder, the predietor works on the locally decoded speech samples Sq [n] instead of the original speech samples srn]. While modelling the quantizer as an additive noise source, simple analysis shows that the noise at the output of the decoder has not undergone any spectral shaping. This kind of quantization noise sounds more pleasant. The quantization noise of a DPCM coder can be classified into two categories. The first category concerns fine quantization noise, often referred to as granular noise. The second category concerns gross quantization errors caused by what is called slope overload. Slope overload may occur when a steep slope in the input signal can not be predicted from previous samples, either because the predietor is too simple (fixed, low order), or because an unpredictable innovation in the speech signal takes place. The performance of a DPCM coder can be improved further by making the predietor as well Philip. Journni of Research Vol.49 No

18 R.J. Sluijter et al. '_ _'., Fig. 13. DPCM coder and decoder; the dashed arrows indicate adaptation (ADPCM). as the quantizer adaptive (dashed lines in Fig. 13). These measures help to reduce both granular noise and slope overload. This variant is called adaptive DPCM (ADPCM) ADPCM The ADPCM system according to the CCITT standard G.726 [20], system 2 in Fig. 12, incorporates a 4-bit non-uniform backward adaptive quantizer and a backward adaptive predictor, so that no side information needs to be transmitted. The backward adaptive predictor is controlled by the quantized prediction error e q [n] and the quantized speech signal Sq [n], in such a way that the cross-correlation between these two signals is adaptively driven to zero, as explained in Sec Both signals are also available in the decoder. The system has a bit rate of 32 kbitjs and its MOS is similar to that of 64 kbitjs PCM. It is used on the PSTN and in DECT (Digital European Cordless Telephone, another ETSI standard), without additional error proteetion bits. It is more complex than PCM, but single-chip realizations are readily available on the market. It has no intrinsic delay and it is very robust in all other respects. For non-speech signals, such as data signals, however, special provisions are incorporated to detect them and to control the settings of the system accordingly Delta modulation In delta modulation (DM), a one-bit quantizer is used and the sampling frequency is increased [6]. The feedback-loop in the coder consists of a simple, fixed, integrating network..many variants have been proposed, most of them differing in the way the quantizer is made adaptive. One of them is the 472 Philips Journalof Research Vol.49 No

19 State of the art and trends in speech coding backward adaptive 'continuous variable slope' DM (CVSD) [6, 21].1t has also been included in the survey of Fig. 12 (system 8). DM has only limited application, mainly military, and in NASA's space shuttle [22].The main features of DM are that it is very simple and extremely robust against transmission errors Analysis-by-synthesis coders The class of analysis-by-synthesis coders under consideration is based on LPC synthesis. Figure 14 shows the generic structure of such coding systems Generic strue ture The speech signal is split up into segments of typically 20 ms, and on each segment LP analysis is performed. A local decoder, consisting of an adaptive LP synthesis filter I/A(z), is excited by an excitation generator to obtain an estimate, s'[n], of the speech signal srn]. The excitation generator can generate only a limited number ofapproximations, xf[n], 1= 1,2,... L, to the prediction residual, so that log2l bits are needed to inform the decoder which particular excitation sequence to use. The error sequence, s[n]- san], is evaluated over an interval of typically 5ms on a mean-square basis. The excitation signal xf[n] is chosen such that, given the L degrees of freedom of the excitation signal, a minimum mean square error (MMSE) is obtained. In speech signals, the excitation candidate that delivers MMSE is not necessarily the candidate that delivers the best perceptual result.. In order to make the error criterion perceptually relevant, 'noise shaping' is introduced. One can conclude from the auditory masking model that more distortion s ; ~ LPC parameters , I I I I I Excitation ~ 1 I I Generator A(z) I I I I 1 ~--!:Q.c.ru.I>-gç-o-li-g!:.---J il A(z) A(zly) _ e Fig. 14. Generic structure of LPC-based analysis-by-synthesis coding systems. Philip. Journalof Research Vol. 49 No

20 R.J. Sluijter et al. can be tolerated in the formant regions. Accordingly a filter is designed which provides this weighting and is generally referred to as the weighting filter. This filter has the form A(z)/ A (zh) in which 0 < 'Y < 1with a typical value ofo.8. The effect of introducing such a parameter is to increase the bandwidth of the formants with respect to those of 1/ A(z). In this way, the formants are partially suppressed so that they have less weight in el[n], which results in the toleration of relatively larger distortion in the formant regions. The MMSE procedure works with an invariable part and an innovation part. The invariable part eo[n] is that part of el[n] which is not influenced by the excitation signal in the subsegment under consideration, so it does not depend on I. It consists of the hangovers of the synthesis filter and the weighting filter of previous subsegments and the contribution of srn] in the current segment. The innovation part ul[n] consists of the convolution of xl[n] with the impulse response h[n] of the cascade of the synthesis and weighting filters, so that the mean square error El is given by: 1 N-I 2 1 N-I 2 EI = N L el [n] = N L (eo[n]- u![n]), n=o n=o where N is the length of the sub segment. The minimum error El, I = 1,2, 3...,L, indicates the best excitation sequence in the weighted MMSE sense. There are three main variants of analysis-by-synthesis coders which basically differ only in the type of excitation function: code excited linear predictive (CELP) coders, multi-pulse excited (MPE) coders and regular-pulse excited (RPE) coders, which will be considered in more detail in the following. (6) Code excitation A CELP coder [23] is basically a vector quantizer operating on the decorrelated and normalized speech signal, with a weighted matching criterion. The codebook used contains rms-normalized approximations to the LTP residual E[n]. Accordingly, the excitation generator in Fig. 14 consists of a codebook, a gain factor, and an LTP synthesis filter. Figure 15shows the structure ofsuch a code-excitation generator in which the LTP synthesizer has been modified to what is now called an adaptive codebook. Ifthe associated lag exceedsthe subsegment duration, it operates as a usual LTP synthesis filter 1/ P(z). Otherwise, the number of samples in the delay line spanned by the lag is repeated until the subsegment is completed. This has the advantage that the computation of the gain factor gp (pitch prediction coefficient) of the adaptive codebook is straightforward [24].The range of the lag is normally ms, comprising 474 Philip. Journal of Research Vol.49 No

21 State of the art and trends in speech coding Lag Optional repeat x Fixed Codebook Fig. IS. Architecture of the CELP excitation generator. 128 integer values. Enhanced performance is obtained if the lag is allowed to have subsample resolution [25]. As a rule, not more than 256 lag values are used, distributed non-uniformly; for smalllags this distribution is more dense, never exceeding a virtual oversampling factor of 8, and for large lags only integer values are used. The main difficulty in the design of CELP coders is to keep the complexity manageable. In order to avoid the complexity of a joint search procedure for the best match, the adaptive codebook and the fixed codebook are searched sequentially. This means that in the first search the contribution of the fixed codebook is zero. Alllags are assessed, and for each lag an optimum gain gp,l is computed. The optimum gain is obtained by substituting ul[n] of eq. (6) by gp,iuan] and setting the partial derivative of El, with respect to gp,l, to zero. The search procedure selects that lag I, for which El is minimum. Next, the effect of the selected vector is incorporated into a new invariable part and the same procedure is now repeated for the fixed codebook. The loss in performance due to this approach is negligible, despite its suboptimality. Further reductions in complexity are necessary for one-dsp realizations. Many CELP variants, all aiming at reduced complexity of the adaptive-codebook and fixed-codebook search procedures, have been proposed in order to meet this goal. The required degree of sophistication of the DSP varies, however. The robustness of CELP coders shows some relation with the bit rate. The lower the bit rate, the more speech specific the Philips Journal of Research Vol.49 No

22 R.J. Sluijter et al. system, and the more information is carried per bit, causing increased error sensitivity. In the DoD-CELP coder (system 7 in Fig. 12) a ternary valued, sparsely populated random codebook (only 25% of the vector elements are nonzero) is used, resulting in a reduced complexity of the fixed codebook search [26]. The adaptive codebook search is split up in a hierarchical way, by first searching integer valued lags followed by searching only neighbouring noninteger values. The relatively low bit rate is obtained mainly by the use of long segments and subsegments (30ms and 7.5ms, respectively) and the low excitation rate* of 0.3 bit/sample, which are also the reasons for the relatively low quality (MOS ~ 3). The long segments also cause the relatively high intrinsic delay of 45 ms. The bit rate of the system is 4.8 kbit/s, including 133 bit/s for error correction. In the VSELP coder (system 5 in Fig. 12), two fixed codebooks are used which are searched in a sequential way and each codebook is constructed by the sum of 7 basisveetors [27]. The excitation code of each codebook consists of 7 bits, being the signs of the basisvectors, so that 128 combinations can be generated. This approach saves the convolution with h[n] for each codebook vector, because it can be precalculated once per basisvector, and the allocations of signs and summations can be done afterwards. The adaptive codebook has no subsample resolution yet. The segments are 20 ms with 5 ms subsegments. Despite the shorter segments (compared to the DoD-CELP), the intrinsic delay is still about 40 ms. This is due to the particular segmentation arrangement. The excitation rate is bit/sample and the estimated MOS is about 3.5. The bit rate of the coder is 7.95 kbit/s and it has been standardized (International Standard IS-54) by the North American CTIA for digital mobile telephony. The standard prescribes a gross channel rate of 13 kbit/s, the difference being devoted to error protection. At the time of standardization, it was the best existing CELP variant which could be realized in a single DSP. The LD-CELP coder has an architecture as shown in Fig. 16 [28,29]. Its low intrinsic delay, ms, is made possible by the use of backward-adaptive LPC analysis and gain control, and the short duration, only five sampling periods, of the subsegments. The coder does not incorporate LTP, but an extremely high order of 50 is used in the LPC. This makes the coder less speech-specific and consequently, it is very transparent, even for music signals. *) The excitation rate is another indicator related to speech quality. Itis defined here as the ratio of the number of bits allocated to the excitation code, apart from absolute gain factors, and the number of samples, both in a subsegment. Philips Journalof Research Vol.49 No

23 State, _ Fixed x 1 codebook of the art and trends in speech coding A(z) s' s perceptual weighting filter e Backward gain adaptation Backward LPC analysis._.---- Local Decoder _._ I i ~ Excitation code Fig. 16. Architecture of the low-delay CELP. Despite the backward adaptation, the system can handle error rates up to 10-2 which matches perfectly the intended application on the PSTN. The trained codebook contains a 7-bit 'shape' codebook (128 excitation vectors) and a 3-bit 'gain' codebook, including a sign bit, to control the backward adaptive gain control process. Its excitation rate is 1.6 bit/sample while the total bit rate is 16 kbit/s. The speech quality is the same as the quality of system 2. The coder is very complex, though implementations on a single (sophisticated) DSP already exist Pulse excitation In an MPE coder the excitation signal x[n] of Fig. 14 consists of a few pulses per sub segment, as depicted in Fig. 17a. If the locations of these pulses are known, the amplitudes can be calculated with the help of eq. (6), again by setting the partial derivatives of El with respect to these pulse amplitudes to zero, and solving the resulting set of equations. For each I, the pulses have different locations. If, for example, 10 pulses are to be located in a subsegment of 5 ms, there are about 10 9 possible excitation vectors. This means that 10 9 sets of equations have to be solved and just as many error measures have to be computed in order to select the minimum El. This is far too complex to be handled even by several DSPs. The usual approach is, therefore, a suboptimal sequential Philips Journalof Research Vol. 49 No

24 R.J. Sluijter et al. 1"1"1''1,,1,, I I I "1 (a) (b) Fig. 17. Examples of the excitation signal x for (a) multi-pulse excitation and (b) regular-pulse excitation. search, pulse by pulse [30]. Although MPE can yield good speech quality, it cannot compete with CELP because a relatively high number of bits is needed for coding the pulse positions. It has not been standardized for any application, but it has been the basis of RPE. In an RPE coder the excitation pulses are placed regularly according to a downsampling scheme, as shown in Fig. 17b [31]. If the downsampling factor is 3, for instance, there are only 3 pulse-position grids according to the 3 possible phases of downsampling. Now, only 3 sets of equations have to be solved and 3 values of E[ determined, unveiling the grid position with the lowest E[. Such a system has been the basis of the ETSI standard for the GSM (Global* System for Mobile) digital cellular telephone network. Because RPE in its original form was still too complex for a commercially attractive implementation at the time the standard was being developed, a simplified version has been standardized [32, 33]. In the RPE-LTP coder, as the GSM system is technically called (system 4 in Fig. 12), the speech signal is first fed through an inverse filter, using 20 ms segments, and subsequently, a pitch prediction residual e [n] is computed by an adaptive LTP on the basis of 5ms subsegments (Fig. 18). For each subsegment of e [n], the RPE coder generates candidate excitation sequences on the basis of a down sampling factor of three, and selects the one with the lowest E[. Prior to transmission, the RPE pulses are quantized using forward blockadaptive, 3-bit uniform quantization, with a single amplitude parameter per *) Although the ETS! is a European organization, the GSM system is being increasingly adopted on the global scale. 478 Philip. Journni of Research Vol.49 No

25 State of the art and trends in speech coding LPC parameters Fig. 18. Architecture of the RPE-LTP coder (full rate GSM). subsegment. The bit rate of the coder is 13 kbit/s, and its intrinsic delay is 20 ms. The gross bit rate on the radio channel is 22.8 kbit/s, so 9.8 kbit/s is used for error protection. This generous proteetion makes the system very reliable on the adverse radio channel. The excitation rate is 1.2 bit/sample, but the quality cannot be compared to that of the LD-CELP. The average MOS test result, which include speech utterances exposed to background noise, tandeming and channel error rates up to 30%, is about 3.6. It is the straightforward structure of the system that makes it quite transparent, even for signalling tones. Data signals, however, have to be transmitted separately in the GSM system. The complexity is low, enabling the complete digital baseband processing of a coder and decoder, including voice activity detection, channel coding and decoding and additional channel control, to fit in a single DSP of medium sophistication Vocoders At least one vocoder already existed in 1936 [34], even before PCM was invented by Reeves in A vocoder is based on the source-filter model of speech production, recall Fig. 1. In the encoder, the model parameters of a speech segment, i.e. the pitch period, the voiced/unvoiced parameter, the gain filter parameters are analysed from segments of typically 20 ms duration, and encoded. The decoder consists of a replica of the model that is controlled by the decoded parameters. As argued in Sec. 2.1., this approach results in very low bit rates. The success of this approach depends on the accuracy and the perceptual relevance of the underlying model. The LPC-l OEvocoder of system 9 in Fig. 12 is characterized by the use of an LPC synthesis filter with 10 coefficients [35].Its bit rate is 2.4 kbit/s. This kind of vocoder is known for its poor, synthetic, speech quality, mainly caused by the incompleteness and oversimplification of the model used, especially in the excitation part. However, interesting new developments are going on in this field, as announced by system 6 in Fig. 12. The IMBE coder uses spectral analysis based on the Fourier transform of 20 ms segments [36]. The spectrum is divided into a number of bands and Philip. Journalof Research Vol. 49 No

26 R.J. Sluijter et al. for each band a voiced/unvoiced parameter is determined. In voiced bands, only the amplitudes of pitch harmonics are retained. For unvoiced bands only one amplitude parameter per band is used. The pitch is represented by a single parameter per segment and heavy tracking is applied over several segments, being the main cause of an intrinsic delay of almost 80 ms. This makes the system practically suitable for half-duplex communication only. The bit rate of the INMARSAT-M system is 4.15 kbit/s and the gross bit rate on the satellite link is 6.4 kbit/s. Although the system is very speech specific and consequently not at all transparent, it can cope with background noise quite well. Its quality exceeds that of the DoD-CELP, which is quite an achievement because this seems to be the first vocoder that outperforms a CELP at that bit rate, while it has a relatively modest complexity Future trends One standard underway is the half-rate GSM standard, expected to be adopted by the ETSI in 1995 [37]. The half-rate GSM channel has a bit rate of 11.4 kbit/s and the net bit rate of the speech coder is 5.6 kbit/s, so 5.8 kbit/s is used for error protection. It incorporates a VSELP-based system with 20 ms segments and 5ms subsegments and a subsample resolution adaptive codebook. The excitation rate is 0.35 bit/sample. The performance of this system with respect to speech quality and robustness approaches that of the full-rate system. The conclusion which can be drawn is that at these bit rates the quality indicated by the dashed line in Fig. 12 (the trend) cannot yet be reached. Another standard is being considered by an ETSI Study Group which is currently looking at the possibility of enhanced quality full-rate speech coding. Meanwhile, an interesting development is going on in the ITU-T standardization process of an 8 kbit/s coder for the Future Public Land Mobile Telecommunication System (FPLMTS). This standard is expected to be launched by the end of Two candidates are involved, one with a 'conjugate structured' fixed codebook (CSCELP) and the other with an 'algebraic' fixed codebook (ACELP) [38, 39]. Both coders have an intrinsic delay of ~ 15ms. Vector quantization is applied to the short-term prediction parameters, requiring less than 20 bits for their representation. This enables a 10ms update rate without increasing the bit rate as compared to 20 ms segments and 40-bit representation. The speech quality of these coders outperforms that of the reference (system 2 of Fig. 12), even in the case of transmission errors up to 1%. The complexity is in the order of magnitude of one DSP again. The final system is expected to be a combination of the proposed candidates. This system will make all other existing speech coders at bit rates 480 Phillps Jouroulof Research Vol. 49 No

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances

More information

Chapter IV THEORY OF CELP CODING

Chapter IV THEORY OF CELP CODING Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile 8 2. LITERATURE SURVEY The available radio spectrum for the wireless radio communication is very limited hence to accommodate maximum number of users the speech is compressed. The speech compression techniques

More information

Analysis/synthesis coding

Analysis/synthesis coding TSBK06 speech coding p.1/32 Analysis/synthesis coding Many speech coders are based on a principle called analysis/synthesis coding. Instead of coding a waveform, as is normally done in general audio coders

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Cellular systems & GSM Wireless Systems, a.a. 2014/2015

Cellular systems & GSM Wireless Systems, a.a. 2014/2015 Cellular systems & GSM Wireless Systems, a.a. 2014/2015 Un. of Rome La Sapienza Chiara Petrioli Department of Computer Science University of Rome Sapienza Italy 2 Voice Coding 3 Speech signals Voice coding:

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder COMPUSOFT, An international journal of advanced computer technology, 3 (3), March-204 (Volume-III, Issue-III) ISSN:2320-0790 Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech

More information

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP Monika S.Yadav Vidarbha Institute of Technology Rashtrasant Tukdoji Maharaj Nagpur University, Nagpur, India monika.yadav@rediffmail.com

More information

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley University of California Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Professors : N.Morgan / B.Gold EE225D Spring,1999 Medium & High Rate Coding Lecture 26

More information

Digital Speech Processing and Coding

Digital Speech Processing and Coding ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/

More information

APPLICATIONS OF DSP OBJECTIVES

APPLICATIONS OF DSP OBJECTIVES APPLICATIONS OF DSP OBJECTIVES This lecture will discuss the following: Introduce analog and digital waveform coding Introduce Pulse Coded Modulation Consider speech-coding principles Introduce the channel

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC Jimmy Lapierre 1, Roch Lefebvre 1, Bruno Bessette 1, Vladimir Malenovsky 1, Redwan Salami 2 1 Université de Sherbrooke, Sherbrooke (Québec),

More information

3GPP TS V8.0.0 ( )

3GPP TS V8.0.0 ( ) TS 46.022 V8.0.0 (2008-12) Technical Specification 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Half rate speech; Comfort noise aspects for the half rate

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

The Channel Vocoder (analyzer):

The Channel Vocoder (analyzer): Vocoders 1 The Channel Vocoder (analyzer): The channel vocoder employs a bank of bandpass filters, Each having a bandwidth between 100 Hz and 300 Hz. Typically, 16-20 linear phase FIR filter are used.

More information

EC 2301 Digital communication Question bank

EC 2301 Digital communication Question bank EC 2301 Digital communication Question bank UNIT I Digital communication system 2 marks 1.Draw block diagram of digital communication system. Information source and input transducer formatter Source encoder

More information

Comparison of CELP speech coder with a wavelet method

Comparison of CELP speech coder with a wavelet method University of Kentucky UKnowledge University of Kentucky Master's Theses Graduate School 2006 Comparison of CELP speech coder with a wavelet method Sriram Nagaswamy University of Kentucky, sriramn@gmail.com

More information

MASTER'S THESIS. Speech Compression and Tone Detection in a Real-Time System. Kristina Berglund. MSc Programmes in Engineering

MASTER'S THESIS. Speech Compression and Tone Detection in a Real-Time System. Kristina Berglund. MSc Programmes in Engineering 2004:003 CIV MASTER'S THESIS Speech Compression and Tone Detection in a Real-Time System Kristina Berglund MSc Programmes in Engineering Department of Computer Science and Electrical Engineering Division

More information

Telecommunication Electronics

Telecommunication Electronics Politecnico di Torino ICT School Telecommunication Electronics C5 - Special A/D converters» Logarithmic conversion» Approximation, A and µ laws» Differential converters» Oversampling, noise shaping Logarithmic

More information

Digital Signal Representation of Speech Signal

Digital Signal Representation of Speech Signal Digital Signal Representation of Speech Signal Mrs. Smita Chopde 1, Mrs. Pushpa U S 2 1,2. EXTC Department, Mumbai University Abstract Delta modulation is a waveform coding techniques which the data rate

More information

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Compression Using Voice Excited Linear Predictive Coding Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis

More information

Analog and Telecommunication Electronics

Analog and Telecommunication Electronics Politecnico di Torino - ICT School Analog and Telecommunication Electronics D5 - Special A/D converters» Differential converters» Oversampling, noise shaping» Logarithmic conversion» Approximation, A and

More information

EEE 309 Communication Theory

EEE 309 Communication Theory EEE 309 Communication Theory Semester: January 2016 Dr. Md. Farhad Hossain Associate Professor Department of EEE, BUET Email: mfarhadhossain@eee.buet.ac.bd Office: ECE 331, ECE Building Part 05 Pulse Code

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding

Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding Nanda Prasetiyo Koestoer B. Eng (Hon) (1998) School of Microelectronic Engineering Faculty of Engineering and Information Technology Griffith

More information

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21 E85.267: Lecture 8 Source-Filter Processing E85.267: Lecture 8 Source-Filter Processing 21-4-1 1 / 21 Source-filter analysis/synthesis n f Spectral envelope Spectral envelope Analysis Source signal n 1

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

-/$5,!4%$./)3% 2%&%2%.#% 5.)4 -.25

-/$5,!4%$./)3% 2%&%2%.#% 5.)4 -.25 INTERNATIONAL TELECOMMUNICATION UNION )454 0 TELECOMMUNICATION (02/96) STANDARDIZATION SECTOR OF ITU 4%,%0(/.% 42!.3-)33)/. 15!,)49 -%4(/$3 &/2 /"*%#4)6%!.$ 35"*%#4)6%!33%33-%.4 /& 15!,)49 -/$5,!4%$./)3%

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

QUESTION BANK EC 1351 DIGITAL COMMUNICATION YEAR / SEM : III / VI UNIT I- PULSE MODULATION PART-A (2 Marks) 1. What is the purpose of sample and hold

QUESTION BANK EC 1351 DIGITAL COMMUNICATION YEAR / SEM : III / VI UNIT I- PULSE MODULATION PART-A (2 Marks) 1. What is the purpose of sample and hold QUESTION BANK EC 1351 DIGITAL COMMUNICATION YEAR / SEM : III / VI UNIT I- PULSE MODULATION PART-A (2 Marks) 1. What is the purpose of sample and hold circuit 2. What is the difference between natural sampling

More information

Lesson 8 Speech coding

Lesson 8 Speech coding Lesson 8 coding Encoding Information Transmitter Antenna Interleaving Among Frames De-Interleaving Antenna Transmission Line Decoding Transmission Line Receiver Information Lesson 8 Outline How information

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

Speech Coding using Linear Prediction

Speech Coding using Linear Prediction Speech Coding using Linear Prediction Jesper Kjær Nielsen Aalborg University and Bang & Olufsen jkn@es.aau.dk September 10, 2015 1 Background Speech is generated when air is pushed from the lungs through

More information

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying

More information

COMPARATIVE REVIEW BETWEEN CELP AND ACELP ENCODER FOR CDMA TECHNOLOGY

COMPARATIVE REVIEW BETWEEN CELP AND ACELP ENCODER FOR CDMA TECHNOLOGY COMPARATIVE REVIEW BETWEEN CELP AND ACELP ENCODER FOR CDMA TECHNOLOGY V.C.TOGADIYA 1, N.N.SHAH 2, R.N.RATHOD 3 Assistant Professor, Dept. of ECE, R.K.College of Engg & Tech, Rajkot, Gujarat, India 1 Assistant

More information

Time division multiplexing The block diagram for TDM is illustrated as shown in the figure

Time division multiplexing The block diagram for TDM is illustrated as shown in the figure CHAPTER 2 Syllabus: 1) Pulse amplitude modulation 2) TDM 3) Wave form coding techniques 4) PCM 5) Quantization noise and SNR 6) Robust quantization Pulse amplitude modulation In pulse amplitude modulation,

More information

EEE 309 Communication Theory

EEE 309 Communication Theory EEE 309 Communication Theory Semester: January 2017 Dr. Md. Farhad Hossain Associate Professor Department of EEE, BUET Email: mfarhadhossain@eee.buet.ac.bd Office: ECE 331, ECE Building Types of Modulation

More information

Department of Electronics and Communication Engineering 1

Department of Electronics and Communication Engineering 1 UNIT I SAMPLING AND QUANTIZATION Pulse Modulation 1. Explain in detail the generation of PWM and PPM signals (16) (M/J 2011) 2. Explain in detail the concept of PWM and PAM (16) (N/D 2012) 3. What is the

More information

10 Speech and Audio Signals

10 Speech and Audio Signals 0 Speech and Audio Signals Introduction Speech and audio signals are normally converted into PCM, which can be stored or transmitted as a PCM code, or compressed to reduce the number of bits used to code

More information

Auditory modelling for speech processing in the perceptual domain

Auditory modelling for speech processing in the perceptual domain ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

Audio Signal Compression using DCT and LPC Techniques

Audio Signal Compression using DCT and LPC Techniques Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,

More information

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals ISCA Journal of Engineering Sciences ISCA J. Engineering Sci. Vocoder (LPC) Analysis by Variation of Input Parameters and Signals Abstract Gupta Rajani, Mehta Alok K. and Tiwari Vebhav Truba College of

More information

Voice mail and office automation

Voice mail and office automation Voice mail and office automation by DOUGLAS L. HOGAN SPARTA, Incorporated McLean, Virginia ABSTRACT Contrary to expectations of a few years ago, voice mail or voice messaging technology has rapidly outpaced

More information

CHAPTER 5. Digitized Audio Telemetry Standard. Table of Contents

CHAPTER 5. Digitized Audio Telemetry Standard. Table of Contents CHAPTER 5 Digitized Audio Telemetry Standard Table of Contents Chapter 5. Digitized Audio Telemetry Standard... 5-1 5.1 General... 5-1 5.2 Definitions... 5-1 5.3 Signal Source... 5-1 5.4 Encoding/Decoding

More information

Adaptive Filters Application of Linear Prediction

Adaptive Filters Application of Linear Prediction Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing

More information

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt Pattern Recognition Part 6: Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory

More information

Speech Enhancement Based On Noise Reduction

Speech Enhancement Based On Noise Reduction Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion

More information

Digital Communication (650533) CH 3 Pulse Modulation

Digital Communication (650533) CH 3 Pulse Modulation Philadelphia University/Faculty of Engineering Communication and Electronics Engineering Digital Communication (650533) CH 3 Pulse Modulation Instructor: Eng. Nada Khatib Website: http://www.philadelphia.edu.jo/academics/nkhatib/

More information

3GPP TS V5.0.0 ( )

3GPP TS V5.0.0 ( ) TS 26.171 V5.0.0 (2001-03) Technical Specification 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Speech Codec speech processing functions; AMR Wideband

More information

NOVEL PITCH DETECTION ALGORITHM WITH APPLICATION TO SPEECH CODING

NOVEL PITCH DETECTION ALGORITHM WITH APPLICATION TO SPEECH CODING NOVEL PITCH DETECTION ALGORITHM WITH APPLICATION TO SPEECH CODING A Thesis Submitted to the Graduate Faculty of the University of New Orleans in partial fulfillment of the requirements for the degree of

More information

CHAPTER 3 Syllabus (2006 scheme syllabus) Differential pulse code modulation DPCM transmitter

CHAPTER 3 Syllabus (2006 scheme syllabus) Differential pulse code modulation DPCM transmitter CHAPTER 3 Syllabus 1) DPCM 2) DM 3) Base band shaping for data tranmission 4) Discrete PAM signals 5) Power spectra of discrete PAM signal. 6) Applications (2006 scheme syllabus) Differential pulse code

More information

ON-LINE LABORATORIES FOR SPEECH AND IMAGE PROCESSING AND FOR COMMUNICATION SYSTEMS USING J-DSP

ON-LINE LABORATORIES FOR SPEECH AND IMAGE PROCESSING AND FOR COMMUNICATION SYSTEMS USING J-DSP ON-LINE LABORATORIES FOR SPEECH AND IMAGE PROCESSING AND FOR COMMUNICATION SYSTEMS USING J-DSP A. Spanias, V. Atti, Y. Ko, T. Thrasyvoulou, M.Yasin, M. Zaman, T. Duman, L. Karam, A. Papandreou, K. Tsakalis

More information

Pulse Code Modulation

Pulse Code Modulation Pulse Code Modulation EE 44 Spring Semester Lecture 9 Analog signal Pulse Amplitude Modulation Pulse Width Modulation Pulse Position Modulation Pulse Code Modulation (3-bit coding) 1 Advantages of Digital

More information

Page 0 of 23. MELP Vocoder

Page 0 of 23. MELP Vocoder Page 0 of 23 MELP Vocoder Outline Introduction MELP Vocoder Features Algorithm Description Parameters & Comparison Page 1 of 23 Introduction Traditional pitched-excited LPC vocoders use either a periodic

More information

1.Explain the principle and characteristics of a matched filter. Hence derive the expression for its frequency response function.

1.Explain the principle and characteristics of a matched filter. Hence derive the expression for its frequency response function. 1.Explain the principle and characteristics of a matched filter. Hence derive the expression for its frequency response function. Matched-Filter Receiver: A network whose frequency-response function maximizes

More information

Fundamental Frequency Detection

Fundamental Frequency Detection Fundamental Frequency Detection Jan Černocký, Valentina Hubeika {cernocky ihubeika}@fit.vutbr.cz DCGM FIT BUT Brno Fundamental Frequency Detection Jan Černocký, Valentina Hubeika, DCGM FIT BUT Brno 1/37

More information

Wideband Speech Coding & Its Application

Wideband Speech Coding & Its Application Wideband Speech Coding & Its Application Apeksha B. landge. M.E. [student] Aditya Engineering College Beed Prof. Amir Lodhi. Guide & HOD, Aditya Engineering College Beed ABSTRACT: Increasing the bandwidth

More information

REAL-TIME IMPLEMENTATION OF A VARIABLE RATE CELP SPEECH CODEC

REAL-TIME IMPLEMENTATION OF A VARIABLE RATE CELP SPEECH CODEC REAL-TIME IMPLEMENTATION OF A VARIABLE RATE CELP SPEECH CODEC Robert Zopf B.A.Sc. Simon Fraser University, 1993 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF

More information

International Journal of Advanced Engineering Technology E-ISSN

International Journal of Advanced Engineering Technology E-ISSN Research Article ARCHITECTURAL STUDY, IMPLEMENTATION AND OBJECTIVE EVALUATION OF CODE EXCITED LINEAR PREDICTION BASED GSM AMR 06.90 SPEECH CODER USING MATLAB Bhatt Ninad S. 1 *, Kosta Yogesh P. 2 Address

More information

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2 ECE 556 BASICS OF DIGITAL SPEECH PROCESSING Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2 Analog Sound to Digital Sound Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre

More information

Waveform Coding Algorithms: An Overview

Waveform Coding Algorithms: An Overview August 24, 2012 Waveform Coding Algorithms: An Overview RWTH Aachen University Compression Algorithms Seminar Report Summer Semester 2012 Adel Zaalouk - 300374 Aachen, Germany Contents 1 An Introduction

More information

COMP 546, Winter 2017 lecture 20 - sound 2

COMP 546, Winter 2017 lecture 20 - sound 2 Today we will examine two types of sounds that are of great interest: music and speech. We will see how a frequency domain analysis is fundamental to both. Musical sounds Let s begin by briefly considering

More information

DEPARTMENT OF DEFENSE TELECOMMUNICATIONS SYSTEMS STANDARD

DEPARTMENT OF DEFENSE TELECOMMUNICATIONS SYSTEMS STANDARD NOT MEASUREMENT SENSITIVE 20 December 1999 DEPARTMENT OF DEFENSE TELECOMMUNICATIONS SYSTEMS STANDARD ANALOG-TO-DIGITAL CONVERSION OF VOICE BY 2,400 BIT/SECOND MIXED EXCITATION LINEAR PREDICTION (MELP)

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

Downloaded from 1

Downloaded from  1 VII SEMESTER FINAL EXAMINATION-2004 Attempt ALL questions. Q. [1] How does Digital communication System differ from Analog systems? Draw functional block diagram of DCS and explain the significance of

More information

Pitch Period of Speech Signals Preface, Determination and Transformation

Pitch Period of Speech Signals Preface, Determination and Transformation Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com

More information

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL 9th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, -7 SEPTEMBER 7 A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL PACS: PACS:. Pn Nicolas Le Goff ; Armin Kohlrausch ; Jeroen

More information

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Digital Audio. Lecture-6

Digital Audio. Lecture-6 Digital Audio Lecture-6 Topics today Digitization of sound PCM Lossless predictive coding 2 Sound Sound is a pressure wave, taking continuous values Increase / decrease in pressure can be measured in amplitude,

More information

Adaptive Filters Linear Prediction

Adaptive Filters Linear Prediction Adaptive Filters Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory Slide 1 Contents

More information

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution PAGE 433 Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution Wenliang Lu, D. Sen, and Shuai Wang School of Electrical Engineering & Telecommunications University of New South Wales,

More information

Chapter 2: Digitization of Sound

Chapter 2: Digitization of Sound Chapter 2: Digitization of Sound Acoustics pressure waves are converted to electrical signals by use of a microphone. The output signal from the microphone is an analog signal, i.e., a continuous-valued

More information

CODING TECHNIQUES FOR ANALOG SOURCES

CODING TECHNIQUES FOR ANALOG SOURCES CODING TECHNIQUES FOR ANALOG SOURCES Prof.Pratik Tawde Lecturer, Electronics and Telecommunication Department, Vidyalankar Polytechnic, Wadala (India) ABSTRACT Image Compression is a process of removing

More information

UNIVERSITY OF SURREY LIBRARY

UNIVERSITY OF SURREY LIBRARY 7385001 UNIVERSITY OF SURREY LIBRARY All rights reserved I N F O R M A T I O N T O A L L U S E R S T h e q u a l i t y o f t h i s r e p r o d u c t i o n is d e p e n d e n t u p o n t h e q u a l i t

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

General outline of HF digital radiotelephone systems

General outline of HF digital radiotelephone systems Rec. ITU-R F.111-1 1 RECOMMENDATION ITU-R F.111-1* DIGITIZED SPEECH TRANSMISSIONS FOR SYSTEMS OPERATING BELOW ABOUT 30 MHz (Question ITU-R 164/9) Rec. ITU-R F.111-1 (1994-1995) The ITU Radiocommunication

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Linguistic Phonetics. Spectral Analysis

Linguistic Phonetics. Spectral Analysis 24.963 Linguistic Phonetics Spectral Analysis 4 4 Frequency (Hz) 1 Reading for next week: Liljencrants & Lindblom 1972. Assignment: Lip-rounding assignment, due 1/15. 2 Spectral analysis techniques There

More information

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of COMPRESSIVE SAMPLING OF SPEECH SIGNALS by Mona Hussein Ramadan BS, Sebha University, 25 Submitted to the Graduate Faculty of Swanson School of Engineering in partial fulfillment of the requirements for

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications Brochure More information from http://www.researchandmarkets.com/reports/569388/ Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications Description: Multimedia Signal

More information

A 600 BPS MELP VOCODER FOR USE ON HF CHANNELS

A 600 BPS MELP VOCODER FOR USE ON HF CHANNELS A 600 BPS MELP VOCODER FOR USE ON HF CHANNELS Mark W. Chamberlain Harris Corporation, RF Communications Division 1680 University Avenue Rochester, New York 14610 ABSTRACT The U.S. government has developed

More information

New Features of IEEE Std Digitizing Waveform Recorders

New Features of IEEE Std Digitizing Waveform Recorders New Features of IEEE Std 1057-2007 Digitizing Waveform Recorders William B. Boyer 1, Thomas E. Linnenbrink 2, Jerome Blair 3, 1 Chair, Subcommittee on Digital Waveform Recorders Sandia National Laboratories

More information

(Refer Slide Time: 3:11)

(Refer Slide Time: 3:11) Digital Communication. Professor Surendra Prasad. Department of Electrical Engineering. Indian Institute of Technology, Delhi. Lecture-2. Digital Representation of Analog Signals: Delta Modulation. Professor:

More information

Finite Word Length Effects on Two Integer Discrete Wavelet Transform Algorithms. Armein Z. R. Langi

Finite Word Length Effects on Two Integer Discrete Wavelet Transform Algorithms. Armein Z. R. Langi International Journal on Electrical Engineering and Informatics - Volume 3, Number 2, 211 Finite Word Length Effects on Two Integer Discrete Wavelet Transform Algorithms Armein Z. R. Langi ITB Research

More information

IMPLEMENTATION OF G.726 ITU-T VOCODER ON A SINGLE CHIP USING VHDL

IMPLEMENTATION OF G.726 ITU-T VOCODER ON A SINGLE CHIP USING VHDL IMPLEMENTATION OF G.726 ITU-T VOCODER ON A SINGLE CHIP USING VHDL G.Murugesan N. Ramadass Dr.J.Raja paul Perinbum School of ECE Anna University Chennai-600 025 Gm1gm@rediffmail.com ramadassn@yahoo.com

More information

Introduction to Speech Coding. Nimrod Peleg Update: Oct. 2009

Introduction to Speech Coding. Nimrod Peleg Update: Oct. 2009 Introduction to Speech Coding Nimrod Peleg Update: Oct. 2009 Goals and Tradeoffs Reduce bitrate while preserving needed quality Tradeoffs: Quality (Broadcast, Toll, Communication, Synthetic) Bit Rate Complexity

More information

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 03 Quantization, PCM and Delta Modulation Hello everyone, today we will

More information

Voice Transmission --Basic Concepts--

Voice Transmission --Basic Concepts-- Voice Transmission --Basic Concepts-- Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics: Amplitude Frequency Phase Telephone Handset (has 2-parts) 2 1. Transmitter

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 213 http://acousticalsociety.org/ ICA 213 Montreal Montreal, Canada 2-7 June 213 Signal Processing in Acoustics Session 2pSP: Acoustic Signal Processing

More information

Voice Codec for Floating Point Processor. Hans Engström & Johan Ross

Voice Codec for Floating Point Processor. Hans Engström & Johan Ross Voice Codec for Floating Point Processor Hans Engström & Johan Ross LiTH-ISY-EX--08/3782--SE Linköping 2008 Voice Codec for Floating Point Processor Master Thesis In Electronics Design, Dept. Of Electrical

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information