The Channel Vocoder (analyzer):

Size: px

Start display at page:

Download "The Channel Vocoder (analyzer):"

Maurice Pierce
6 years ago
Views:

1 Vocoders 1

2 The Channel Vocoder (analyzer): The channel vocoder employs a bank of bandpass filters, Each having a bandwidth between 100 Hz and 300 Hz. Typically, linear phase FIR filter are used. The output of each filter is rectified and lowpass filtered. The bandwidth of the lowpass filter is selected to match the time variations in the characteristics of the vocal tract. For measurement of the spectral magnitudes, a voicing detector and a pitch estimator are included in the speech analysis. 2

3 The Channel Vocoder (analyzer block diagram): Bandpass Filter Rectifier Lowpass Filter A/D Converter S(n) Bandpass Filter Rectifier Lowpass Filter A/D Converter Encoder To Channel Voicing detector Pitch detector 3

4 The Channel Vocoder (synthesizer): linear-phase FIR filters Covering 0-4 khz Each having a bandwidth between Hz 20-ms frames, or 50 Hz changing of spectral magnitude LPF bandwidth: Hz Sampling rate of the output of the filters: 50 Hz 4

5 The Channel Vocoder (synthesizer): Bit rate: 1 bit for voicing detector 6 bits for pitch period For 16 channels, each coded with 3-4 bits, updated 50 times per second Then the total bit rate is bps Further reductions to 1200 bps can be achieved by exploiting frequency correlations of the spectrum magnitude 5

6 The Channel Vocoder (synthesizer): At the receiver the signal samples are passed through D/A converters. The outputs of the D/As are multiplied by the voiced or unvoiced signal sources. The resulting signal are passed through bandpass filters. The outputs of the bandpass filters are summed to form the synthesized speech signal. 6

7 The Channel Vocoder (synthesizer block diagram): D/A Converter Bandpass Filter Output speech From Channel Decoder D/A Converter Voicing Information Bandpass Filter Switch Pitch period Pulse generator Random Noise generator 7

8 The Phase Vocoder : The phase vocoder is similar to the channel vocoder. However, instead of estimating the pitch, the phase vocoder estimates the phase derivative at the output of each filter. By coding and transmitting the phase derivative, this vocoder destroys the phase information. 8

9 The Phase Vocoder (analyzer block diagram, kth channel) S(n) cos k n Lowpass Filter cos k n a k n Differentiator Differentiator Compute Short-term Magnitude And Phase Derivative Short-term magnitude sin k n Decimator Encoder To Channel sin k n Lowpass Filter cos k n b k n Decimator Short-term phase derivative 9

10 The Phase Vocoder (synthesizer block diagram, kth channel) Decimated Short-term amplitude cos k n From Channel Decoder Integrator Cos Interpolator Decimated Sin Interpolator Short-term Phase sin k n derivative 10

11 The Phase Vocoder : LPF bandwidth: 50 Hz Demodulation separation: 100 Hz Number of filters: Sampling rate of spectrum magnitude and phase derivative: samples per second Spectral magnitude is coded using PCM or DPCM Phase derivative is coded linearly using 2-3 bits The resulting bit rate is 7200 bps 11

12 The Formant Vocoder : The formant vocoder can be viewed as a type of channel vocoder that estimates the first three or four formants in a segment of speech. It is this information plus the pitch period that is encoded and transmitted to the receiver. 12

13 The Formant Vocoder : Example of formant: (a) : The spectrogram of the utterance day one showing the pitch and the harmonic structure of speech. (b) : A zoomed spectrogram of the fundamental and the second harmonic. (a) (b) 13

14 The Formant Vocoder (analyzer block diagram): Input Speech F3 F2 F1 F3 B3 F2 B2 F1 B1 Pitch And V/U Decoder V/U F0 Fk :The frequency of the kth formant Bk :The bandwidth of the kth formant 14

15 The Formant Vocoder (synthesizer block diagram): F3 B3 F2 B2 F1 B1 V/U F0 F3 F2 F1 Excitation Signal 15

16 Linear Predictive Coding : The objective of LP analysis is to estimate parameters of an all-pole model for the vocal tract. Several methods have been devised for generating the excitation sequence for speech synthesizes. Various LPC-type speech analysis and synthesis methods differ primarily in the type of excitation signal generated for speech synthesis. 16

17 LPC 10 : This methods is called LPC-10 because of 10 coefficient are typically employed. LPC-10 partitions the speech into the 180 sample frame. Pitch and voicing decision are determined by using the AMDF and zero crossing measures. 17

18 A General Discrete-Time Model For Speech Production Pitch Gain s(n) Voiced DT Impulse generator G(z) Glottal Filter U(n) Voiced Volume velocity V U H(z) Vocal tract Filter R(z) LP Filter Speech Signal Unvoiced Uncorrelated Noise generator Gain 18

19 پيشگويي خطي تعيين مرتبه پيشگويي صفحه 19 از 54

20 پيشگويي خطي تعيين مرتبه پيشگويي صفحه 20 از 54

21 پيشگويي خطي تعيين مرتبه پيشگويي PG 10log m n m M 1 m n m M 1 s e 2 2 [ n] [ n] صفحه 21 از 54

22 پيشگويي خطي مثال M=4 M=10 صفحه 22 از 54

23 پيشگويي خطي مثال M=2 M=10 M=54 صفحه 23 از 54

24 پيشگويي خطي ايده پيشگويي خطي بلند مدت M=10 M=50 صفحه 24 از 54

25 پيشگويي خطي پيشگويي خطي بلند مدت صفحه 25 از 54

26 وكدر LPC10 مشخصات عمومي LPC10 صفحه 26

27 كد كننده وكدر LPC10 PCM LPC LPC LPC Bit Encoder صفحه 27 از 54

28 28 هحفص چيپ دويرپ صيخشت YMC m N m n l] s[n]s[n R[l,m] 1 m N m n l n s n s m l MDF 1 ] [ ] [ ], [ m N m n e N n s b n s 1 ], [ ] [. ] [

29 وكدر LPC10 MDF T=20,21,,39,40,42,,80,84,,154 صفحه 29 از 54

30 وكدر LPC10 كد كننده LPC RC صفحه 30 از 54

31 وكدر LPC10 سنتز گفتار سيگنال اصلي بخش كد كننده تعيين صدادار/بيصدا بودن فريم تعيين دوره گام فثط براي حالت صدادار محاسبه بهره سيگنال V/U قطار ضربه با پريود يراير دوره گام G گفتار سنتز شده نويز تصادفي صفحه 31

32 وكدر LPC10 محدوديتها AR صفحه 32

33 Residual Excited LP Vocoder : Speech quality can be improved at the expense of a higher bit rate by computing and transmitting a residual error, as done in the case of DPCM. One method is that the LPC model and excitation parameters are estimated from a frame of speech. 33

34 Residual Excited LP Vocoder : The speech is synthesized at the transmitter and subtracted from the original speech signal to form the residual error. The residual error is quantized, coded, and transmitted to the receiver At the receiver the signal is synthesized by adding the residual error to the signal generated from the model. 34

35 Residual Excited LP Vocoder : The residual signal is low-pass filtered at 1000 Hz in the analyzer to reduce bit rate In the synthesizer, it is rectified and spectrum flattened (using a HPF), the lowpass and highpass signals are summed and the resulting residual error signal is used to excite the LPC model. RELP vocoder provides communication-quality speech at about 9600 bps. 35

36 RELP Analyzer (type 1): S(n) Buffer And window f (n; m) e (n; m) Residual error Excitation parameters stlp analysis Θˆ 0, gain estimate V/U, decision Pˆ, pitch estimate LP Parameters {â(i;m)} LP Synthesis model Encoder To Channel 36

37 RELP Analyzer (type 2): S(n) Buffer f (n; m) Inverse And Filter window Â(z;m) Prediction Residual (n;m) Lowpass Filter Decimator DFT Encoder To Channel stlp analysis LP Parameters {â(i;m)} 37

38 Synthesizer for a RELP vocoder From Channel Decoder Buffer And Controller Residual Interpolator Rectifier Highpass Filter LP model Parameter updates LP synthesizer Excitation 38

39 Multipulse LPC Vocoder RELP needs to regenerate the highfrequency components at the decoder. A crude approximation of the high frequencies The multipulse LPC is a time domain analysis-by-synthesis method that results in a better excitation signal for the LPC vocal system filter. 39

40 Multipulse LPC Vocoder The information concerning the excitation sequence includes: the location of the pulses an overall scale factor corresponding to the largest pulse amplitude The pulse amplitudes relative to the overall scale factor The scale factor is logarithmically quantized into 6 bits. The amplitudes are linearly quantized into 4 bits. The pulse locations are encoded using a differential coding scheme. The excitation parameters are updated every 5 msec. The LPC vocal-tract parameters and the pitch period are updated every 20 msec. The bit rate is 9600 bps. 40

41 Analysis-by-synthesis coder A stored sequence from a Gaussian excitation codebook is scaled and used to excite the cascade of a pitch synthesis filter and the LPC synthesis filter The synthetic speech is compared with the original speech Residual error signal is weighted perceptually by a filter ˆ( z / c) W ( z) ˆ( z) Aˆ( z) Aˆ( z / c) 41

42 Obtaining the multipulse excitation: (Analysis by synthesis method) Input speech s(n) Pˆ Buffer And LP analysis Pitch Synthesis filterθ p (z) LP Synthesis filter - fˆ(n;m) f(n;m) + (n;m) Perceptual Weighting filter W(z) Multipulse Excitation generator Error minimization W (n;m) 42

43 Code Excited LP : CELP is an analysis-by-synthesis method in which the excitation sequence is selected from a codebook of zero-mean Gaussian sequence. The bit rate of the CELP is 4800 bps. 43

44 CELP (analysis-by-synthesis coder) : Speech samples Gaussian Excitation codebook Gain Pitch Synthesis filter LP parameters Spectral Envelope (LP) Synthesis filter Buffer and LP analysis Side information Perceptual Weighting Filter W(z) Computer Energy (square and sum) Index of Excitation sequence 44

45 Analysis-by-synthesis coder This weighted error is squared and summed over a subframe block to give the error energy By performing an exhaustive search through the codebook we find the excitation sequence that minimize the error energy 45

46 Analysis-by-synthesis coder The gain factor for scaling the excitation sequence is determined for each codeword in the codebook by minimizing the error energy for the block of samples 46

47 CELP (synthesizer) : From Channel decoder Buffer And controller Gaussian Excitation codebook Pitch Synthesis filter LP Synthesis filter LP parameters, gain and pitch estimate updates 47

48 CELP synthesizer Cascade of two all-pole filter with coefficients that are updated periodically First filter is a long-delay pitch filter used to generate the pitch periodicity in voiced speech This filter has this form p ( z) p 1 bz p 48

49 CELP Parameters of the filter can be determined by minimizing the prediction error energy, after pitch estimation,over a frame duration of 5msec Second filter is a short-delay all-pole (vocal-tract) filter and has coefficients that are determined every 10-20msec 49

50 Example: sampling frequency is 8khz subframe block duration for the pitch estimation and excitation sequence is performed every 5msec. We have 40 samples per 5-msec The excitation sequence consist of 40 samples 50

51 Example: A codebook of 1024 sequences gives good-quality speech For such codebook size,we require 10bits to send codebook index Hence the bit rate is reduced by a factor of 4 The transmission of pitch predictor parameters and spectral predictor brings the bit rate to about 4800 bps 51

52 Low-delay CELP coder CELP has been used to achieve tollquality speech at bps with low delay. Although other types of vocoders produces high quality speech at bps these vocoders buffer 10-20msec of speech samples 52

53 Low-delay CELP coder The one way delay is of the order of msec With modification of CELP, it is possible to reduce the one-way delay to about 2ms Low-delay CELP is achieved by using a backward-adaptive predictor with a gain parameter and an excitation vector size as small as 5 samples 53

54 Low-delay CELP coder Input Speech s(n) Buffer and window Excitation Vector quantizer codebook Gain LP (high-order) Synthesis filter fˆ(n;m) f(n;m) + - (n;m) Gain adaptation Predictor adaptation Perceptual Weighting Filter W(z) Error minimization W (n;m) 54

55 Low-delay CELP coder Pitch predictor used in the conventional forward-adaptive coder is eliminated In order to compensate for the loss in pitch information, the LPC predictor order is increased significantly, to an order of 50 55

56 Low-delay CELP coder LPC coefficients are updated more frequently, every 2.5 ms 5-sample excitation vector corresponds to an excitation block duration of msec at 8-kHz sampling rate 56

57 Low-delay CELP coder The logarithm of the excitation gain is adapted every subframe excitation block by employing a 10 th -order adaptive linear predictor in the logarithmic scale The coefficients of the logarithmic-gain predictor are updated every four blocks by performing an LPC analysis of previously quantized excitation signal blocks 57

58 Low-delay CELP coder The perceptual weighting filter is also 10 th order and is updated once every four blocks by employing an LPC analysis on frames of the input speech signal of duration 2.5 msec The excitation codebook in the low-delay CELP is also modified compared to conventional CELP 10-bit excitation codebook is employed 58

59 Vector Sum Excited LP : The VSELP coder and decoder basically differ in method by which the excitation sequence is formed In the next block diagram of the VSELP, there are three excitation sources One excitation is obtained from the pitch period state The other two excitation sources are obtained from two codebooks 59

60 VSELP Decoder : Long-term Filter state Codebook 1 0 Pitch synthesis filter Spectral envelop (LP) synthesis filter Spectral post filter Synthetic Speech 1 Codebook

61 VSELP Decoder LPC synthesis filter is implemented as a 10-pole filter and its coefficients are coded and transmitted every 20ms Coefficients are updated in each 5-ms frame by interpolation Excitation parameters are also updated every 5ms 61

62 VSELP Decoder 128 codewords in each of the two codebooks codewords are constructed from two sets of seven basis codewords by forming linear combinations of the seven basis codewords The long-term filter state is also a codebook with 128 codeword sequences 62

63 VSELP Decoder In each 5-msec frame, the codewords from this codebook are filtered through the speech system filter and correlated with the input speech sequence ˆ ( z ) The filtered codeword is used to update the history and the lag is transmitted to the decoder 63

64 VSELP Decoder Thus the update occurs by appending the best-filtered codeword to the history codebook The oldest sample in the history array is discarded The result is that the long-term state becomes an adaptive codebook 64

65 VSELP Decoder The three excitation sequences are selected sequentially from each of three codebooks Each codebook search attempts to find the codeword that minimizes the total energy of the perceptually weighted error Once the codewords have been selected the three gain parameters are optimized 65

66 VSELP Decoder Joint gain optimization is sequentially accomplished by orthogonalizing each weighted codeword vectors prior to the codebook search These parameters are vector quantized to one of 256 eight-bit vectors and transmitted in every 5-ms frame 66

67 Vector Sum Excited LP : The bit rate of the VSELP is about 8000 bps. Bit allocations for 8000-bps VSELP Parameters Bits/5-ms Frame Bits/20ms 10 LPC coefficients - 38 Average speech energy - 5 Excitation codewords from two VSELP codebooks Gain parameters 8 32 Lag of pitch filter 7 28 Total

68 VSELP Decoder Finally, an adaptive spectral post filter is employed in VSELP following the LPC synthesis filter; this post filter is a pole-zero filter of the form W ( z) ˆ( z / c) ˆ( z) Aˆ( z) Aˆ( z / c) 68

69 DEMO Speech Codec Male Speaker Female Speaker Music Original Speech/Music (16-bit sampled at 8KHz) FS-1015 (LPC-10e 2.4 kb/s) FS-1016(CELP 4.8 kb/s) IS-54 ( VSELP 7.95 kb/s) G.721 (32 kb/s ADPCM) 69

70 Standard Voice Algorithms G.711 The most widely used digital representation of voice signals is that of the G.711 or PCM (Pulse Code Modulation) This codec represents a 4 khz band limited voice signal sampled at 8 khz using 8 bits per sample A-law or m-law coding. G.726 The protocol for the G.726 codec requires a 64 kbps A-Law or m-law PCM signal to be encoded into four different bit rate options ranging from 2 bits per sample to 5 bits per sample The algorithm is based on Adaptive Differential Pulse Code Modulation (ADPCM) and is based on 1 sample backward prediction scheme. 70

71 G.728 The G.728 algorithm compresses PCM codec voice signals to a bit rate of 16 kbps. This algorithm is based on a strong backward prediction scheme and is by far considered as one of the most complex voice algorithms to be produced by the ITU standard organization. G.729 For compression of voice signals at 8 kbps the G.729 algorithm offers toll quality with built in algorithmic delays of less than 15 msec Additional features described in the G.729 Annex ensure VAD1 and Comfort Noise Generation functionalities to enhance the quality and reduce the overall bit rate G The most widely used algorithm for band limited channels, such as VoIP and video conferencing, is that of G The algorithm has two operating bit rates of 6.3 kbps and 5.3 kbps Although the delay is not as low as that of the other ITU standards its quality is near toll quality for the given low bit rates, making it very efficient in bit usage. 71

72 GSM2 AMR The latest GSM standard is the multi rate Adaptive Code Excited Linear Prediction that provides compression in the range of 4.75 to 12.2 kbps In total the codec provides 12 bit rates that cover the half rate to full rate channel capacity. GSM FR The first digital codec used in a mobile environment is the GSM Full Rate vocoder The codec compresses 13 bit PCM sample signals to a rate of 13 kbps The algorithm is based on a very simple Regular Pulse Excited Linear Prediction Coding technique. GSM HR To increase capacity, the GSM committee decided on a lower bit rate of 5.6 kbps for the voice channel The algorithm is based on the Vector Sum Excited Linear Predictive (VSELP) and is computationally as complex as other low bit rate algorithms. 72

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances