Chapter 6 Basics of Digital Audio

Chapter 6 Basics of Digital Audio 6.1 Digitization of Sound 6.2 MIDI: Musical Instrument Digital Interface 6.3 Quantization and Transmission of Audio

6.3 Quantization and Transmission of Audio Coding of Audio: Quantization and transformation of data are collectively known as coding of the data. a) For audio, the μ-law technique for companding audio signals is usually combined with an algorithm that exploits the temporal redundancy present in audio signals. b) Differences in signals between the present and a past time can reduce the size of signal values and also concentrate the histogram of pixel values (differences, now) into a much smaller range. 2

c) The result of reducing the variance of values is that lossless compression methods produce a bitstream with shorter bit lengths for more likely values. In general, producing quantized sampled output for audio is called PCM (Pulse Code Modulation). The differences version is called DPCM (and a crude but efficient variant is called DM). The adaptive version is called ADPCM. 3

Pulse Code Modulation The basic techniques for creating digital signals from analog signals are sampling and quantization. Quantization consists of selecting breakpoints in magnitude, and then re-mapping any value within an interval to one of the representative output levels. 4

(a) (b) Sampling and Quantization. 5

a) The set of interval boundaries are called decision boundaries, and the representative values are called reconstruction levels. b) The boundaries for quantizer input intervals that will all be mapped into the same output level form a coder mapping. c) The representative values that are the output values from a quantizer are a decoder mapping. d) Finally, we may wish to compress the data, by assigning a bit stream that uses fewer bits for the most prevalent signal values. 6

Every compression scheme has three stages: A. The input data is transformed to a new representation that is easier or more efficient to compress. B. We may introduce loss of information. Quantization is the main lossy step we use a limited number of reconstruction levels, fewer than in the original signal. C. Coding. Assign a codeword (thus forming a binary bitstream) to each output level or symbol. This could be a fixed-length code, or a variable length code such as Huffman coding. 7

For audio signals, we first consider PCM for digitization. This leads to Lossless Predictive Coding as well as the DPCM scheme; both methods use differential coding. As well, we look at the adaptive version, ADPCM, which can provide better compression. 8

PCM in Speech Compression Assuming a bandwidth for speech from about 50 Hz to about 10 khz, the Nyquist rate would dictate a sampling rate of 20 khz. (a) Using uniform quantization without companding, the minimum sample size we could get away with would likely be about 12 bits. Hence for mono speech transmission the bit-rate would be 240 kbps. (b) With companding, we can reduce the sample size down to about 8 bits with the same perceived level of quality, and thus reduce the bit-rate to 160 kbps. (c) However, the standard approach to telephony in fact assumes that the highest-frequency audio signal we want to reproduce is only about 4 khz. Therefore the sampling rate is only 8 khz, and the companded bitrate thus reduces this to 64 kbps. 9

However, there are two small wrinkles we must also address: 1. Since only sounds up to 4 khz are to be considered, all other frequency content must be noise. Therefore, we should remove this high-frequency content from the analog input signal. This is done using a bandlimiting filter that blocks out high, as well as very low, frequencies. Once we arrive at a pulse signal, we must still perform DA conversion and then construct a final output analog signal. But, effectively, the signal we arrive at is the staircase shown in Fig. 6.13(b). 10

Fig. 6.13: Pulse Code Modulation (PCM). (a) Original analog signal and its corresponding PCM signals. (b) Decoded staircase signal. (c) Reconstructed signal after low-pass filtering. 11

2. A discontinuous signal contains not just frequency components due to the original signal, but also a theoretically infinite set of higherfrequency components: (a) This result is from the theory of Fourier analysis, in signal processing. (b) These higher frequencies are extraneous. (c) Therefore the output of the digital-to-analog converter goes to a low-pass filter that allows only frequencies up to the original maximum to be retained. 12

The complete scheme for encoding and decoding telephony signals is shown as a schematic in Fig. 6.14. As a result of the low-pass filtering, the output becomes smoothed and Fig. 6.13(c) above showed this effect. Fig. 6.14: PCM signal encoding and decoding. G.711 13

Differential Coding of Audio Audio is often stored not in simple PCM but instead in a form that exploits differences which are generally smaller numbers, so offer the possibility of using fewer bits to store. (a) If a time-dependent signal has some consistency over time ( temporal redundancy ), the difference signal, subtracting the current sample from the previous one, will have a more peaked histogram, with a maximum around zero. 14

(b) For example, as an extreme case the histogram for a linear ramp signal that has constant slope is flat, whereas the histogram for the derivative of the signal (i.e., the differences, from sampling point to sampling point) consists of a spike at the slope value. (c) So if we then go on to assign bit-string codewords to differences, we can assign short codes to prevalent values and long codewords to rarely occurring ones. 15

Lossless Predictive Coding Predictive coding: simply means transmitting differences predict the next sample as being equal to the current sample; send not the sample itself but the difference between previous and next. (a) Predictive coding consists of finding differences, and transmitting these using a PCM system. (b) Note that differences of integers will be integers. Denote the integer input signal as the set of values f n. Then we predict values f n as simply the previous value, and define the error e n as the difference between the actual and the predicted signal: f n f n1 e f f n n n 16

(c) But it is often the case that some function of a few of the previous values, f n 1, f n 2, f n 3, etc., provides a better prediction. Typically, a linear predictor function is used: 4 f a f n nk nk k1 17

The idea of forming differences is to make the histogram of sample values more peaked. (a) For example, Fig.6.15(a) plots 1 second of sampled speech at 8 khz, with magnitude resolution of 8 bits per sample. (b) A histogram of these values is actually centered around zero, as in Fig. 6.15(b). (c) Fig. 6.15(c) shows the histogram for corresponding speech signal differences: difference values are much more clustered around zero than are sample values themselves. (d) As a result, a method that assigns short codewords to frequently occurring symbols will assign a short code to zero and do rather well: such a coding scheme will much more efficiently code sample differences than samples themselves. 18

Fig. 6.15: Differencing concentrates the histogram. (a): Digital speech signal. (b): Histogram of digital speech signal values. (c): Histogram of digital speech signal differences. 19

One problem: suppose our integer sample values are in the range 0..255. Then differences could be as much as -255..255 we ve increased our dynamic range (ratio of maximum to minimum) by a factor of two need more bits to transmit some differences. (a) A clever solution for this: define two new codes, denoted SU and SD, standing for Shift-Up and Shift-Down. Some special code values will be reserved for these. (b) Then we can use codewords for only a limited set of signal differences, say only the range 15..16. Differences which lie in the limited range can be coded as is. But with the extra two values for SU, SD, a value outside the range 15..16 can be transmitted as a series of shifts, followed by a value that is indeed inside the range 15..16. (c) For example, 100 is transmitted as: SU, SU, SU, 4, where (the codes for) SU and for 4 are what are transmitted (or stored). 20

Lossless predictive coding the decoder produces the same signals as the original. As a simple example, suppose we devise a predictor for as follows: f n 1 fn ( fn 1 fn2 ) 2 e f f n n n 21

Let s consider an explicit example. Suppose we wish to code the sequence f 1, f 2, f 3, f 4, f 5 = 21, 22, 27, 25, 22. For the purposes of the predictor, we ll invent an extra signal value f 0, equal to f 1 = 21, and first transmit this initial value, uncoded: f 21, e 22 211; 2 2 1 1 f3 ( f2 f1) (22 21) 21, 2 2 e 27 21 6; 3 1 1 f4 ( f3 f2) (27 22) 24, 2 2 e 25 24 1; 4 1 1 f5 ( f4 f3) (25 27) 26, 2 2 e 22 26 4 5 22

The error does center around zero, we see, and coding (assigning bit-string codewords) will be efficient. Fig. 6.16 shows a typical schematic diagram used to encapsulate this type of system: Memory Fig.: Schematic diagram for Predictive Coding encoder and decoder. 23

DPCM Differential PCM is exactly the same as Predictive Coding, except that it incorporates a quantizer step. (a) Our nomenclature: the original signal values: f n the predicted signal: f n the quantized, reconstructed signal: f n 24

(b) DPCM: form the prediction; form an error e n by subtracting the prediction from the actual signal; then quantize the error to a quantized version, e n The set of equations that describe DPCM are as follows: f function _ of ( f, f, f,...), n n1 n2 n3 e f f n n n, e n Q[ e ], n transmit codeword( e ), reconstruct : f fˆ e. n n n n Then codewords for quantized error values entropy coding, e.g. Huffman coding. e n are produced using 25

(c) The main effect of the coder-decoder process is to produce reconstructed, quantized signal values (d) The distortion is the average squared error ; one often plots distortion versus the number of bit-levels used. f n N 2 [ ( n n) ]/ n1 f n f f N e n One scheme for analytically determining the best set of quantizer steps, for a non-uniform quantizer, is the Lloyd- Max quantizer, which is based on a least-squares minimization of the error term. A Lloyd-Max quantizer will do better (have less distortion) than a uniform quantizer. 26

For speech, we could modify quantization steps adaptively by estimating the mean and variance of a patch of signal values, and shifting quantization steps accordingly, for every block of signal values. That is, starting at time i we could take a block of N values f n and try to minimize the quantization error: in1 min f Q[ f ] ni n n 2 27

Since signal differences are very peaked, we could model them using a Laplacian probability distribution function, which is strongly peaked at zero: it looks like l x exp x 2 ( ) (1/ 2 ) ( 2 / ) for variance σ 2. So typically one assigns quantization steps for a quantizer with nonuniform steps by assuming signal differences, d n are drawn from such a distribution and then choosing steps to minimize in1 min d Q[ d ] l( d ). ni 2 n n n This is a least-squares problem, and can be solved iteratively using the Lloyd-Max quantizer. 28

Schematic diagram for DPCM: Fig.: Schematic diagram for DPCM encoder and decoder 29

Notice that the quantization noise, effect on the error term, en en. f n f n, is equal to the quantization Example: Suppose we adopt the particular predictor below: so that As well, use the quantization scheme: f ˆ /2 n trunc fn 1 f n2 e f fˆ n n n is an integer. e Q[ e ] 16*trunc 255 e /16 256 8 n n n f fˆ e n n n 30

First, we note that the error is in the range 255..255, i.e., there are 511 possible levels for the error term. The quantizer simply divides the error range into 32 patches of about 16 levels each. It also makes the representative reconstructed value for each patch equal to the midway point for each group of 16 levels. 31

Table 6.7 gives output values for any of the input codes: 4-bit codes are mapped to 32 reconstruction levels in a staircase fashion. Table 6.7 DPCM quantizer reconstruction levels. e n in range -255.. -240-239.. -224... -31.. -16-15.. 0 1.. 16 17.. 32... 225.. 240 241.. 255 Quantized to value -248-232... -24-8 8 24... 232 248 32

As an example stream of signal values, consider the set of values: f 1 f 2 f 3 f 4 f 5 130 150 140 200 230 Prepend extra values f = 130 to replicate the first value, f 1. Initialize with quantized error e 0, so that the first reconstructed value is exact: = 130. Then the rest of the values calculated are as follows (with prepended values f 1 1 in a box): f = 130, 130, 142, 144, 167 e = 0, 20, 2, 56, 63 e = 0, 24, 8, 56, 56 f = 130, 154, 134, 200, 223 f 1 On the decoder side, we again assume extra values equal to the correct value f, so that the first reconstructed value f is correct. What is received is, and the 1 en reconstructed f n is identical to that on the encoder side, provided we use exactly the same prediction rule. 33

DM DM (Delta Modulation): simplified version of DPCM. Often used as a quick AD converter. 1. Uniform-Delta DM: use only a single quantized error value, either positive or negative. (a) a 1-bit coder. Produces coded output that follows the original signal in a staircase fashion. The set of equations is: fˆ f, n n1 e f fˆ f f, n n n n n1 k if en 0, where k is a constant en k otherwise f fˆ e. n n n Note that the prediction simply involves a delay. 34

(b) Consider actual numbers: Suppose signal values are f 1 f 2 f 3 f 4 10 11 13 15 As well, define an exact reconstructed value f 1 f1 10. (c) E.g., use step value k = 4: e2 = 11 10 = 1, e3 = 13 14 = 1, e4 = 15 10 = 5, The reconstructed set of values 10, 14, 10, 14 is close to the correct set 10, 11, 13, 15. (d) However, DM copes less well with rapidly changing signals. One approach to mitigating this problem is to simply increase the sampling, perhaps to many times the Nyquist rate. 35

2. Adaptive DM: If the slope of the actual signal curve is high, the staircase approximation cannot keep up. For a steep curve, should change the step size k adaptively. One scheme for analytically determining the best set of quantizer steps, for a non-uniform quantizer, is Lloyd-Max. 36

ADPCM ADPCM (Adaptive DPCM) takes the idea of adapting the coder to suit the input much farther. The two pieces that make up a DPCM coder: the quantizer and the predictor. 1. In Adaptive DM, adapt the quantizer step size to suit the input. In DPCM, we can change the step size as well as decision boundaries, using a non-uniform quantizer. We can carry this out in two ways: (a) Forward adaptive quantization: use the properties of the input signal. (b) Backward adaptive quantizationor: use the properties of the quantized output. If quantized errors become too large, we should change the non-uniform quantizer. 37

2. We can also adapt the predictor, again using forward or backward adaptation. Making the predictor coefficients adaptive is called Adaptive Predictive Coding (APC): (a) Recall that the predictor is usually taken to be a linear function of previous reconstructed quantized values,. (b) The number of previous values used is called the order of the predictor. For example, if we use M previous values, we need M coefficients a i, i = 1..M in a predictor fˆ M a f n i n i i1 f n 38

However we can get into a difficult situation if we try to change the prediction coefficients, that multiply previous quantized values, because that makes a complicated set of equations to solve for these coefficients: (a) Suppose we decide to use a least-squares approach to solving a minimization trying to find the best values of the a i : N min ( f fˆ ) n1 n (b) Here we would sum over a large number of samples f n, for the current patch of speech, say. But because f n depends on the quantization we have a difficult problem to solve. As well, we should really be changing the fineness of the quantization at the same time, to suit the signal s changing nature; this makes things problematical. n 2 39

(c) Instead, we usually resorts to solving the simpler problem that results from using not f n in the prediction, but instead simply the signal f n itself. Explicitly writing in terms of the coefficients a i, we wish to solve: N M min ( f a f ) n i n i n1 i1 Differentiation with respect to each of the a i, and setting to zero, produces a linear system of M equations that is easy to solve. (The set of equations is called the Wiener-Hopf equations.) 2 40

Fig. 6.18 shows a schematic diagram for the ADPCM coder and decoder: Fig. 6.18: Schematic diagram for ADPCM encoder and decoder 41

ITU audio-coding standards The ITU G.711 standard is designed for telephone bandwidth speech signals sampled at 8 KHz, and provides the lowest delay possible (1 sample) at the lowest complexity. It does a direct sample-by-sample nonuniform quantization of the PCM input signal, similar to the A-law and μ-law. The G.711 scheme has lower processor utilization due to lower complexity, but a higher IP band-width requirement. It was primarily used for video telephone over ISDN and supports a bit rate of 64 Kbps. 42 Multimedia Systems

ITU G.722 The ITU G.722 standard was designed to transmit 7 KHz voice or music and is often used in videoconferencing systems where higher audio quality (compared with the G.711) is required. The voice quality is good but the music quality is not perfectly transparent due to a 7 KHz sampling rate. Frame delays are tolerated and it supports bit rates from 48 to 64 Kbps. G.722 operates by dividing the signal into two bands high pass and low pass which are then encoded with different modalities. The G.722 standard is preferred over the G.711 PCM standard because of increased bandwidth for teleconferencing type applications. 43 Multimedia Systems

ITU G.721, ITU G.726, ITU G.727 The G.721 standard was established in the 1980s and was a precursor to the G.726 and G.727. All of these are also used for telephone bandwidth speech and differ from the previous ITU standards by using adaptive differential pulse code modulation techniques. These standards show how a 64 Kbps A-law or μ-law PCM signal (G.711) can be converted to a 40, 32, 24, or even 16 Kbps signal using ADPCM. These correspond to 5, 4, 3, and even 2 bits per sample. 44 Multimedia Systems

Ex 13 In PCM, what is the delay, assuming 8 khz sampling? Generally, delay is the time penalty associated with any algorithm due to sampling, processing, and analysis. Answer: Since there is no processing associated with PCM, the delay is simply the time interval between two samples, and at 8 khz, this is 0.125 msec.

Ex 14 Suppose we use a predictor as follows: Also, suppose we adopt the quantizer Equation If the input signal has values as follows: 20 38 56 74 92 110 128 146 164 182 200 218 236 254 show that the output from a DPCM coder (without entropy coding) is as follows: 20 44 56 74 89 105 121 153 161 181 195 212 243 251 Figure 6.19(a) shows how the quantized reconstructed signal tracks the input signal. As a programming project, write a small piece of code to verify your results. e Q[ e ] 16*trunc 255 e /16 256 8 n n n 46 Multimedia Systems

(b) Suppose by mistake on the coder side we inadvertently use the predictor for lossless coding, 1 fn ( fn 1 fn2 ) 2 e f f n n n using original values f n instead of quantized ones. Show that on the decoder side we end up with reconstructed signal values as follows: 20 44 56 74 89 105 121 137 153 169 185 201 217 233 so that the error gets progressively worse. Figure 6.19(b) shows how this appears: the reconstructed signal gets progressively worse. Modify your code from above to verify this statement. f n 47 Multimedia Systems

Fig. 6.19: (a) DPCM reconstructed signal (dotted line) tracks the input signal (solid line). (b) DPCM reconstructed signal (dashed line) steers farther and farther from the input signal (solid line). 48 Multimedia Systems

ADPCM 压缩算法 ADPCM(Adaptive Differential Pulse Code Modulation) 是一种针对 16bits( 或 8bits 或者更高 ) 声音波形数据的一种有损压缩算法, 它将声音流中每次采样的 16bit 数据以 4bit 存储, 压缩比 4:1. 而且压缩 / 解压缩算法非常简单, 是一种低空间消耗高质量高效率声音处理方法后缀名为.AUD 的声音数据文件大多用 ADPCM 压缩 49 Multimedia Systems

ADPCM 压缩过程 int index=0,prev_sample=0; // 认为声音信号都是从零开始的 while ( 还有数据要处理 ) { cur_sample=getnextsample(); // 得到当前 16bit 的采样数据 delta=cur_sample-prev_sample; // 计算出和上一个的增量 if (delta<0) delta= -delta,sb=8; // 取绝对值 else sb = 0 ; // sb 保存的是符号位 code = 4*delta / step_table[index]; // 根据 steptable[] 得到一个 0-7 的值 if (code>7) code=7; // 它描述了声音强度的变化量 index += index_adjust[code] ; // 根据声音强度调整下次取 steptable 的序号 if (index<0) index=0; // 便于下次得到更精确的变化量的描述 else if (index>88) index=88; prev_sample=cur_sample; outputode(code sb); // 加上符号位保存起来 } 50 Multimedia Systems

ADPCM 解压缩过程 int index=0,cur_sample=0; while ( 还有数据要处理 ) { code=getnextcode(); // 得到下一个数据 if ((code & 8)!= 0) sb=1 else sb=0; code&=7; // 将 code 分离为数据和符号 delta = (step_table[index]*code)/4+step_table[index]/8; // 后面加的一项是为了减少误差 if (sb==1) delta=-delta; cur_sample+=delta; // 计算出当前的波形数据 if (cur_sample>32767) output_sample(32767); else if (cur_sample<-32768) output_sample(-32768); else output_sample(cur_sample); index+=index_adjust[code]; if (index<0) index=0; if (index>88) index=88; } 51 Multimedia Systems

int index_adjust[8] = {-1,-1,-1,-1,2,4,6,8}; int step_table[89] = { } 7,8,9,10,11,12,13,14,16,17,19,21,23,25,28,31,34,37,41,45, 50,55,60,66,73,80,88,97,107,118,130,143,157,173,190,209,230,253,279,30 7,337,371, 408,449,494,544,598,658,724,796,876,963,1060,1166,1282,1411,1552,170 7,1878,2066, 2272,2499,2749,3024,3327,3660,4026,4428,4871,5358,5894,6484,7132,78 45,8630,9493, 10442,11487,12635,13899,15289,16818,18500,20350,22385,24623,27086, 29794,32767 52 Multimedia Systems

While performing audio encoding and decoding, the complexity of the encoder is not the same as the decoder. Which one is more complex and why? 53 Multimedia Systems