EE 435/535: Error Correcting Codes Project 1, Fall 2009: Extended Hamming Code Project #1 is due on Tuesday, October 6, 2009, in class. You may turn the project report in early. Late projects are accepted for one week past the due date at a penalty of 20 % off. I encourage you to complete the programming portion of this project within two weeks, which allows you an additional week to write up your report. 1 Introduction Project #1 is to be done individually, although student interaction is encouraged. The project report must be done individually, and may be handed in as hard copy or emailed in electronic form to the instructor. Undergraduates should complete sections 2 through 5. Graduate students should also complete sections 2 through 5, plus section 6. Undergraduates may also do section 6 for extra credit. The bonus items in section 5 may be completed for extra credit by either undergraduate or graduate students. They are not required for either undergraduate or graduate students. You may use either Matlab or C/C++ to program your simulations of a coded system. 1.1 Project Reports Show all your work! Simply giving the numeric answer to a question is not sufficient to receive full credit. If you use Matlab to solve any of the project questions, please include your Matlab code as well as any requested data plots in your report. Similarly, if you use C/C++, please include your C/C++ code. Also include all input used and output obtained from your code in answer to any project questions. If you are unable to solve a project problem, or your code does not work, please include whatever work you have done in your project report so that I can give you partial credit for the work you did do. 2 Extended Hamming Code: Encoding The extended Hamming code adds an additional parity-check bit to the regular Hamming code. This extra parity-check bit is the modulo 2 sum of all bits of the regular Hamming code, i.e. the previous 7 bits. Use the following representation for the parity-check matrix H: 1
EE 435/535, ECC: Project 1, Fall 2009 2 H (8,4) = 0 H (7,4) 0 0 1 1 1 1 1 1 1 1 where the parity-check matrix for the (7,4) Hamming code is given by 1 0 0 1 0 1 1 H (7,4) = 0 1 0 1 1 1 0 0 0 1 0 1 1 1 1. Convert the (8,4) extended Hamming code s parity-check matrix H given above to a systematic form. You should be able to do this by adding appropriate rows of H together; it is not necessary to swap any columns of H. 2. Find the generator matrix G for the (8,4) extended Hamming code with your systematic parity-check matrix H found in the previous question. 3. Write a program in either Matlab or C/C++ which generates all 2 4 possible 4-bit information sequences and encodes them into all 2 4 possible (8,4) extended Hamming codewords. 4. Your program should also print out each 4-bit information sequence u and its corresponding (8,4) extended Hamming codeword v. 5. Given the 2 4 codewords you found, what is the minimum distance d min of the (8,4) extended Hamming code? Explain why. 6. What is the rate R c of the (8,4) extended Hamming code? 3 Extended Hamming Code: Syndrome Decoding on the BSC Syndrome decoding is minimum-distance decoding on a reduced set of codewords, i.e. those that are a distance t or less from the received noisy codeword, where t is the number of errors the code is capable of correcting. 1. How many bit errors can the (8,4) extended Hamming syndrome decoder correct and why? How many bit errors can it detect and why? 2. Write a syndrome decoding program in either Matlab or C/C++. Your program should take an 8-bit input (a noisy (8,4) extended Hamming codeword r = v + e), find the syndrome S = r H T, determine the error vector e from the syndrome S, and estimate what the most likely original codeword v was. The output of your program should be the estimated codeword ˆv.
EE 435/535, ECC: Project 1, Fall 2009 3 4 Binary Symmetric Channel (BSC) The binary symmetric channel (BSC) or bit-flipping channel takes an input bit and generates an error (or flips a bit) with crossover probability ρ. Thus the BSC can be viewed as generating an error vector e of length N, where the probability that e i = 1 is p(e i = 1) = ρ, i = 1,..., N. To simulate decoder performance, we need to create a program that emulates the channel of interest, in this case, the BSC. The e=rand(1,n) command in Matlab generates a uniformly random vector e of length 1 N whose elements e i range in value from 0 to 1. In C, each call to rand() generates a uniformly random integer between 0 and RAND MAX (the value of RAND MAX is defined in <stdlid.h>). You must #include <stdlib.h> to call rand(). There are other uniform random number generators available in C; lrand48() generates a long int ranging from 0 to 2 31 1. To apply this uniformly random integer (in C) or real value (in Matlab) to the BSC, you must convert the number to either a 0 or 1, where the chance that the bit is 1 equals ρ. That is your task in this section. 1. Write a function in either Matlab or C/C++ that generates a random vector of length N = 8, whose values are either 0 or 1. The probability that a 1 appears in any given bit position must be ρ, where ρ will be an input parameter. The vector length should also be an input parameter. The output will be your random binary vector of length N. 2. Run your function with ρ = 0.1, for 2500 sequences e of length k = 8, or one sequence e of length 20000. Calculate p(e i = 1). Show that your experimental p(e i = 1) ρ = 0.1. 3. Note that you can use this same function to generate a uniformly random binary information sequence u of length k, simply by setting ρ = 0.5 and the vector length to k. Generate 2500 sequences u of length k = 4 (or one length 10000 sequence u) and calculate p(u i = 1). Show that your experimental p(u i = 1) 1/2. 5 Performance Results on BSC We wish to evaluate the performance of the (8,4) extended Hamming code over a binary symmetric channel (BSC) with crossover probability ρ. Use a range of values for ρ, with ρ = [.01.05.1.5]. To simulate this system, 3 main functions are required: the (8,4) extended Hamming encoder, the BSC emulator, and the syndrome decoder. Additionally, at the beginning, a uniformly random binary information sequence u should be generated to serve as input to the encoder; alternately, you may generate the all-zeros sequence. Even if you generate the all-zeros sequence, I still want you to include your encoder in the system, to show that it works. 1. Combine your encoder, BSC emulator and syndrome decoder into one program. Also, either generate a random sequence u or use the all-zeros sequence as input to your encoder. If you
EE 435/535, ECC: Project 1, Fall 2009 4 use the all-zeros sequence, explain why this will give you accurate error-rate results, i.e. why you do not have to use a random selection of codewords to obtain good results. Run your simulations for at least 10,000 information bits of u for each value of ρ, the BSC error probability. Using 100,000 bits would be even better, giving you more accurate results at lower channel error probability. 2. Add a function to your program which calculates both the codeword error-rate (WER) and the bit error-rate of the coded bits. 3. Plot both the codeword error-rate (WER) and the bit error-rate (BER) of all coded bits, for the (8,4) extended Hamming code, both on the y-axis, vs. the channel crossover probability ρ on the x-axis. Plot both axes on a log scale. Label both axes, title your plot, and include a legend to distinguish the different curves. Plot ρ with decreasing ρ to the right. 4. To your above plot, add the uncoded word error-rate (WER), which is the error rate when transmitting the information bits or data u without encoding and decoding. This will show the advantage of coding. Assume a word size of 8 bits, the same length as the extended (8,4) Hamming codeword. Add this to your plot legend as well. Compare the uncoded WER to the (8,4) extended Hamming-encoded WER. What do you observe? 5. Bonus 1: Calculate the information BER (that is, the BER of only the information bits) as well as the coded BER. Plot and compare both coded and information BER. Is there a difference? 6. Bonus 2: Calculate the information WER (that is, the WER only when there is an error in the information bits of the codeword) as well as the coded WER. Plot and compare both coded and information WER. Is there a difference? 6 Graduate: Extended Hamming Code: Nearest-Neighbor (Minimum Distance) ML Decoding on the AWGN channel Maximum-likelihood (ML) decoding is also known as optimal decoding, because it minimizes the probability of codeword error and optimizes the probability of choosing the correct codeword. For this section, we assume that each codeword bit v i, i = 1,...N is converted to a BPSK (binary phase-shift-keying) or 2-PAM (pulse amplitude modulation) symbol x i, such that v i = 0 x i = 1 and v i = 1 x i = 1. Then the BPSK symbol x i is transmitted across an AWGN (additive white Gaussian noise) channel to the receiver which contains the decoder. The AWGN channel is a Gaussian-distributed probabilistic channel that adds noise n to the transmitted BPSK symbols x. The Gaussian noise n has zero mean and variance σ 2 n. Each noise sample n i is real-valued and assumed i.i.d. (independent and identically distributed).
EE 435/535, ECC: Project 1, Fall 2009 5 The received noisy codeword has noise added to the transmitted symbols, and is represented as y = x + n. For each individual symbol, y i = x i + n i, i = 1,...N. ML decoding chooses the estimated codeword ˆx to be the codeword which gives the largest probability p(x y). If all codewords x are a priori equally likely, so that p(x) is the same for all codewords x, then maximizing p(x y) is equivalent to maximizing p(y x). This can be shown via Bayes theorem. Thus, for equally likely codewords x, ML decoding maximizes the likelihood p(y x). The ML decision, for equally likely x, is written as ˆx = max X j p(y x = X j ), (1) where X j indicates the j th codeword. We also make the assumption of independent and identically distributed (i.i.d) noise samples n i, as well as independent x i. Thus we approximate p(y x) as the product of the individual sample probabilities p(y i x i ), that is, N p(y x = X j ) = p(y i x i ), (2) where x i is the i th BPSK symbol. The individual conditional probabilities p(y i x i ) are found from the Gaussian noise distribution, because p(y i x i ) = p(n i = y i x i ). p(y i x i ) = exp( (y i x i ) 2 /(2σ 2 n )) 2πσ 2 n. (3) We then choose the most-likely codeword v as the codeword v(x) corresponding to the modulated sequence x that maximizes equation 2, as ˆv(x) = arg max x N p(y i x i ), (4) using equation 3 to find each p(y i x i ). An alternate method to equation 4 is to use log-likelihood ratios or LLRs instead of probabilities. A channel LLR λ ch i for sample i is defined as ( ) λ ch p(yi x i = 1) i = log. (5) p(y i x i = 1) There are several advantages to using LLRs, including: 1) they are a more compact form of representing the probabilities for each sample; 2) a positive LLR means a 1 is more likely, while a negative LLR means a -1 is more likely, so the sign can be used to make a bit-by-bit decision; 3) because LLRs are logs, equation 3 can be converted from a product to a sum. This is shown below in equation 6.
EE 435/535, ECC: Project 1, Fall 2009 6 ˆv(x) = arg max x ˆv(x) = arg max x N log N ( ) p(yi x i = 1) ; p(y i x i = 1) λ ch i (6) Error-Rate Plots for AWGN channel: Typically, error-rate plots for codes over the AWGN channel plot the BER or WER on the y-axis, and SNR= 10 log 10 (E b /N 0 ) for the x-axis. E b is the energy per bit, which is found as E b = E s /R, where R is the rate in bits/symbol. For uncoded BPSK modulation, R = 1. For coded BPSK modulation, R = R c, where R c is the coding rate R c = k/n. The energy per symbol, E s, is 1 when we use BPSK where the symbols are ±1. Thus the energy per bit, E b, is found as E b = 1/R c, for BPSK modulation. The noise power spectral density N 0 is found as N 0 = 2σ 2 n. Sometimes 10 log 10 (E s /N 0 ) is used for the x-axis instead, but E b /N 0 is most commonly used in ECC and digital communications applications. This project uses SNR= 10 log 10 (E b /N 0 ). 1. Write a function that generates a BPSK-modulated symbol sequence x from your binary codeword sequence v, such that v i = 0 x i = 1 and v i = 1 x i = 1. 2. Write an additive white Gaussian noise (AWGN) simulator. Use randn() if you are using Matlab. If you are using C, you must create your own Gaussian random noise from uniformly random variables generated with rand() or drand48(). Talk with me if you have any questions about how to do this. Your AWGN simulator should take as input the desired σ 2 n and the required length of your noise sequence. 3. Add an AWGN sequence n to each BPSK-modulated codeword x to generate your received noisy codeword y = x + n. 4. Write a minimum-distance decoder, which chooses ˆv according to equation 4 or 6. 5. Combine your (8,4) extended Hamming encoder, AWGN channel, and minimum-distance decoder to create an AWGN simulator of the (8,4) EHC. 6. Calculate the BER and WER of your decoder for σ 2 n = [1 0.7 0.5 0.33 0.25]. 7. Plot the BER and WER on the y-axis. Plot 10 log 10 (E b /N 0 ) on the x-axis.