ICE1495 Independent Study for Undergraduate Project (IUP) A Lie Detector Prof. : Hyunchul Park Student : 20020703 Jonghun Park Due date : 06/04/04
Contents ABSTRACT... 2 1. INTRODUCTION... 2 1.1 BASIC OF ECC... 2 BRIEF HISTORY OF ERROR CORRECTION CODE.... 3 2 PROBLEM ANALYSIS AND RELATED ISSUES... 5 2.1 KEY CONCEPTS... 5 Shannon's Theorem... 5 Tradeoffs... 5 2.2 THEORETICAL REVIEW BEFORE ANALYZING... 5 2.3ANALYSIS OF PROBLEM... 7 2.4 SOLVE THE PROBLEMS... 7 2.5 DECODING ALGORITHMS... 10 3 IMPLEMENTATION... 11 3.1 LIE DETECTOR WITH (7, 4, 3) HAMMING CODE BY JAVA... 11 3.2 SIMULATION AND PERFORMANCE ANALYSIS... 13 5. CONCLUSION... 15 6. REFERENCE... 15 [Senior Project] Jong-Hun Park 1
Abstract From now on, I explain the history and basic concept of Error Correcting Code. And we solve the given problem which is the system of lie-noise environment. And think about the decoding rules. Moreover, we saw the lie detector with a simple example and simulate one error correcting code for plotting BER curve. Core word : ECC, Hamming code, Hamming bound, decoding rule 1. Introduction When I chose this subject, I thought that process of this project is only to solve given problems, following text box. As time goes by, I however, had known that there were many things included this project. Also we can confirm some advantages using coding method rather than opposite one. However there is short time for completing this project, we constructed plan, and started immediately. The brief plan of this project like as,! First, because I have no idea about this, I had studied about ECC under T.A s assistance.! Second, try to solve the problem based what I had studied! Third, code the lie detector considering user interface! Forth, simulate this code with performance analysis Consequently, the purpose of this project, do my own project and follow upper schedule correctly. Also, I get some ideas of ECC and more familiar with matlab and Java. With such plan I started the project. The given problem description is as follow. Suppose that Bob will ask to Alice. Alice have one integer between zero to nine in her mind and she do not say it to Bob. By Bob s question, he wants to know the number in her mind. Some given questions are as follow. (And she just answer with yes or no ex) (1) Is your number is odd? (2) Is your number a member of {1,3,6,7}? For given situation, you consider the answers of following questions.. Q1. If Alice is saying only truth, how many questions are need to know her number? Q2.If she can say lie at most one times, how many questions are need to know her number? And what are contends of them? Q3. If she can say lie at most given t times (t>0), how many questions are need to know her number? And what are contends of them? Q4. Describe the relationship between upper three questions and Error Correcting Codes in Communication System. Now, let s sail notice the overall contends of ECC < Problem Description> 1.1 Basic of ECC Theoretical issue of this project is based on Error Correcting Code (ECC). Nowadays, ECC is used in many communication systems and storage devices, especially CD and DVD. [Senior Project] Jong-Hun Park 2 These systems and devices use information represented by binary sequences. When binary information is passed from one point to
another, there is always some chance that a mistake can be made; a 1 interpreted as a 0 or a 0 taken to be 1. This can be caused by channel noise m, media defects, electronic noise, component failures, poor connections, deterioration due to age, and other factors. When a bit is mistakenly interpreted, a bit error has occurred. Error correction is the process of detecting bit errors and correcting them and can be done in software or hardware. For high data rates, error correction must be done in special-purpose hardware because software is too slow. From now on, we consider ECC in communication system, related to given question. Figure 1 System Diagram Figure 1 is the simple block diagram of Communication system. Information source generate message signal and transmitter change it to others which can be easily and accurately transmitted to other points. Then transmitted signal pass through the channel, which add the noise signal to transmitted signal so, received signal into receiver is distorted compared with transmitted one. So, some devices need in receiver for correcting these errors came from channel. In other words, No digital or analog transmission is perfect. Each system makes errors at a certain rate. As data transfer rates densities increase, the raw error rate also increases. To reduce the error rate, Error correcting code is need. Our problem can be regarded as the problem of communication system between Alice and Bob. Because general noise of sound can be corrected by their ears, so, the lies of Alice s answers can be another noise. So, we can simulate this given situation. Brief history of error correction code. Around 1947-1948, the subject of information theory was created by Claude Shannon. The main result of Shannon's "Mathematical Theory of Communication" is that the only way to get the most storage capacity in a storage device or the fastest transmission through a communications channel is through the use of very powerful error correcting systems. During the same time period, Richard Hamming discovered and implemented a single-bit error correcting code. [Senior Project] Jong-Hun Park 3
In 1960, researchers, including Irving Reed and Gustave Solomon, discovered how to construct error correcting codes that could correct for an arbitrary number of bits or an arbitrary number of "bytes" where "byte" means a group of "w" bits. Even though the codes were discovered at this time, there still was no way known to decode the codes. The first textbook on error correcting codes was written in 1961 by W. Wesley Peterson. In 1968, Elwyn Berlekamp and James Massey discovered algorithms needed to build decoders for multiple error correcting codes. They came to be known as the Berlekamp-Massey algorithm for solving the key decoding equation. In the last 30 years, researchers have discovered that the Berlekamp-Massey algorithm is a variation of an ancient algorithm discovered in Egypt around 300 BC by Euclid and known as Euclid's extended algorithm for finding the greatest common divisor of two polynomials. Today, numerous variations of the Berlekamp-Massey and Euclid algorithms exist to solve the key decoding equation. More detail concepts of ECC will be discussed in next section with analyzed given problem. [Senior Project] Jong-Hun Park 4
2 Problem Analysis and related Issues 2.1 Key Concepts The error detecting and correcting capabilities of a particular coding scheme is correlated with its code rate and complexity. The code rate is the ratio of data bits to total bits transmitted in the code words. A high code rate means information content is high and coding overhead is low. However, the fewer bits used for coding redundancy, the less error protection is provided. A tradeoff must be made between bandwidth availability and the amount of error protection required for the communication. Shannon's Theorem Error coding techniques are based on information coding theory, an area developed from work by Claude Shannon. In 1948, Shannon presented a theory that states: given a code with a code rate R that is less than the communication channel capacity C, a code exists, for a block length of n bits, with code rate R that can be transmitted over the channel with an arbitrarily small probability of error. This would indicate that there is still much work to be done improving error coding techniques. Cryptography, the method of encrypting data for security rather than reliability, is also a descendant of Shannon's work. Tradeoffs When choosing a coding scheme for error protection, the types of errors that tend to occur on the communication channel must be considered. There are two types of errors that can occur on a communication channel: random bit errors and burst errors. A channel that usually has random bit errors will tend to have isolated bit flips during data transmissions and the bit errors are independent of each other. A channel with burst errors will tend to have clumps of bit errors that occur during one transmission. Error codes have been developed to specifically protect against both random bit errors and burst errors. 2.2 Theoretical Review before analyzing There are briefly three types of code technique, linear block code, CRC code and convolution code. For this project, before we analyze this problem, we should have some technical concepts of ECC, especially, linear block codes which we will focus on. Figure 2 Ben diagram of code space The basic concept of linear block code is that transmitter send codeword c, which consists of original message bit and other redundancy bits for correcting the errors. [Senior Project] Jong-Hun Park 5
Figure 3 Message generation As you can see figure 3, in order to make transmitted signal c, we use some algebraic form, generate matrix G, which is k by n.(n is the length of codeword and k is the length of message signal). Then, generated codeword c is modulated by modulator according to given channel condition. Then received signal includes channel noise. In mathematically, received signal r(t) is, r(t) = c(t) + n(t) : Generally n(t) is AWGN(Addictive White Gaussian Noise). Because of this, received codeword is changed. So, by passing through decoder, we should recover the original signal from detected signal. Recently, there are many linear block codes introduced, for example, Hamming code, Golay code, BCH code, Reed-Solomon codes which is broadly used in satellite communication. The only difference between these codes is how to generate G, H matrices. From now on, let s see more specific algorithm of (7, 4) Hamming code, which I will used for solving this project) In Hamming code, generate matrix is defined as follow, G= [I, A]; I am identity matrix of given dimension, and A is k by (n-k) matrix for making redundancy bits. Briefly, the codeword is generated by product of message signal and Generate matrix, c=m*g. According to composition of G, the kinds of codes are different. For repairing the detected signal, we use H matrix, whose composition is like this, H = [-A, I]; Then G*H =-A *I + I*A = -A + A =0 So, the multiplication of s=h*r(t), called syndrome of r, can be the clue of correction. Simply, syndrome will be (n-k) by 1 matrix. If there is no error, s will be zero vectors, but, if there is at least one error, the syndrome won t be zero vectors. Surprisingly, it will be one of column vector of H. And the position of the same column vector in H is the position of error in codeword. (7, 4) Hamming code has 3 additional [Senior Project] Jong-Hun Park 6
bits and it can detect 6 errors and correct one bit error code. 2.3Analysis of problem Figure 4 Problem diagram As I mentioned, situation of this problem can be represent one communication system. The main signals in this problem! m(t) = Alice true number " in binary, it can be represent 4bits code! n(t) = her lies! r(t) = her answers about my questions Then, how can we find her real number, even though we don t know whether she say a lie or not? 2.4 Solve the problems First problem is how many question is need to know the Alice number if she don t say a lie. By intuitive thinking, the solution is very easy. Because, there are only 10 integers, we can recognize the number in 4 (4= [log2 (10) =3.329], [] is upper bound). In other words, you can eliminate half of numbers per question. It is very simple. However, if she can lie one times, like second problem, how many questions are needed? This problem can not be solved by upper intuitive method. We can find the key of solution of this problem in (7, 4) hamming code which I mention at Section 2. Suppose of making encoder and decoder of upper communication system by (7, 4) hamming code. Then the summary of this situation is as follow,! We don t know original message signal m(t)! In order to know m(t), we should find r(t), which has noise bit and c(t), from asking some question to Alice! If we recognize r (t), because we already know G, H matrices, we can generate syndrome vector and c (t). It means that we can recognize original signal. So, the main point is finding r (t) by well constructed questions. Well constructed means, minimizing the number of questions, and generating no error. Generating questions such like this is the [Senior Project] Jong-Hun Park 7
hardest work of this project. However, the method is simple. First, we should know how many questions are needed for one error. The answer is related the basic concept of hamming code. This approach is a little bit tricky, but, frankly speaking, because this was my original approach, so I mentioned this. Minimum hamming distance of (7, 4) hamming code is 3. So, it can correct [(3-1)/2] =1 error by properties of this code. And because this code has 3 redundancy bits, so, we can guess for one error correcting we need 3 bits more. Then, how many well constructed questions are needed? The answer is as follow. (Suppose, this system is binary system)! Because the symbol, one bit can represent is only two! And the bit error consists of only two cases, one is decoding one to zero, the other is decoding zero to one! We just determine one bit which is one or zero, per each question. Therefore, we need only 7 questions when Alice is saying one lie. The way to make question is also simple. The i-th bit of code word C, is just zero or not. But we just encode her answer and C i are same. So, if you want to know C i and {A, B, C} where A, B, C are the numbers which of all have the condition, C i =1. Then, you can ask like is your number one of {A, B, C}. Then here answer (yes=1/no=0) will be C i. In follow simulation section, I made lie detector which can recover one lie with JAVA Swing. More specific mechanisms and results will be mentioned at that section. Third problem is the extension of second one, the case that if she is saying lies t-times. This problem cannot be solved by upper tricky approach. However, it also can be solved by one of the concepts of Hamming code, Hamming bound. Figure 5 Vector representations of codes [Senior Project] Jong-Hun Park 8
Upper diagram indicates (n,k) linear block code. The composition of this union is as follow.! There exist 2^n codes in this union.! The number of codeword is 2^k, which is the same as the number of possible symbol.! e is the number of correctable error. It makes circles centered to each codeword So, the relationship among n, k, e is obtained by follow steps 1 the number of codeword is 2^n 2 The total number of e[i] which will be regarded as the codeword[i] is sum of ncj where j is changed from zero to e. 3 total number of correctable codes is the product of 1 and 2 4 the result of 3 can not be larger than 2^n Because n-k is the number of redundancy bits, rearrange this in terms of r. As we found the meaning of r in second problem, the code which can correct e errors, need more redundant bits amount of right side of upper mathematical inequality equation? In ECC, this called Hamming Bound. In (7, 4) Hamming code, e e j= 0 ncj e log 2( ncj) j= 0 Min(r) 0 1 0 0 1 8 3 3 2 29 4.8580 5 3 64 6 6 4 99 6.6294 7 5 120 6.9069 7 Therefore, in third problem, by hamming bound, at least r bits which satisfy r = n k log 2( ncj) j= 0 Furthermore, the answer of forth problem is already described. Until now, based on we think about the general ECC concepts, we solved the given problems e [Senior Project] Jong-Hun Park 9
2.5 Decoding Algorithms As I mentioned at introduction, solving problems is not the core of this project. In my opinion, considering the reason why modern communication system started to apply the coding system, the main process which decides the efficiency is in the decoding system. If the decoding algorithm doesn t make the transmission system more efficiency compared with uncoding transmission system, there is no reason to use coding theory. First, the contrast of coding, let s think about the non-coding transmission through the noise channel. The decoding algorithm of non-coding transmission is just hard decision. So, there exist one or more decision boundaries which is the half of distance between continuously received two numbers (because of error, it is not integers). Second, let s think about the decoding algorithm of hamming code. The algorithm of hamming code is very simple. Another name of hamming decoding is Syndrome decoding. The reason why they called syndrome is they using it. H matrix which we saw previous, is the generate matrix of s, syndrome vector. The Hamming decoding algorithm, which corrects up to one bit error, is as follows: 1. Compute the syndrome s = y*ht for the received vector y. If s=0, then there are no errors. Return the received vector and exit. 2. Otherwise, determine the position j of the column of H that is the transpose of the syndrome. 3. Change the jth bit in the received word, and output the resulting code. As long as there is at most one bit error in the received vector, the result will be the codeword was sent. Lastly, let s think about Maximum Likelihood decoding. The goal of the sequence estimation is to find the code sequence copt for which the posteriori probability p(c r) is maximized: copt=arg (maxc p(c)*p(c r)). If use Bayes law p(r)p(c r) = p(c)p(r c) and constant factor p(r) is neglected, we can obtain: copt=arg (maxc p(c r)), where p(c) is the a priori probability of the sequence. Assume that the message sequence m = [m0, m1, m2, ] is encoded into the code sequence c = [c0, c1, c2,, ]. The received sequence is r = [r0, r1, r2, ]. The decoder produces an estimate of c based on the observation of r. The estimate is denoted by C =(c1,c2 ) The maximum likelihood (ML) decoder choose c iff P(r c)>p(r c ) or logp(r c) > log(r c ) where c c In general, path metric denoted by M(r, c) is not ML and its bit metric is M(ri, ci): # choose c iff M ( ri, ci) > M ( ri, c' i) where ci c i i For BSC, ML decision rule, choose c iff i dh ( r, ci) < dh ( r, c' i) ( For a ML receiver M ( ri, ci) = logp(ri ci)) Consequently, i ML ireceiver decision which has smaller Hamming distance to received signal. These three basic decision rules are used widely in decoding system. There is no best one, in other words, each method has own system which it perform with best efficiency. Next, let s implement with these different decoding ideas. [Senior Project] Jong-Hun Park 10
3 Implementation Until now, we see the background and the theoretical analysis of ECC especially linear block code. From now on, let s generates real lie detector and calculates its efficiency and accuracy by plotting BER with many samples. 3.1 Lie Detector with (7, 4, 3) Hamming Code by JAVA This lie detector can correct one lie, which means this has ability of one error correction. The inputs of this are the answers from Alice (user). And the outputs are the questions that will be answered and the report of finding the original number in her mind. Figure 6 Start screen of detector As you can see upper picture, there are one textfield and some buttons including the answer buttons. If you push the Run button program starts with some comments. Before I explain about the core source let s do one game if I think number 7 and I will lie at second question. First, I press run button then the screen was changed, and it give some questions which I answered only yes/no Figure 7 Lie detector Run_1 [Senior Project] Jong-Hun Park 11
Figure 8 Lie Detector Run_2 The program catches when I lied and what I think, correctly. Then, I will explain specific operation. Figure 9 Main frame of Lie Detector As you can see this program consists of large two classes. Class Detector is the main visual frame, which is based on JAVA Swing. First, one object of Detector, is visualized with the detector frame which contains menubar panels buttons and textfield. Then object a call method oper which defined at Module class. Class Module has two global variables, input number array actually Alice s answers, and numofq which contains how many numbers program ask. And Mudule class has more methods. Search() method do finding error and return the result report, and hxi() method returns syndrome vector to search() method. [Senior Project] Jong-Hun Park 12
And there are more methods for mod calculation and converting binary to decimal. The information flow is as follow 1. main function in class detector make one instance of detector() whose name is a 2. then visual lie detector is shown 3. when the buttons pushed, each actionlisteners act its own work. 4. if numofq is same as 6, the program show the output result. And the decoding method of this is syndrome method which I already mentioned at 2.5 Decoding Algorithm. There is no hard thing to implement with Java language. However, when I complete making this detector, I wondered how much efficiency one bit error correction has? Because, this algorithm can not correct more than single bit error, and it has poor correcting ability when the number of errors is more than 2. When I see the decoding process of following simulation, decoder makes two bit error signal to three bit error signal for decision. So, I am very confused. Let s think about details of this with follow simulation result. 3.2 Simulation and performance analysis As I mentioned, I just wondering the performance of (7, 4) hamming code, actually only one bit corrector performance compared with none coding without redundancy bit. Figure 10 BER per SNR As you can see, upper diagram indicates the BER level due to different SNR, red curve is the performance curve when it doesn t use coding method. (I generate this curve by hard decision.) Second [Senior Project] Jong-Hun Park 13
blue curve indicate BER curve using (7, 4) hamming coding, which use syndrome decoding method. And the last one is theoretical curve of (7, 4) hamming code correction. When I saw this performance curve, first, it s hard to understand this graph. However, I realize what I confused. This system send only 7bit, actually 4bit signal m, so, the channel which make the signal with 3bit error, can not used in communication system. So, simply just one bit error correcting can afford high correcting performance. From the paper, Heidi Steendam and Marc Moeneclaey ML-Performance of Low-Density Parity-Check Codes, I can get the upper bound of theoretical curve. BER k Q( 2 N 2 j= 0 dh ( bj,0) * SNR * dh( cj,0))* K When BER = about 10 4, the SNR difference between theoretical curve and syndrome decoding is about 1.5 db, however, the difference between Syndrome decoding and no coding curve is more than 2dB, actually difference between uncoding curve and theoretical curve is about 4 db of SNR. So, we can conclude this simulated (7,4) hamming code has more performance is good and when using coding method the BER value is decreased when it doesn t. Figure 11 BER comparison between ML decode and Syndrome decode Figure 11 is the optional comparison which compare the performance between ML decoding and Syndrome decoding. In this experiment, we can find the performances are almost same. The reason why this two are same is, I think, this code delivery only 4bit signal. From my research, ML decoding is generally more efficient than syndrome decoding. [Senior Project] Jong-Hun Park 14
5. Conclusion Though this reports, we confirm history and basic concept of Error Correcting Code. And we solve the given problem which is the system of lie-noise environment. Furthermore, we saw the lie detector with a simple example and simulate one error correcting code for plotting BER curve. From this curve, we confirm the coding method decrease BER, efficiently. Hamming code is almost primitive Error Correcting code, because now days no one uses for main encode/decode method. However, we confirm even hamming code have more powerful performance rather than non_coding transmission system. So, though frequency spectrum capability is decreased by using ECC, if using more powerful coding such as Golay, BCH, or Reed-Solomon Code, the BER curve will be more closed to Shannon s bound. And this technique will be used wildly at storage and communication technology such as DVD, and special communication system which needs non error but high bit rate or symbol rate. I hope that with closing, the appearance of more powerful code which will break the Shannon bound. 6. Reference 1. Wade Trappe and Lwarence C. Washington Introduction to Cryptography with Coding Theory 2. Shu Lin and Daniel Costello, Jr. Error Control Coding, prentice Hall, 1983 3. W.Weslet Peterson and E.J.Weldon, Jr. Error-correcting Codes,2 nd ed., MIT press, 1972 4. Ling-Pei Kung introduction to Error correcting code 5. 이윤미황인경시립인천대학교, [7,4] Hamming Code Encoder 와 Decoder 의설계 6. Heidi Steendam and Marc Moeneclaey ML-Performance of Low-Density Parity-Check Codes [Senior Project] Jong-Hun Park 15