Implementation and Performance Testing of the SQUASH RFID Authentication Protocol Philip Koshy, Justin Valentin and Xiaowen Zhang * Department of Computer Science College of n Island n Island, New York, 10314 E-mail: Xiaowen.Zhang@csi.cuny.edu Abstract We implement and test the performance of an RFID hash algorithm recently proposed by Adi Shamir [1] using a C++ simulation. The algorithm, called SQUASH (short for SQUarehASH ), allows for an RFID tag design that is simple enough to be implemented on low-cost RFID tags. Shamir has proved mathematically that his SQUASH algorithm is at least as secure as the Rabin cryptosystem [2], which has been extensively tested and scrutinized for nearly 30 years. The SQUASH algorithm is designed to minimize processing time and cost without sacrificing security. Shamir s SQUASH algorithm was developed as a lightweight version of a hash function from the Rabin cryptosystem, c = m 2 mod n. The SQUASH algorithm is theoretically faster because rather than storing, computing, and transmitting a large ciphertext, the algorithm allows a low cost RFID tag to compute the bits of the ciphertext in real-time, bit by bit, transmitting them as they are calculated. Although the SQUASH algorithm is provably as secure as the Rabin cryptosystem, the performance of the algorithm has not been carefully scrutinized. Shamir writes that the performance of the SQUASH algorithm should scale linearly as the size of a tag s register increases; we attempt to test this specific claim by using a software simulation. Keywords-RFID; Hash Algorithm; Rabin Cryptosystem I. INTRODUCTION Radio Frequency Identification (RFID) tags allow large organizations in the public and private sector to catalog, index and process a large volume of data wirelessly. Because of a growing need for low-cost RFID tags, individual tags are created with only a small amount of working memory and security is often implemented as an afterthought. Some consumer rights organizations, like CASPIAN (Consumers Against Supermarket Privacy Invasion and Numbering) and EPIC (Electronic Privacy Information Center), are against the use of RFID because of possible privacy violations. As the security industry searches for a cost-effective, lightweight RFID hash algorithm to deal with these security concerns, Adi Shamir has created the SQUASH algorithm (short for SQUare-hASH ) as one possible solution. Unlike other hash functions that have been proposed without a strong mathematical basis, the SQUASH algorithm is provably as secure as the hash function from the Rabin cryptosystem, c = m 2 mod n, which has been studied and scrutinized for nearly 30 years. The SQUASH algorithm has the benefit of producing *X. Zhang is the corresponding author; his work is supported in part by a PSC-CUNY Award. 978-1-4244-5550-8/10/$26.00 2010 IEEE bits of the ciphertext on the fly, bit by bit, thus allowing for high data throughput. We have focused on a specific claim made by Shamir, specifically that the performance of the SQUASH algorithm will scale linearly as the tag register size is increased exponentially. We have created a simulation of the algorithm in C++ to verify this claim. II. AUTHENTICATION VIA HASH ALGORITHMS A. Authenticate using the result of a mathematical operation If security were not a concern, an RFID reader could authenticate an RFID tag by simply requesting a secret value from the tag. The reader could then match and verify this secret value from its own internal database. While this approach is possible, the tag would send sensitive data that could be retrieved by an eavesdropping attack and then exploited by a man-in-the-middle attack. To avoid this problem, a tag sends something called a ciphertext instead. Rather than share a secret value over a potentially insecure transmission medium, we can perform a mathematical operation on the secret value, and send the result. The operation is called a hash function and the result is called the ciphertext. B. The Authentication Process When beginning transmission, the RFID reader sends a pseudorandom number R to the RFID tag. At this point, the tag and the reader both know the following three pieces of information: (1) The hash function H, (2) a secret piece of information S, uniquely identifying the tag and (3) the pseudorandom number R. After receiving R from the reader, the tag then performs an exclusive-or operation on R and S and sends the result through the hash function H. This produces the ciphertext C 1. So in other words, the tag calculates H(R S) = C 1. Simultaneously, the reader also uses the same H, S and R to calculate its own ciphertext, C 2. Therefore, the reader produces H(R S) = C 2. The tag sends its calculated ciphertext C 1 back to the reader and the reader attempts to compare it with its calculated C 2. If C 2 = C 1, then the RFID reader has successfully authenticated the RFID tag since they have both computed the same value for the ciphertext.
III. SQUASH AND THE RABIN CRYPTOSYSTEM In the one way function c = m 2 mod n from the Rabin cryptosystem, c represents the ciphertext, m represents the message to be hashed and n is a Mersenne number (n = 2 K 1 where k is the message length) that has not yet been factored. If we attempted to square, mod and store a large number, the implementation would be too cumbersome and too slow to work on a low-cost RFID tag. The follow steps show the proof and derivation of the SQUASH algorithm from the Rabin cryptosystem. The focus is on high speed while maintaining a small design footprint. A. Modular Reduction We look at an example to demonstrate how Shamir simplifies the modular square operation (m 2 mod n) used in the Rabin cryptosystem. Say we have a 4-bit message with a value of 1010 binary. (Note that since the message is 4 bits, k = 4 in this example) B. Mathematical Convolution In order to find g 1 and g 0, it would seem that we need to calculate and store m 2 in its entirety. Luckily, because of a process called mathematical convolution, we can generate the bits of m 2 on the fly without storing the square. To better understand convolution, we begin by examining the operation of squaring in a generalized sense. The following example demonstrates how we can square a 4-bit binary number, where the individual bits of the number are represented by m 3 to m 0. This means: m = 1010 binary (which is 10 decimal) m 2 = 0110 0100 binary (which is 100 decimal) We can break up m 2 into a top half and bottom half as follows: m 2 = 0110 0100 g 1 g 0 g 1 = 0110 binary = 6 decimal g 0 = 0100 binary = 4 decimal Mathematically, we can determine that the square of the message is actually equivalent to m 2 = g 1 2 K + g 0 This is because multiplying g 1 by 2 K is the same thing as shifting g 1 left by k places. Visually, the operation would look like this for our example where g 1 = 0110 and k = 4. g 1 = 0110 Initial value of g 1 from above g 1 = 0110 0 Shift left (same as multiplying by 2 1 ) g 1 = 0110 00 Shift left (same as multiplying by 2 2 ) g 1 = 0110 000 Shift left (same as multiplying by 2 3 ) g 1 = 0110 0000 Shift left (same as multiplying by 2 4 ) If we add g 1 (after the four shifts) to g 0, we see that we get m 2 : g 1 = 0110 0000 + g 0 = 0000 0100 m 2 = 0110 0100 From our original formula (c = m 2 mod n), we know that n is of the form 2 K 1. Performing some algebra, we can simply the cipher calculation. 2 K 1 = n Original formula 2 K = n + 1 After adding 1 to both sides 2 K = 1 mod n After taking mod n on both sides This means our cipher can be simplified as follows: c = m 2 mod n = (g 1 2 K + g 0) mod n = g 1 + g 0 Therefore the cipher is simply equal to g 1 + g 0 The final product of the squaring operation can be found by taking the sum of the bits in each column of partial products. For example, the bits of g 0 can be calculated as follows: Bit 0 of the solution = m 0m 0 Bit 1 of the solution = m 1m 0 + m 0m 1 Bit 2 of the solution = m 2m 0 + m 1m 1 + m 0m 2 + carry Bit 3 of the solution = m 3m 0 + m 2m 1 + m 1m 2 + m 0m 3 + carry Thus, in order to generate a bit in the lower half g 0 (we can call it bit j) we use this formula: (1) Similarly, the bits of g 1 can be calculated as follows: Bit 4 of the solution = m 3m 1 + m 2m 2 + m 1m 3 + carry Bit 5 of the solution = m 3m 2 + m 2m 3 + carry Bit 6 of the solution = m 3m 3 + carry Bit 7 of the solution = carry To generate a bit in the upper half g 1 (call it bit j+k) we use this formula: (2) By combining equations (1) and (2), we can come up with the final SQUASH algorithm:
IV. IMPLEMENTATION Once we were able to mathematically generate bits of the product on the fly, we looked for an approach that would work in our C++ implementation. Shamir suggested generating these bits using non-linear feedback shift registers (NLFSR). A paper by Gosset, Standaert and Quisquater [3] gave us ideas. The group implemented the SQUASH algorithm into a Field- Programmable Gate Array (FPGA) using NLFSRs. As described in their paper, the final SQUASH algorithm can be simply stated as follows: states that are not in any particular order. (We say nearly all combinations because the state 0000 is invalid for reasons we will soon see.) A diagram of a simple LFSR follows. Set c = 0, m = NLFSR(R S) For j = 600 to j = 647 c = = c = Output the 32 bits c,, c Figure 1. 0 contains the value 1010 Notice that bits 3 and 4 (counting from left to right) are being fed into an exclusive-or (XOR) gate. These bits are said to be tapped. The number and order of the tapped bits are referred to as a tap sequence. The tap sequence for this register would typically be listed as (4,3). To understand the implementation, we must first understand linear feedback shift registers. A. Linear Feedback Shift Register Imagine if we were tasked with enumerating through all possible combinations (aka states) of a 4 bit binary string. TABLE I. 0 0000 0 1 0001 1 2 0010 2 3 0011 3 4 0100 4 5 0101 5 6 0110 6 7 0111 7 8 1000 8 9 1001 9 10 1010 10 11 1011 11 12 1100 12 13 1101 13 14 1110 14 15 1111 15 Intuitively, we would count from 0000 to 1111 (in binary) like above. Notice that we can represent 16 possible states with 4 binary bits (2 4 = 16). Although counting linearly upward from state to state like this is an intuitive approach, this turns out to be a complex implementation in physical hardware. An alternative design for generating nearly all possibilities is to use something called a linear feedback shift register. A linear feedback shift register allows us to generate nearly all possible binary combinations for a binary string of length n, albeit with Figure 2. 1 (after shifting right once from 0) If we were to generate all possible states of this particular shift register (by shifting right and using the XOR to feed in new values), the output states of the LFSR would be as follows: TABLE II. 0 1010 10 1 1101 13 2 1110 14 3 1111 15 4 0111 7 5 0011 3 6 0001 1 7 1000 8 8 0100 4 9 0010 2 10 1001 9 11 1100 12 12 0110 6 13 1011 11 14 0101 5
There are two important differences between Table I and Table II: The states here are completely out of order, which is why LFSR s are often used in cryptography. If we look carefully, this shift register goes through all possible states except a state with a value of 0000. A state of 0000 being fed into the XOR gate (regardless of the tap sequence) will always produce a 0 and consequently, we would be stuck in a state of 0000 indefinitely. B. Tap Sequences Keep in mind that in the previous LFSR example, we tapped bits 3 and 4. If we had chosen a different tap sequence, our register may not have went through 15 states. In general, a good tap sequence is one that gives us maximal length. This simply means that a LFSR should give us as many different states as possible until it loops back around. Mathematically, a maximal length tap sequence will always yield 2 n 1 possible states where n represents the number of binary bits. Remember that we subtract 1 because of our inability to represent a state of 0000. In order to find a maximal length tap sequence, we can look up the information from a table. For our purposes, we were able to find maximal length tap sequences from Wikipedia for 4, 8, 16, 32, 64 and 128 bit registers [4]. The maximal length tap sequences we used are shown here: Forward Shifter TABLE IV. Reverse Shifter 0 1010 10 0011 5 1 1101 13 1011 11 2 1110 14 0110 6 3 1111 15 1100 12 4 0111 7 1001 9 5 0011 3 0010 2 6 0001 1 0100 4 7 1000 8 1000 8 8 0100 4 0001 1 9 0010 2 0011 3 10 1001 9 0111 7 11 1100 12 1111 15 12 0110 6 1110 14 13 1011 11 1101 13 14 0101 5 1010 10 Notice how the forward and reverse shifters are mirror images of each other. To create a reverse shifter, we must change the shift direction from right to left as well as find a new reverse tap sequence. A simple formula can be used to convert the forward tap sequence into a reverse tap sequence: if (n, A, B, C) are the tapped bits in the forward direction, then (n, n-c, n-b, n-a) is the tap sequence in the reverse direction. For example, if the forward tap sequence for a 4 bit register is (4,3), the reverse tap sequence is (4,1) as Fig. 3 illustrates. TABLE III. Register Size (in bits) Maximal Length Tap Sequence 4 4,3 8 8,6,5,4 16 16,15,13,4 32 32,22,2,1 64 64,63,61,60 128 128,126,101,99 Note how the first value of the tap sequence is also the same as the register size. Essentially, this means the rightmost bit is always tapped for maximal length tap sequences. C. Non-linear Reverse Feedback Shift Register While we ve seen that a normal LFSR can generate bits in the forward direction, we will also need to generate bits in the reverse direction to perform the mathematical convolution we ve described previously. An example follows: Figure 3. Reverse LFSR V. PERFORMANCE TESTING To test the performance of our SQUASH implementation in C++, we designed a program that would simulate 100 timed executions of the SQUASH algorithm for registers of various sizes. After measuring the average time (in CPU cycles), we determined the frequency of the current CPU and used it to calculate an approximate tags per second metric. This was done by dividing the frequency of our CPU by the mean cycles per tag. Based on our results using this metric, our software implementation did indeed perform almost linearly as Shamir had described.
5 4 3 2 1 0 Tags Authenticated By Reader As A Function Of Register Size Tags Per Second 4 8 16 32 64 128 VI. CONCLUSION While our tests showed that performance did scale linearly, we noted that we should generally be able to process significantly more tags per second. Gosset s FPGA implementation of the Squash algorithm was able to process roughly 3,500 tags per second using an FPGA operating at 222 MHz [3]. Meanwhile, our C++ implementation, running on a Dual Core, Intel 1.83 GHz was unable to process 5 tags per second. As we can see, specialized hardware makes an enormous difference. Production-quality RFID readers should not be affected by the performance issues we faced testing on a general purpose PC. Figure 4. Results of performance testing Fig. 4 is slightly skewed because the x-axis grows exponentially Table VI shows the linear relationship between register size and processed tags per second. TABLE VI. Register Size Tags Per Second (In Bits) 4 4.3457 8 2.60074 16 1.68399 32 0.981616 64 0.590769 128 0.309733 REFERENCES [1] Adi Shamir, SQUASH A New MAC With Provable Security Properties for Highly Constrained Devices Such as RFID Tags, Proc. Fast Software Encryption FSE 2008, Lausanne, Switzerland, February 2008 [2] Michael Rabin, Digitalized Signatures and Public-Key Functions as Intractable Factorization, Massachusetts Institute of Technology, Laboratory for Computer Science, TR-212, January 1979 [3] F. Gosset, F.-X. Standaert, J.-J. Quisquater, FPGA Implementation of SQUASH, 29 th Symposium on Information Theory in Benelux, Leuven, Belgium, May 2008, pp. 231-238. [4] Linear Feedback Shift Registers, Wikipedia, http://en.wikipedia.org/wiki/linear_feedback_shift_register, Accessed 2010