VOL., NO. 5, MARCH 26 ISSN 89-668 26-26 Asian Research Publishing Nework (ARPN). All righs reserved. HIGH THROUGHPUT EVALUATION OF SHA- IMPLEMENTATION USING UNFOLDING TRANSFORMATION Shamsiah Bini Suhaili and Takahiro Waanabe 2 Faculy of Engineering, Universii Malaysia Sarawak, Koa Samarahan, Sarawak, Malaysia 2 Graduae School of Informaion, Producion and Sysems, Waseda Universiy, Hibikino, Wakamasu-ku, Kiakyushu-shi, Fukuoka, Japan E-Mail: sushamsiah@feng.unimas.my ABSTRACT Hash Funcion is widely used in he proocol scheme. In his paper, he design of SHA- hash funcion by using Verilog HDL based on FPGA is sudied o opimise boh hardware resource and performance. I was successfully synhesised and implemened using Alera Quarus II Arria II GX: EP2AGX45DF29C4. In his paper, wo ypes of design are proposed, namely SHA- and SHA-unfolding. The maximum frequency of SHA- design is 274.2 MHz which is higher han SHA- unfolding ha has he maximum frequency of only 74.73 MHz. However, his leads o a high hroughpu of he SHA unfolding design wih 2236.54 Mbps. Besides, boh designs provide a small area implemenaion on Arria II ha requires only 423 and 548 Combinaional ALUTs, 693 and 97 oal regiser, respecively. Keywords: maximum frequency, FPGA, HDL, SHA-. INTRODUCTION Implemenaion of hash funcion on reconfigurable hardware is one of he pracical soluions for embedded sysem which can give differen resuls based on he srucure of reconfigurable logic of FPGA. In oher words, FPGA has he capabiliy o improve he performance in erms of power, speed and area implemenaion. FPGA offers several benefis for crypographic algorihm hash funcion because i is small, incurs low developmen cos, has high speed and fine memory; i is highly flexible, including capabiliy for frequen modificaion of hardware, shor ime o marke as well as easy experimenal esing and verificaion. I ends o be an excellen choice when dealing wih algorihms bu i has he disadvanage of high power consumpion. Therefore, in order o apply he high-speed crypographic soluion on reconfigurable hardware, furher research relaing o high speed and small area implemenaion needs o be aken ino accoun. Hash Funcion is a ransformaion ha akes variables inpu message M and reurns a fixed-size lengh which is called hash value [,2,3]. There are many ypes of hash funcions such as MD5, SHA-224, SHA-256, SHA- 384 and SHA-52. The purpose of his paper is o analyse he srucure of SHA- hash funcion on reconfigurable hardware and o obain small area implemenaion as well as high frequency maximum. In shor, balancing beween maximum frequency and area implemenaion of he design needs o be considered. The high performance of he hash funcion design is imporan o improve he hroughpu of he design since nowadays all sysems need fas implemenaion. The moivaion of his research is o sudy he srucure of SHA- hash funcion as i is imporan for some applicaions such as Message Auhenicaion Code (MAC) []. SHA- hash algorihm has been sudied wih careful design a every sage of is inner srucure using Verilog. There are many researches peraining o SHA- FPGA-based implemenaion [4-2]. However, some of he papers need furher improvemen. In his paper, Alera Quarus II Arria II GX: EP2AGX45DF29C4 is chosen as a arge device for boh SHA- and SHA- unfolding implemenaion because i has he poenial o provide high performance for he design. The paper is organised as he following: Secion II presens he descripion of SHA- algorihm; Secion III briefly explains Unfolding Algorihm; Secion IV conains he performance evaluaion; and Secion V ends he paper wih a conclusion of SHA- implemenaion. SHA- ALGORITHM Secure Hash Algorihm (SHA-) inpu mus be and he message is processed in 52-bi blocks sequenially wih 6-bi message diges oupu. The process of SHA- algorihm is divided ino wo pars: preprocessing and hash compuaion. The non-linear funcion of SHA- operaes on hree 32-bi words B, C, and D wih logical sequence from fo unil f79. PRE-PROCESSING Pre-processing consiss of hree seps: padding he message, parsing he padded message ino message blocks and seing he iniial hash value [2]. Message padding is he muliple of 52-bi or 24-bi based on inpu message. Le he widh of an inpu message M be l bis. Therefore, he value of he inpu l mus be in he range of. Then, single -bi is added o he end of he message, and i is followed wih -bi and he lengh of he message which is congruen 448 modulo 52. The padded message, M is parsed ino N 52-bi blocks, where he message is M(), M(2)... M(N). Table shows he buffer iniialisaion of SHA-. I consiss of five 32-bi iniial values which mus be execued before hash compuaion, as shown in Table-. 335
VOL., NO. 5, MARCH 26 ISSN 89-668 26-26 Asian Research Publishing Nework (ARPN). All righs reserved. Table-. Buffer Iniialisaion of SHA-. Afer iniialising five working variables A, B, C, D, and E wih buffer iniialisaion H, H, H 2, H 3, H 4 in he pre-processing, he hash compuaion uses he consan K and round funcion for 79 as shown in Table-2 and Table 3 o process he message. The symbol,, in non-linear funcion of four rounds SHA- algorihm from Table-3 represen logical AND, NOT and XOR operaion respecively. Afer rounding four rounds ha consis of 8 seps, he final sep is adding he iniial value wih he las oupu hash. Table-2. Consan K. HASH COMPUTATION SHA- hash compuaion processes he padded message wih message schedule of 8 seps processing of 32-bi, W,W...W 79. Equaion illusraes he compression funcion of SHA- for inpu A, B, C, D, and E. The symbol << means he regiser inpu shifs o he lef wih he value given. T consiss of W and K where W is expanded message word of round, and K is round consan of round. Table-3. Round funcion. UNFOLDING ALGORITHM Unfolding algorihm is one of he echniques ha can be used by DSP applicaion o obain a new program ha performs more han one ieraion of he original program. In addiion, unfolding facor, J describes he number of ieraions from he original program. The rules of unfolding algorihm are explained as below [4]: Figure-. SHA- Compression funcion. A 5 FB, C D W K E T, A T, B A, C B 3, D C, E D () The formula for he derivaion of 32-bi block message schedule W is simply from message inpu for 6. The remaining values of W where 6 are derived using he following Equaion (2).. For each node U in he original Daa flow graph (DFG), draw he J nodes U, U, U 2 2. For each edge U V wih delay in he original, draw he J edges V wih U i i % J i delays for i,,..., J J. In order o explain he srucure of unfolding algorihm, one example of DSP program is shown in Figure-2. y ( n ) ay ( n 9) x ( n ) (3) W ROTL W W W W (2) 2 8 4 6 335
VOL., NO. 5, MARCH 26 ISSN 89-668 26-26 Asian Research Publishing Nework (ARPN). All righs reserved. Figure-2. The original DSP program [4]. DFG can be consruced from Figure-2, which is he original DSP program by replacing he inpu and oupu por wih node A and B while he addiion and muliplicaion processes are represened by node C and node D respecively as shown in Figure-3. Figure-3. The 2-unfolded DFG [4]. Based on he firs rule of unfolding algorihm, here are 8 nodes ha represen i, namely A, B, C, D, A, B, C and D. The second sep of unfolding algorihm is o connec each edgeu V in he DSP program. The edge U V wih no delay is divided ino wo pars, U V and U V. Therefore, he edge C D wih 9 delays becomes C D9 wih 9 delays and % 2 C D 2 9% 2 wih 9 delays. Finally, he 2-unfolded DFG is creaed 2 wih C D wih 4 delays and C D wih 5 delays respecively. SHA- UNFOLDING ALGORITHM The proposed SHA- unfolding algorihm wih facor 2 is shown in Figure-4. I consiss of wo non-linear funcions wih hree differen inpus, wo circular lef shif of boh A and B by 3 and wo circular lef shif of A by 5 and Temp by 5 respecively. From his figure, here are 8 addiion operaions which perform in parallel form during he execuion process. Thus, he criical pah of he design has only four addiion processes. In oher words, wo hash operaions are execued per cycle. This process reduces he number of cycle from 8 cycles o 4 cycles in order o obain he final oupu hash. Hence, unfolding ransformaion can increase he hroughpu of he SHA- hash funcion. Figure-4. SHA- Unfolding compression funcion. The oupus of SHA- unfolding algorihms are shown in he following equaion. ROTL a b represens circular lef shif or lef roaion operaion of b by a posiion o he lef, and func p, q, r means non-linear funcion a ime for hree differen inpu p, q and r. A funcb, C D E W K 3 } func A, ROTL B C 5 Temp ROTL, A 5 2 ROTL { Temp, W K B 2 Temp C 3 2 ROTL A (4) 3 D ROTL E 2 B 2 C PERFORMANCE EVALUATIONS Synhesis and implemenaion processes of SHA- algorihm are successfully designed by using Verilog HDL based on Alera Quarus II Arria II: EP2AGX45DF29C4. Boh designs only focus on ieraive design where one block sep funcion is used for 8 rounds and 4 rounds processes. Simulaion of he designs is verified using ModelSim-Alera.e based on funcional and iming simulaion in erms of area implemenaion, maximum frequency and power analysis of he design. The comparison beween previous publicaions of SHA- design and SHA- unfolding design FPGA-based D 3352
VOL., NO. 5, MARCH 26 ISSN 89-668 26-26 Asian Research Publishing Nework (ARPN). All righs reserved. implemenaion is carried ou o evaluae he performance of he design [5-3]. All he resuls are presened in Table- 4. The proposed SHA- design and SHA- unfolding use only 423 and 548 Combinaional ALUT respecively. Besides, oal regiser of he design increases from 693 o 97 in SHA- unfolding design. The comparison of area implemenaion and speed of he design depends on FPGA family devices. The designer needs o choose he appropriae device in order o reduce he usage of logic uilisaion as well as increase he performance of he design. The oal esimaed power dissipaion of he SHA- unfolding decreases from 625.86 o 456.2 mw. From his able, i is shown ha he hroughpu of he design for SHA- unfolding increases significanly wih 74.73 MHz maximum frequency. The hroughpu of he design is abou 2236.54 Mbps which is higher han ha of SHA- design, wih only 754.88 Mbps. Hence, he hroughpu of he design can be calculaed by using he following formula where block size is 52 bis. Table-5. Area implemenaion of SHA- and SHA- unfolding. Throughpu Frequency block size (5) Laency Table-4. FPGA-based SHA- implemenaion. RESULTS ANALYSIS There are several ohers published FPGA-based implemenaion of SHA-. In his paper, wo ypes of FPGA, Xilinx and Alera are lised as CAD ool for design implemenaion in order o compare he effecs of area implemenaion in erms of Combinaional ALUT, Logic Elemen, Slices and oal regiser. Since he device of SHA- implemenaions is no he same, he comparison of he design in erms of area and speed can be evaluaed from arge devices. In oher words, he designer can choose a device ha will provide high performance implemenaion. Table-5 shows he SHA- area implemenaion on differen ypes of FPGA family devices. From his able, we consider he laency of previous papers which he auhors did no menion as normal SHA- operaion. Small area implemenaion is good for any applicaion ha needs compac design which can reduce power consumpion. The proposed design uses Arria II ha can balance he area and performance of he design. Implemenaions on Arria II for boh proposed designs use small amoun of combinaional ALUT and oal regiser if compared wih oher previous designs. The performance in erms of area can be evaluaed by choosing he appropriae family device in order o ge he small area implemenaion. Table-6 shows he comparison of maximum frequency of SHA- design wih differen ypes of family devices. From his able, he proposed SHA- 3353
VOL., NO. 5, MARCH 26 ISSN 89-668 26-26 Asian Research Publishing Nework (ARPN). All righs reserved. design provides he highes maximum frequency which is 274.5 MHz wih a hroughpu of 754.56. Table-6. Maximum frequency of SHA-. CONCLUSIONS The archiecure of SHA- Unfolding was successfully synhesised and implemened on Alera Arria II: EP2AGX45DF29C4 using Verilog HDL. The maximum frequency of he design is 74.73MHz while he area uilisaion in erms of combinaional ALUTs and oal regiser are 548 and 97 respecively. The maximum frequency of SHA-design implemenaion illusraes he criical pah of he design. In order o obain he high performance design, no only speed needs o be considered, bu he area implemenaion should also be aken ino accoun. Some oher mehodology or echnique can be implemened o increase he maximum frequency as well as hroughpu of he design. High performance wih efficien design incorporaes consideraions of small area implemenaion, high maximum frequency and small esimaion power consumpion; his in urn will lead o high hroughpu of he design. ACKNOWLEDGEMENTS This work is suppored by Universii Malaysia Sarawak (UNIMAS). REFERENCES Table-7 shows he maximum frequency (fmax) of SHA- unfolding. From his able, he proposed SHA- unfolding design obains he highes maximum frequency which is 74.73 MHz if compared wih oher SHA- unfolding designs. This leads o high hroughpu of SHA- unfolding design. The hroughpu from L. Jiang and J.Kim design provides high hroughpu because of he pipeline design. However, his design uses large area implemenaion. As we can see from Table-5, he same auhors use large amoun of Combinaional ALUT which is abou 33764 and 649 if compared wih his SHA- unfolding design ha only uses 548 Combinaional ALUT. Table-7. Maximum frequency of SHA- unfolding. [] Beale Q. Dang. 2. Draf NIST Special Publicaion 8 7. Recommendaion for Applicaions using Approved Hash Algorihm, Compuer Securiy Division, Informaion Technology Laboraory. [2] Federal Informaion Processing Sandards. Secure Hash Sandard (SHS), FIPS PUB 8-3. 28. Informaion Technology Laboraory Naional Insiue of Sandards and Technology Gaihersburg. [3] F. R. Henriquez, N.A. Saqib, A. D. Perez, C. K. Koc. 26. Crypographic Algorihms on Reconfigurable Hardware, Springer series on Signal and Communicaion. [4] K.K.Parhi. 999. VLSI Digial Signal Processing Sysems: Design and Implemenaion, John Wiley & Sons, Inc. 9-4. [5] K. Jarvinen. 24. Design and Implemenaion of a SHA- Hash Module on FPGAs. Helsinki Universiy of Technology Signal Processing Laboraory. [6] Y.K.Kang, D.W.Kim, T.W.Kwon, J.R.Choi. 22. An Efficien Implemenaion of Hash Funcion Processor for IPSEC. Proceedings 22. IEEE Asia-Pasific Conference on ASIC. pp. 93-96. [7] L. Miao, X. Jinfu, Y. Xiaohui, Y. Zhifeng. 29. Design and Implemenaion of Reconfiigurable Securiy Hash Algorihms Based on FPGA. Informaion Engineering, ICIE 9 WASE Inernaional Conference, Taiyuan, Chanxi. pp. 38-384. 3354
VOL., NO. 5, MARCH 26 ISSN 89-668 26-26 Asian Research Publishing Nework (ARPN). All righs reserved. [8] Diez, J.M., Bojanic S., Sanimirovic Lj., Carreras C., Nieo-Taladriz O.. 22. Hash Algorihms for Crypographic Proocols: FPGA Implemenaions. Proceeding of h Telecommunicaions forum TELFOR 22, Belgrade, Yugoslavia. [9] D. Zibin, Z. Ning. 23. FPGA Implemenaion of SHA- Algorihm, ASIC 23. Proceedings 5 h Inernaional Conference. 2: 32 324. [] L. Jiang, Y. Wang, Q. Zhao,Y. Shao, X. Zhao. 29. Ulra High Throughpu Archiecures for SHA- Hash Algorihm on FPGA, Compuaional Inelligence and Sofware Engineering, CiSE 29, Inernaional Conference, Wuhan. pp. -4. [] N. Sklavos, E. Alexopoulos and O. Koufopavlou. 23. Neworking Daa Inegriy: High Speed Archiecures and Hardware Implemenaions. The Inernaional Arab Journal of Informaion Technology. (). [2] Y. K. Lee, H. Chan, I. Verbauwhede. 26. Throughpu Opimized SHA- Archiecure Using Unfolding Transformaion. Applicaion-specific Sysems, Archiecures and Processors (ASAP 6). pp. 354-359. [3] J. Hoon Lee, S. Choon Kim, Y. Jun Song. 2. High-Speed FPGA Implemenaion of he SHA- Hash Funcion. IEICE Trans. Fundamenals, E94-A(9) [4] J. Kim, H. Lee, Y. Won. 22. Design for High Throughpu SHA- Hash Funcion on FPGA. Fourh Inernaional Conference on Ubiquious and Fuure Neworks (ICUFN). pp. 43 44. 3355