HIGH THROUGHPUT EVALUATION OF SHA-1 IMPLEMENTATION USING UNFOLDING TRANSFORMATION

Similar documents
Direct Analysis of Wave Digital Network of Microstrip Structure with Step Discontinuities

Phase-Shifting Control of Double Pulse in Harmonic Elimination Wei Peng1, a*, Junhong Zhang1, Jianxin gao1, b, Guangyi Li1, c

EE 330 Lecture 24. Amplification with Transistor Circuits Small Signal Modelling

Pulse Train Controlled PCCM Buck-Boost Converter Ming Qina, Fangfang Lib

An off-line multiprocessor real-time scheduling algorithm to reduce static energy consumption

Investigation and Simulation Model Results of High Density Wireless Power Harvesting and Transfer Method

ECMA-373. Near Field Communication Wired Interface (NFC-WI) 2 nd Edition / June Reference number ECMA-123:2009

How to Shorten First Order Unit Testing Time. Piotr Mróz 1

4 20mA Interface-IC AM462 for industrial µ-processor applications

A new image security system based on cellular automata and chaotic systems

ECMA st Edition / June Near Field Communication Wired Interface (NFC-WI)

P. Bruschi: Project guidelines PSM Project guidelines.

Parameters Affecting Lightning Backflash Over Pattern at 132kV Double Circuit Transmission Lines

Lecture 11. Digital Transmission Fundamentals

2600 Capitol Avenue Suite 200 Sacramento, CA phone fax

Hardware Design of Moving Object Detection on Reconfigurable System

Chapter 2 Introduction: From Phase-Locked Loop to Costas Loop

B-MAC Tunable MAC protocol for wireless networks

ISSCC 2007 / SESSION 29 / ANALOG AND POWER MANAGEMENT TECHNIQUES / 29.8

Evaluation of the Digital images of Penaeid Prawns Species Using Canny Edge Detection and Otsu Thresholding Segmentation

Using Box-Jenkins Models to Forecast Mobile Cellular Subscription

Modeling and Prediction of the Wireless Vector Channel Encountered by Smart Antenna Systems

A NEW DUAL-POLARIZED HORN ANTENNA EXCITED BY A GAP-FED SQUARE PATCH

Motion-blurred star image acquisition and restoration method based on the separable kernel Honglin Yuana, Fan Lib and Tao Yuc

Related-Cipher Attacks

10. The Series Resistor and Inductor Circuit

A WIDEBAND RADIO CHANNEL MODEL FOR SIMULATION OF CHAOTIC COMMUNICATION SYSTEMS

Memorandum on Impulse Winding Tester

AN303 APPLICATION NOTE

Q-learning Based Adaptive Zone Partition for Load Balancing in Multi-Sink Wireless Sensor Networks

A Segmentation Method for Uneven Illumination Particle Images

Social-aware Dynamic Router Node Placement in Wireless Mesh Networks

Collaborative communication protocols for wireless sensor networks

Design of an active radio frequency powered multihop wireless sensor network

Receiver-Initiated vs. Short-Preamble Burst MAC Approaches for Multi-channel Wireless Sensor Networks

3D Laser Scan Registration of Dual-Robot System Using Vision

A Harmonic Circulation Current Reduction Method for Parallel Operation of UPS with a Three-Phase PWM Inverter

The Comparisonal Analysis of the Concept of Rectangular and Hexagonal Pilot in OFDM

Reducing Computational Load in Solution Separation for Kalman Filters and an Application to PPP Integrity

Gaussian Blurring-Deblurring for Improved Image Compression

Fuzzy Inference Model for Learning from Experiences and Its Application to Robot Navigation

Obsolete Product(s) - Obsolete Product(s)

Applications. Sami G. Tantawi, Stanford Linear Accelerator Center, Stanford University, CA 94025, USA

Industrial, High Repetition Rate Picosecond Laser

EE201 Circuit Theory I Fall

Transformer of tgδ on MSP430F1331 single chip microcomputer WANG Han 1 CAI Xinjing 1,XiaoJieping 2,Liu weiqing 2

A New Voltage Sag and Swell Compensator Switched by Hysteresis Voltage Control Method

Foreign Fiber Image Segmentation Based on Maximum Entropy and Genetic Algorithm

The student will create simulations of vertical components of circular and harmonic motion on GX.

Comparative Analysis of the Large and Small Signal Responses of "AC inductor" and "DC inductor" Based Chargers

ACTIVITY BASED COSTING FOR MARITIME ENTERPRISES

Development of Temporary Ground Wire Detection Device

4.5 Biasing in BJT Amplifier Circuits

DS CDMA Scheme for WATM with Errors and Erasures Decoding

Pattern compensation in SOA-based gates. Article (peer-reviewed)

Control and Protection Strategies for Matrix Converters. Control and Protection Strategies for Matrix Converters

EE 40 Final Project Basic Circuit

The design of an improved matched filter in DSSS-GMSK system

EXPERIMENT #4 AM MODULATOR AND POWER AMPLIFIER

Will my next WLAN work at 1 Gbps?

Accurate Tunable-Gain 1/x Circuit Using Capacitor Charging Scheme

Multiuser Interference in TH-UWB

Architectures for Resource Reservation Modules for Optical Burst Switching Core Nodes *

Transmit Beamforming with Reduced Feedback Information in OFDM Based Wireless Systems

Increasing multi-trackers robustness with a segmentation algorithm

Optimal configuration algorithm of a satellite transponder

Automatic Power Factor Control Using Pic Microcontroller

Auto-Tuning of PID Controllers via Extremum Seeking

Adaptive Approach Based on Curve Fitting and Interpolation for Boundary Effects Reduction

MAP-AIDED POSITIONING SYSTEM

Dead Zone Compensation Method of H-Bridge Inverter Series Structure

Demodulation Based Testing of Off Chip Driver Performance

A new method for classification and characterization of voltage sags

Negative frequency communication

Role of Kalman Filters in Probabilistic Algorithm

Bounded Iterative Thresholding for Lumen Region Detection in Endoscopic Images

Variable Rate Superorthogonal Turbo Code with the OVSF Code Tree Insah Bhurtah, P. Clarel Catherine, K. M. Sunjiv Soyjaudah

Efficient Data Encoding and Decoding for Network-On-Chip Application

Performance Study of Positioning Structures for Underwater Sensor Networks

ECE-517 Reinforcement Learning in Artificial Intelligence

BOUNCER CIRCUIT FOR A 120 MW/370 KV SOLID STATE MODULATOR

Table of Contents. 3.0 SMPS Topologies. For Further Research. 3.1 Basic Components. 3.2 Buck (Step Down) 3.3 Boost (Step Up) 3.4 Inverter (Buck/Boost)

LECTURE 1 CMOS PHASE LOCKED LOOPS

ENERGETICAL PERFORMANCES OF SINUSOIDAL PWM STRATEGY FOR THE INDUCTION MOTOR AND VOLTAGE INVERTER SYSTEM: SIMULATION AND EXPERIMENTAL APPROACH

Shortest and Efficient Multipath Routing in Mobile ad hoc Network (MANET)

Installing remote sites using TCP/IP

ATEE Adriana FLORESCU

EXPERIMENT #9 FIBER OPTIC COMMUNICATIONS LINK

A novel quasi-peak-detector for time-domain EMI-measurements F. Krug, S. Braun, and P. Russer Abstract. Advanced TDEMI measurement concept

Family of Single-Inductor Multi-Output DC-DC Converters

Chapter 2 Summary: Continuous-Wave Modulation. Belkacem Derras

Heterogeneous Cluster-Based Topology Control Algorithms for Wireless Sensor Networks

Communication Systems. Department of Electronics and Electrical Engineering

PRM and VTM Parallel Array Operation

SLAM Algorithm for 2D Object Trajectory Tracking based on RFID Passive Tags

Efficient burst assembly algorithm with traffic prediction

Energy Efficient Data Fragmentation for Ubiquitous Computing

OLIVIA: Objectoriented Logicsimulation Implementing the VITAL Standard

Programmable DC Electronic Loads 8600 Series

A-LEVEL Electronics. ELEC4 Programmable Control Systems Mark scheme June Version: 1.0 Final

Transcription:

VOL., NO. 5, MARCH 26 ISSN 89-668 26-26 Asian Research Publishing Nework (ARPN). All righs reserved. HIGH THROUGHPUT EVALUATION OF SHA- IMPLEMENTATION USING UNFOLDING TRANSFORMATION Shamsiah Bini Suhaili and Takahiro Waanabe 2 Faculy of Engineering, Universii Malaysia Sarawak, Koa Samarahan, Sarawak, Malaysia 2 Graduae School of Informaion, Producion and Sysems, Waseda Universiy, Hibikino, Wakamasu-ku, Kiakyushu-shi, Fukuoka, Japan E-Mail: sushamsiah@feng.unimas.my ABSTRACT Hash Funcion is widely used in he proocol scheme. In his paper, he design of SHA- hash funcion by using Verilog HDL based on FPGA is sudied o opimise boh hardware resource and performance. I was successfully synhesised and implemened using Alera Quarus II Arria II GX: EP2AGX45DF29C4. In his paper, wo ypes of design are proposed, namely SHA- and SHA-unfolding. The maximum frequency of SHA- design is 274.2 MHz which is higher han SHA- unfolding ha has he maximum frequency of only 74.73 MHz. However, his leads o a high hroughpu of he SHA unfolding design wih 2236.54 Mbps. Besides, boh designs provide a small area implemenaion on Arria II ha requires only 423 and 548 Combinaional ALUTs, 693 and 97 oal regiser, respecively. Keywords: maximum frequency, FPGA, HDL, SHA-. INTRODUCTION Implemenaion of hash funcion on reconfigurable hardware is one of he pracical soluions for embedded sysem which can give differen resuls based on he srucure of reconfigurable logic of FPGA. In oher words, FPGA has he capabiliy o improve he performance in erms of power, speed and area implemenaion. FPGA offers several benefis for crypographic algorihm hash funcion because i is small, incurs low developmen cos, has high speed and fine memory; i is highly flexible, including capabiliy for frequen modificaion of hardware, shor ime o marke as well as easy experimenal esing and verificaion. I ends o be an excellen choice when dealing wih algorihms bu i has he disadvanage of high power consumpion. Therefore, in order o apply he high-speed crypographic soluion on reconfigurable hardware, furher research relaing o high speed and small area implemenaion needs o be aken ino accoun. Hash Funcion is a ransformaion ha akes variables inpu message M and reurns a fixed-size lengh which is called hash value [,2,3]. There are many ypes of hash funcions such as MD5, SHA-224, SHA-256, SHA- 384 and SHA-52. The purpose of his paper is o analyse he srucure of SHA- hash funcion on reconfigurable hardware and o obain small area implemenaion as well as high frequency maximum. In shor, balancing beween maximum frequency and area implemenaion of he design needs o be considered. The high performance of he hash funcion design is imporan o improve he hroughpu of he design since nowadays all sysems need fas implemenaion. The moivaion of his research is o sudy he srucure of SHA- hash funcion as i is imporan for some applicaions such as Message Auhenicaion Code (MAC) []. SHA- hash algorihm has been sudied wih careful design a every sage of is inner srucure using Verilog. There are many researches peraining o SHA- FPGA-based implemenaion [4-2]. However, some of he papers need furher improvemen. In his paper, Alera Quarus II Arria II GX: EP2AGX45DF29C4 is chosen as a arge device for boh SHA- and SHA- unfolding implemenaion because i has he poenial o provide high performance for he design. The paper is organised as he following: Secion II presens he descripion of SHA- algorihm; Secion III briefly explains Unfolding Algorihm; Secion IV conains he performance evaluaion; and Secion V ends he paper wih a conclusion of SHA- implemenaion. SHA- ALGORITHM Secure Hash Algorihm (SHA-) inpu mus be and he message is processed in 52-bi blocks sequenially wih 6-bi message diges oupu. The process of SHA- algorihm is divided ino wo pars: preprocessing and hash compuaion. The non-linear funcion of SHA- operaes on hree 32-bi words B, C, and D wih logical sequence from fo unil f79. PRE-PROCESSING Pre-processing consiss of hree seps: padding he message, parsing he padded message ino message blocks and seing he iniial hash value [2]. Message padding is he muliple of 52-bi or 24-bi based on inpu message. Le he widh of an inpu message M be l bis. Therefore, he value of he inpu l mus be in he range of. Then, single -bi is added o he end of he message, and i is followed wih -bi and he lengh of he message which is congruen 448 modulo 52. The padded message, M is parsed ino N 52-bi blocks, where he message is M(), M(2)... M(N). Table shows he buffer iniialisaion of SHA-. I consiss of five 32-bi iniial values which mus be execued before hash compuaion, as shown in Table-. 335

VOL., NO. 5, MARCH 26 ISSN 89-668 26-26 Asian Research Publishing Nework (ARPN). All righs reserved. Table-. Buffer Iniialisaion of SHA-. Afer iniialising five working variables A, B, C, D, and E wih buffer iniialisaion H, H, H 2, H 3, H 4 in he pre-processing, he hash compuaion uses he consan K and round funcion for 79 as shown in Table-2 and Table 3 o process he message. The symbol,, in non-linear funcion of four rounds SHA- algorihm from Table-3 represen logical AND, NOT and XOR operaion respecively. Afer rounding four rounds ha consis of 8 seps, he final sep is adding he iniial value wih he las oupu hash. Table-2. Consan K. HASH COMPUTATION SHA- hash compuaion processes he padded message wih message schedule of 8 seps processing of 32-bi, W,W...W 79. Equaion illusraes he compression funcion of SHA- for inpu A, B, C, D, and E. The symbol << means he regiser inpu shifs o he lef wih he value given. T consiss of W and K where W is expanded message word of round, and K is round consan of round. Table-3. Round funcion. UNFOLDING ALGORITHM Unfolding algorihm is one of he echniques ha can be used by DSP applicaion o obain a new program ha performs more han one ieraion of he original program. In addiion, unfolding facor, J describes he number of ieraions from he original program. The rules of unfolding algorihm are explained as below [4]: Figure-. SHA- Compression funcion. A 5 FB, C D W K E T, A T, B A, C B 3, D C, E D () The formula for he derivaion of 32-bi block message schedule W is simply from message inpu for 6. The remaining values of W where 6 are derived using he following Equaion (2).. For each node U in he original Daa flow graph (DFG), draw he J nodes U, U, U 2 2. For each edge U V wih delay in he original, draw he J edges V wih U i i % J i delays for i,,..., J J. In order o explain he srucure of unfolding algorihm, one example of DSP program is shown in Figure-2. y ( n ) ay ( n 9) x ( n ) (3) W ROTL W W W W (2) 2 8 4 6 335

VOL., NO. 5, MARCH 26 ISSN 89-668 26-26 Asian Research Publishing Nework (ARPN). All righs reserved. Figure-2. The original DSP program [4]. DFG can be consruced from Figure-2, which is he original DSP program by replacing he inpu and oupu por wih node A and B while he addiion and muliplicaion processes are represened by node C and node D respecively as shown in Figure-3. Figure-3. The 2-unfolded DFG [4]. Based on he firs rule of unfolding algorihm, here are 8 nodes ha represen i, namely A, B, C, D, A, B, C and D. The second sep of unfolding algorihm is o connec each edgeu V in he DSP program. The edge U V wih no delay is divided ino wo pars, U V and U V. Therefore, he edge C D wih 9 delays becomes C D9 wih 9 delays and % 2 C D 2 9% 2 wih 9 delays. Finally, he 2-unfolded DFG is creaed 2 wih C D wih 4 delays and C D wih 5 delays respecively. SHA- UNFOLDING ALGORITHM The proposed SHA- unfolding algorihm wih facor 2 is shown in Figure-4. I consiss of wo non-linear funcions wih hree differen inpus, wo circular lef shif of boh A and B by 3 and wo circular lef shif of A by 5 and Temp by 5 respecively. From his figure, here are 8 addiion operaions which perform in parallel form during he execuion process. Thus, he criical pah of he design has only four addiion processes. In oher words, wo hash operaions are execued per cycle. This process reduces he number of cycle from 8 cycles o 4 cycles in order o obain he final oupu hash. Hence, unfolding ransformaion can increase he hroughpu of he SHA- hash funcion. Figure-4. SHA- Unfolding compression funcion. The oupus of SHA- unfolding algorihms are shown in he following equaion. ROTL a b represens circular lef shif or lef roaion operaion of b by a posiion o he lef, and func p, q, r means non-linear funcion a ime for hree differen inpu p, q and r. A funcb, C D E W K 3 } func A, ROTL B C 5 Temp ROTL, A 5 2 ROTL { Temp, W K B 2 Temp C 3 2 ROTL A (4) 3 D ROTL E 2 B 2 C PERFORMANCE EVALUATIONS Synhesis and implemenaion processes of SHA- algorihm are successfully designed by using Verilog HDL based on Alera Quarus II Arria II: EP2AGX45DF29C4. Boh designs only focus on ieraive design where one block sep funcion is used for 8 rounds and 4 rounds processes. Simulaion of he designs is verified using ModelSim-Alera.e based on funcional and iming simulaion in erms of area implemenaion, maximum frequency and power analysis of he design. The comparison beween previous publicaions of SHA- design and SHA- unfolding design FPGA-based D 3352

VOL., NO. 5, MARCH 26 ISSN 89-668 26-26 Asian Research Publishing Nework (ARPN). All righs reserved. implemenaion is carried ou o evaluae he performance of he design [5-3]. All he resuls are presened in Table- 4. The proposed SHA- design and SHA- unfolding use only 423 and 548 Combinaional ALUT respecively. Besides, oal regiser of he design increases from 693 o 97 in SHA- unfolding design. The comparison of area implemenaion and speed of he design depends on FPGA family devices. The designer needs o choose he appropriae device in order o reduce he usage of logic uilisaion as well as increase he performance of he design. The oal esimaed power dissipaion of he SHA- unfolding decreases from 625.86 o 456.2 mw. From his able, i is shown ha he hroughpu of he design for SHA- unfolding increases significanly wih 74.73 MHz maximum frequency. The hroughpu of he design is abou 2236.54 Mbps which is higher han ha of SHA- design, wih only 754.88 Mbps. Hence, he hroughpu of he design can be calculaed by using he following formula where block size is 52 bis. Table-5. Area implemenaion of SHA- and SHA- unfolding. Throughpu Frequency block size (5) Laency Table-4. FPGA-based SHA- implemenaion. RESULTS ANALYSIS There are several ohers published FPGA-based implemenaion of SHA-. In his paper, wo ypes of FPGA, Xilinx and Alera are lised as CAD ool for design implemenaion in order o compare he effecs of area implemenaion in erms of Combinaional ALUT, Logic Elemen, Slices and oal regiser. Since he device of SHA- implemenaions is no he same, he comparison of he design in erms of area and speed can be evaluaed from arge devices. In oher words, he designer can choose a device ha will provide high performance implemenaion. Table-5 shows he SHA- area implemenaion on differen ypes of FPGA family devices. From his able, we consider he laency of previous papers which he auhors did no menion as normal SHA- operaion. Small area implemenaion is good for any applicaion ha needs compac design which can reduce power consumpion. The proposed design uses Arria II ha can balance he area and performance of he design. Implemenaions on Arria II for boh proposed designs use small amoun of combinaional ALUT and oal regiser if compared wih oher previous designs. The performance in erms of area can be evaluaed by choosing he appropriae family device in order o ge he small area implemenaion. Table-6 shows he comparison of maximum frequency of SHA- design wih differen ypes of family devices. From his able, he proposed SHA- 3353

VOL., NO. 5, MARCH 26 ISSN 89-668 26-26 Asian Research Publishing Nework (ARPN). All righs reserved. design provides he highes maximum frequency which is 274.5 MHz wih a hroughpu of 754.56. Table-6. Maximum frequency of SHA-. CONCLUSIONS The archiecure of SHA- Unfolding was successfully synhesised and implemened on Alera Arria II: EP2AGX45DF29C4 using Verilog HDL. The maximum frequency of he design is 74.73MHz while he area uilisaion in erms of combinaional ALUTs and oal regiser are 548 and 97 respecively. The maximum frequency of SHA-design implemenaion illusraes he criical pah of he design. In order o obain he high performance design, no only speed needs o be considered, bu he area implemenaion should also be aken ino accoun. Some oher mehodology or echnique can be implemened o increase he maximum frequency as well as hroughpu of he design. High performance wih efficien design incorporaes consideraions of small area implemenaion, high maximum frequency and small esimaion power consumpion; his in urn will lead o high hroughpu of he design. ACKNOWLEDGEMENTS This work is suppored by Universii Malaysia Sarawak (UNIMAS). REFERENCES Table-7 shows he maximum frequency (fmax) of SHA- unfolding. From his able, he proposed SHA- unfolding design obains he highes maximum frequency which is 74.73 MHz if compared wih oher SHA- unfolding designs. This leads o high hroughpu of SHA- unfolding design. The hroughpu from L. Jiang and J.Kim design provides high hroughpu because of he pipeline design. However, his design uses large area implemenaion. As we can see from Table-5, he same auhors use large amoun of Combinaional ALUT which is abou 33764 and 649 if compared wih his SHA- unfolding design ha only uses 548 Combinaional ALUT. Table-7. Maximum frequency of SHA- unfolding. [] Beale Q. Dang. 2. Draf NIST Special Publicaion 8 7. Recommendaion for Applicaions using Approved Hash Algorihm, Compuer Securiy Division, Informaion Technology Laboraory. [2] Federal Informaion Processing Sandards. Secure Hash Sandard (SHS), FIPS PUB 8-3. 28. Informaion Technology Laboraory Naional Insiue of Sandards and Technology Gaihersburg. [3] F. R. Henriquez, N.A. Saqib, A. D. Perez, C. K. Koc. 26. Crypographic Algorihms on Reconfigurable Hardware, Springer series on Signal and Communicaion. [4] K.K.Parhi. 999. VLSI Digial Signal Processing Sysems: Design and Implemenaion, John Wiley & Sons, Inc. 9-4. [5] K. Jarvinen. 24. Design and Implemenaion of a SHA- Hash Module on FPGAs. Helsinki Universiy of Technology Signal Processing Laboraory. [6] Y.K.Kang, D.W.Kim, T.W.Kwon, J.R.Choi. 22. An Efficien Implemenaion of Hash Funcion Processor for IPSEC. Proceedings 22. IEEE Asia-Pasific Conference on ASIC. pp. 93-96. [7] L. Miao, X. Jinfu, Y. Xiaohui, Y. Zhifeng. 29. Design and Implemenaion of Reconfiigurable Securiy Hash Algorihms Based on FPGA. Informaion Engineering, ICIE 9 WASE Inernaional Conference, Taiyuan, Chanxi. pp. 38-384. 3354

VOL., NO. 5, MARCH 26 ISSN 89-668 26-26 Asian Research Publishing Nework (ARPN). All righs reserved. [8] Diez, J.M., Bojanic S., Sanimirovic Lj., Carreras C., Nieo-Taladriz O.. 22. Hash Algorihms for Crypographic Proocols: FPGA Implemenaions. Proceeding of h Telecommunicaions forum TELFOR 22, Belgrade, Yugoslavia. [9] D. Zibin, Z. Ning. 23. FPGA Implemenaion of SHA- Algorihm, ASIC 23. Proceedings 5 h Inernaional Conference. 2: 32 324. [] L. Jiang, Y. Wang, Q. Zhao,Y. Shao, X. Zhao. 29. Ulra High Throughpu Archiecures for SHA- Hash Algorihm on FPGA, Compuaional Inelligence and Sofware Engineering, CiSE 29, Inernaional Conference, Wuhan. pp. -4. [] N. Sklavos, E. Alexopoulos and O. Koufopavlou. 23. Neworking Daa Inegriy: High Speed Archiecures and Hardware Implemenaions. The Inernaional Arab Journal of Informaion Technology. (). [2] Y. K. Lee, H. Chan, I. Verbauwhede. 26. Throughpu Opimized SHA- Archiecure Using Unfolding Transformaion. Applicaion-specific Sysems, Archiecures and Processors (ASAP 6). pp. 354-359. [3] J. Hoon Lee, S. Choon Kim, Y. Jun Song. 2. High-Speed FPGA Implemenaion of he SHA- Hash Funcion. IEICE Trans. Fundamenals, E94-A(9) [4] J. Kim, H. Lee, Y. Won. 22. Design for High Throughpu SHA- Hash Funcion on FPGA. Fourh Inernaional Conference on Ubiquious and Fuure Neworks (ICUFN). pp. 43 44. 3355