Device specific key generation technique for anticounterfeiting physically unclonable functions and artificial intelligence

Similar documents
Ring Oscillator PUF Design and Results

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

Temperature variation effects on asynchronous PUF design using FPGAs

Study of Physical Unclonable Functions at Low Voltage on FPGA

ABSTRACT. Lightweight Silicon-based Security Concept, Implementations, and Protocols. Mehrdad Majzoobi

Ring Oscillator and its application as Physical Unclonable Function (PUF) for Password Management

Asynchronous physical unclonable function using FPGA-based self-timed ring oscillator

UNIT-II LOW POWER VLSI DESIGN APPROACHES

Yet, many signal processing systems require both digital and analog circuits. To enable

Ring Oscillator Physical Unclonable Function with Multi Level Supply Voltages

A Physical Unclonable Function Based on Inter- Metal Layer Resistance Variations and an Evaluation of its Temperature and Voltage Stability

Design and evaluation of a delay-based FPGA physically unclonable function

UNIT-III POWER ESTIMATION AND ANALYSIS

Electrical and Computer Engineering ETDs

Abstract of PhD Thesis

Design of Sub-10-Picoseconds On-Chip Time Measurement Circuit

Implementation and Performance Testing of the SQUASH RFID Authentication Protocol

EC 1354-Principles of VLSI Design

1 FUNDAMENTAL CONCEPTS What is Noise Coupling 1

A Large Scale Characterization of RO-PUF

Applications Of Physical Unclonable Functions on ASICS and FPGAs

FPGA PUF based on Programmable LUT Delays

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May ISSN

Contents 1 Introduction 2 MOS Fabrication Technology

UMAINE ECE Morse Code ROM and Transmitter at ISM Band Frequency

Lecture 12 Memory Circuits. Memory Architecture: Decoders. Semiconductor Memory Classification. Array-Structured Memory Architecture RWM NVRWM ROM

Course Outcome of M.Tech (VLSI Design)

Analysis and loss estimation of different multilevel DC-DC converter modules and different proposed multilevel DC-DC converter systems

Lecture 11: Clocking

PE713 FPGA Based System Design

Crossover Ring Oscillator PUF

CHAPTER 6 DIGITAL CIRCUIT DESIGN USING SINGLE ELECTRON TRANSISTOR LOGIC

Video Enhancement Algorithms on System on Chip

EECS150 - Digital Design Lecture 28 Course Wrap Up. Recap 1

Copyright 2000 Society of Photo Instrumentation Engineers.

B.E. SEMESTER III (ELECTRICAL) SUBJECT CODE: X30902 Subject Name: Analog & Digital Electronics

CESEL: Flexible Crypto Acceleration. Kevin Kiningham Dan Boneh, Mark Horowitz, Philip Levis

CMOS Test and Evaluation

Digital Design and System Implementation. Overview of Physical Implementations

CHAPTER 4 A NEW CARRIER BASED PULSE WIDTH MODULATION STRATEGY FOR VSI

Analysis and Design of Autonomous Microwave Circuits

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N

ON DESIGN OF PUF-BASED RANDOM NUMBER GENERATORS

Guaranteeing Silicon Performance with FPGA Timing Models

3084 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 60, NO. 4, AUGUST 2013

Difference between BJTs and FETs. Junction Field Effect Transistors (JFET)

Wafer-scale 3D integration of silicon-on-insulator RF amplifiers

A Novel Low-Power Scan Design Technique Using Supply Gating

6. FUNDAMENTALS OF CHANNEL CODER

Low Power SRAM-PUF with Improved Reliability & Uniformity Utilizing Aging Impact for Security Improvement

Chapter 1 Introduction

FPGA Based System Design

Nonuniform multi level crossing for signal reconstruction

A 1.3 Megapixel CMOS Imager Designed for Digital Still Cameras

A Case Study of Nanoscale FPGA Programmable Switches with Low Power

PHYSICAL STRUCTURE OF CMOS INTEGRATED CIRCUITS. Dr. Mohammed M. Farag

On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI

A SIGNAL DRIVEN LARGE MOS-CAPACITOR CIRCUIT SIMULATOR

MANY integrated circuit applications require a unique

Master of Comm. Systems Engineering (Structure C)

A Survey of the Low Power Design Techniques at the Circuit Level

Minimum key length for cryptographic security

Computer-Based Project in VLSI Design Co 3/7

CHAPTER III THE FPGA IMPLEMENTATION OF PULSE WIDTH MODULATION

Secure communication based on noisy input data Fuzzy Commitment schemes. Stephan Sigg

An Analog Phase-Locked Loop

Novel laser power sensor improves process control

BICMOS Technology and Fabrication

INF3430 Clock and Synchronization

14.2 Photodiodes 411

Lecture 1. Tinoosh Mohsenin

A Novel Radiation Tolerant SRAM Design Based on Synergetic Functional Component Separation for Nanoscale CMOS.

Advances in Antenna Measurement Instrumentation and Systems

CHAPTER 4 GALS ARCHITECTURE

The Design of SET-CMOS Hybrid Logic Style of 1-Bit Comparator

Low Power Design of Schmitt Trigger Based SRAM Cell Using NBTI Technique

Phase Calibrated Ring Oscillator PUF Design and Application

EE301 Electronics I , Fall

A 3-10GHz Ultra-Wideband Pulser

THE TREND toward implementing systems with low

CHAPTER 4 FIELD PROGRAMMABLE GATE ARRAY IMPLEMENTATION OF FIVE LEVEL CASCADED MULTILEVEL INVERTER

Advances in Silicon Technology Enables Replacement of Quartz-Based Oscillators

DESIGNING powerful and versatile computing systems is

A Level-Encoded Transition Signaling Protocol for High-Throughput Asynchronous Global Communication

Microcircuit Electrical Issues

Engr354: Digital Logic Circuits

Lecture Perspectives. Administrivia

EECS150 - Digital Design Lecture 19 CMOS Implementation Technologies. Recap and Outline

Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks

Design of a Folded Cascode Operational Amplifier in a 1.2 Micron Silicon-Carbide CMOS Process

A Variable-Frequency Parallel I/O Interface with Adaptive Power Supply Regulation

A TDC based BIST Scheme for Operational Amplifier Jun Yuan a and Wei Wang b

HIGH LOW Astable multivibrators HIGH LOW 1:1

Substrate Coupling in RF Analog/Mixed Signal IC Design: A Review

Exposure schedule for multiplexing holograms in photopolymer films

AC : THE EFFECT OF FLUORESCENT LIGHTS ON RFID SYSTEMS OPERATING IN BACKSCATTER MODE

I DDQ Current Testing

Coherent Detection Gradient Descent Adaptive Control Chip

Inspector Data Sheet. EM-FI Transient Probe. High speed pulsed EM fault injection probe for localized glitches. Riscure EM-FI Transient Probe 1/8

64-Macrocell MAX EPLD

Transcription:

The University of Toledo The University of Toledo Digital Repository Theses and Dissertations 2012 Device specific key generation technique for anticounterfeiting methods using FPGA based physically unclonable functions and artificial intelligence Swetha Pappala The University of Toledo Follow this and additional works at: http://utdr.utoledo.edu/theses-dissertations Recommended Citation Pappala, Swetha, "Device specific key generation technique for anti-counterfeiting methods using FPGA based physically unclonable functions and artificial intelligence" (2012). Theses and Dissertations. 397. http://utdr.utoledo.edu/theses-dissertations/397 This Thesis is brought to you for free and open access by The University of Toledo Digital Repository. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of The University of Toledo Digital Repository. For more information, please see the repository's About page.

A Thesis entitled Device Specific Key Generation Technique for Anti-Counterfeiting Methods Using FPGA Based Physically Unclonable Functions and Artificial Intelligence by Swetha Pappala Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Master of Science Degree in Electrical Engineering Dr. Mohammed Y. Niamat, Committee Chair Dr. Weiqing Sun, Committee Co-Chair Dr. Mansoor Alam, Committee Member Dr. Patricia R. Komuniecki, Dean College of Graduate Studies The University of Toledo August 2012

Copyright 2012, Swetha Pappala This document is copyrighted material. Under copyright law, no parts of this document may be reproduced without the expressed permission of the author.

An Abstract of Device Specific Key Generation Technique for Anti-Counterfeiting Methods Using FPGA Based Physically Unclonable Functions and Artificial Intelligence by Swetha Pappala Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Master of Science Degree in Electrical Engineering The University of Toledo August 2012 Anti-counterfeiting techniques have entered a new era with the implementation of critical designs and confidential information transfer protocols. The complexity in developing security mechanisms and routing protocols for embedded systems continues to increase; on the other hand, cost and size constraints have been lowered. Trustworthy authentication of a device is of extreme importance for secure protocols. Methodologies for preventing IC piracy have been developed that require a unique signature key for every fabricated chip. Physically Unclonable Functions (PUFs) can be used for such signature generation. This research implements a key generation process using a novel Ring Oscillator PUF (ROPUFs) design followed by an error correcting code, and a hashing algorithm. The key generation process has been implemented in three phases: ROPUF, Error Correction Process, and a Hashing Algorithm. The ROPUF design takes advantage of the unique characteristic properties of FPGAs. In this work, the ROPUFs are implemented using LUTs, multiplexers and flip flops that are the basic components of the FPGA iii

architecture. The PUF design is followed by an error correction process to rectify any noisy bits in the response due to drastic environmental changes like temperature and voltage fluctuations. Artificial Neural Networks are used for the error correction process. The latter part of the research deals with a hashing function that has been implemented to enhance the security of the key generation process. The hashing function redresses the response bits of the PUF unit to mask the challenge-response pairs. The proposed PUF circuit is implemented on 5 Xilinx Spartan 2 XC2S100 FPGAs, and an Agilent 16801A Logic Analyzer is used to obtain the PUF responses. The intra-chip and inter-chip responses are analyzed and plotted using Hamming distances. The overall uniqueness of the responses is found to be 49.0625% which is higher when compared to the previous implementations of the conventional ROPUF circuit (43.40%), and the earlier chain-implementation (48.51%). The inter-chip and intra-chip uniqueness factor for the proposed design are 47.929% and 41.91% respectively. Artificial Neural Networks are tested using the PUF responses of various lengths. The failure rates of the proposed method are below 1 ppm which is lower than the failure rate of BCH codes which is typically 4.8 ppm. The SHA-256 algorithm is optimized using parallel processing techniques to give better throughput results. The delay is reduced to 45 clock cycles. iv

This thesis is dedicated to my parents, Mr. and Mrs. Surya Prakash Rao, my sister, Padmavati and my family.

Acknowledgements I would like to thank Dr. Mohammed Niamat and Dr. Weiqing Sun for giving me an opportunity to work under their leadership and guiding me with their valuable advices. I would like to thank Dr. Mansoor Alam, Mr. Allen Rioux and other faculty members of the Engineering Technology Department for financially supporting me during the completion of my Master s. I would also like to thank Mr. Alan L. Kossow for providing me the resources to complete my Master s Thesis. I would also like to extend my regards to Dr. Mansoor Alam for serving as a committee member. I would like to thank my parents and my family members who encouraged me to fulfill my dreams of pursuing my Master s degree and have always been a motivational factor in my life. I would also like to thank my sister who has supported me in my stressful situations. vi

Table of Contents Abstract... iii Acknowledgements... vi Table of Contents... vii List of Tables... xi List of Figures... xii 1. Introduction... 1 1.1 Motivation... 1 1.2 Research Objectives... 4 2. Physically Unclonable Functions... 7 2.1 Introduction... 7 2.1.1 Challenge-Response Based Authentication... 11 2.1.2 Key Generation for Cryptography... 12 2.2 Classification of Physically Unclonable Functions Based on Security... 14 2.2.1 Strong PUFs... 14 2.2.2 Controlled PUFs... 15 2.2.3 Weak PUFs... 15 2.3 Classification of Physically Unclonable Functions Based on Fabrication Techniques... 16 2.3.1 Non-Silicon PUFs... 16 vii

2.3.2 Silicon PUFs... 17 2.4 Types of PUFs... 17 2.4.1 Optical PUFs... 17 2.4.2 Coating PUFs... 19 2.4.3 Delay Based PUFs... 20 2.4.3.1 Ring Oscillator PUFs... 21 2.4.3.2 Switch Based PUFs... 22 2.4.3.3 Tristate Buffer PUFs... 24 2.4.4 Memory Based PUFs... 26 2.4.4.1 SRAM Based PUFs... 27 2.4.4.2 Butterfly PUFs... 28 2.4.5 Other Implementations... 30 2.5 Previous Works... 30 2.6 Architecture of the Proposed Ring Oscillator PUF... 32 2.7 Working of the Proposed ROPUF... 34 2.8 Conclusion... 42 3. Error Correcting Codes... 44 3.1 Introduction... 44 3.2 Neural Networks... 45 3.3 Bidirectional Associative Memory... 50 3.4 Training Algorithm... 53 3.5 Conclusion... 59 4. Secure Hashing Algorithm... 61 viii

4.1 Introduction... 61 4.2 Types of Cryptographic Algorithms... 64 4.2.1 Secret Key Cryptography... 64 4.2.1.1 Stream Cipher... 65 4.2.1.2 Block Cipher... 66 4.2.2 Public Key Cryptography... 67 4.2.3 Hash Functions... 71 4.3 Secure Hashing Algorithms (SHA-2)... 75 4.4 SHA-256... 77 4.4.1 Preprocessing... 78 4.4.2 Hash Computation... 79 4.5 Optimization of the SHA-256 Hash Function... 82 4.6 Architecture of the SHA-256 Hash Function... 82 4.6.1 Padder and Wt Unit... 83 4.6.2 Constants Unit-I... 85 4.6.3 Constants Unit-II... 85 4.6.4 Hash Computation Unit... 85 4.6.5 Final Addition... 87 4.7 Conclusion... 90 5. Conclusion... 91 5.1 Summary... 91 5.2 Contributions... 92 5.3 Conclusions... 93 ix

5.4 Future Work... 94 References... 96 Appendix A... 109 Appendix B... 124 x

List of Tables Table 2.1. Signal Transitions of the Proposed ROPUF.... 35 Table 4.1. Summary of the Existing Hashing Algorithms.... 75 Table 4.2. Specifications of Secure Hash Functions... 77 Table 4.3. Constants used in SHA-256 Hash Function.... 78 Table 4.4. Comparison Between the Previous Works and the Proposed Design.... 89 xi

List of Figures Figure 1.1. Revenue Comparisons of a Product.... 2 Figure 1.2. Phases of Key Generation Process.... 5 Figure 2.1. Theoretical Shape versus Actual Shape in Photolithography... 9 Figure 2.2. Uncertainty/Imperfections in the Doping Process.... 10 Figure 2.3. Challenge-Response Based PUF Protocol... 12 Figure 2.4. Asymmetric Key Cryptographic Protocol.... 13 Figure 2.5. Working of Optical PUFs.... 18 Figure 2.6. Structure of a Coating PUF.... 20 Figure 2.7. Architecture of a Ring Oscillator PUF.... 22 Figure 2.8. Architecture of Switch Based PUFs.... 23 Figure 2.9. Architecture of Arbiter PUFs.... 23 Figure 2.10. Architecture of Tristate Buffer PUFs.... 26 Figure 2.11. Cross Coupled Inverter Structure.... 27 Figure 2.12. 6T SRAM Cell.... 28 Figure 2.13. Architecture of a Butterfly PUF.... 29 Figure 2.14. Oscillator of the Proposed ROPUF Circuit.... 33 Figure 2.15. Signal Transitions of the Proposed ROPUF.... 36 Figure 2.16. Generation of Multi-Bit PUF Signature.... 37 Figure 2.17. Distribution of Hamming Distances.... 38 xii

Figure 2.18. Distribution of Hamming Distances for Intra-Chip Response Bits.... 39 Figure 2.19. Distribution of Hamming Distances for Inter-Chip Response Bits.... 40 Figure 2.20. Comparison between the Results of the Proposed Method and Previous Implementations.... 41 Figure 3.1. Biological Neural Network.... 46 Figure 3.2. Graphical Representation of McCulloch and Pitts Neuron Model.... 47 Figure 3.3. Phases of Error Correction Process.... 49 Figure 3.4. (a) Generalized BAM Architecture. (b) Simplified BAM Architecture.... 51 Figure 3.5. Data File.... 53 Figure 3.6. Input Matrix formed after Hexa-Decimal Conversion.... 54 Figure 3.7. Weight Matrix.... 55 Figure 3.8. Retrieval of the Test Vector from the Input Matrix.... 57 Figure 3.9. Retrieval of a Noisy Vector.... 58 Figure 4.1. Taxonomy of the Cryptographic Primitives.... 63 Figure 4.2. Two-Way Communication using Secret Key Cryptography... 64 Figure 4.3. Two-Way Communication using Public Key Cryptography.... 69 Figure 4.4. Man-in-Middle Attack on a Two-Way Communication.... 70 Figure 4.5. General Flowchart for the Operation of a Hashing Function.... 72 Figure 4.6. Flowchart of a SHA-256 Hash Function.... 81 Figure 4.7. System Architecture of the SHA-256 Hash Function.... 83 Figure 4.8. Architecture of the Wt Unit.... 84 Figure 4.9. Architecture of the T1 function.... 86 Figure 4.10. Architecture of the T2 function.... 87 xiii

Figure 4.11. Simulation of the SHA-256 Hash Function.... 88 Figure 4.12. Comparison between Results of the Proposed Method and Previous Works.... 89 Figure A.1. Chien Search and Error Correction for Binary Code.... 116 Figure B.1. Floor Plan 1.... 124 Figure B.2. Floor Plan 2.... 125 Figure B.3. Floor Plan 3.... 126 Figure B.4. Floor Plan 4.... 127 xiv

Chapter 1 Introduction 1.1 Motivation In today s global market, the competition is fierce which makes first-to-market revenue gains the key to fund R&D of new products which takes months, if not years and millions of dollars of investment into the next generation products. Manufacturing, assembly, and testing are now moving globally outside the company s facility to be done at contract manufacturers making the security of the critical designs and information a top priority. Traditional reverse engineering used to be the most common security breach. This refers to the practice where systems are torn apart and design techniques are extracted to reduce the perpetrators R&D cost and time-to market. Today, designers have new threats to the security of products that include cloning and overbuilding [1]. A hacker might simply replicate the design, IP, or software, with no feature improvement and even with no time spent to determine the design technique (cloning). With minimal investment, this provides the hacker with more time to market and often direct replacement of the product in the established customer base of the initial product. This allows the hacker to reap greater profits and lower cost product. This results in a 1

great reduction in the original company s market potential leaving them with reduced product revenue. Revenue losses due to such hackers are a permanent loss and are unrecoverable to the original company. According to the Anti-Counterfeiting Bureau, USA, counterfeiting and piracy costs the economy a loss of more than $250 billion in revenue and 750,000 jobs every year [2]. Figure 1.1 shows the magnitude of revenue losses of such products [3]. Figure 1.1. Revenue Comparisons of a Product. Such losses may also occur when a contract manufacturer or assembly house builds extra products and releases them into the market without the knowledge or authorization of the designing company (overbuilding) [4]. Overbuilding offers much more time to market than cloning. In fact, overbuilt products have even been known to hit the market prior to the original products. With no R&D investment or development costs, overbuilders receive the best profits and can offer the lowest product prices. Overbuilding results in many more losses than just revenue and market share loss. Firstly, the original 2

company cannot perceive the number of authentic products in the field. This makes the support burden difficult to manage and potentially much higher than they can manage. Secondly, if a company cannot authenticate a product in the field, not only the support costs get out of hand but several other factors come into existence such as maintaining the product price in the market etc. Additionally, there is no way to guarantee the same level of quality which might significantly impact the return material authorizations (RMAs) that need to be validated and processed. Another area that becomes a burden is product reliability. With cloning and overbuilding, there is an enormous liability and responsibility on the company to weed out the counterfeit units Hence, it becomes an obligation for the companies to know the security issues of the product to improvise the product and make their critical designs and information secure. The study of these threats help in designing corresponding defensive techniques to make secure Field Programmable Gate Array (FPGAs)/ Integrated Circuits (ICs) devices. These threats can be broadly divided into two categories: Design Threats and Data Threats. Design Threats deal with attackers that target the design of the FPGA. Advanced techniques like Logic State Analysis using Lithium Niobate and Scanning Electron Microscope (SEM) prove to be major threats to FPGA devices. On the other hand Data Threats deal with stolen or modified data/bitstreams sent to or from an FPGA. In such cases, there might be damage to the system or a confidentiality threat to the user s data. Malicious tampering may destroy the FPGA device. Cloning the bitstream (configuration file) of an FPGA makes it easier for the hacker to replicate the functionality of the FPGA [5]. Most prior work relating to FPGA security focuses on 3

protecting IP theft and securely uploading bitstreams in the field. However, establishing a root of trust on a field device is a challenging task due to the property of FPGAs to be remotely reconfigured in the field. The continuous increase in density and capability of FPGAs are motivating designers to implement valuable designs using FPGAs. With FPGAs being used in more and more applications, these designs become valuable. Their continuous growth now requires significant security features to be implemented in the designs. Attackers search for vulnerabilities and developers for defenses. 1.2 Research Objectives Owing to the above mentioned security breaches, trustworthy authentication of an object is of extreme importance for secure protocols. Traditional methods of storing the identity of an object using non-volatile memory are insecure. Novel chip identifiers called Physically Unclonable Functions (PUFs) extract random process characteristics of an FPGA/IC to establish its identity. In our research, we propose techniques to improve the Ring Oscillator PUF (ROPUF) architecture holistically, with a focus on its stability. This work also involves the application of an error correcting code using neural networks and a secure hashing algorithm (SHA-2). This research is divided into three phases: Phase 1: An ROPUF is designed using LUTs and multiplexers which are the basic components of FPGA architecture. These PUFs are implemented on Spartan 2 XC2S100 FPGAs and a logic analyzer is used to display the PUF responses. Hamming Distances are calculated for the responses and are analyzed to verify the uniqueness of the PUFs. 4

Phase 2: Though PUFs are difficult to clone and provide secure authorization, they suffer from instability due to variations in environmental conditions and noise. Due to slight dependency of the PUF signature bits on the operating temperatures, a bit generated from a pair of ring oscillators might flip when the operating temperature changes considerably. To stabilize the circuit Artificial Neural Networks (ANNs) are used for its error correcting process. Neural networks are trained according to the response bits of PUFs and the trained networks are tested by sending noisy response bits for correction. The trained networks have successfully corrected the error bits. The error correcting codes have given better results than the conventional methods. Phase 3: SHA-256 algorithm has been implemented and optimized to give better throughput results. This algorithm has been used to redress the output responses of the ROPUFs. The output responses of the PUFs are transformed by a hashing function to achieve a uniform distribution of keys. Figure 1.2 shows the three phases of the key generation process. Figure 1.2. Phases of Key Generation Process. 5

The chapters are organized as follows. Chapter 1: This chapter discusses the motivational factors and an overview of the research. Chapter 2: This chapter discusses the concept of physically unclonable functions and the classification of PUFs based on security and fabrication. Further, this chapter also discusses the different types of PUF implementations and the proposed ring oscillator PUF design. The latter part of this chapter presents the graphs depicting the uniqueness of the PUF responses for intra-chip and inter-chip implementations. Chapter 3: This chapter explains the concepts of neural networks and bidirectional associative memories (BAM). The latter part of this chapter explains the BAM based training algorithm for the error correction process of the PUF responses. Chapter 4: This chapter illustrated the different types of cryptographic algorithms and the advantages of hashing algorithms over the rest, followed by the SHA-256 algorithm. This chapter also discusses the implementation and optimization parameters of SHA-256 algorithm to redress the response bits of the PUF outputs. Chapter 5: This chapter summarizes the work done in this thesis and its intellectual contributions. It also outlines further extensions for this research. 6

Chapter 2 Physically Unclonable Functions 2.1 Introduction Physically Unclonable Functions (PUFs) were invented by Naccache and Frèmanteau in 1992. By definition, Physically Unclonable Functions are functions that are embodied in physical structures which are easy to analyze but are difficult to be reconstructed with their exact characteristics [6]. In other words, a PUF is a function which produces secret output response based on the underlying properties of a physical device while adhering to various properties based on the deployment and the level of security intended from the device. The input to the PUF is called the challenge and the output is called the response. Thus, challenge-response pairs are used to generate unique signatures from a PUF. PUFs exploit manufacturing process variations in a die to generate non volatile chip unique signatures exploiting manufacturing process variations of integrated circuits. They derive confidential information from uncontrollable random components rather than storing them in the memory. Lack of manufacturing control over sub-micron process 7

variation makes a PUF unclonable. This enables chip authentication and cryptographic key generation. The advanced sub-wavelength semiconductor fabrication techniques have resulted in nanometer feature sizes with a substantial amount of process variations. These process variations are mainly due to the inability to precisely control the diffusion of dopants [7,8] and due to the inability to robustly fabricate geometric features [7,9]. Process variations results in variations in the key electrical parameters of circuit devices and interconnects that increase the uncertainty in the outcome of the design process, and consequently jeopardize the parametric wield of the fabrication process [7,10]. Process variations are typically divided into two components: Inter-die process variations Inter-die variations account for variations that arise between chips on the same wafer or different wafers. Intra-die process variations Intra-die variations account for variations that arise between different devices and interconnects that reside within the same chip. To cope with process variations, the underlying sources of variations are characterized and statistical techniques are applied [11,12,13]. While process variations are random in nature, intra-die variations typically exhibit spatial correlation, i.e., devices that are spatially close to each other are strongly correlated when compared to devices that are spatially far from each other [14,15,16,17]. Figure 2.1 and Figure 2.2 illustrate two of the leading sources of process variation. These figures have been reproduced from [18]. Figure 2.1 shows the intended ideal device shapes versus actual shape made through photolithography. It is to be noted that the actual shape of the devices also depends on the 8

location of the device on the silicon wafer. Such variability causes randomness in electrical properties of the device [19,20]. Figure 2.2 shows the uncertainty in the doping process. Due to the drastic decrease in the size of the devices, there may be only dozens of dopants in one device, and therefore randomness in the actual number of injected dopants can have a significant impact on electrical properties of the device, in particular, statistical fluctuations in the threshold voltage. PUFs exploit these uncertainties in electrical properties to generate unique device specific responses that help in device identification. Figure 2.1. Theoretical Shape versus Actual Shape in Photolithography. 9

Figure 2.2. Uncertainty/Imperfections in the Doping Process. FPGA based products need security not just from reverse engineering but also from hacking and cloning. The solution is preferably in an embedded circuit throughout the lifecycle of the product rather than an application note. PUFs prove to be a good choice for such security applications. Randomness in the PUF responses to produce a unique key for every FPGA, proved to be an essential aspect to avoid hacking and cloning of FPGA products. Uniqueness is a measure of how distinctly a PUF can identify an FPGA among a group of FPGAs. Hamming distance between n-bit responses, R1 and R2, generated by a PUF from a pair of FPGAs, F1 and F2 respectively, is a good estimate of the uniqueness of the PUFs. The estimate of the collision of response bits when FPGAs, F1 and F2 have the same or nearly same response for a challenge is also an important factor. 10

The concept of having an FPGA generate its own unique digital signature has broad applications in areas such as embedded systems security, and IC/IP counter piracy. PUFs can be deployed for use in two distinct protocols. They are: Challenge-Response Based Authentication Key Generation for Cryptography 2.1.1 Challenge-Response Based Authentication Figure 2.3 illustrates the verification process to authenticate an FPGA device using PUFs. After the fabrication process, each PUF is subjected to an evaluation phase where the response for a set of challenges is recorded by a trusted source and provided to the verifier. The recorded information is represented as a table in Figure 2.3. Each PUF device will have its unique challenge-response database due to the process variations in the FPGAs. The challenge-response tables of all the PUF devices are stored in the verifier s systems. To authenticate a PUF, the verifier sends a random challenge, from the table, to the PUF and compares the response with the set of pre-recorded PUF responses, from the table, to authenticate the corresponding PUF device. In order to prevent certain cloning attacks each challenge-response pair is used only once. In order to avoid aliasing between PUFs, each unique challenge-response pair can only be assigned to one single PUF. Therefore, the total number of unique challenge-response pairs determines the number of PUF circuits that can be fielded. 11

Figure 2.3. Challenge-Response Based PUF Protocol. 2.1.2 Key Generation for Cryptography PUFs are also used for key generation in certain key based cryptographic protocols. In such cryptographic protocols, the secret message is encrypted before transmission using a cryptographic function. The encrypted message can be decrypted only by the device having the corresponding decryption key. Therefore, attacking devices that do not have the corresponding decryption key cannot access the secret message. There are different types of key based cryptographic protocols. They are discussed in detail in chapter 4. Figure 2.4 shows the asymmetric key cryptographic protocol. The figure indicates that the message from A is encrypted using a public key, and can only be decrypted using B s private key. Other malicious devices, have no access to the private key, and hence, would not be able to decrypt the message. 12

Figure 2.4. Asymmetric Key Cryptographic Protocol. Conventionally, the private key is stored in the digital memory of the device; however, various invasive attacks could reveal the key, thereby allowing the malicious devices to gain access to the secure private key. To avoid such attacks, PUFs are used as key generators to produce unique private keys for each device. Due to the randomness in the responses generated from a PUF device, extraction of these keys is known to be difficult. Thus, PUFs can be used as key generating devices for secure data sharing protocols. The corresponding public keys are computed mathematically and distributed to the required devices. Such applications need typically small PUF hardware with very stable output responses. The main hardware cost in such devices is due to the encryption and decryption processes rather than the secret key generating PUF devices. PUFs are also used in a variety of other secure protocols. They are: Hardware based Public Key Cryptographic Protocol [21]. Light Weight PUF based RFID Authentication [22,23]. PUF based Authentication Protocols [24]. 13

2.2 Classification of Physically Unclonable Functions Based on Security Though PUFs are classified based on different aspects, the ease to evaluate is one common aspect among them which signifies the ease to extract the necessary secret response from the PUF with high efficiency. Physically Unclonable Functions are classified into three major types according to the applications and the security features they provide [25]. They are: Strong PUFs [26,27] Controlled PUFs [28] Weak PUFs [29] 2.2.1 Strong PUFs A Strong PUF produces a secret response for a given challenge determined by the underlying properties of a physical device and adheres to the following properties apart from the generic characteristics of all PUFs: 1. Hard to Predict An adversary having access to polynomial number of physical measurements from the PUF device or having no further access to the device must be able to only extract insignificant amount of information about the response of the PUF to a randomly applied challenge. 2. Difficult to Clone It must be almost impossible to fabricate a second strong PUF system, which exhibits the exact challenge-response behavior as the original PUF. This characteristic must be held true even in case of the manufacturer of the PUF. 3. Hard to Characterize A hacker should not be able to extract the responses for all possible challenges to the PUF. This signifies that the PUF should have a large number of challenge-response pairs. Also, generating consecutive responses from 14

the PUF should take finite time, so that the adversary cannot read out the entire information in the PUF in a constrained environment. Conventionally, Strong PUFs are used for key generation, authentication and identification. These PUF based protocols are cheaper in terms of computational resources than traditional crypto-based protocols [30,27,31,32,26]. From security point of view, such PUFs typically do not have any mechanism to circumvent an adversary to apply a challenge and observe a response. Therefore, it is very important to maintain a large number of challenge-response pairs for the PUF or to mask them. 2.2.2 Controlled PUFs Controlled PUFs employ a wrapper logic built around a Strong PUF to prevent challenges from directly being applied to the PUF and to prevent the direct access to the responses from the PUF. Typically, such PUFs are employed to overcome modeling attacks [25]. However, the security provided by the controlled PUF is broken if the outputs of the underlying strong PUF are probed on its way to the control logic. Such PUFs are generally used as a more secure alternative to strong PUFs. 2.2.3 Weak PUFs Weak PUFs are a special case of strong PUFs which have very few number of challengeresponse pairs. In the extreme case, they have only one challenge. The secret response from a weak PUF is used as a standard secret key for other key based cryptographic functions. The outputs of such PUFs are never sent to systems external to the security core. Weak PUFs are often used as a replacement to non-volatile memory based key storage for cryptographic protocols. Such PUFs are tolerant to invasive attacks unlike 15

memory based systems. Coating PUFs [33], Butterfly PUFs [34], SRAM PUFs [35] are implementations of weak PUFs. 2.3 Classification of Physically Unclonable Functions Based on Fabrication Techniques PUFs can be implemented in various ways to extract the physical properties of a physical system. Since most PUFs are used for security critical applications involving integrated circuits, it is very important that PUFs be easily integrated with FPGAs. Therefore, it is advantageous to build PUFs along with FPGAs using the existing fabrication methods of FPGAs and ASIC. PUFs can be classified into two major categories based on their fabrication processes. They are: Non-Silicon PUFs These PUFs use explicitly introduced randomness. Silicon PUFs These PUFs use randomness in the intrinsic properties. 2.3.1 Non-Silicon PUFs Non-Silicon PUFs derive secret keys from random variations occurring in physical systems other than integrated circuits. The first PUF was proposed by Pappu et al [27], and was based on variations occurring in optical systems. Those PUFs use the speckle pattern observed from an optical medium focused with a laser to derive secret keys. Coating PUFs are another class of non-silicon PUFs. In coating PUFs, a network of metal wires is laid out in the shape of a comb on the top of the ICs. The space above and between the comb like structures is filled with opaque material randomly doped with dielectric particles which is reflected as changes in capacitance at different areas of the IC. Sensor arrays placed in different regions of the IC are used to detect the variations in capacitance to produce a unique response. Though, such PUFs are fabricated on silicon 16

systems, we do not classify them as silicon PUFs, as they need fabrication techniques that are not a part of generic CMOS fabrication technology. [36,37] present some more nonsilicon implementations. 2.3.2 Silicon PUFs Silicon PUFs are fabricated using the existing FPGA/ASIC fabrication processes and therefore could be easily interfaced with other FPGAs/ICs or can be built on the same die with the rest of the components. They are designed using the uncontrollable process variations occurring during fabrication of FPGA/ICs. These variations make it impossible for hackers to manufacture two identical devices with identical characteristics thus avoiding cloning of the devices. The device variations are captured using carefully designed configurations of identical circuits, leading to slight differences in the circuit characteristics like propagation delays, leakage current, voltage drop, etc. This difference in circuit characteristics is reflected as a unique response for a given set of challenges to the PUF device. Since these variations are uncontrollable and random in nature, each PUF device produces a unique response. The sources of variation in silicon PUF and various implementations of such PUFs are discussed in Section 2.4. 2.4 Types of PUFs 2.4.1 Optical PUFs Optical PUFs consist of a transparent material with distributed scattering particles. Their secret key derives from the uniqueness and unpredictability of speckle patterns that result from multiple scattering of laser light in a disordered optical medium [27,38,39]. Figure 2.5 shows the circular polarizer blocking the light rays that reflect 17

directly from the top of the PUF. The challenge for such PUFs might include angle of incidence, focal distance or wavelength of the laser beam, a mask pattern blocking a part of the laser light, or any other change in the display pattern. The output is a speckle pattern that consists of many randomly distributed bright and dark patches. A high entropy bit-string can be derived from the speckle pattern using image analysis. Physical cloning of optical PUFs is difficult for the following reasons: The light diffusion obscures the location of the scattering particles. The best physical techniques can probe diffusive materials up to a depth of approximately 10 scattering lengths only [40]. Even in case the hacker has all the information about the locations of the scattering particles, precise positioning of such large number of scattering particles is highly impossible and expensive. It requires a production process different from the original manufacturing process. Figure 2.5. Working of Optical PUFs. 18

On the other hand, modeling is difficult due to the inherent complexity of multiple coherent scattering [41]. Given the details of all the scattering particles, the fastest known computation method of a speckle pattern is the transfer-matrix method that requires illuminated area, wavelength and the PUF thickness [42]. The computation involved for this method is known to be difficult even for conservative values of the illuminated areas, wavelength and thickness of the PUF. 2.4.2 Coating PUFs Coating PUFs are integrated with an IC but are still categorized as non-silicon PUFs. More precisely, the IC is covered with a coating consisting of aluminophosphate, which is doped with random dielectric particles like TiO 2, SrTiO 3, BaTiO 3 [43]. Dielectric particles include several particles of random size and shape with a relative dielectric constant differing from the dielectric constant of the coating material. The PUF consists of a combination of the coating and the dielectric material. It also contains an array of metal sensors between the passivation layer and the coating. Sufficient randomness is only obtained if the dielectric particles are smaller than the distance between the sensor parts. Figure 2.6 shows the structure of a Coating PUF. The challenge corresponds to a voltage of a certain frequency and amplitude applied to the sensors at a certain point of the sensor array. Due to the presence of the coating material and dielectric particles, the sensor plates act like capacitors with a random capacitance value. The values from the capacitors are then converted into secret keys. Coating PUFs have an advantage of possessing a high degree of integration. The matrix containing the random particles is a part of the opaque coating. Thus, the coating 19

which protects the secret key present in the electronics itself serves as a carrier of inherently tamper-resistant secrets. Coating PUFs have an additional advantage that they can be easily turned into a controlled PUF. The control electronics can be placed underneath the coating. Probing the PUF externally gives insufficient information to the attacker. The results of the capacitance from inside are very sensitive to the precise locations of the dielectric particles. Physical reproduction of the coating costs a prohibitive amount of effort, even with the information about the precise locations of the random particles, due to its complexity. Figure 2.6. Structure of a Coating PUF. 2.4.3 Delay Based PUFs Delay based PUFs utilize the variations in propagation delays of identical circuits to derive a secret response from an FPGA/IC. Some delay based PUF architectures include: Ring Oscillator PUFs Switch based PUFs 20

Tristate Buffer based PUFs 2.4.3.1 Ring Oscillator PUFs Ring Oscillator PUFs (ROPUFs) are based on variations in frequencies of identical oscillators to produce a secret PUF response. Figure 2.7 represents the architecture of a typical ring oscillator PUF [26]. It comprises of N identical K-stage ring oscillators. Each of the oscillators oscillates at its characteristic frequency determined by the device characteristics of the underlying transistors. Theoretically, all the ring oscillators should oscillate at the same frequency, but due to the inherent inter-chip, and intra chip process variations as well as environmental conditions, the oscillation frequency is affected. This causes every implementation of the oscillator to output a slightly different frequency. To generate a digital key from such oscillators, a comparison is made between frequencies of a selected pair of oscillators. The output bit is 1 or 0 depending on speed of both the oscillators. The selection of a pair of oscillators for comparison is controlled by a multiplexer, based on the input challenge to the circuit. To generate an M-bit output, M different comparisons between the oscillators are made. According to permutations and combinations, for a PUF with N ring oscillator circuits, there are N*(N-1)/2 distinct combinations [26,44,45]. Theoretically, for an N oscillator circuit, the maximum possible value for M is log 2 (N!). This is due to the correlation between certain input challenges. ROPUFs work on the concept that different ROPUFs have different output responses for the same challenge. This property is used to identify a given FPGA based on the key generated from the challenge-response pair. 21

Figure 2.7. Architecture of a Ring Oscillator PUF. 2.4.3.2 Switch Based PUFs Switch based PUFs work on the concept of variation in propagation delays of identical delay lines using arbiters [46]. Figure 2.8 illustrates the architecture of a Switch based PUF. These PUFs consist of K-stages of switching elements. The output of each element is connected to the input of the next stage. The switching elements take a 2-bit input and the output is also a 2-bit. They also have a single bit challenge. When a 0 is sent as the challenge bit/control bit, the inputs are mirrored to the output. In the other case, where a 1 is sent as the challenge bit/control bit, the two input paths are switched. A D-flip flop is connected to the output of the last switch. The two outputs of the last switching stage are connected to the D-input and the clock input of the flip flop. This arrangement is called an arbiter. The structure of a switching element gave rise to a number of different designs for switch-based PUFs. The most popular implementation of the Switch based PUF is the Arbiter PUF [32,47] shown in Figure 2.9. 22

Figure 2.8. Architecture of Switch Based PUFs. Figure 2.9. Architecture of Arbiter PUFs. Arbiter PUFs use two multiplexers to implement each switching stage. The challenge is sent to each of the control inputs of the switching elements. For a K-stage switch based PUF, there is a K-bit challenge. By controlling the challenge bits of the switching elements it is possible to obtain different combinations of delay lines. Due to the process variations, one of the two delay lines has a shorter propagation delay than the 23

other. Hence, one of the signals reaches the D input or the clock input faster than the other. The output of the flip flop is 1 if the signal to the D input reaches quicker than the other. The output is 0 for the other case. In Switch based PUFs, both the paths are perfectly symmetrical. One of the delay lines propagate the signal changes quicker than the other purely due to delay induced by process variations. For a K-stage PUF, there are 2K unique configurations. Though Arbiter PUFs have a larger challenge-response space, some of the challenges lead to violations of setup or hold time of the arbiter that causes meta-stability. This is due to the close propagation delays of the paths that cause transitions at the clock and the D inputs of the flip flop at an interval less than the setup or hold time. Under such conditions the output of the PUF is random. This leads to a loss in stability of the PUF. Conventionally, challenges causing meta-stability are predetermined during the evaluation phase of the PUF and such challenges are avoided. However, such evaluations are expensive. It has also been observed that Switch based PUFs are not compatible with FPGA platforms due to the difficulty in building symmetric switching stages on such platforms. Therefore, ROPUFs are traditionally preferred over Switch based PUFs. 2.4.3.3 Tristate Buffer PUFs Tristate Buffers are similar to arbiter PUFs in capturing the variations in identical delay lines as PUF responses [48]. However, they are made hardware efficient with the use of tristate buffers to select the delay paths. Tristate buffers take 2 inputs and produce one output. They have three states of operation: logic 0, logic 1 and high impedance. When the enable pin of the buffer is set, the input is mirrored to the output pin, else the output reaches a high impedance state. 24

Figure 2.10 shows the architecture of the K-stage Tristate buffer PUF. The outputs of every stage are cascaded to the inputs of the next stage. The output of the last stage is connected to a D-flip flop, similar to the Switch based PUFs. Each stage consists of two delay units consisting of two tristate buffers. The inputs and the outputs of the two tristate buffers of every stage are connected. The two enable ports of the tristate buffers are connected to each other with one of the buffers having an inverted enable input. This ensures the activation of only one buffer. The challenge bits are sent as enable bits to the tristate buffers. Variations in the challenges selects different path to the D-input and the clock input with varying combinations of buffers. The two delay lines (bottom line and top line) are independent unlike Switch based PUFs. Though all the buffers have identical designs, each of them has a slightly different propagation delay due to uncontrollable variations in the underlying transistors. Hence, different challenges yield different propagation delays between the two delay paths which in turn yield different response bits at the output. The transitions in the output bits depends on the signal that reaches the flip flop faster. These transitions are used to generate the secret key for a given challenge. Though it has been experimentally observed that tristate buffer PUFs are 18% more power efficient and 23% more area efficient than arbiter PUFs, they also have metastability issues like Arbiter PUFs [48]. 25

Figure 2.10. Architecture of Tristate Buffer PUFs. 2.4.4 Memory Based PUFs Memory based PUFs work on the concept of unpredictable startup states of feedback based CMOS memory structures to produce unique response bits [49]. Most CMOS based memory structures like flip flops, lathes and SRAMs use cross coupled structures with a positive feedback to store the required logic. Figure 2.11 shows the architecture of a cross coupled inverter structure. A cross coupled inverter structure is composed of two inverters. The output of both the inverters is connected to the input of the other inverter. Such structures are similar to tristate buffers. They have two stable states (logic 0 and logic 1 ) and an unstable state. Data can be stored by giving appropriate input values to the structure. When the system is powered up with no specific input, the output is either of the stable states depending on the difference in characteristics of the inverters and the external noise. Theoretically, since both the inverters are identical (assuming no noise) the output of the system is in meta-stable condition but practically, the inverters have slightly different characteristics due to process variations. Thus, the structure results in different logic states. The outputs of such 26

structures help in forming unique response bits. An array of such devices is assembled to derive a secret key. The main disadvantage of such PUFs is their dependency on external noise and hence need a better error correcting hardware and redundant PUF structures to obtain a reliable output key. There have been several implementations of such Memory based PUFs. Two major implementations are: SRAM based PUFs Butterfly PUFs Figure 2.11. Cross Coupled Inverter Structure. 2.4.4.1 SRAM Based PUFs The conventional 6T SRAM memory cell has two cross coupled inverters (load transistors M1, M2, M3 and M4) and two access transistors (M5, M6). Figure 2.12 shows a 6T SRAM cell [50]. In these cells, a write operation is achieved by loading the bit lines with appropriate values and turning the access transistors on. The read operation is achieved by forcing the two bit lines to logic 1 for a limited time and turning the access transistors on. Due to the dynamic nature of the charge the bit lines are forced to the value stored in the cross coupled inverter structure. The sizes of transistors are set 27

meticulously to achieve proper read write operations of SRAMs. The voltage needed to flip the state of the SRAM cells is also set as high as possible. In SRAM PUFs, the SRAM cells are use to produce a random response based on process characteristics of the two load inverters. The transistors are sized at minimal width to increase the sensitivity to device variations of the cross coupled inverter structures. The secret key is generated on the concept that when the device is powered up, one of the inverters gets a slightly higher voltage input than the other due to manufacturing variations. This higher voltage is amplified to a logic 0 or logic 1 due to the feedback structure. The output is used as a response for triggered PUF SRAM cells [29]. Figure 2.12. 6T SRAM Cell. 2.4.4.2 Butterfly PUFs Butterfly PUFs are an alternative implementation of Memory based PUFs. They focus on reconfigurable devices like FPGAs where controlling the device sizes and 28

routing is difficult due to prefabrication of the transistors. In Butterfly PUFs latches are used as memory elements. Figure 2.13 shows the architecture of a Butterfly PUF [34]. They consist of two latches with their outputs connected to the inputs of the other. The PRESET signal turns the output to logic 1 when high and the CLR signal turns the output to logic 0 when high. The PRESET signal of latch 1 and the CLR signal of latch 2 are set to low. The input signal is connected to the CLR signal of latch 1 and PRESET signal of latch 2. To obtain the response bits, the input signal is set high which makes both the latches unstable due to the opposite polarity of the inputs and the outputs. After a finite amount of time, the input signal is set low which settles the cross coupled latch structure to valid logic that is determined by the small manufacturing variations in the latches. This output bit is used as a response for such PUFs. Figure 2.13. Architecture of a Butterfly PUF. 29

2.4.5 Other Implementations There have been several other Silicon PUF implementations. Some of them use analog circuit based approaches [51]. Such PUFs are based on slight variations in the oncurrent of identical PMOS and NMOS transistors to produce the secret responses. Variations in characteristics of identical amplifier circuits have also been used to derive unique responses [52]. There have also been approaches to utilize variations in power rails, interconnects and even environmental variations to produce PUF responses. [53] uses the undesirable environmental variations as an additional layer of security to the PUFs. Variations in resistance of identical power grids on an IC have also been used to generate unique responses [54]. In this thesis we focus only on Delay based PUFs, more precisely Ring Oscillator PUFs. The ROPUFs have certain advantages over the other PUF designs when implemented on an FPGA platform. Firstly, it has been experimentally observed that under a wide range of temperature variations, ROPUFs are more stable and reliable when compared to the other implementations [55]. Secondly, the implementations of ring oscillators is simplified and can be optimizes while routing. Considering the above major advantages, ROPUFs are chosen for the key generation process with an error correction process followed by a hashing algorithm using artificial intelligent techniques. 2.5 Previous Works [26] discusses the conventional ring oscillator circuit using NOT gates and multiplexers. The implementation of PUFs on FPGAs is discusses in detail in [56,57]. In continuation to this work a strategy for improving the quality of RO PUF designs by 30

placing and comparing ROs in a chain-like structure has been proposed and implemented in [58]. Their results indicate that the proposed design strategy can significantly improve the quality of RO PUF implementations. [59] also proposes another technique called longest increasing subsequence-based grouping algorithm (LISA) to enhance the hardware utilization of RO PUFs. The performance of LISA has been analyzed by introducing hybrid architecture and by formulating its cost delay metrics. To validate the effectiveness of a RO PUF, it needs to be characterized over an adequately large population of chips. Accurate on-chip experimental results have been characterized using a significantly large population of 125 FPGAs [60]. The experimental data using a ring oscillator loop delay model has been analyzed to quantify the quality factors of RO PUFs such as uniqueness and reliability. Inter-chip and Intra-chip results have been analyzed at normal operating conditions. Further, experiments have been conducted to analyze the dependency of RO PUFs on temperature. These experimental evaluations show that the responses to robust challenges have an average error rate of less than 2% under temperature variations from -10 o C to 75 o C [61,62]. This signifies the reliability of RO PUFs with respect to temperature variations. Apart from temperature variations, a PUF also needs to be robust against reversible as well as irreversible temporal changes in circuits. The impact of aging on FPGA based PUFs has been studied in [63]. An accelerated aging testing on an FPGA-based RO-PUF has been proposed and its affect on the functionality of the PUF has been analyzed. The results in this work indicate that the randomness of PUF responses remains unaffected despite aging. Avowing to the experimental results with respect to environmental variations like temperature fluctuations and aging that indicate the reliability and uniqueness of RO PUFs, RO PUFs 31

have been used for various applications like Random Number Generation [64,65], Pattern Matching [66], and Trojan Detection [67]. 2.6 Architecture of the Proposed Ring Oscillator PUF Traditionally, ROPUFs exploit the fact that uncontrollable wire delays and voltage transfer characteristics due to fabrication process variations of FPGA devices cause random but static variations in the frequency of identically laid out oscillators. The response vector, in such PUFs, is generated by a pair-wise comparison of the ring oscillator frequencies. These comparisons are represented as the challenge-response pairs, where the chosen ring oscillator pair is the challenge and the output of the comparator is the response. It comprises of N identical K-stage oscillators. For an ROPUF with N identical ring oscillators, there would be N*(N-1)/2 comparisons [45,68]. The oscillators are designed to have identical layouts to ensure that the frequency variations between the two are strictly due to process variations. On an FPGA platform, implementing the conventional ROPUFs have certain challenges due to the inability to exploit layout design techniques, and the lack knowledge about the gate level structure of an FPGA. Keeping in mind the above mentioned difficulty, we propose a novel ROPUF design using LUTs and multiplexers that are the basic components of FPGA architecture. In our research, we propose a design that overcomes the instabilities of conventional ROPUFs on FPGAs and improves the architecture of ROPUFs holistically, with a focus on its reliability. Figure 2.14 shows an oscillator of the proposed ROPUF design. In this work, these oscillators are designed using LUTs and multiplexers that are 32

the basic components of FPGA architectures. Multiple instances are instantiated to generate a multi-bit PUF signature. Two LUTs (LUT X and LUT Y) within a Xilinx Spartan 2 XC2S100 FPGA have been used in shift register mode. The shift register contents are initialized to complementary values. They are initialized to (0x5555) 16 and (0xAAAA) 16 respectively. Value in LUT X: 0101 0101 0101 0101 Value in LUT Y: 1010 1010 1010 1010 Figure 2.14. Oscillator of the Proposed ROPUF Circuit. The outputs of the LUTs are connected to select inputs of the two multiplexers (M X and M Y ) as shown in Figure 2.14. The outputs of the LUTs drive the select input pins of a chain of multiplexers. It is to be noted that both the multiplexers have their I 0 data input tied to logic 0. The I 1 data line for the bottom multiplexer is tied to logic 1. On the other hand, the I 1 data line of the top multiplexer is driven by the output of the multiplexer below. The output of the last multiplexer is connected to a D-flip flip as 33

shown in Figure 2.14. The D-flip flop is used to increase the width of the delay pulse to produce an appropriate output response. The flip flop is initialized to logic 0. 2.7 Working of the Proposed ROPUF Due to the complementary initialization values in both LUT X and LUT Y, the shift register implemented in LUT X produces a sequence complementary to LUT Y. Since the initial values in the LUTs are a sequence of alternate 0 s and 1 s, the corresponding select lines of the multiplexers connected to the LUTs fluctuate between 0 and 1 alternatively. Initially, the output of LUT X is logic 1 and the output of LUT Y is logic 0 i.e. signal L1 is logic 1 and signal L2 is logic 0. Thus the signals N1 and N2 are at logic 1 and logic 0 respectively. Since N2 is held constant the whole time, the output of the flip flop is logic 0. At the triggered clock edge, the output of LUT X changes from logic 1 to logic 0 and the output of LUT Y changes from logic 0 to logic 1. Although LUT X and the multiplexer it drives (M X ) are identical to LUT Y and the associated multiplexer (M Y ), the two circuits experience different delays due to random process variations. There are two cases worth highlighting. They are: Case 1: Considering the case where LUT X and the multiplexer M X are faster than LUT Y and the multiplexer M Y. In this case, LUT X transitions from logic 1 to logic 0, the slower LUT Y changes from logic 0 to logic 1, i.e., L1 transitions from logic 1 to logic 0 before L2 transitions from logic 0 to logic 1. While the signal L2 transitions from logic 0 to logic 1, the signal N2 is held at logic 0. So, the PUF signature bit is held constant throughout the process. 34

Case 2: Considering the case where LUT Y and the multiplexer M Y are faster than LUT X and the multiplexer M X. In this case, LUT Y transitions from logic 0 to logic 1 while the slower LUT X changes from logic 1 to logic 0, i.e., L2 transitions from logic 0 to logic 1 before L1 transitions from logic 1 to logic 0. During the difference in the delay periods, when L2 has transitioned to logic 1 but L1 is still at logic 1, the select lines of both the multiplexers are held at logic 1 due to which a short positive pulse (glitch) appears in the signal N2. The presence or absence of a glitch on the signal N2 and the width of the pulse are due to process variations that impact the relative delays of LUT X and LUT Y. The presence or absence of a positive glitch on signal N2 is used to determine a PUF signature bit. The output of M Y (signal N2) is connected to the asynchronous SET input line of a D-flip flop as shown in Figure 2.14. When the glitch appears on the signal N2, the flip flop output and the PUF signature bit becomes logic 1. In all other cases, the PUF signature bit remains at logic 0. Table 2.1 summarizes the operation of the PUF and Figure 2.15 illustrates the operation of the PUF in the initialization phase and the two cases. Table 2.1. Signal Transitions of the Proposed ROPUF. L1 L2 N1 N2 Output Initially 1 0 1 0 0 Case 1 (LUT X faster than LUT Y) Case 2 (LUT X faster than LUT Y) 1 0 0 0 0 0 1 0 1 1 1(glitch) 1 35

Figure 2.15. Signal Transitions of the Proposed ROPUF. Multiple PUF signature bits are generated by integrating multiple instances of the PUF design and a decoder. A challenge/response frame work is used to select the PUFs for comparison. Figure 2.16 shows the integration of the PUFs and the decoder to generate a multi-bit response. The challenge, given to the decoder, selects two instances to be compared for every clock cycle. The rest of the process is similar to the working discussed previously. 36

Figure 2.16. Generation of Multi-Bit PUF Signature. The design is evaluated using Xilinx Spartan XC2S100 FPGAs. The boards are connected to the PC using serial RS-232 to download the PUF implementations on the FPGAs. An Agilent 16801A logic analyzer is used to display the PUF signatures. The PUFs are clocked at 50MHz using a DS1075 programmable oscillator on the board. Hamming distances and uniqueness factor are used to evaluate the PUF responses. Hamming distance between a pair of signature response bits is the number of positions the two response bits differ. Uniqueness characterizes device variations of PUF responses R. Equation 2.1 shows the mathematical representation of the uniqueness factor. Its calculation can be done with the same x responses of each of the m PUF devices as described for reliability. This estimate is defined by the average of the percentage 37

Hamming distance [d(r i,u ;R j,v )/n * 100%] between all n-bit responses R i,u and R j,v of two different devices i and j out of all m devices. u and v run from 1 to x to address every recorded response of each device. The Xilinx Spartan-2 has 2 slices per CLB and two 4-input LUTs per slice. Therefore, there can be four implementations on a single CLB. Responses of 128 bit length are evaluated on 5 Xilinx Spartan XC2S100 FPGAs at 50MHz. Every FPGA has eight implementations. We have 40 implementations on 5 Spartan XC2S100 FPGAs thus giving (40*39)/2 = 780 data points for analyses. The Hamming Distances are calculated and analyzed for the 40 implementations. Figure 2.17 shows the Hamming distance distribution for the data points signifying the uniqueness of the 128-bit responses. The figure clearly shows the clustering of the distribution around the range [61:65]. Figure 2.17. Distribution of Hamming Distances. 38

In addition to computing PUF signatures across different FPGA chips, PUF signatures are implemented multiple times in different regions on a single chip. The Hamming distances for intra-chip and inter-chip responses are also analyzed and plotted. Figure 2.18 shows the intra-chip variations in the Hamming distances of the responses. For the intra-chip responses, 8 implementations on an FPGA give (8*7)/2 = 28 data points. 5 such implementations give 28*5 = 140 total data points for analyses. Clustering of data is seen in the interval [56:60]. In case of inter-chip responses, 2 implementations on each chip give (10*9)/2 5 = 40 data points for evaluation. Figure 2.19 shows the inter-chip variations in the Hamming distances. Clustering of data is seen in the interval [61:65]. From the graphs, it is observed that the uniqueness in the responses is higher in case of inter-chip response bits when compared to the intra-chip response bits. The floor plans for the implementations have been shown in Appendix B. Figure 2.18. Distribution of Hamming Distances for Intra-Chip Response Bits. 39

Figure 2.19. Distribution of Hamming Distances for Inter-Chip Response Bits. The key advantages of this design are its smaller size as compared to [26] and ease of implementation using synthesis, and place and route tools in Xilinx ISE. The PUF design is implemented in VHDL and can be automatically routed using synthesis, place and route tools. This design does not need any external routing and the use of hard macros can be avoided as used in [26] and [58] which makes them easier to implement. The uniqueness of the proposed design, an important factor for a PUF implementation, is higher than the conventional circuit of the ROPUF. [58] discusses a chain method to increase the percentage uniqueness of the responses from the conventional ROPUF design. The optimal value of the uniqueness is considered to be 50%. Uniqueness values below 50% indicate correlation between PUF responses and hence lower PUF quality. The uniqueness of the conventional throughput has been experimentally calculated as 43.40% [26]and the uniqueness of the improvised design using chain implementation has been calculated as 48.51% [58]. The percentage uniqueness of the proposed PUF responses is calculated as 49.0625%. This percentage 40

indicates that the responses from the proposed PUF design exhibit higher uniqueness and lower correlation than the conventional ROPUF design. This also indicates a higher PUF reliability of the design. The individual uniqueness factors of the proposed design for both the inter-chip and intra-chip implementations yield better results than the previous works. The inter-chip uniqueness factor for the proposed design is calculated as 47.929% and 41.91% for intra-chip responses in comparison to the results for the uniqueness factor of the inter-chip responses and the intra-chip responses of the conventional circuit calculated as 47.31% and 40.86% respectively in [60]. Figure 2.20 shows a graphical representation of the comparison between the results of the proposed method and the previous implementations. Figure 2.20. Comparison between the Results of the Proposed Method and Previous Implementations. However, the quality factor of this PUF, which include uniqueness, reliability and attack resiliency, are negatively affected by factors like voltage fluctuations and environmental temperature. These factors degrade the stability of the PUF signatures. 41

Due to the slight dependency of the PUF signature bits on the operating temperature, a bit generated from a pair of ring oscillators might flip when the operating temperatures change considerably. To stabilize the circuit, Artificial Neural Networks (ANNs) have been used for its error correction process. The ANNs are the latest signal processing technologies. They have a remarkable ability to learn and derive information from complicated and imprecise data that can be used to extract patterns. This characteristic property of ANN gives an advantage over the conventional error correcting codes such as BCH codes. Error correcting codes (ECC) and ANNs have been discussed in detail in Chapter 3. A hashing algorithm is used to further increase the security of the key generation process. Chapter 4 discusses the cryptographic applications in detail. 2.8 Conclusion Silicon PUFs are novel chip identifiers, which produce a unique response per FPGA based on the slight variations in process characteristics of identical devices. A Ring Oscillator based PUF is a promising solution for an FPGA platform. ROPUFs are able to generate highly reliable unclonable outputs by amplifying the delay difference caused by fabrication variations through the substructure of ring oscillator. The proposed ROPUF design has given better results than the conventional ROPUF designs. The responses of 128-bit length are evaluated on 5 Xilinx Spartan XC2S100 FPGAs. There have been 8 implementations on every FPGA. The Hamming distances for 40 implementations on 5 Spartan 2 FPGAs are calculated and plotted. The analysis clearly shows the clustering of the distribution around the range [61:65]. The Hamming distances for intra-chip and inter-chip responses are also analysed. Clustering of data is seen in the interval [56:60] and [61:65] respectively. From the 42

graphs, it is observed that the uniqueness in the responses is higher in case of inter-chip response bits when compared to the intra-chip response bits. The percentage uniqueness of the PUF responses are calculated as 49.0625% which is higher than the uniqueness of the responses of a conventional ring oscillator (43.40%) [26] and the chain implementation (48.51%) discussed in [58]. 43

Chapter 3 Error Correcting Codes 3.1 Introduction A physically unclonable function makes a secure module to generate a unique device specific key, but due to the slight dependency of the PUF signature response bits on the operating temperatures and the voltage fluctuations, an error correcting code is necessary to satisfy the constraints of the cryptographic methods that demand very accurate keys that satisfy specific mathematical properties. Due to the changes in operating temperatures, the outputs obtained for the same challenge on the same CLB slice may differ slightly. To redress the response bits according to the cryptographic needs, we have used an error correcting code. Conventionally, BCH codes have been used to correct the flipped bits in PUF responses [26,69,70]. The BCH codes are a generalization of Hamming codes for multiple error correction. Binary BCH codes were first discovered by A. Hocquenghem in 1959 and independently by R.C. Bose and D.K. Ray-Chaudhuri in 1960 [71]. BCH codes form a large class of multiple error correcting codes. The original applications of BCH codes were restricted to binary codes of length 2 m -1 for some integer m. They were later extended by Orenstein and Zieler in 1961 [71]. A major disadvantage in case of BCH 44

codes are its complex computations. The computation needed to generate the error correcting syndromes increase drastically with the increase in the number of error bits. We propose an error correction process using Artificial Neural Networks (ANNs). Artificial Neural Networks are the latest signal processing technologies. They have a remarkable ability to learn and derive information from complicated and imprecise data that can be used to extract patterns. This characteristic property of ANN gives an advantage over the conventional error correcting codes such as BCH codes [72]. Throughout the years, there has been an immense growth in the fields of new technologies owing to the computational demands and associated security protocols. The field of Artificial Neural Networks is no exception. They have been applied in a wide and diverse range of problem domains/applications including engineering, medicine, finance, physics and geology. Neural networks are introduced in areas demanding classification, control or prediction. In particular, the non linear nature of neural networks and their ability to learn from their environment in supervised as well as unsupervised ways render them suitable for solving complex problems. Designing and implementation of intelligent systems have proved to be a crucial factor for innovations and development of interrogatives that linear systems have not resolved. 3.2 Neural Networks Neural Networks were introduced by McCulloc and Pitts in 1943. They are inspired by the early models of sensory processing by the human brain. An ANN can be created by simulating a network of model neurons in a computer. By applying algorithms that mimic the processes of real neurons, the networks learn to solve many types of 45

problems. Neurons are represented as models of biological networks into conceptual components for circuits that could perform computational tasks. The basic model of the artificial neuron is based upon the functionality of the biological neuron. By definition, neurons are basic signaling units of the nervous system of a living organism in which each neuron is a discrete cell. Figure 3.1 shows the biological neural network [73]. Figure 3.1. Biological Neural Network. Signals are propagated from one neuron to another by complex electrochemical reactions. Chemical substances are released from the synapses that cause a change in the electrical potential of the cell body. When the potential reaches its threshold, an electric pulse, is sent down through the axon. The pulse spreads out and eventually reaches synapses, causing them to increase or decrease their potential. It has been observed that neurons can form new connections with other neurons and an entire collection of neurons may sometimes migrate from one place to another. These mechanisms form the basis for learning in the brain. 46

Similarly, neurons, in ANNs, are connected by links and each link has a numerical weight associated with it. Weights are the basic means of long term memory in ANNs. They express the strength of each neuron input. ANNs learn through repeated adjustments of these weights. A model neuron is referred to as a threshold unit and its function is illustrated in Figure 3.2 [74]. It receives input from a number of other units or external sources, weights each input and summates them. If the total input is above the threshold, the output of the unit is 1 else 0. Hence, the output changes from 0 to 1 when the total weighted sum of inputs is equal to the threshold. The points in input space satisfying this condition form a hyper plane. In two dimensional space, a hyper plane is a line, where as in three dimensional space, it is a plane. Points on one side of the hyper plane are classified as 0 and those on the other side are classified as 1. Such problems are called linearly separable. Nonetheless, the error back propagation method, which can make fairly complex networks of simple neurons learn from examples, shows that these networks could solve problems that were not linearly separable. Figure 3.2. Graphical Representation of McCulloch and Pitts Neuron Model. 47

The weights and the threshold of the network should be set such that threshold unit correctly solves the classification problem. This is learnt by iteratively presenting examples with known classifications. This process is called learning or training due to its resemblance to the actual learning of the brain. Simulation of learning by a computer involves making small changes in the weights and the threshold each time a new example is presented. In general machine learning involves adaptive mechanisms that enable a network to learn from experience, example or analogy. The learning capabilities improve the performance of an intelligent system over time. Machine learning mechanisms form the basis for adaptive systems [75,76,77,78]. Unlike expert systems, which can provide the user with a definitive answer if the reviewed characteristics exactly match the ones coded in the rule base, a neural network conducts an analysis of the information and provides a probability estimate that the data matches the characteristics which it has been trained to recognize. While the probability of a match determined by a neural network can be 100% accurate, the accuracy of the decisions relies totally on the experience gained by the system while analyzing examples of the problem. The neural network initially gains the experience by training the system to correctly identify preselected examples of the classification. The responses of the neural networks are reviewed and the configuration of the system is refined until the analysis of the training data reaches a satisfactory level. In addition to the initial training period, the neural network also gains experience over time as it conducts analyses on data related to the problem. There are many neural network algorithms for the pattern recognition. Various algorithms differ in their learning mechanisms. Learning can be either supervised or 48

unsupervised. In supervised learning, training set contains both inputs and required responses. After training the network, the responses should match the trained responses. On the other hand, unsupervised learning is based on clustering of input data. There is no prior information about the inputs in a particular class. The characteristics of the pattern and a history of training are used to assist the network in defining classes. This supervised classification is called clustering. Neural Networks are trained using various algorithms like back propagation and genetic algorithms. After considering various algorithms back propagation method using Bidirectional Associative Memories (BAM) is chosen for the error correction process of the PUF responses. The error correction process ensures that the PUF produces the same output in case of any environmental changes. The error correction process has two phases: Initialization Phase and Regeneration Phase. In the initialization phase, the output is generated from the PUF circuit and an error correcting syndrome is computed. In the regeneration phase, the noisy/corrupt signal is sent to the trained neural network for the errors to be rectified. Figure 3.3 shows the two phases of the error correcting process. Figure 3.3. Phases of Error Correction Process. 49

3.3 Bidirectional Associative Memory Adaptive Bidirectional Associative Memory (BAM) is a type of recurrent neural network. BAM was proposed by Bart Kosko as an extension to the Hopfield Network by incorporating an additional layer to perform recurrent auto-associations as well as hetroassociations on the memories [79,80]. A Hopfield Network can be interpreted as a special case of BAM, when the weight matrix is a symmetrical square matrix. Owing to its capability to generalize and immunize noise, BAM is commonly used for pattern recognition. In BAM, bi-directionality, forward and backward information flow, is implemented in neural networks to produce two way associative searches for stored challenge-response associations. The network evolves to a two pattern stable state when the BAM neurons are activated. Hetro-associations are encoded in a BAM by summing correlation matrices. Information is transferred forward from one neuron to the other by passing through the connection matrix. Information passes backwards through the transpose of the same matrix. The forward and backward directionality of BAM correlation encoding naturally extends to the encoding of patterns [80]. This characteristic property of BAM has been used to design an error correcting code for PUF responses. The generalized BAM architecture is shown in Figure 3.4 (a) and Figure 3.4 (b) shows the simplified diagram of BAM. A BAM consists of neurons arranged in two layers. It has been observed that BAM is more efficient when bipolar binary neurons are used. The neurons in one layer are fully interconnected to the neurons in the second layer as shown in Figure 3.4 (b). There are no interconnections among neurons in the same layer. The weight from the first 50

layer to the second layer is same as the weights from the second layer to the first layer. Layer A and B operate alternatively. The output signals of the neurons are transferred forward (towards the right) by using the weight matrix W, and then transferred backwards (towards the left) by using the transpose of the weight matrix W T as shown in Figure 3.4 (b) [81]. The error correcting code designed for the PUF differs from the conventional BAM in its update mode. Figure 3.4. (a) Generalized BAM Architecture. (b) Simplified BAM Architecture. The BAM correlation encoding scheme is extended to a general Hebbian Learning Law. The connection weight matrix used to associate the pattern pairs is built using the traditional Hebb s Law. The Hebb s Law can be stated in two rules [73]: If two neurons on either side of a connection are activated synchronously, then the weight of that connection is increased. 51

If two neurons on either side of the connection are activated asynchronously, then the weight of that connection is decreased. According to the Hebb s Law, the mathematical representation of the adjustment applied to the weight matrix can be represented as: In BAM the correlation matrix for each pattern pair is the matrix product of the transpose of the input vector X T and the output vector Y. The networks are trained using the responses of the PUFs as the input vectors and the output vectors. Every matrix is bidirectionally stable for bivalent and continuous neurons. The size of the vectors can be according to the user s discretion. The associative weight matrix has been calculated as the sum of all correlation matrices which can be mathematically represented as: where M is the number of pattern pairs to be stored in the BAM, X is the input vector and Y is the output vector. Like a Hopfield network, the McCulloch and Pitts neurons [80] with the sign activation function have been used for error correction. In the error correcting codes, BAM architecture has been used to learn continuous mapping and to rapidly extract bivalent associations from several noisy samples. 52

3.4 Training Algorithm Step 1: Converting Hexa-Decimal Values to Bipolar Binary Values To test and analyze the error correcting code, output responses from the PUF implementations are stored in a file in the hexa-decimal form. Each value from the.mat file are taken and converted into the bipolar binary format. Each line in the.mat file are converted into one vector and stored in a matrix form. All the vectors after conversion are stored in a matrix (Input). This matrix is used as the input matrix as well as the output matrix for the generation of the weight matrix while training the network. An example is discussed to illustrate the process. The.mat file is shown in Figure 3.5. Figure 3.6 shows the matrix formed after the conversion of the hexa-decimal values in the.mat file to the bipolar binary values. Figure 3.5. Data File. 53

Figure 3.6. Input Matrix formed after Hexa-Decimal Conversion. Step 2: Storage The values from the Input matrix are taken to form a neural network. In this example, the input layer of BAM has 32 neurons and the output layer also has 32 neurons. Since there are 20 input vectors and 20 corresponding output vectors, the value of M is 20. The weight matrix is determined as: The weight matrix obtained is a 32x32 matrix. Figure 3.7 shows the matrix obtained after computation. This is the initialization phase of the error correction process. 54

Figure 3.7. Weight Matrix. 55

Step 3: Testing In the regeneration phase, a noisy/corrupt signal is sent to the trained neural network for the errors to be rectified. This phase has two tasks. Firstly, to ensure no false negatives, the BAM is tested to retrieve any vector from the input matrix or the output matrix. To verify that the network is capable of recalling any vector from the input matrix, we present the neural network with a vector from the input matrix. It should be able to retrieve the corresponding vector in the output matrix. Since the input and the output matrices are the same, the BAM should return the same values that have been presented to the network. The vector is retrieved using Equation 3.4. For instance, Test Vector = [ 1, -1, 1, -1, 1, -1, 1, 1, 1, 1, -1, -1, 1, 1, -1, 1, 1, 1, 1, -1, 1, 1, 1, 1, -1, -1, -1, 1, -1, -1, 1, -1 ]; Using Equation 3.4, the output for the corresponding test vector is evaluated. Figure 3.8 shows the MATLAB snapshot of the retrieval of the above vector from the input matrix. The network was able to retrieve the values in both forward and backward directions. Random values from the input as well as the output matrices are used for testing. Vectors from the output matrix are retrieved using Equation 3.5. 56

Figure 3.8. Retrieval of the Test Vector from the Input Matrix. Step 4: Retrieval In the second part of the regeneration phase, the BAM is tested to retrieve exact values from the output matrix even when a noisy/corrupt input signal is given at the input layer. A noisy test vector is taken and the error correction process is initialized. The output is calculated using Equation 3.4. The input is updated using Equation 3.5 and the process is repeated iteratively till the input and the output vectors attain equilibrium. Equilibrium is attained when the input and the output vectors remain unchanged with further iterations. The above test vector is taken and a number of bits are flipped at random positions. The trained BAM network has successfully corrected the noisy bits. Figure 3.9 shows the retrieval of the given test vector. Test Vector = [ -1, -1, 1, -1, 1, -1, 1, 1, 1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, -1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, 1 ]; 57

Original Vector = [ 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, 1, 1, 1, 1, -1, 1, 1, -1, -1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1 ]; Figure 3.9. Retrieval of a Noisy Vector. 32 bit, 64 bit, 128 bit and 256 bit vectors have been used for testing. It has been observed that learning tends to improve with sample size. The major advantages of this method are its simple computation when compared to BCH codes and the ability for group processing. The error correcting syndromes are calculated for a batch of PUF outputs rather than one single output vector. All the vectors used to train the network can use the same error correcting syndrome to correct the noisy bits. The proposed error correcting code using bidirectional associative memories has two other advantages. The technique generates a secure syndrome that does not pose any threats to the information used to generate the syndrome (PUF response bits). The second major advantage is the 58

robust nature of the algorithm. The failure rates of the error correction code can be driven below 1 ppm. 3.5 Conclusion The Error Correcting Code is a critical part of the key generation process. Cryptographic keys demand exact resemblance to the previously generated/stored keys amidst environmental changes. In order to fulfill this demand an error correcting code had become an integral part of the key generation process. An important aspect of a neural network is its ability to learn from its environment and to improve its performance through learning. It has been experimentally observed that BAM is more stable when compared to the other learning algorithms. This property arises due to the transpose relationship between weight matrices with the input and the output vectors in the forward and the backward direction. The network has been trained for 32 bit, 64 bit, 128 bit and 256 bit vectors and has been successfully rectified errors in noisy test vectors. The stability and performance of the algorithm have been compared with the Hopfield networks, and the Hebbian Learning Algorithms. It has been observed that the BAM is much stable and gives much accurate results, when presented with noisy inputs. Our results show that a BAM network can be successfully used as an error correcting code for the PUF output, even if it is not necessary for all the PUF responses of the key generation process. The security of this design has been further enhanced using a hashing algorithm to redress the response bits. This hinders the attacker s ability to determine the challengeresponse pairs. SHA-2 algorithm has been used for this purpose. SHA-2 hashing 59

functions are a new standard in cryptographic applications introduced by National Institute of Standards and Technology (NIST). The response bits from the PUFs are corrected using the error correction process and sent as an input vector to the hashing function to transform the response bits to achieve a uniform distribution criterion. Chapter 4 discusses different cryptographic algorithms and SHA hashing function in detail. 60

Chapter 4 Secure Hashing Algorithm 4.1 Introduction In an era where Internet is being increasingly used as a tool for commerce and provides essential communication among millions of people, security becomes a tremendously important and an essential aspect. Security involves a range of applications from secure commerce and payments to private communications and password protection. One essential aspect of secure communications is that of cryptography. The growth of both wired and wireless communications has triggered the revolution for the generation of new cryptographic algorithms. Cryptography is the study of mathematical techniques for all aspects of information security. The primary goal of cryptography is to mask the messages sent between two people so that an attacker/hacker might not understand the message being sent. Cryptography is not only used to protect data from theft and alteration, but can also be used for user authentication. There are four major goals for the implementation of cryptographic applications: 1. Confidentiality: The property to make the content of information available only to the authorized personals. Secrecy is a term synonymous with confidentiality and 61

privacy. There are numerous approaches to providing confidentiality, ranging from physical protection to mathematical algorithms which render data unintelligible without secret keys. 2. Data Integrity: The property to unauthorized alteration of confidential data. The ability to detect data manipulation by unauthorized personnel assures data integrity. Data manipulations include insertion, deletion or substitution. 3. Authentication: This property relates to identification. It applies to both entities and information. For example, two parties entering into a communication should authorize each other and any information communicated over a channel should be authenticated with respect to its origin, date of origin, data content, time sent, etc. Hence, this aspect of cryptography is subdivided into two major classes: entity authentication and data origin authentication. Data origin authentication is a necessary aspect for FPGA reconfiguration. 4. Non-Repudiation: The property which prevents an entity from denying previous commitments or actions. In case of disputes denying certain actions, this property provides a means to resolve the issue in such a situation. For example, one entity may authorize the purchase of property by another entity and later denies such authorization. This property involves a third party to resolve such disputes. The fundamental goal of cryptography is to adequately address these four areas in both theory and practice. Cryptography pertains to the prevention and detection of deception and other malicious activities. These goals can be achieved using different types of cryptographic schemes. There are, several ways of classifying cryptographic 62

algorithms based on the number of keys that are employed for encryption and decryption, and can further be defined by their application. The three types of algorithms are: Secret Key (Symmetric) Cryptography: Uses a single key for both encryption and decryption. Public Key (Asymmetric) Cryptography: Uses one key for encryption and another key for decryption. Hash Functions: Use a mathematical transformation to irreversibly encrypt information. In each of the cases, the initial encrypted data is referred to as pain-text and the encrypted data is called the cipher-text. Figure 4.1 shows the taxonomy of the cryptographic primitives [82]. Figure 4.1. Taxonomy of the Cryptographic Primitives. 63

4.2 Types of Cryptographic Algorithms 4.2.1 Secret Key Cryptography In this method, a single key is used for both encryption and decryption. The sender encrypts the plaintext using the key and a set of rules and sends the cipher text to the receiver. The receiver applies the same key to decrypt the cipher text and recovers the plain text/message. Figure 4.2 shows the communication technique for secret key cryptography. Due to the use of a single key for both the functions, secret key cryptography is known as symmetric encryption. In this form of cryptography, the key is the secret and should be known only to the sender and the receiver. The biggest setback with this approach is the distribution aspect of the key. Figure 4.2. Two-Way Communication using Secret Key Cryptography. 64

Secret Key cryptographic schemes can be categorized as: stream ciphers and block ciphers [83]. Stream ciphers are bit-wise operations that implement a form of feedback mechanism to constantly change the key. On the other hand, block ciphers encrypt one block of data at a time using the same key for the whole block. A plaintext block of data always encrypts to the same cipher text when the same key is used but the same plaintext might not encrypt to the same cipher text in a stream cipher. 4.2.1.1 Stream Cipher Stream ciphers form an important class of symmetric-key encryption schemes. They are, in a way, block ciphers having block length equal to one. The main advantage of this scheme is the change in encryption transformation for every symbol of plaintext being encrypted. Stream ciphers prove to be advantageous where transmission errors are highly probable due to the absence of a propagation error. They are also used in applications where data must be processed one symbol at a time due to no memory or limited buffering of data. Stream ciphers can be future divided onto two categories: Self Synchronizing Stream Ciphers and Synchronous Stream Ciphers. Self-synchronous stream ciphers calculate each bit in the key as a function of the previous n bits in the key stream. It is called self synchronous because the encryption and the decryption processes synchronize with each other by merely knowing the number of bit of the n-bit key stream that have been processed. The main disadvantage with this process is its error propagation. One error bit in the transmission will result in n-error bits at the receiver side. Synchronous stream ciphers generate the key independent of the message stream. They are generated using a key generator function. This generator function is used by 65

both the sender and the receiver. While stream ciphers do not propagate transmission errors, they are, by nature, periodic so that the key stream will eventually repeat. 4.2.1.2 Block Cipher A block cipher is an encryption scheme that breaks up the plaintext messages, to be transmitted, into strings of a fixed length, and encrypts one block at a time. Block ciphers are the most well known symmetric-key encryption techniques. Two important classes of block ciphers are substitution ciphers and transposition ciphers. Transformation ciphers combine both substitution and transposition ciphers. Simple substitution ciphers do not provide adequate security even in case of small block sizes and extremely large key space. In case of English alphabets, the key space is 26!, yet the key used can be determined quite easily by examining a modest amount of cipher text due to the fact that the distribution of letter frequencies is preserved in the cipher text. For example, the letter E occurs more frequently than the other letters in ordinary English text. Hence, it can be inferred that the letter occurring most frequently in the cipher text blocks is most likely to correspond to the letter E in the plain text. By observing a modest number of cipher text blocks, a cryptanalyst or a hacker can determine the key. Simple substitution and transposition ciphers individually do not provide very high security, however, a combination of both the transformations obtain stronger ciphers. In spite of the above mentioned drawbacks, secret key cryptographic applications prove cost effective in some real world scenarios. Some secret key encryption algorithms used on real world are: Data Encryption Standard (DES) and Advanced Encryption Standards (AES). They are discussed in the next part of this section. 66

Data Encryption Standards (DES): DES is the most common secret key encryption key scheme used today. DES was designed by IBM in the 1970 s and adopted by the National Bureau of Standards (NBS) in 1977 for commercial and unclassified government applications. DES is a block cipher employing a 56-bit key that operates on 64-bit blocks. DES has a set of complex rules and transformations that were designed specifically to yield fast hardware implementations. IBM also proposed a 112-bit key for DES, which was rejected by the government. The concept of 112-bit key for DES was again considered in 1990 s but has never been fully implemented. DES is defined in American National Standard and Federal Information Processing Standards (FIPS). Advanced Encryption Standards (AES): In 1997, National Institute of Standards and Technology (NIST) initiated a process to develop a new secure cryptosystem for the US government applications [84]. As a result, AES became the official successor to DES in 2001. AES uses a secret key cryptographic scheme called Rijndael, a block cipher design by Belgian cryptographers Joan Daemen and Vincent Rijmen. The algorithm is designed for variable block length and key length. The improved version of the algorithm allows any combination of key lengths (128, 192, or 256 bits) and block lengths (128, 192, or 256 bits). Federal Information Processing Standard Publication (FIPS PUB 197) describes a 128-bit block cipher employing a 128, 192, or 256-bit key. 4.2.2 Public Key Cryptography Public Key Cryptography is considered to be the most significant new development in cryptography in the last 300-400 years. Modern public key cryptography 67

was first presented by Stanford University professor Martin Hellman and graduate student Whitfield Diffie in 1976 [85]. Their paper described a two-key crypto system that helps two parties communicate securely over a non-secure communication channel without sharing the secret key. Public key ciphers can be categorized as block ciphers with the unusual property that one key is used to encipher the plain text and a different, unrelated key is used to decipher the cipher text. So one of the keys is kept private and the other is made public. The public key is used to encipher the plain text. The private keys are then used to decipher the cipher text. In public key cryptography, the key used to encipher the message cannot be used to decipher it i.e. the person who enciphers the message cannot decipher it unless he has the specific private key. Generic public key cryptography is simple but has far reaching consequences. It employs two keys that are mathematically related although knowledge about one key does not help in deriving the other key. The most important aspects of this method are both the keys that are required for the process to work and not the order of the keys applied. For this reason, this approach is called asymmetric cryptography. A prototypical public key cipher uses huge numerical values that may contain 1000 bits or more in which every single bit is significant. However, the key space is much smaller because of severe constraints on the keys. So, a 1000-bit public key may have strength similar to a 128-bit secret key cipher. Figure 4.3 shows the communication technique for public key cryptography. 68

Figure 4.3. Two-Way Communication using Public Key Cryptography. Public key ciphers eliminate the key distribution problem completely, but on the other hand also introduce the possibility of a man-in-middle attack. Man-in-middle attack refers to a type of attack where the attacker intrudes into the communication between the endpoints on a network to inject false information and intercept the data transferred between them. This type of impersonation is an example of protocol failure. In this case the adversary impersonates the receiver by sending the sender a public key. The sender incorrectly assumes it to be the public key and encrypts the message using the incorrect key. The adversary then decrypts the message using the private key and re-encrypts the message using the receiver s public key. Figure 4.4 shows the man-in-middle attack on a two-way communication protocol in case of secret key cryptography. To avoid such manin-middle attacks, there is a necessity to authenticate the public key to achieve data origin 69

authentication. The second drawback of this approach is the enormous amount of data to be processed. Due to the huge values that have to be dealt with, in public key ciphers, they are very slow and are used only to encipher random message keys. The message key is then used by a conventional secret key cipher which actually enciphers the data. Figure 4.4. Man-in-Middle Attack on a Two-Way Communication. Public key encryption algorithms used in real world for key exchange or digital signatures includes RSA. It is the first, but still most common, public key cryptographic implementation and is named after the three MIT mathematicians (Ronald Rivest, Adi Shamir, and Leonard Adleman) who developed it. RSA is used in a number of software products for key exchange, digital signature, or encryption of small blocks of data. RSA uses a variable size encryption block and a variable size key. The public-private key pair 70

is derived from a very large number n, which is the product of two prime numbers chosen according to certain rules. The public key information includes the number n and a derivative of one of the factors of n. If a large number is created from two prime factors that are roughly the same size, there is no known factorization algorithm that will solve the problem in a reasonable amount of time. This makes RSA secure, apart from the fact that it still has the man-in-middle attack to hamper its security. 4.2.3 Hash Functions Cryptographic Hash Functions are one of the most primitive methods in modern cryptography, informally called One-Way Hash Functions. Hash Functions are defined as computationally efficient functions mapping binary strings of arbitrary length to binary strings of some fixes length, called hash values [86]. Hash functions are cryptographic algorithms that use no key. Instead, a fixed length hash value is computed, based upon the plain text that makes it impossible for both the contents and the length of the plaintext to be recovered. Hash algorithms are used to generate digital finger print of the contents of a file that can be used to ensure that the file has been altered by an intruder or virus. They are also used by operating systems to encrypt passwords or to provide the integrity of a file. Hash functions are publically known and involve no secret keys. In the hash computation process, the architectures of hash functions are public and commonly known. The security is solely based on the one way operation of the hash function with no keys (public or private) involved. They are typically chosen such that it is computationally infeasible to find two distinct inputs which hash to a same value. In case of digital signatures, a message is usually hashed and the hash value is signed. After the 71

message has been sent, the receiver hashes the received message and compares it with the previously generated hash value. This saves both time and space compared to signing the message directly, which would involve splitting the message into appropriate-sized blocks and signing each block individually. Figure 4.5 shows a general flowchart for a hashing function. The inability to find the same hash value for two inputs is the primary security for this process, in which case the signature on one message hash value would authenticate a malicious user. Another application of hash functions is their use in protocols that involves prior commitments including some identification protocols using digital signature schemes. Hash functions are also used for implementation of digital signature algorithms [87,88], keyed hash message authentication codes [89] and in random number generator architectures [90]. Figure 4.5. General Flowchart for the Operation of a Hashing Function. 72

Some hashing functions used on real world are: Message Digest Algorithms: MD2, MD4, MD5, Secure Hashing Algorithms: SHA-1, SHA-256, SHA-384, SHA-512, Whirlpool, and Tiger-2: Tiger-192, Tiger-160, Tiger-128. Some of them are discussed in the next part of this section. Message Digest (MD) Algorithms: The MD family of hashing algorithms has been designed by Ron Rivest in the late 1980 s and the early 1990 s [91,92]. They involve a series of byte-oriented algorithms that produce a 128-bit hash value from an arbitrary length message. o MD-2: Specially designed for systems with limited memory, such as smart cards. o MD-4: It is similar to MD-2, with improvements designed specifically for fast processing software. o MD-5: This version has been designed after potential weaknesses have been reported in MD-4. More manipulations are made to the original data when compared to MD-4, which in turn makes it slower than MD-4. It has been used in several products in spite of a number of weaknesses that have been brought into light by German cryptographer Hans Dobbertin in 1996. Secure Hashing Algorithms: SHA algorithms have been new standard in cryptographic applications introduced by NIST in the year 1995 [93]. SHA-0 was the first version of the secure hashing algorithms but was withdrawn soon after its release due to weakness in the design. SHA-1 and SHA-2 were released after considerable improvements in the design. 73

o SHA-1: It is a popular hashing algorithm developed by NIST released in 1994 and was originally published as FIPS 180-1. SHA-1 is similar to MD-4 and MD-5 but is more secure than the both. Hence, it is also referred to as the successor of MD-5. SHA-1 produces 160-bit hash value and also features a large number of security protocols and applications. It has recently been discovered that the security strength can be dramatically lowered to a point where it is theoretically possible to produce a collision [94]. o SHA-2: SHA-2 was released in 2000 after improvements have been made to SHA-1 algorithm. SHA-2 operates similar to the SHA-1 algorithm. The primary differences lies in the number of initial blocks of messages, less number of iterations, the use of right shift as well as left shift, and the use of constants for each iteration instead of a range of values. SHA-2 actually combines SHA-224, SHA- 256, SHA 384 and SHA-512. SHA-2 has been used to redress the response bits. SHA-2(256) algorithm has been chosen mainly due to its advantage over other algorithms, like Tiger, Whirlpool, MD5, SHA-1, with respect to collision attacks and preimage attacks. The term collision characterizes finding two input messages that produce the same hash function and the term preimage attack characterizes the inability to find a message from its hash function (backtracking). Table 4.1 shows a summary of all the algorithms. This table clearly shows the details about collision attacks and preimage attacks of the algorithms. 74

Table 4.1. Summary of the Existing Hashing Algorithms. Hashing Algorithms Author Date Launched Number of bits Collision Attacks Preimage Attacks MD-2 Ronald Rivest 1989 128 Yes Yes MD-5 Ronald Rivest 1991 128 Yes(2^20.96) Yes(2^183.3) Whirlpool Paulo Barreto, Vincent Rijmen 2001 521 Yes(1) No SHA-1 NIST 1995 160 Yes(2^51) No SHA-2 NIST 2000 - No No SHA-256 NIST 2000 256 No No SHA-384 NIST 2000 384 No No SHA-512 NIST 2000 512 No No 4.3 Secure Hashing Algorithms (SHA-2) The architecture of hash functions is public and commonly known. In the hash computation process, there is no secrecy and no keys, public or private, involved. The secrecy is based on the 2 n hash computations that are needed to find any hash message from its hash value. SHA-2 supersedes the existing SHA-1 (FPIS-180-1) with three new hash functions, SHA-2(256), SHA-2(384), and SHA-2(512), for computing a securely 75

condensed representation of the data. They use larger messages, and are more resistant to possible attacks allowing them to be used with larger blocks of data, up to 2128 bits. The SHA-2 algorithm is the same for all the three hashing functions, differing only in the size of the operands, the initialization vectors, and the size of the final digest. The produced hash values ranges from 256 to 512 depending on the algorithm used. These hash functions help in determining the integrity of the message. Any change in the message will, with very high probably, result in a completely different hash value. The three new hash functions are considered to be secure because for each one of them, it is computationally infeasible to find a message that corresponds to a given hash value and to find two different messages that produce the same hash value. Table 4.2 shows further specifications of the three algorithms and SHA-1. SHA-2(256) has been chosen for this particular application owing to the number of bits in the hash value. In this work, the hash values are used to redress the response bits for a uniform distribution. The operations of the hash function can be divided into two stages: Stage 1: Preprocessing involves padding the message, parsing the padded message and initializing the hash values. Stage 2: Hash computation involves data manipulations using arithmetic operations, predefined functions and constants. The following section describes the SHA-2 algorithm applied to the SHA-256 hash function. 76

Table 4.2. Specifications of Secure Hash Functions. SHA-1 SHA-2(256) SHA-2(384) SHA-2(512) Input Message Size (Bits) Padded Data Block (Bits) <2 64 <2 64 <2 128 <2 128 512 512 1024 1024 Word Size (Bits) 32 32 64 64 Transformation Rounds 80 64 80 80 Hash Value (bits) 80 128 192 256 4.4 SHA-256 The SHA-256 hash function produces a final hash value of 256 bits, depending on the input message, composed of multiple blocks of 512 bits each. This input block is further divided into 32 bits each (represented as Wt) and fed to sixty-four cycles of the SHA-256 function. In each cycle, data processing is performed by additions and logical operations (such as bitwise logical operations and bitwise rotations) based on the output data of the previous cycle. The computational structure of each round of the algorithm is described in the pseudo code given below. SHA-256 uses a series of 64 32-bit constants. These constants represent the first 32-bits of the fractional part of the cube roots of the first 64 prime numbers. Table 4.3 shows the values of the constants. The 32-bit values in the variables a to h are updated for every round. Each 512 data block is processed for 64 rounds and then the variables are added to the previous digest message to obtain the 77

intermediate hash values. The pseudo code for the two stages of the SHA-256 hash function is described below. Table 4.3. Constants used in SHA-256 Hash Function. 428A2F98 71374491 B5C0FBCF E9B5DBA5 3956C25B 59F111F1 923F82A4 AB1C5ED5 D807AA98 12835B01 243185BE 550C7DC3 72BE5D74 80DEBLFE 9BDC06A7 CL9BF174 E49B39C1 EFBE4786 0FC19DC6 240CA1CC 2DE92C6F 4A7484AA 5CB0A9DC 76F988DA 983E5152 A831C66D B00327C8 BF597FC7 C6E00BF3 D5A79147 06CA0ABB 14292967 27B70A85 2E1B2138 4D2C6DFC 53380D13 650A7354 766A0ABB 81C2C92E 92722C85 A2BFE8A1 A81A664B C24B8B70 C76C51A3 D192E819 D6990624 F40E3585 106AA070 19A4C116 1E376C08 2748774C 34B0BCB5 319C0CB3 4ED8AA4A 5B9CCA4F 682E6FF3 748F82EE 78A5636F 84C87814 8CC70208 90BEFFFA A4506CEB BEF9A3F7 C67178F2 4.4.1 Preprocessing Preprocessing involves padding the input message, parsing the padded message into message blocks and initializing the hash values. The hash computation uses the 78

padded data along with constants, functions, and algebraic operations, to iteratively generate a series of hash values. The input message is padded to ensure that the input message is a multiple of 512 bits. For a input message of length len, 1 bit is appended at the end of the input message, followed by k zero bits where k is the smallest non-negative solution to the equation len+1+k 448 mod 512. The message is then parsed into N 512-bit blocks which are further divided into 32-bit words. The hash values are instantiated as follows: H 0 = 6A09E667 H 2 = 3C6EF372 H 4 = 510E527F H 6 = 1F83D9AB H 1 = BB67AE65 H 3 = A54FF53A H 5 = 9B05688C H 7 = 5BE0CD19 4.4.2 Hash Computation Step 1: SHA-256 uses six logical functions for the message schedule. Each function operates on a 32 bit word (represented as x, y, z) and the result of each word is also a 32 bit word. The functions are listed below. Ch(x, y, z) = ((x and y) xor (not (x) and z)) (4.1) Maj(x, y, z) = ((x and y) xor (x and z) xor (y and z) (4.2) Σ 0 (x) = (ROTR 2 (x) xor ROTR 13 (x) xor ROTR 22 (x) (4.3) Σ 1 (x) = ROTR 6 (x) xor ROTR 11 (x) xor ROTR 25 (x) (4.4) σ 0 (x) = ROTR 7 (x) xor ROTR 18 (x) xor SHR 3 (x) (4.5) σ 1 (x) = ROTR 17 (x) xor ROTR 19 (x) xor SHR 10 (x) (4.6) 79

ROTR n (x) Rotate right (circular right shift) operation, where x is a w-bit word and n is an integer (0 n < w). ROTR n (x) = (x>>n) or (x>>w-n) SHR n (x) Right shift operation, where x is a w-bit word and n is an integer (0 n < w) SHR n (x) = (x>>n) The above functions are used to compute the Wt unit as shown in Equation 4.7 and Equation 4.8. W(t) = M(t) when 0 <= t <= 15 (4.7) W(t) = σ 1 (W(t-2)) + W(t-7) + σ 0 (W(t-15)) + W(t-16) when 16 <= t <= 63 (4.8) where M(t) is the padded message. Step 2: The next step is to initialize the eight working variables, a, b, c, d, e, f, g, and h, with the (i-1) th hash value. a = H 0 (i-1) b = H 1 (i-1) c = H 2 (i-1) d = H 3 (i-1) e = H 4 (i-1) f = H 5 (i-1) g = H 6 (i-1) h = H 7 (i-1) Step 3: For t = 0 to 63 T 1 = h + Σ 1 (e) + Ch(e, f, g) + K (256) t + W(t) T 2 = Σ 0 (a) + Maj(a, b, c) h = g g = f f = e e = d + T 1 d = c 80

c = b b = a a = T 1 + T 2 Step 4: The intermediate hash values are computed as: H (i) (i-1) 0 = a + H 0 H (i) (i-1) 2 = c + H 2 H (i) (i-1) 4 = e + H 4 H (i) (i-1) 6 = g + H 6 H (i) (i-1) 1 = b + H 1 H (i) (i-1) 3 = d + H 3 H (i) (i-1) 5 = f + H 5 H (i) (i-1) 7 = h + H 7 The above steps are repeated N times to get the final 256-bit hash value of the input message M. Figure 4.6 shows the round architecture of an SHA-256 hash function. Figure 4.6. Flowchart of a SHA-256 Hash Function. 81

4.5 Optimization of the SHA-256 Hash Function SHA-256 hash function is an essential part of the process and requires data storing and manipulations. Since this algorithm has been implemented on an FPGA, the improvements have been designed taking advantage of the basic architecture of the FPGA. To achieve the required processing capability, certain optimization techniques are proposed to improve the implementation of the SHA-256 algorithm. Techniques are incorporated to optimize the data dependency. This helped in an increase in the throughput. Parallel Counters and Balanced Carry Save Adders are used to improve the partial additions. The addition units are improved, since they are the most critical operations in this algorithm. Pipelining techniques have been used to achieve higher frequencies. Look-Up Tables (LUTs) are used to store the constant values (Kt). 4.6 Architecture of the SHA-256 Hash Function The architecture of the SHA-256 hash function is illustrated in Figure 4.7. The process starts with the Initialization Unit. This unit initializes the corresponding constants, in the Constants Unit, and defines the operation word length. The Padder then pads the input message into multiples of 512-bit blocks. In every transformation round, based on the padded data, a new data block, Wt(i) is produced in the Wt Unit. The specified constants set, K t (i), are stored in LUTs to support the Hash Computation Unit. The Hash Computation Unit is the main data path of the system architecture. The specified number of the data transformation rounds, for the SHA-256 hash function, is 82

performed in this component with the help of a rolling feedback loop. The transformed data is finally modified in the Final Addition Unit which operates along with the Constants Unit. The output of the Final Addition Unit gives the final hash value. Figure 4.7. System Architecture of the SHA-256 Hash Function. 4.6.1 Padder and Wt Unit The input message is padded in the Padder, before the beginning of the hash computation. The Padder ensures that the input message length is a multiple of 512-bit blocks. The output of the Padder is the input of the Wt Unit. In the Wt Unit, the padded data is divided into equal blocks of 32-bit each and are processed in order. The output is processed using Equation 4.7 and Equation 4.8. The padded data is divided into 16 blocks of 32-bit each and are stored in 16 register. For every clock cycle, the data stored in one register is forwarded into the next register. During shifting, in every clock cycle, the functions σ 0 and σ 1 (mentioned in 83

equation) are implemented to perform data transformations. Figure 4.8 shows the architecture of the Wt Unit. The R(n) component is a rotate right (circular right shift) operation. It rotates the input data by n-bits. The S(n) component is a right shift operation. It shifts the input data to the right by n-bits. The R(n) and the S(n) components work with different components according to the Equation 4.9 and Equation 4.10. σ 0 (x) = ROTR 7 (x) xor ROTR 18 (x) xor SHR 3 (x) (4.9) σ 1 (x) = ROTR 17 (x) xor ROTR 19 (x) xor SHR 10 (x) (4.10) The modulo adder components perform a 2 32 modulo addition. The output of the Wt Unit for each round is stored in the Wt(i) register and then forwarded to the Hash Computation Unit. Figure 4.8. Architecture of the Wt Unit. 84

4.6.2 Constants Unit-I SHA-256 uses 8 x 32-bit constants. These constants are initialized by the Initialization Unit and used in the initialization of the Hash Computation Unit and in the Final Addition. The Constants unit of the above shown architecture is implemented using Look-Up Tables (LUTs). The SHA-256 hash function operates on 512 blocks of data. If a message has more than one data block, the constants have to be refreshed. When a new block of data, not related to the previous processed block, has to be transformed, the used constants are reinitialized to the predefine values. 4.6.3 Constants Unit-II This block contains the specified set of constants. The SHA-256 uses a sequence of 64 32-bit constants. These constants represent the first 32-bits of the fractional parts of the cube roots of the first 64 prime numbers. To implement the specified K t (i) constants, one ROM array of 64 x 32-bits have been used. 4.6.4 Hash Computation Unit This unit accepts 8 data inputs (A in, B in,..h in ) of 32-bit each and produces 8 outputs of 32-bit each. The constant input values (A con, B con,..h con ) are loaded into the Hash Computation Unit in the initializing phase. This unit also takes Wt(i) and K t (i) as inputs. The Wt(i) component comes from the Wt Unit and the K t (i) comes from the ROM blocks. The word length of all the inputs and the outputs is 32-bit. Data modifications in this unit are mainly performed by the T1 and the T2 function modules and Modulo Adders. The Modulo Adder implements modulo 2 32 addition. Figure 4.9 and Figure 4.10 show the architectures of T1 and T2 functions. The 85

R(n) components, shown in the figures, operate similar to the one shown in Figure 4.8. The architectures of the functions are implemented using the functions shown in Equation 4.11 and Equation 4.12. T 1 = h + Σ 1 (e) + Ch(e, f, g) + K (256) t + W(t) (4.11) T 2 = Σ 0 (a) + Maj(a, b, c) (4.12) Figure 4.9. Architecture of the T1 function. 86

Figure 4.10. Architecture of the T2 function. 4.6.5 Final Addition This unit implements modulo additions between the 8 data inputs (A in, B in,..h in ) and the 8 input constants (A con, B con,..h con ). The word length of the inputs and the outputs of the modulo addition component is 32-bits. The output of this component is the final hash value. Figure 4.11 shows the simulation result of the SHA-256 hash function. The simulation clearly shows that the proposed optimized algorithm takes 45 clock cycles to perform the SHA-256 algorithm. 87

Figure 4.11. Simulation of the SHA-256 Hash Function. Figure 4.11. Simulation of the SHA-256 Hash Function. 88

This algorithm has been optimized using the above techniques to give better throughput results. The proposed architecture takes 45 clock cycles to perform the SHA- 256 hash function. The optimization has given better results compared to [86] and [95]. Table 4.4 shows the comparison between the proposed architectures and the implementations in [86] and [95], and Figure 4.12 shows the graphical representation of the comparisons. The throughputs have been improved by 88.52% and 30.72% when compared to [86] and [95] respectively. Table 4.4. Comparison between the Previous Works and the Proposed Design. Number of LUTs Number of FFs Number of CLBs Number of Clock Cycles Percentage Increase in Throughput [86] 2120 1651 1060 81 88.52% [95] 1367 1304 684 65 30.72% Proposed Architecture 1298 1089 649 45 - Figure 4.12. Comparison between Results of the Proposed Method and Previous Works. 89

4.7 Conclusion The inherent strength of the SHA-256 algorithm has been improved with the use of 64 round constants instead of the four constants used in SHA-1. This reduces the risk of collision considerably. No collision attacks or preimage attacks have been reported till date. The combination of PUF and SHA algorithm prove to be strong combination to implement security in an FPGA device but due to a slight dependency of the PUF signature bits on the operating temperatures, a bit generated from a pair of ring oscillators might flip when the operating temperature changes considerably. Due to the changes in operating temperature and voltage fluctuation, the output obtained for the same challenge on the same CLB slice may not be bit exact. The output response bits have been observed to deviate more in Hamming distance from a reference response. Since cryptographic method demand very accurate keys that satisfy mathematical properties, the response bits have been redressed according to the cryptographic needs using error correcting codes. The error correction process ensures that the PUF produces the same output in case of any environmental changes. 90

Chapter 5 Conclusion 5.1 Summary Silicon PUFs, Delay Based PUFs in specific, are novel chip identifiers that produce unique responses for every device based on the slight variations due to the manufacturing processes of identically designed devices. This thesis initially discusses the concepts of PUFs and the uncertainty in the doping/manufacturing processes that cause process variations that serve as the basis for PUF implementations. It is followed by classification of PUFs based on the security issues and fabrication techniques and the types of PUF implementations. The chapter further discusses the proposed PUF circuit using LUTs and multiplexers that are the basic component of the FPGA architecture. The PUFs have been implemented on a Spartan 2 XC2S100 FPGAs and a logic analyzer is used to analyze the outputs obtained. The intra chip and inter chip responses are analyzed using Hamming codes. The graphs for the uniqueness of the responses are also presented. The second part of the research involved the implementation of an error correcting code using Artificial Neural Networks. This is due to the fact that PUFs are affected by varying environmental conditions like temperature and supply voltage noise. 91

Thus the output responses of the PUFs might have certain flipped bits and it is a known fact that cryptographic applications demand bit exact responses for authentication. So, to improvise the stability of the responses of the PUFs, an error correcting code has been implemented using neural networks, precisely bidirectional associative memories. This part of the thesis also explains the advantages of using neural networks over conventional error correction methods like BCH codes. Further, we explained that it is important to improvise the key generation process. A hashing function has been used to transform the response bits to achieve a uniform distribution criterion. The latter part of the thesis dealt with implementation and optimization of the SHA-256 algorithm using pipelining techniques. 5.2 Contributions Successfully designed, implemented and simulated a new circuit for Ring Oscillator PUFs. The design has been successfully implemented, simulated and verified using 5 Xilinx Spartan XC2S100 FPGAs and an Agilent 16801A Logic Analyzer. The design has been simulated at 50MHz. Inter-chip and Intra-chip responses have been analyzed and the Hamming Distances for the 128-bit responses have been calculated and plotted to obtain statistical results for the reliability factor. The uniqueness of the responses has been calculated and compared with the results of previous works and a higher uniqueness factor with respect to the implementations on both inter-chip and intra-chip has been achieved. 92

A new error correction technique has been modeled and successfully implemented using Artificial Neural Networks. The Networks (BAM) have been successfully trained and tested using PUF responses to yield better results than the conventional BCH codes. A lower failure rate has been achieved using the proposed method when compared to BCH codes. The SHA-2(256) algorithm has been optimized using pipelining techniques. Parallel Counters and Balanced Carry Save Adders have been used to improve the partial additions and the data dependencies have been optimized. The three phases have successfully been integrated to generate a unique and reliable device specific key for cryptographic applications and trustworthy authentication techniques to deal with the increasing counterfeiting problems. 5.3 Conclusions The proposed PUF circuit is implemented on a Xilinx Spartan 2 XC2S100 FPGAs, and an Agilent 16801A logic analyzer is used to obtain the PUF responses. The intra-chip and inter-chip responses are analyzed and plotted using Hamming codes. From the responses of the intra-chip implementations, it is observed that 23% of the Hamming distances lie in the range [56:65] and 28% of the Hamming distances for the responses of the inter-chip implementations are observed to lie in the range [61:65]. This signifies that the uniqueness in the signatures is higher in case of inter-chip responses than the intrachip responses. Also, the overall uniqueness of the responses is found to be 49.0625% which is higher when compared to the previous implementations of the conventional ROPUF circuit (43.40%) discussed in [26] and the improvisation of the conventional 93

circuit using chain-implementation presented (48.51%) in [58]. Accurate on-chip experimental results for the uniqueness factor of the inter-chip responses and the intrachip responses of the conventional circuit have separately been calculated as 47.31% and 40.86% respectively in [60]. The proposed design proves to yield better results than the previous works in both the inter-chip and intra-chip uniqueness factors. The inter-chip uniqueness factor for the proposed design is calculated as 47.929% and 41.91% for intrachip responses. The PUFs are improvised using Artificial Neural Networks. The networks are tested using the PUF responses of various lengths. The networks have successfully corrected the error bits. The failure rates of the proposed method are below 1ppm which is lower than the failure rate of BCH codes which is calculated as 4.8ppm [69]. The learning tendency of the neural networks proved to be an important factor in lowering the failure rates of the error correction process. It is also observed that learning tends to improve with sample size. The SHA-256 hashing function is optimized using parallel programming. The delay is reduced to 45 clock cycles and the throughput is increased by 88.52% and 30.72% compared to [86] and [95]. This improvisation has been achieved due to the implementation of pipelining technique. 5.4 Future Work This work involves the successful implementation of a key generation process using the proposed PUFs on Xilinx FPGAs and their error correction technique. The results have been simulated and analyzed, and have been compared to previous works. In the error correction process, the PUF responses are assumed to produce certain errors 94

with a considerable change in temperature. Future work includes characterizing the stability of PUF response bits across a wide range of environmental changes like temperatures and voltage fluctuations and improving the learning function of the error correcting code to decrease the number of iterations required to train the network and decrease the failure rate. 95

References [1] T. Huffimire, B. Brotherton, T. Sherwood, R. Kastner, T. Levin, T. D. Nguyen, and C. Irvine, Managing Security in FPGA- Based Embedded Systems, in IEEE Design and Test of Computers, vol. 25 issue 6, pp. 590-598. [2] International Anti-Counterfeiting Bureau, Intellectual Property Theft, 2011. [3] P. Dillien, IP Tag Technology Identifies Stolen FPGA Designs, Tracks and Security Leaks, 2010. [4] "Cisco Statement on Conterfiet Goods," Cisco Corp, San Jose, CA, 2008. [5] "In China, Knockoff Cellphones are a Hit," The New York Times, New York, NY, 2009. [6] J. Wu and M. O'Neill, On Foundation and Construction of Physical Unclonable Functions." Report on Cryptology eprint Archive, 2010. [7] D. Boning and S. Nassif, Models of Process Variations in Device and Interconnect, in Design of High-Performance Microprocessor Circuits, IEEE Press, 2001, pp. 98 115. [8] S. Roy and A. Asenov, Where Do the Dopants Go? Science, vol. 309, pp. 388 390, 2005. [9] M. Orshansky, L. Milor, and C. Hu, Characterization of Spatial Intrafield Gate CD Variability, Its Impact on Circuit Performance, and Spatial Mask-Level Correction, IEEE Transactions on Semiconductor Manufacturing, vol. 17(1), pp. 2 11, 2004. 96

[10] K. Okada and H. Onodera, Statistical Parameter Extraction for Intra and Inter-Chip Variabilities of Metal Oxide-Semiconductor Field-Effect Transistor Characteristics, Japanese Journal of Applied Physics, vol. 44(1A), pp. 131 134, 2005. [11] A. Agarwal, D. Blaauw, and V. Zolotov, Statistical Timing Analysis for Intra-Die Process Variations For Spatial Correlations, in Proceedings of IEEE International Conference on Computer Aided Design, 2003, pp. 900 907. [12] H. Chang and S. Sapatnekar, Statistical Timing Analysis Considering Spatial Correlation Using a Single PERT-Like Traversal, in Proceedings of IEEE International Conference on Computer Aided Design, 2003, pp. 621 625. [13] M. Orshansky and A. Bandyopadhyay, Fast Statistical Timing Analysis Handling Arbitrary Delay Correlations, in Proceedings of ACM/IEEE Design Automation Conference, 2004. [14] P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, and C. Spanos, Modeling Within-Die Spatial Correlation Effects for Process Design Co-Optimization, in International Symposium on Quality Electronic Design Automation, 2005, pp. 516 521. [15] S. Bhardwaj, S. Vrudhuka, P. Ghanta, and Y. Cao, Modeling of Intra-Die Process Variation for Accurate Analysis and Optimization of Nano-Scale Circuits, in Proceedings of ACM/IEEE Design Automation Conference, 2006, pp. 791 796. 97

[16] J. Xiong, V. Zolotov, and L. He, Robust Extraction of Spatial Correlation, in Proceedings of ACM/IEEE International Symposium on Physical Design, 2006, pp. 2 9. [17] F. Liu, A General Framework for Spatial Correlation Modeling in VLSI Design, in Proceedings of ACM/IEEE Design Automation Conference, 2007, pp. 817 822. [18] Virtual Fab, IBM. http://researcher.ibm.com/view_project.php?id=925 [19] A. Bansal, R. N. Singh, R. N. Kanj, S. Mukhopadhyay, J. Lee, E. Acar, A. Singhee, K. Kim, C. Chuang, S. Nassif, F. Heng and K. K. Das, Yield Estimation of SRAM Circuits using Virtual SRAM Fab, Proceedings of the 2009 International Conference on Computer-Aided Design, 2009, pp. 631-636. [20] B. Hargreaves, H. Hult, and S. Reda, Within-die Process Variations: How Accurately Can They Be Statistically Modeled?, Design Automation Conference, ASPDAC 08, 2008, pp. 524 530. [21] N. Beckmann and M. Potkonjak, Hardware-Based Public-Key Cryptography with Public Physically Unclonable Functions," in Information Hiding (Transl.: S. Katzenbeisser and A.-R. Sadeghi, eds.), vol. 5806 of Lecture Notes in Computer Science, pp. 206-220, Springer Berlin, Heidelberg, 2009. [22] L. Kulseng, Z. Yu, Y. Wei, and Y. Guan, Lightweight Secure Search Protocols for Low-Cost RFID Systems," in Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems, ICDCS '09, 2009, pp. 40-48. 98

[23] M. Majzoobi, F. Koushanfar, M. Potkonjak, Lightweight Secure PUFs, IEEE/ACM International Conference on Computer-Aided Design, ICCAD 08, 2008, pp.670-673. [24] H. Busch, S. Katzenbeisser, and P. Baecher, PUF-Based Authentication Protocols Revisited," in Information Security Applications (Transl.: H. Youm and M. Yung, Eds.), vol. 5932, pp. 296-308, Springer Berlin, Heidelberg, 2009. [25] U. Rhrmair, F. Sehnke, J. Slter, G. Dror, S. Devadas, and J. Schmidhuber, Modeling Attacks on Physical Unclonable Functions." Cryptology eprint Archive, 2010. [26] G. E. Suh and S. Devadas, Physical Unclonable Functions for Device Authentication and Secret Key Generation," in Proceedings of the 44th Annual ACM Conference on Design Automation, DAC '07, 2007, pp. 9-14. [27] R. Pappu, Physical One-Way Functions," PhD thesis, Massachusetts Institute of Technology, 2001. [28] B. Gassend, D. Clarke, M. van Dijk, and S. Devadas, Controlled Physical Random Functions," in Proceedings of the 18th Annual Computer Security Conference, 2002. [29] J. Guajardo, S. S. Kumar, G.-J. Schrijen, and P. Tuyls, FPGA Intrinsic PUFs and their Use for IP Protection," vol. 4727 of Lecture Notes in Computer Science, pp. 63-80, Springer Berlin, Heidelberg, 2007. 99

[30] B. Gassend, D. Clarke, M. van Dijk, and S. Devadas, Silicon Physical Random Functions," in Proceedings of the 9th ACM Conference on Computer and Communications Security, CCS '02 (New York, NY, USA), 2002, pp. 148-160. [31] B. Gassend, D. Lim, D. Clarke, M. van Dijk, and S. Devadas, Identification and Authentication of Integrated Circuits: Research articles," Concurr. Comput. : Pract. Exper., vol. 16, no. 11, pp. 1077-1098, 2004. [32] S. Devadas, E. Suh, S. Paral, R. Sowell, T. Ziola, and V. Khandelwal, Design and Implementation of PUF-Based "Unclonable" RFID ICs for Anti-Counterfeiting and Security Applications," in IEEE International Conference on RFID, 2008, pp. 58-64. [33] B. Skoric, G.-J. Schrijen, P. Tuyls, T. Ignatenko, and F. Willems, Secure Key Storage with PUFs," pp. 269-292, Springer London, 2008. [34] S. Kumar, J. Guajardo, R. Maes, G.-J. Schrijen, and P. Tuyls, Extended abstract: The Buttery PUF Protecting IP on Every FPGA," in IEEE International Workshop on Hardware-Oriented Security and Trust, HOST 08, 2008, pp. 67-70. [35] J. Guajardo, S. S. Kumar, G.-J. Schrijen, and P. Tuyls, FPGA Intrinsic PUFs and their Use for IP protection," vol. 4727 of Lecture Notes in Computer Science, pp. 63-80, Springer Berlin / Heidelberg, 2007. [36] P. Tuyls, B. skoric, S. Stallinga, A. Akkermans, and W. Ophey, Information- Theoretic Security Analysis of Physical Unclonable Functions," in Financial Cryptography and Data Security (Transl.: A. S. Patrick and M. Yung, Eds.), vol. 100

3570 of Lecture Notes in Computer Science, pp. 141-155, Springer Berlin / Heidelberg, 2005. [37] B. Skoric, P. Tuyls, and W. Ophey, Robust Key Extraction from Physical Unclonable Functions," in Applied Cryptography and Network Security (Transl.: J. Ioannidis, A. Keromytis, and M. Yung, Eds.), vol. 3531 of Lecture Notes in Computer Science, pp. 407-422, Springer Berlin / Heidelberg, 2005. [38] R. Pappu, B. Recht, J. Taylor, and N. Gershenfeld, Physical One-Way Functions, Science, vol. 297, pp. 2026, Sept. 2002. [39] P. Tuyls, B. Skoric, S. Stallinga, A.H.M. Akkermans, and W. Ophey, Information- Theoretic Security Analysis of Physical Unclonable Functions, Proceedings of Financial Cryptography and Data Security, 2005. [40] M. Magnor, P. Dorn, W. Rudolph, Simulation of Con-focal Microscopy through Scattering Media with and without Time Gating, vol. 19, no. 11, pp. 1695-1700, 2001. [41] J. F. de Boer, Optical Fluctuations on the Transmission and Reaction of Mesoscopic Systems, Ph.D. Thesis, 1995. [42] H. Furstenberg, Non-commuting Random Matrices, Transactions of American Mathematics Soceity, 1963, pp. 108. [43] P. Tuyls, B. Skoric, Secret Key Generation from Classical Physics, in Proceedings of the Hardware Technology Drivers for Ambient Intelligence Symposium, Philips Research Book Series, Kluwer, 2005. 101

[44] X. Xin, J. Kaps, and K. Gaj, A Configurable Ring-Oscillator Based for Xilinx FPGAs, 14th Euromicro Conference on Digital System Design (DSD), 2011, pp. 651-657. [45] J. H. Anderson, A PUF Design for Secure FPGA-Based Embedded Systems, in IEEE/ACM Asia and South Pacific Design Automation Conference (ASP DAC), 2010, pp. 1-6. [46] D. Lim, Extracting Secrets Keys from Integrated Circuits," Master of Science Thesis, Massachusetts Institute of Technology, 2004. [47] M. Majzoobi, F. Kaunshanfar, and M. Potkonjak, Techniques for Design and Implementation of Secure Reconfigurable PUFs, ACM Transactions on Reconfigurable Technology and Systems, vol. 2, no. 1, pp. 1-33, 2009. [48] E. Ozturk, G. Hammouri, and B. Sunar, Physical Unclonable Function with Tristate Buffers," in IEEE International Symposium on Circuits and Systems, ISCAS 08, pp. 3194-3197, 2008. [49] V. van der Leest, G. J. Schrijen, H. Handschuh, and P. Tuyls, Hardware Intrinsic Security from D flip-flop, ACM Workshop on Scalable Trusted Computing (STC), 2010, pp. 53-62. [50] L. Chang, D. Fried, J. Hergenrother, J. Sleight, R. Dennard, R. Montoye, L. Sekaric, S. McNab, A. Topol, C. Adams, K. Guarini, and W. Haensch, Stable SRAM Cell Design for the 32nm Node and Beyond," in Symposium on Digest of Technical Papers: VLSI Technology, 2005, pp. 128-129. 102

[51] D. Puntin, S. Stanzione, and G. Iannaccone, CMOS Unclonable System for Secure Authentication Based on Device Variability," in 34th European on Solid-State Circuits Conference, ESCIRC 08, 2008, pp. 130-133. [52] M. Bhargava, C. Cakir, and K. Mai, Attack Resistant Sense Amplifier Based PUFs with Deterministic and Controllable Reliability of PUF Responses," in IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), 2010, pp. 106-111, 2010. [53] X. Wang and M. Tehranipoor, Novel Physical Unclonable Function with Process and Environmental Variations," in Design, Automation Test in Europe Conference Exhibition DATE 10, 2010, pp. 1065-1070. [54] R. Helinski, D. Acharyya, and J. Plusquellic, A Physical Unclonable Function Defined using Power Distribution System Equivalent Resistance variations," in 46th ACM/IEEE on Design Automation Conference, DAC '09, 2009, pp. 676-681. [55] G. E. Suh, D. Clarke, B. Gassend, M. van Dijk, and S. Devadas, Aegis: Architecture for Tamper-Evident and Tamper-Resistant Processing," in Proceedings of the 17th Annual International Conference on Supercomputing, ICS '03 (New York, NY), ACM, 2003, pp. 160-171. [56] M. Soybali, B. Ors, G. Saldamli, Implementation of a PUF Circuit on a FPGA, 4 th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pp. 1-5, 2011. 103

[57] X. Xin, J. Kaps, K. Gaj, A Configurable Ring-Oscillator-Based PUF for Xilinx FPGAs, 14th Euromicro Conference on Digital System Design (DSD), pp. 651-657, 2011. [58] Dominik Merli, Frederic Stumpf, Claudia Eckert, Improving the Quality of Ring Oscillator PUFs on FPGAs, Proceedings of the 5th Workshop on Embedded Systems Security, WESS'10, October 2010. [59] C. Yin, and G. Qu, LISA: Maximizing RO PUF's secret extraction, IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), pp 100-105, 2010. [60] A. Maiti, J. Casarona, L. McHale, P. Schaumont, A large scale characterization of RO-PUF, IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), pp 94-99, 2010. [61] M. Majzoobi, F. Koushanfar, S. Devdas, FPGA PUF using Programmable Delay Lines, IEEE International Workshop on Information Forensics and Security (WIFS), pp 1-6, 2010. [62] E. Yin, and G. Qu, Temperature-Aware Cooperative Ring Oscillator PUF, HOST 09, IEEE International Workshop on Hardware-Oriented Security and Trust (HOST), 2009, pp. 36 42. [63] A. Maiti, L. McDougall, P. Schaumont, The Impact of Aging on an FPGA-Based Physical Unclonable Function, International Conference on Field Programmable Logic and Applications (FPL), pp. 151-156, 2011. 104

[64] M. Jessa, and M. Jaworski, Randomness of a combined TRNG based on the Ring Oscillator Sampling Method, International Conference on Signals and Electronic Syatems (ICSES), pp. 323-326, 2010. [65] K. Wold, S. Petrovic, Behavioral model of TRNG based on oscillator rings implemented in FPGA, IEEE 14 th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS), pp. 163-166, 2011. [66] Z. Paral, S. Devdas, Reliable and efficient PUF-based key generation using pattern matching, IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), pp. 128-133, 2011. [67] Y. Jin, and Y. Makris, Is single-scheme Trojan prevention sufficient?, IEEE 29 th International Conference on Computer Design (ICCD), pp. 305-308, 2011. [68] V. Vivekraja and L. Nazhandali, Circuit-Level Techniques for Reliable Physically Unclonable Functions, IEEE International Symposium on Hardware Oriented Security and trust, 2009. [69] M. Yu, and S. Devadas, Secure and Robust Error Correction for Physical Unclonable Function, IEEE Conference on Design and Test of Computers, 2010, pp 48-65. [70] S. Yu, M. D. Raihi, R. Sowell, S. Devadas, Lightweight and Secure PUF Key Storage Using Limits of Machine Programming, Workshop on Cryptographic Hardware and Embedded Systems, 2011. [71] Daniel Rios, Neural networks: A Requirement for Intelligent Systems. 105

http://www.learnartificialneuralnetworks.com/ [72] Hank Wallace, Error Detection and Correction Using BCH Codes, Atlantic Quality and Design, Inc., 2010. [73] M. Negnevitsky, Artificial Intelligence: A Guide to Intelligent Systems, Second Edition, Pearson Education, 2005. [74] Anders Krogh, What are artificial neural networks?, Nature Publishing Group, vol. 26, no. 2, Feb 2008. [75] Fox, L. Kevin, Henning, R. Rhonda, and Jonathan H. Reed, A Neural Network Approach towards Intrusion Detection, in Proceedings of the 13th National Computer Security Conference, 1991. [76] J. Frank, Artificial Intelligence and Intrusion Detection: Current and Future Directions, in Proceedings of the 17th National Computer Security Conference, 1994. [77] L. Fu, A Neural Network Model for Learning Rule-Based Systems, in Proceedings of International Joint Conference on Neural Networks, 1992, pp. 343-348. [78] D. Hammerstrom, Neural Networks At Work, IEEE Spectrum, June 1993, pp. 26-53. [79] E. Tae-Dok, C. Choi, and Ju-Jang Lee, Generalized Asymmetrical Bidirectional Associative Memory, Machine Intelligence & Robotic Control, vol. 1, no. 1, pp. 43 45, 1999. 106

[80] Bart Kosko, Bidirectional Associative Memory, IEEE Transactions on Systems, Man, and Cybernetics, vol. 18 no. 1, January/February 1988. [81] Y. P. Singh, V. S. Yadav, A. Gupta, and A. Khare, Bidirectional Associative Memory Neural Network Method in the Character Recognition, Journal of Theoretical and Applied Information Technology (JATIT), 2009. [82] A. Menezes, P. van Oorschot, and S. Vanstone, Overview of Cryptography, Handbook of Applied Cryptography, CRC Press, Inc, 1997. [83] M. S. Anoop, Public Key Cryptography - Applications Algorithms and Mathematical Explanations, September, 2005. [84] J. Daemen and V. Rijmen. AES Proposal: Rijndael. http://www.esat.kuleuven.ac.be/rijmen/rijndael, 2002. [85] Ian Curry, An Introduction to Cryptography and Digital Signatures, Entrust Securing Digital Identities and Information, 2001. [86] S. Dominikus, A Hardware Implementation of MD4-Family Hash Algorithms, IEEE Proceedings of International Conference on Electronics Circuits and Systems (ICECS, vol. 3, pp. 1143-1146, ), 2005. [87] Digital Signature Standard, National Institute of Standards and Technology (NIST), FIPS PUB 186-2. http://csrc.nist.gov/publications/fips/fips186-2.htm, 2002. 107

[88] R. L. Rivest, A. Shamir, and L. Adleman, A Method for Obtaining Digital Signatures and Public-Key Cryptosystems, in Proceedings of Communications of the ACM, vol. 21, no. 2, pp. 120 126, 1978. [89] The Keyed-Hash Message Authentication Code (HMAC), HMAC Standard, National Institute of Standards and Technology, 2002 http://csrc.nist.gov/publications/fips.htm [90] N. Sklavos, P. Kitsos, K. Papadomanolakis, and O. Koufopavlou, Random Number Generator Architecture and VLSI Implementation, Proceedings of IEEE International Symposium on Circuits & Systems (ISCAS), vol. 4, pp. 854 857, 2002. [91] R. Lee, Law Is Not a Science: Admissibility of Computer Evidence and MD5 Hashes, SANS Computer Forensics Blog, 2009. [92] W. Burr, Cryptographic hash standards: Where do we go from here?, IEEE Security & Privacy, vol. 4, no. 2, pp. 88-91, 2006. [93] Secure Hash Standard (SHA), Draft Federal Information Processing Standards Publication, DRAFT FIPS PUB 108-4, 2011. [94] X. Wang, Y. L. Yin, and H. Yu, Collision Search Attacks on SHA1, 2005. [95] N. Sklavos, and O. Koufopavlou, Implementation of the SHA-2 Hash Family Standards using FPGAs, Journal of Supercomputing, 2008, pp. 227-248. 108

Appendix A BCH Codes Source: http://cwww.ee.nctu.edu.tw/course/channel_coding/chap5.pdf A.1 Introduction BCH (Bose Chaudhuri - Hocquenghem) Codes form a large class of multiple random error-correcting codes. They were first discovered by A. Hocquenghem in 1959 and independently by R. C. Bose and D. K. Ray-Chaudhuri in 1960. BCH codes are cyclic codes. Only the codes, not the decoding algorithms, were discovered by these early writers. The original applications of BCH codes were restricted to binary codes of length 2 m - 1 for some integer m. These were extended later by Gorenstein and Zieler (1961) to the non-binary codes with symbols from Galois field GF(q). The first decoding algorithm for binary BCH codes was devised by Peterson in 1960. Since then, Peterson s algorithm has been refined by Berlekamp, Massey, Chien, Forney, and many others. A.2 Primitive BCH Codes For any integer m 3 and t < 2 m-1 there exists a primitive BCH code with the following parameters: n = 2m 1 n k mt 109

d min 2t +1 This code can correct t or fewer random errors over a span of 2 m 1 bit positions. The code is a t-error-correcting BCH code. For example, for m=6, t=3 n = 2 6 1 = 63 n k = 6 x 3 = 18 d min = 2 x 3 + 1 = 7 This is a triple-error-correcting (63, 45) BCH code. A.3 Generator Polynomial of Binary BCH Codes Let α be a primitive element in GF(2 m ). For 1 i t, let φ 2i-1 (x) be the minimum polynomial of the field element α 2i-1. The degree of φ 2i-1 (x) is m or a factor of m. The generator polynomial g(x) of a t-error-correcting primitive BCH codes of length 2 m -1 is given by g(x) = LCM { φ 1 (x), φ 3 (x),.., φ 2t-1 (x)} Note that the degree of g(x) is mt or less. Hence the number of parity-check bits; n-k, of the code is at most mt. Note that the generator polynomial of the binary BCH code is originally found to be the least common multiple of the minimum polynomials φ 1 (x), φ 2 (x),.., φ 2t (x) i.e. g(x) = LCM { φ 1 (x), φ 2 (x), φ 3 (x),.., φ 2t-1 (x), φ 2t (x)} However, generally, every even power of α in GF(2 m ) has the same minimal polynomial as some preceding odd power of α in GF(2 m ) 110

As a consequence, the generator polynomial of the t-error-correcting binary BCH code can be reduced to g(x) = LCM { φ 1 (x), φ 3 (x),.., φ 2t-1 (x)} Example: m = 4, t = 3 Let be a primitive element in GF (2 4 ) which is constructed based on the primitive polynomial p(x) = 1 + x + x 4 φ 1 (x) = 1 + x + x 4 corresponding to α φ 3 (x) = 1 + x + x 2 + x 3 + x 4 corresponding to α 3 φ 5 (x) = 1 + x + x 2 corresponding to α 5 g (x) = LCM { φ 1 (x), φ 3 (x), φ 5 (x) } = φ 1 (x)φ 3 (x)φ 5 (x) = 1 + x + x 2 + x 3 + x 4 + x 5 + x 6 + x 8 + x 10 The code is a (15, 5) cyclic code. A.4 Properties of Binary BCH Codes Consider a t-error-correcting BCH code of length n = 2 m 1 with generator polynomial g(x). g(x) has as α, α 2, α 3,,α 2t roots, i.e. g(α i ) = 0 for 1 i 2t Since a code polynomial c(x) is a multiple of g(x), c(x) also has α, α 2, α 3,,α 2t as roots, i.e. c(α i ) = 0 for 1 i 2t. A polynomial c(x) of degree less than 2 m 1 is a code polynomial if and only if it has α, α 2, α 3,,α 2t as roots. A.5 Decoding of BCH Codes Consider a BCH code with n = 2 m 1 and generator polynomial g(x). 111

Suppose a code polynomial c(x) = c 0 + c 1 x +.c n-1 x n-1 is transmitted. Let r(x) = r 0 + r 1 x +.r n-1 x n-1 be the received polynomial. Then r(x) = c(x) + e(x), where e(x) is the error polynomial. To check whether r(x) is a code polynomial or not, we simply test whether r (α) = r (α 2 ) =. = r (α 2t ) = 0. If yes, then r(x) is a code polynomial, otherwise r(x) is not a code polynomial and the presence of errors is detected. Decoding procedure 1. Syndrome Computation. 2. Determination of the error pattern. 3. Error correction. A.6 Syndrome computation The syndrome consists of 2t components in GF( 2 m ) S = (S 1, S 2,,S 2t ) and S i r(α i ) for 1 i 2t. Computation: Let φ i (x) be the minimum polynomial of α i. Dividing r (x) by φ i (x), we obtain r(x) a (x) φ i (x) b (x) Then S i = b (α i ) S i = b (α i ) S i = b (α i ) can be obtained by linear feedback shift-register with connection based on φ i (x). 112

A.7 Syndrome and Error Pattern Since r(x) = c(x) + e(x) then S i = r(α i ) = c(α i ) + e(α i ) = e(α i ) for 1 i 2t. A.1 This gives a relationship between the syndrome and the error pattern. Suppose e(x) has ν errors (ν t) at the locations specified by x j1, x j2,., x jν. i.e. e (x) = x j1 + x j2 +. + x jν A.2 where 0 j 1 < j 2 <. < j v n 1. From Equations A.1 & A.2, we have the following relation between syndrome components and error location: S 1 = e(α) = α j1 + α j2 +. + α jν S 2 = e(α 2 ) = (α j1 ) 2 + (α j2 ) 2 +. + (α jν ) 2 A.3 S 2t = e(α 2t ) = (α j1 ) 2t + (α j2 ) 2t +. + (α jν ) 2t It we can solve the 2t equations, we can determine α j1, α j2,,α jν. The unknown parameter α jν = Z u for 1,2,,ν are called the error location number. When α jν, 1 u <ν are found, the powers j u, u = 1,2,.,ν give us the error locations in e(x). These 2t equations of Equation A.3 are known as power-sum symmetric function. A.8 Error-Location Polynomial (Error-Locator Polynomial) Suppose that ν t errors actually occur. Define error-locator polynomial L(z) as L(z) = (1 + Z 1 z) (1 + Z 2 z ) (1 + Z v z) A.4 113

σ 0 + σ 1 z + σ 2 z 2 +. + σ v z v where σ 0 = 1. L(z) has as roots. Note that Z u = α ju. If we can determine L(z) from the syndrome S = (S 1, S 2,, S 2t ), then the roots of L(z) give us the error-location numbers. A.9 Relationship between S and L(z) From Equation A.4, we find the following relationship between the coefficients of L(z) and the error-locator numbers: σ 0 = 1 σ 1 = Z 1 + Z 2 +.. + Z v σ 2 = Z 1 Z 2 + Z 2 Z 3 +.. + Z v-1 Z v A.5... σ v = Z 1 Z 2. Z v Equation A.5 is called elementary symmetric functions. From Equations A.3 and Equation A.5, we have the following relationship between the syndrome and the coefficients of L(z): S 1 + σ 1 = 0 S 2 + σ 1 S 1 + 2 σ 2 = 0 S 3 + σ 1 S 2 + σ 2 S 1 + 3 σ 3 = 0 A.6... 114

S v + σ 1 S v-1 + σ 2 S v-2 +. + σ v-1 S 1 + v σ v = 0 Here for binary case i σ i = σ i when i is odd, and i σ i = 0 otherwise. The Equations A.6 are called the Newton s identities. If we can determine σ 1, σ 2,. σ v from the Newton s identities, then we can determine the error-location numbers Z 1, Z 2,., Z v, by finding the roots of L(z). A.10 Computation of Error-Location Numbers Chien Search. (R.T. Chien, 1964) A Chien-search circuit is shown in Figure A.1. L(z) = σ 0 + σ 1 z + σ 2 z 2 +. + σ v z v = where σ 0 = 1 The roots of L(z) in GF( 2 m ) can be determined by substituting the elements of GF( 2 m ) in L(z). If L(α i ) = 0, then α i is the root of L(z) and α i =α n i is an error-location number. To decode the first received digit r n 1, we check whether α is a root of L(z). If L(α ) = 0, then r n 1 is erroneous and must be corrected. If L(α ) 0, then r n 1 is error-free. To decode r n i, we test whether L (α i ) = 0 If L (α i ) = 0, r n i is erroneous and must be corrected, otherwise r n i is error free. 115

Figure A.1. Chien Search and Error Correction for Binary Code. A.11 Peterson-Gorenstein-Zierler Decoding Algorithm From Equation A.7 σ v S j-v + σ v-1 S j-v+1 +. + σ 1 S j-1 = - S j for j v A.7 Assuming that ν = t, then σ t S j-t + σ t-1 S j-t+1 +. + σ 1 S t-1 = - S j for j t e.g. j = t + 1, we have σ t S 1 + σ t-1 S 2 +. + σ 1 S t = - S t+1 The following matrix equation is obtained for the symmetric function σ j as follows: 116

It can be shown that S' is nonsingular if the received word contains exactly t errors. It also can be shown that S' is singular if fewer than t errors occur. If S' is singular, then the rightmost column and bottom rows can be removed and the determinant of the resulting matrix computed. This process is repeated until one reaches a non-singular matrix. The coefficients of the error-locator polynomial are then founded by the use of standard algebraic technique. Once the ν error locations are known, hence we can use the relation between syndrome components and error locations. Thus, S 1 = e i1 Z 1 + e i2 Z 2 +.. + e iv Z v S 2 = e i1 Z 1 2 + e i2 Z 2 2 +.. + e iv Z v 2... S 2t = e i1 Z 1 2t + e i2 Z 2 2t +.. + e iv Z v 2t The system equations can be reduced to the following matrix form: Decoding is completed by solving for the { } This is a general case of non-binary BCH codes. 117

A.12 BCH Codes as Industry Standards (511, 493) BCH code in ITU-T. Rec. H.261 video codec for audiovisual service at kbit/s a video coding a standard used for video conferencing and video phone. n = 511 m = 9 k = 493 n-k = 18 t = 2 (40, 32) BCH code in ATM (Asynchronous Transfer Mode) This is shortened cyclic code that can correct 1-bit error and detect 2-bit errors. A.13 Program for BCH codes /* The function for coping a template into a VHDL file. */ void CopyFile(fin, fout) /* fin -an input (template) file, fout -an output (VHDL) file*/ FILE *fin, *fout; /* variable declarations */ { char c; c=getc(fin); /* read a character from the input file */ while(c!='#' && c!=eof) /* while the read character is not # or End Of File*/ { putc(c,fout); c= getc(fin); } /* copy the input file into the output file */ } /* Print an And statement that =1 if cout== i for a finite field counter */ void PrintAndCout(fo, i) FILE *fo; 118

int i; { int f, mask, lx; lx=lpow(i); /* lx= i =a 0, a 1 1,..., a m-1 m-1 */ mask=one; /* the mask equals 0 */ if((mask & lx)==0) fprintf(fo," NOT"); /* if a 0 bit of lx is zero print NOT */ fprintf(fo, " cout(0)"); /* print to the file cout(0) */ mask>>=1; /* shift the mask to represent the j+1 bit */ for(f=1; f<m; f++) { fprintf(fo," AND"); if((mask & lx)==0) fprintf(fo," NOT"); /* if the a f bit of lx is zero print NOT */ fprintf(fo, " cout(%d)",f); /* print to the file cout(f) */ mask>>=1; /* shift the mask to represent the j+1 bit */ } } /* Print a LFSR for a counter */ void PrintCoutRing(fo) FILE *fo; { int f, mask; mask=one; mask>>=1; /* the mask =x 1 */ for(f=1; f<m; f++) { if(mask & gen_basis)/*gen_basis -the irreducible polynomial in GF(2 m ) (Appendix A)*/ 119

fprintf(fo,"\tcout(%d)<= (cout(%d) XOR cout(m-1)) AND NOT reset;\n",f,f-1); else fprintf(fo,"\tcout(%d)<= cout(%d) AND NOT reset;\n",f,f-1); mask>>=1; /* mask = x f+1 */ } } Fragment of C executable program... CopyFile(fi,fo); /* the beginning ecount*/ PrintAndCout(fo,k-1); /* for the vdinr signal the AND statement */ CopyFile(fi,fo); PrintAndCout(fo,n-1); /* for the vdins signal the AND statement */ CopyFile(fi,fo); PrintCoutRing(fo); /* for a LFSR in a counter */ CopyFile(fi,fo); /* finishing ecount and beginning enc */ Template The fragment of the bch.vht template for a counter modulo 15 (using multiplication by ) for the (15, 5) BCH code encoder. -- COUNTER MODULO n FOR the (n,k) BCH ENCODER USE WORK.const.ALL; ENTITY ecount IS PORT (clk, reset: IN BIT; vdin: OUT BIT); --vdin - valid data in - vdin=1 if 0 count < k 120

END ecount; ARCHITECTURE ecounta OF ecount IS SIGNAL cout: BIT_VECTOR(0 TO m-1); -- cout in GF(2^m); cout= L^count SIGNAL vdinr, vdins, vdin1: BIT; BEGIN vdinr<=#; -- Run PrintAndCout(fo,k-1) C function -- synchronous reset of vdin1 register if cout==k-1 vdins<= (#) OR reset; -- Run PrintAndCout(fo,n-1) C function -- synchronous set vdins=1 if cout==n-1 vdin<= vdin1 AND NOT reset; -- output signal vdin is related with vdin1 register PROCESS BEGIN -- vdin1 register circuit WAIT UNTIL clk'event AND clk='1'; IF vdinr='1' THEN vdin1<= '0'; ELSIF vdins='1' THEN vdin1<= '1'; END IF; END PROCESS; PROCESS BEGIN -- increment or reset the cout in the LFSR, cout=l^count WAIT UNTIL clk'event AND clk='1'; 121

cout(0)<= cout(m-1) OR reset; # END PROCESS; -- run PrintCoutRing(fo) C function END ecounta; VHDL The fragment of the enc.vhd file for the control system for an encoder -- COUNTER MODULO n FOR the (n,k) BCH ENCODER USE WORK.const.ALL; ENTITY ecount IS PORT (clk, reset: IN BIT; vdin: OUT BIT); --vdin - valid data in - vdin=1 if 0 count < k END ecount; ARCHITECTURE ecounta OF ecount IS SIGNAL cout: BIT_VECTOR(0 TO m-1); -- cout in GF(2^m); cout= L^count SIGNAL vdinr, vdins, vdin1: BIT; BEGIN vdinr<= cout(0) AND cout(1) AND NOT cout(2) AND NOT cout(3); -- the synchronous reset of the vdin1 register if cout==k-1 vdins<= ( cout(0) AND NOT cout(1) AND NOT cout(2) AND cout(3)) OR reset; -- the synchronous set of the vdins=1 if cout==n-1 vdin<= vdin1 AND NOT reset; -- output signal vdin is related with vdin1 register 122

PROCESS BEGIN -- the vdin1 register circuit WAIT UNTIL clk'event AND clk='1'; IF vdinr='1' THEN vdin1<= '0'; ELSIF vdins='1' THEN vdin1<= '1'; END IF; END PROCESS; PROCESS BEGIN -- increment or reset the cout in the LFSR, cout=l^count WAIT UNTIL clk'event AND clk='1'; cout(0)<= cout(m-1) OR reset; cout(1)<= (cout(0) XOR cout(m-1)) AND NOT reset; cout(2)<= cout(1) AND NOT reset; cout(3)<= cout(2) AND NOT reset; END PROCESS; END ecounta; 123

Appendix B Floor Plans Figure B.1. Floor Plan 1. 124

Figure B.2. Floor Plan 2. 125

Figure B.3. Floor Plan 3. 126