Temperature variation effects on asynchronous PUF design using FPGAs

Size: px

Start display at page:

Download "Temperature variation effects on asynchronous PUF design using FPGAs"

Theodora Brooks
6 years ago
Views:

1 The University of Toledo The University of Toledo Digital Repository Theses and Dissertations 2014 Temperature variation effects on asynchronous PUF design using FPGAs Swetha Gujja University of Toledo Follow this and additional works at: Recommended Citation Gujja, Swetha, "Temperature variation effects on asynchronous PUF design using FPGAs" (2014). Theses and Dissertations This Thesis is brought to you for free and open access by The University of Toledo Digital Repository. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of The University of Toledo Digital Repository. For more information, please see the repository's About page.

2 A Thesis entitled Temperature Variation Effects on Asynchronous PUF Design using FPGAs by Swetha Gujja Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Master of Science Degree in Electrical Engineering Dr. Mohammed Niamat, Committee Chair Dr. Junghwan Kim, Committee Member Dr. Weiqing Sun, Committee Member Dr. Patricia R. Komuniecki, Dean College of Graduate Studies The University of Toledo December 2014

4 An Abstract of Temperature Variation Effects on Asynchronous PUF Design using FPGAs by Swetha Gujja Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Master of Science Degree in Electrical Engineering The University of Toledo December 2014 A variety of logic circuits can be implemented on configurable platforms like Field Programmable Gate Arrays (FPGAs). FPGAs have attracted attention because of their usability over the last decade owing to their greater flexibility compared to full- Custom ICs and ASICs. With their growing popularity, FPGAs have become an attractive target for piracy and therefore, there is a need to develop techniques for their security. Integrated circuits are facing security issues like cloning, reverse engineering, overbuilding, and physical tampering. In order to overcome this challenge, the idea of Physical Unclonable Functions (PUFs) came into existence. PUFs are special challengeresponse entities, embedded in a physical device and are used as hardware primitives in the field of hardware oriented security. PUFs extract secret keys from the unique physical characteristics of the chips, i.e., the manufacturing process variations in the chips, in order to authenticate an FPGA. In this thesis, a Self-Timed Ring Oscillator (STRO) based Asynchronous Physical Unclonable Function is implemented on an FPGA. Unlike synchronous ring oscillators, iii

5 STROs use an asynchronous logic that eliminates the need for a global or centralized clock in its architecture. Synchronous circuits face challenges such as time closure effects, increasing clock rates, clock skews, and performance overhead, etc. The asynchronous logic employed in STROs overcomes these challenges. STROs act as building blocks for our proposed Asynchronous PUF design. In this design, the identically placed self-timed ring oscillators utilize the manufacturing process variations to obtain varying frequencies from the PUF. The output frequencies from STROs help in generating the response bits, which act as unique signatures to serve FPGA authentication, cryptographic key generation, and random number generation. The delay variation of interconnects and transistors on a chip is dependent on the junction temperatures, which could affect the response bits when there is a significant change in environmental temperatures. To determine the robustness, the proposed Asynchronous PUF design is validated by testing at 4 different locations on 13 FPGAs at different temperatures: room temperature, 0 o C, 20 o C, 45 o C, and 70 o C. At these temperatures, the frequencies are recorded and compared in order to generate the response bits. The uniqueness and reliability of the obtained responses are observed to be 48.04% (near to the desired 50%) and 97.66%, respectively. It is found that the reliability of the asynchronous design increases with increase in temperature. iv

6 This thesis is dedicated to my family, teachers and friends for making me who I am today.

7 Acknowledgements This journey would have been difficult without the support of my family, my advisor, professors, and friends. Firstly, I would like to thank my advisor Dr. Mohammed Niamat for providing me an opportunity to conduct my Master's research under him and for continued support and guidance. My sincere thanks to Dr. Junghwan Kim and Dr. Weiqing Sun for being a part of my thesis committee. Financial support from the EECS and ET departments is also greatly acknowledged. I would like to thank my lab mates Fathi Amsad, Muslim Mohammed, Tamzidul Hoque, and Kavya Vittala for spending their precious time to help me out in solving the issues faced. I would specially thank Roshan Silwal for guiding me through my thesis work by clearing my doubts. I would like to thank my parents, Gopal Rao Gujja and Aruna Gujja, my entire family and friends for their constant love, support, understanding, encouragement and motivation that made this thesis possible. v

8 Table of Contents Abstract... iii Acknowledgements...v Table of Contents... vi List of Tables...x List of Figures... xi List of Abbreviations... xiv List of Symbols... xvi 1 Introduction and Research Overview Introduction Architecture of FPGA Configurable Logic Blocks (CLB) Input / Output Blocks (IOBs) Interconnect Network Architecture of Spartan II FPGA Architecture of Spartan-3E FPGA Security Issues Related to FPGAs Reverse Engineering Overbuilding...10 vi

9 1.3.3 Cloning Physical Tampering Goals of the Research Thesis Organization Basic Terminology and Definitions Physical Unclonable Functions Hardware Oriented Security Manufacturing Process Variations Environmental Variations Noise Uniqueness and Reliability Uniqueness Hamming Weight (HW) Hamming Distance (HD) Reliability Intra-Chip Hamming Distance (Intra HD) Physical Unclonable Functions PUF Classification Optical PUFs Acoustical PUFs Coating PUFs SRAM PUFs Butterfly PUFs...28 vii

10 3.1.6 Arbiter PUFs Ring Oscillator (RO) PUFs Anderson PUFs Latch PUFs Flip-Flop PUFs Applications of PUFs Device Authentication Cryptography Secret Key Storage (Memory less) Intellectual Property Protection Random Number Generation Asynchronous Ring Oscillators Introduction to Asynchronous Circuits Principles of Asynchronous Logic Muller Gate Asynchronous Ring Oscillator Token and Bubble Propagation Jitter in Ring Oscillators (RO) Implementation of Asynchronous Ring Oscillator on FPGA Introduction Look-Up-Table based Muller Gate Design Asynchronous Ring Oscillator Implementation on FPGA Experimental Results...60 viii

11 6 Temperature Variation on FPGA Based Asynchronous PUF Introduction Asynchronous PUF Architecture Asynchronous PUF Implementation and Results Uniqueness of Asynchronous PUF Design Reliability of Asynchronous PUF at Varying Temperatures Conclusion Contributions Future Work...82 References...83 A Source Codes...92 A.1 VHDL Code for a Six-Stage Asynchronous Ring Oscillator...92 A.1.1 UCF File for Mapping LUT Muller Gates in a Desired Region...96 A.1.2 Test Bench Code for Post Place and Route Simulation of STRO...96 A.2 VHDL Code for Asynchronous PUF Design...98 A.2.1 UCF Code for Mapping Inputs and Outputs on FPGA A.2.2 Test Bench Code for Post Place and Route Simulation of PUF ix

12 List of Tables 5.1 LUT Mapping of a Set Muller Gate. INIT => x FFB LUT Mapping of a Reset Muller Gate. INIT => x 00B Frequencies Obtained from 6-Stage STRO Designs Uniqueness of Asynchronous PUF on FPGA Reliability of Asynchronous PUF on FPGA...78 x

13 List of Figures 1-1 Architecture of a Field Programmable Gate Array Xilinx Spartan II FPGA Architecture Single CLB Architecture in Spartan II FPGA Xilinx Spartan-3E FPGA Architecture Single CLB Architecture in Spartan-3E FPGA Example of Hamming Weight Example of Inter-Hamming Distance Example of Intra-Hamming Distance Optical PUF Coating PUFs SRAM PUF with 6 Transistors Butterfly PUF with Cross Coupled D Flip-Flops Arbiter PUF Ring Oscillator Anderson s PUF Latch cell Architecture of a Synchronous Circuit Architecture of an Asynchronous Circuit...41 xi

14 4-3 Handshaking Protocol in an Asynchronous Circuit Transistor and Logic Level Implementations of Muller Gate Standard Muller Gates and Their Respective Truth Tables Six Stage Pipeline Converted to a Six-Stage Ring Stage STRO using Muller Gates with Inverted Reverse Input Token-Bubble Propagation Burst Mode Propagation and Evenly Spaced Mode Propagation A Stage in Asynchronous Ring Oscillator Instantiation of Set Muller Gate Using VHDL in Xilinx LUTs Occupied by STRO in One CLB on Spartan-3E FPGA Placement Constraints Used for LUTs, Input and Output in Xilinx A 6-Stage Asynchronous Ring Oscillator with LUTs Xilinx RTL Schematic View of 6-Stage Asynchronous Ring Oscillator Logic Circuits Built Inside the (a) Set Muller Gate and (b) Reset Muller Gate STRO Occupying 3 Slices in 1 CLB (in FPGA Editor) Hard Macro of 6-Stage STRO with Reference Component Simulation Result of 6-stage STRO with TTBBBB Configuration in Xilinx Practical Output of a 6-stage SRTO on Spartan-3E FPGA Practical Output of a 6-stage SRTO on Spartan-II FPGA Locations Chosen for Asynchronous PUF Design on Spartan-3E FPGAs Proposed STRO-Based Asynchronous PUF Design for FPGA Technological Schematic of Asynchronous PUF with 16-STROs A Portion of Asynchronous PUF with Identical STRO Hard Macros...68 xii

15 6-5 (a) Basys-2 Spartan 3E FPGAs (b) Agilent 18601A Logic Analyzer Temperature Chamber Post Place and Route Simulation Result of Asynchronous PUF Design Placement and Routing of Asynchronous PUF at Location Asynchronous PUF Outputs Observed in Logic Analyzer Distribution of Frequencies Generated by Asynchronous Rings Distribution of Average Frequencies of STROs at 4 Locations on 13 FPGAs Frequency Distribution Curves of STROs at Different Temperatures...78 xiii

16 List of Abbreviations ASIC...Application Specific Integrated Circuit CB...Connection Block CLB...Configurable Logic Block CLK...Clock CMOS...Complementary Metal Oxide Semiconductor CRP...Challenge-Response Pair EMI...Electro-Magnetic Interference ERAI...Electronics Resellers Association International FF...Flip-Flop FPGA...Field Programmable Gate Arrays HD...Hamming Distance HRNG...Hardware Random Number Generator HW...Hamming Weight IC...Integrated Circuit I/O...Input/ Output IOB...Input/ Output Block IP...Intellectual Property IRO...Inverter Ring Oscillator ITRS...International Technology Roadmap for Semiconductors LUT...Look-Up Table MOSFET...Metal Oxide Semiconductor Field Effect Transistor MUX...Multiplexer NRE...Non-Recurring Engineering PUF...Physical Unclonable Function xiv

17 RO...Ring Oscillator RO-PUF...Ring Oscillator based Physical Unclonable Function SB...Switch Block SR...Set / Reset SRAM...Static Random Access Memory STR...Self-Timed Ring STRO...Self-Timed Ring Oscillator STRO-PUF...Self-Timed Ring Oscillator based Physical Unclonable Function UCF...User Constraint File VHDL...VHSIC Hardware Description Language VLSI...Very Large Scale Integration xv

18 List of Symbols ack...acknowledge signal B...Bubble C...Current output state of Muller Gate C...Previous output state in STRO C i...output of stage (i) in asynchronous ring C i+1...output of stage (i+1) in asynchronous ring F...Forward input of Muller gate f...frequency MHz...Mega-Hertz m...number of responses collected n...number of bits in a response N...Total number of stages in a ring oscillator N B...Number of bubbles ns...nano-seconds N T...Number of tokens R...Reverse input of Muller gate r i...n-bit response obtained from chip i R m...n-bit response obtained from chip m R x...n-bit response obtained from chip x R y...n-bit response obtained from chip y R x...n-bit response obtained from chip x at varied temperature SR...Set/Reset signal T...Token t...respective bit number in a response Z...Number of chips xvi

19 Chapter 1 Introduction and Research Overview 1.1 Introduction A Field Programmable Gate Array (FPGA) is a semiconductor device made up of interconnected functional blocks, which can be programmed by the end user to perform required logic functions. As FPGAs are re-programmable, partially re-configurable and have lower-time to market, FPGAs have become a key component in most of the electronic systems. FPGAs are found to be amongst the largest categories of circuits that are implemented in high-end semiconductor fabrication [1]. Owing to their flexibility, FPGAs are much preferred in the market compared to Application-Specific Integrated Circuits (ASICs). FPGAs are being widely used in the fields of aerospace, defense, wireless communications, automotive, and for high performance computing and data storage, etc. FPGA vendors are investing significant amount of resources to achieve growth in FPGA capabilities and applications. FPGAs are undergoing many security issues as the adversaries are trying make profits by replicating the original design, without any 1

20 investment. So, there is a need for security of FPGAs [2]. The major security issues are cloning, counterfeiting, reverse engineering, Physical tampering, and insertion of malicious components, etc. In the field of electronics, more than 1300 counterfeit incidents were reported to Electronic Resellers Association International (ERAI) in These incidents are double the incidents reported in 2008 and 2010 [3]. This means that the counterfeit components are entering into the market very easily. This problem causes a potential loss to the vendors and increase low quality products in the market. Therefore, there is a need for highly secure and tamper resistant solutions. A unique identifier can be embedded in an IC, to give it a unique identity. This may help in identifying the IC, but cannot authenticate it. To provide Authentication, a secret key must be embedded in an IC. Physical Unclonable Functions (PUFs) [4] can provide these secret keys, and help in identification and authentication of ICs. The functioning of PUFs depends on the physical properties of the chip, in order to generate the secret keys. These physical properties are not reproducible even by the manufacturer. Hence the responses produced by the PUF are unique for every individual chip. These responses act as unique signatures to authenticate the FPGA. The method of generating unique binary signatures helps in cryptographic key generation, digital rights management, Intellectual Property (IP) protection, IC counterfeit prevention, and device authentication. The PUFs are very promising in signature generation in the field of hardware security [5]. Various kinds of PUFs are proposed in the literature. But many of them are synchronous PUFs. In this work, we generate the secret binary responses with the help of 2

21 an Asynchronous PUF design [6]. Our design does not require a clock circuit in its architecture. The design uses the Self-Timed Ring Oscillators (STROs) as basic components. The STROs are very different from the regular Inverter Ring Oscillators (IROs). These STROs help us in avoiding the disadvantages caused by the IROs. In this research, a STRO based asynchronous PUF design is employed and the responses are obtained at different temperatures. These recorded responses are tested for their uniqueness and reliability in order to know the efficiency of the design in providing security to the FPGA. This chapter discusses the basic FPGA architecture, security issues related to FPGAs, the goals of this work and the thesis organization. 1.2 Architecture of FPGA The name itself explains that it is programmable in the Field by the end user, unlike an Application Specific Integrated Circuits (ASIC) which can perform a specific operation decided by the manufacturer [7]. The FPGAs are made up of millions of logic cells. These FPGAs enable the user to implement the combinational or sequential logic functions by using the configurable logic blocks, random access memories, input/output blocks, etc. that are available on the FPGA chip. Some FPGAs are partially reprogrammable during the run time. It is possible to implement reconfigurable hardware circuits on FPGAs. Along with re-programmability feature, FPGAs also provide many advantages such as lower Non-Recurring Engineering (NRE) costs, rapid prototyping, and long product life cycle [2]. Due to these versatile features, FPGAs are having high demand in the market. 3

22 A basic FPGA architecture looks like a two-dimensional array. This array constitutes of Configurable Logic Blocks (CLBs) and Programmable Input/Output Blocks (IOBs). The Programmable interconnect network available on the chip, helps us to connect the CLBs and IOBs with each other. The programmable interconnects possess the routing resources like Connection Blocks (CBs), Switch Blocks (SBs), and interconnects links. Interconnect links are made up of segments of wires of different lengths. These routing resources are responsible for routing the signals in FPGAs. An FPGA architecture with all these elements is shown in the Figure 1-1. Switch Box Vertical Routing Channel SB CB SB CB SB CB CLB CB CLB CB SB CB SB CB SB Configurable Logic Block (CLB) CB CLB CB CLB CB I/O Block SB CB SB CB SB Horizontal Routing Channel Connection Box Figure 1-1: Architecture of a Field Programmable Gate Array Depending on the requirement, a set of individual logic blocks are used in order to program the desired logic functions on FPGAs. Later, the routing is performed between 4

23 CLBs and IOBs using the routing resources. The basic functionality of different components of FPGA architecture are explained below Configurable Logic Blocks (CLB) The CLBs are the building blocks of an FPGA that are used to program the desired logic on to the chip and are capable to store the design. Look-up Tables (LUTs), Flip- Flops (FFs), and Multiplexers are the basic elements of CLBs. The LUT helps in implementing any logic function. The LUT acts as a truth table where different combinations of the inputs execute different functions in order to give the desired outputs. An n- input LUT uses 2 n memory locations [8]. When paired up with an LUT, a Flip-Flop (FF) works as a storage unit. For the purpose of connecting different logic elements together, we can use the multiplexers available inside the CLBs Input / Output Blocks (IOBs) To interface the internal components of an FPGA with the external components, Input/Output Blocks are used. These IOBs comprise of input pads and output pads. An input pad is used to receive an input signal from a pin and an output pad is connected with the output buffer to drive the output signal on to the IOB. Hence any IOB can be programmed to behave as either an input block or as an output block depending on the requirement. They provide characteristics such as low power or high speed connections depending on the I/O standard that is chosen for its operation [9]. 5

24 1.2.3 Interconnect Network The interconnect network helps in connecting the multiple logic blocks in an FPGA. This network helps in carrying signal traffic between different components on FPGA [10]. As the interconnect networks are programmable to provide routing connections between various logic blocks and I/O blocks, they are also known as programmable routing networks. The routing network comprises of connection blocks, interconnect wires, and switch blocks. The elements in an FPGA are arranged in a two dimensional structure, which needs complex routing among logic blocks to perform the required operation. All wires in the interconnect network are organized in horizontal routing channels and vertical routing channels throughout the chip. These wires have segments of different lengths. There are two types of routing i.e. Local and Global routing. The local routing connects one CLB to other adjacent CLBs. The global routing connects the CLBs to IOBs, to other CLBs which are not adjacent, and to any other components on the FPGA. The interior connections of the Switch Blocks (SB) and the Connection Blocks (CB) are done using programmable switches which consists of a pass transistor. Both SBs and CBs are used while routing is employed between two different CLBs or between CLBs and IOBs Architecture of Spartan II FPGA The Spartan II family FPGA consists of 15,000 to 200,000 system gates. Figure 1-2 shows the layout of a Xilinx Spartan II FPGA chip. This is an XC2S100 model which has 100,000 gates in 600 CLBs. These CLBs are divided into 20 rows and 30 columns as 6

shown in Figure 1-2. This FPGA uses 150 nm technology [11] and consists of two identical slices in each CLB. Each slice comprises of two LUTs, flip flops, and multiplexers.

25 shown in Figure 1-2. This FPGA uses 150 nm technology [11] and consists of two identical slices in each CLB. Each slice comprises of two LUTs, flip flops, and multiplexers. Hence there are only four LUTs in each CLB. In Figure 1-2, the blocks at the center of the board are the CLBS and those on the peripherals are IOBs. LUTs inside the CLB of Spartan II are arranged as shown in Figure 1-3. Figure 1-2: Xilinx Spartan II FPGA Architecture Configurable Logic Block (CLB) LUT1 LUT2 LUT1 LUT2 Slice 0 Slice 1 Figure 1-3: Single CLB Architecture in Spartan II FPGA 7

26 1.2.5 Architecture of Spartan-3E FPGA Xilinx Spartan-3E XC3S100E boards are used in this research to implement the Asynchronous PUF design. The FPGA Spartan-3E family consists of 100,000 to 1.6 million system gates. Figure 1-4 shows the layout of a Xilinx Spartan-3E FPGA XC3S100 chip. This FPGA has 100,000 gates in 240 CLBs. The CLBs are divided into 22 rows and 16 columns as shown in Figure 1-4. This FPGA uses 90 nm technology [12]. The CLBs in this FPGA consists of four identical slices. Each slice comprises of two LUTs, flip-flops, and multiplexers. Hence there are eight 4-input LUTs in each CLB along with multiplexers and arithmetic carry logic. LUTs inside the CLB of Spartan-3E are arranged as shown in Figure 1-5. To validate the proposed design, 13 boards of this type are used in this research. Figure 1-4: Xilinx Spartan-3E FPGA Architecture 8

27 Configurable Logic Block (CLB) LUT2 LUT1 LUT2 Slice 2 Slice 3 LUT1 LUT2 LUT1 LUT2 LUT1 Slice 0 Slice 1 Figure 1-5: Single CLB Architecture in Spartan-3E FPGA 1.3 Security Issues Related to FPGAs The scope for FPGA security is very broad and ranges from technological and architectural issues to applications, from FPGA vulnerabilities to new types of security primitives and protocols, from limitations of FPGA-based systems in terms of security to 9

28 their strategic and quantitative advantages, etc. [13]. Depending upon the goal, the attackers try to pirate the original design in many ways. Sometimes, attackers may just copy the original design or they may try to steal the Intellectual Property (IP) and mix it with their own IP in order to develop a new design [14]. Basically, pirated copies are developed to make easy money. These attacks could cause a potential financial loss to the companies or sometimes they may threaten the personal or national security. Different types of threats related to hardware design are discussed in Xilinx whitepaper [15] as follows: Reverse Engineering Reverse engineering is done by observing the exact logic functions of the original design of the chip. Third parties can obtain the outputs of the existing devices by cycling the inputs and analyze the characteristics of the obtained outputs. Since there are many gates in the current FPGA models, this method of reverse engineering can be difficult and also it is time consuming. The attackers may try to analyze the unprotected FPGA bit streams and easily re-create the duplicate of the original design. They may also use this information to produce their own competing products. Anti-Governments can use this information to develop effective countermeasures. Careful designing by the manufacturers can slowly reduce the reverse engineering process Overbuilding This is very dangerous security issue because it is an easy way of design theft. Usually, overbuilding is done by the contractor who builds more than the required number of chips in order to sell them for a cheaper cost to earn money. The over built 10

29 chips perform the exact operations of original design which makes it difficult to discover these products in the market. The only solution for this is that the manufacturing companies should carefully observe the contractors and avoid building excess chips Cloning Cloning is nothing but copying the original design and then prepare a replica of it. In the process of cloning, the attacker does not understand the design completely i.e. the original design is considered as a black box. Hence, the replica designed by them does not operate like the original design. A destructed design is released into the market at a cheaper cost. Applications like aviation may have severe impact on flight safety if a cloned product is used. Cloning enables the third party to develop the replica without any development costs. Tampering is defined as an attempt made by the attacker to get the unauthorized access to an electronic system Physical Tampering Tampering is defined as an attempt made by the attacker to get the unauthorized access to an electronic system. It can be done through reverse engineering. Spoofing is a method of tampering where the attacker replaces all or some parts of an FPGA bit stream. Attackers may tamper the devices with malicious goals. By the method of tampering, an attacker may try to extract the operating data of the original design and then destroy it. From recent studies, the Physical Unclonable Functions are found to be an efficient solution to overcome many security issues related to programmable devices. 11

30 1.4 Goals of the Research The major goals of this research are: To design a Self-Timed Ring Oscillator (Asynchronous Ring) with Muller gates having inverted reverse input. To implement a Look-Up-Table based Asynchronous Ring Oscillator on FPGA and create a hard macro for it. To use the created hard macro for designing an Asynchronous Physical Unclonable Function and implement the design at different locations on FPGAs. To test the robustness of the Asynchronous PUF design on various FPGAs with varying temperatures. To calculate the uniqueness and reliability of Asynchronous PUF using the responses obtained from 13 Spartan 3E FPGAs at different temperatures. 1.5 Thesis Organization In Chapter 2, different terminology used in this thesis such as PUFs, hardware oriented security, process variations, environmental variations, Noise, uniqueness and reliability were briefly introduced for a better understanding of this thesis. Chapter 3 presents different types of PUFs introduced in literature, which vary in their construction and operation. The real time applications of PUFs were also discussed in this chapter. 12

31 Chapter 4 discusses the asynchronous circuits, their advantages over synchronous circuits, and asynchronous logic i.e. hand shaking protocol. This chapter also presents a basic Muller gate, asynchronous ring oscillator and its operation depending on the token bubble propagation. Jitter (thermal noise) in ring oscillators is also explained. Chapter 5 discusses the implementation of asynchronous ring oscillator on FPGAs, which includes the design of LUT based Muller gate, using a Muller gate as set/reset Muller gate depending on INIT numbers and the experimental results obtained by implementing a 6-stage asynchronous ring on FPGA. Chapter 6 provides architecture of the proposed Asynchronous PUF design, implementation of Asynchronous PUF on FPGA and evaluation of the design in terms of uniqueness and reliability considering temperature variation. Finally a summary of the proposed research, conclusions drawn, our major contributions, and Future work are presented in Chapter 7. 13

32 Chapter 2 Basic Terminology and Definitions This chapter discusses the basic terminology used in this thesis for a better understanding of the work done. 2.1 Physical Unclonable Functions A Physical Unclonable Function (PUF) is defined as a function that maps challenges to responses, that is embodied by a physical device which is easy to evaluate and hard to characterize [4]. The PUFs operation exploits the unavoidable fabrication process variations of a chip. Reproducing an exact replica of a PUF based system is impossible even for a manufacturer (who has complete details about the system design). Physical Unclonable Functions use the challenge-response pairs (CRP) mechanism. For a given input (challenge), a PUF gives a unique output (response) from the chip. The responses are unique for each chip as the process variations vary from one chip to the other. Physical Unclonable Functions have an ability to produce identical responses for a particular challenge, even under different environmental conditions. PUFs are useful for solving many security issues in the field of hardware oriented security and have gathered huge attention in current research. 14

33 2.2 Hardware Oriented Security The vulnerabilities in product design and fabrication process have increased the possibility of introducing malicious circuit into the original design such that it impacts the functionality of the product or it leaks the confidential information to the adversary [16]. These malicious circuits are known as Trojans. Trojans are a security threat that is able to steal the confidential information of the design. As the Trojans are introduced into the physical hardware, there is a critical need for hardware oriented security in the market. The field of hardware oriented security is gaining high concentration in order to overcome the challenges due to hardware security issues like cloning, reverse engineering, etc. Some Trojan detection techniques have been proposed recently to overcome such challenges [16, 17]. Software oriented security is another area in computing and process, that focuses on protecting a software from software cloning, illegal usage, etc. This research concentrates on hardware oriented security using PUFs. 2.3 Manufacturing Process Variations As the transistor feature sizes are being scaled down, there has been a great concern over the increasing process variations. During the process of design, fabrication and manufacturing test of an IC, an inevitable randomness occurs. This randomness results in the variation of effective channel length, width, gate threshold voltage and oxide thickness [18]. Even the same lot or wafer in a die exhibits inherent process variations. A die may possess delay variations due to the mask variations, lithography, junction temperature, etc. and is called as system component of delay variations [4]. There are random variations from one wafer to other in a die. These variations are caused 15

34 by the process temperature and pressure variations while manufacturing. The manufacturing process variations are uncontrollable and result in different delay characteristics of circuit elements in a chip. Such uncontrollable variations are taken into consideration in PUF designs to generate the unique response bits from every chip. 2.4 Environmental Variations The noise in a silicon PUF increases with increasing change in environmental conditions like temperature and voltage [19]. Ambient temperature is the most significant environmental condition that affects the chip operation. The delay of gates and wires vary with variation in ambient temperature. Variation up to ±25 degrees centigrade may result in considerable delay variations of the gates [4]. The relative measurement of delays using delay ratios can provide robustness to the design against environmental conditions. Identical placement of the elements of PUF design reduces the impact of changing junction temperatures. In this research, the PUF is designed in such a way that the STROs are identically placed along the FPGA and the delay variations are recorded at varying temperatures in terms of frequency, in order to ensure robustness of the design. 2.5 Noise Every electronic device will have some sources of noise. The source of noise could vary from one device to the other. An electronic device may undergo different types of noise from its manufacturing to its usage. In [20], different types of noise produced due to different sources are explained as follows: Noise due to manufacturing process - The variation in silicon layers during the process of design and manufacturing may cause noise in an electronic device. This noise 16

35 is specific to each integrated circuit. After the device is manufactured, it contains the information of the noise due to variations. A Physical Unclonable Function should be able to exploit the manufacturing noise and help in identifying the chip properties. Local Noise A noise that appears during the circuit operation is known as local noise. This type of noise is observed due to the random thermal motion of charge carriers. The local noise may help in generation of random numbers but it is not suitable for PUF circuits. This noise has to be compensated in order to exploit the process variations using a PUF design and to lower the intra-chip variation. Global Environmental Noise This noise occurs due to the environmental conditions like variation in temperature and voltage during the circuit operation. High environmental noises can disrupt the PUF responses and lead to high intra-chip variation. This makes it difficult to perform circuit identification. Hence, there is a need for a PUF design that is not sensitive to the environmental noise. The asynchronous PUF design in this thesis showed good robustness to environmental conditions (discussed in Chapter 6). 2.6 Uniqueness and Reliability [21] Uniqueness (variability or inter & intra-chip comparison) and reliability (stability or intra-chip comparison at varying conditions) are the quality metrics used to determine the robustness of the PUF design to generate responses with required variation in chips Uniqueness Uniqueness is defined as an estimate of the ability of a PUF design to generate the random responses and to differentiate each chip by using the generated responses. Uniqueness is measured by the inter-chip variation of the responses obtained. This 17

36 variation can be calculated with the help of Hamming Weight (HW) and inter-chip Hamming Distance (HD). If the PUF responses are truly random with equal probability of 0 s and 1 s, then the ideal uniqueness is expected to be 50 % Hamming Weight (HW) The responses from the chip are obtained in terms of binary bits. The Hamming Weight of a binary string is the number of bits that are different from the zero-bit in the entire string, i.e., the number of 1 s in a string are counted in order to determine the Hamming Weight. An example of Hamming Weight calculation is shown below this string has seven 1 s and three 0 s. Hence the HW =70 %. Consider a set of response bits as shown in Figure 2-1. When a challenge is given to the chips, assuming m number of responses are obtained as R1, R2, and R3 up to R m. Each response is having n bits. Then, the Hamming Weight of each response is calculated and the average of all the HWs is determined in order to get the intra-chip uniqueness. This procedure is formulated as follows: Hamming Weight HW = Where i,t is the t th bit of an n bit response from chip i where t=1 to n. The average of HW of all responses in a chip gives the intra-chip uniqueness. 18

37 n Bit responses Challenge R1 R2 R3 R4 HW1 + HW2 + HW3 + HW4 Rm + HWm Average HW= (HW1+Hw2+HW3+HW4.+HWm) n Figure 2-1: Example of Hamming Weight Hamming Distance (HD) For binary strings, a Hamming Distance between any two strings of equal length is defined as the number of bits that are different in the two strings. An example of Hamming Distance calculation is shown below. Example: Consider two strings of equal lengths and The Hamming Distance (HD) is 3 bits i.e. 30% because only three bits are different when both strings are compared. Consider a set of response bits as shown in Figure 2-2. When a challenge is given to the Z number of chips, assuming m number of responses are obtained as R1, R2, and R3 up to R m. Each response has n bits. The Hamming Distance is calculated by comparing each response with every other response in this set and the average of all the HDs is determined in order to get the inter chip uniqueness. 19

38 This procedure is formulated as follows: The Inter-HD = Where R x and R y are the n bit responses of chip x and chip y respectively and z be the total number of chips. This equation gives the average inter-chip uniqueness. n Bit responses R1 HD12 HD13 HD14 HD1m R2 HD23 HD24 HD2m R3 HD34 HD3m Challenge R4 HD45 HD4m Rm Figure 2-2: Example of Inter-Hamming Distance Reliability A design is said to be reliable, if the responses generated on a chip are stable even at changing environmental conditions. Reliability is determined by Intra-Chip Hamming Distance i.e. by comparing the responses generated by the same chip at different environmental conditions. In this research, reliability of the design is tested with varying temperatures on FPGAs Intra-Chip Hamming Distance (Intra HD) Intra-Chip Hamming Distance is the Hamming Distance calculated by comparing the responses generated at varying environmental conditions on the same chip. In our work, Intra HD is the Hamming Distance between the responses generated at different 20

39 temperatures and the room temperature. Assume that the chip is subjected to 0 0 C, 20 0 C, 45 0 C, and 70 0 C temperatures and the responses are obtained as shown in Figure 2-3. Each response is compared with the response obtained at room temperature and Hamming Distances are calculated. The average of all these HDs gives the intra-chip Hamming Distance. This is formulated as follows: Intra HD = Reliability = 100% - Intra HD Where R x is the n bit response of chip x at room temperature. is the response at varied temperature. The average intra HD gives the reliability of the design at varied temperature. At Room Temperature R1 Challenge R2 R3 R4 At 0 degrees At 20 degrees At 45 degrees HD12 HD13 HD14 R5 At 70 degrees HD45 n Bit responses Intra HD = (HD12 + HD13 + HD14 + HD15) n Figure 2-3: Example of Intra-Hamming Distance 21

40 Chapter 3 Physical Unclonable Functions Physical Unclonable Functions (PUFs) were invented by Naccache and Fremanteau in The concept of PUF was formally introduced by Pappu, et. al. in 2001 [22]. Since then, much interest has been developed in this area of research. A basic Physical Unclonable Function is capable of exploiting the complex physical characteristics (manufacturing process variations) of the system, such that the system cannot be replicated even with the knowledge of full design. This gives a unique set of CRPs for every physical system. The PUF output is volatile (stored by manufacturer but not on the chip) and prevails only in digital form when a system is turned on. Hence, a PUF does not need storing its output and helps preventing some security issues. Physical Unclonable Functions are usually implemented in tamper-resistant hardware environment, which makes it difficult for the attackers to get the system (ICs, ASICs, and FPGAs) design information. A PUF can be compared to physical objects that contain random components which make them unclonable. In other words, a PUF is unclonable because of the lack of manufacturing control over the sub-micron process variations. The process variations are 22

41 substantial in advanced semiconductor fabrication techniques due to their nanometer feature sizes. They occur due to the lack of precise control mechanism over the diffusion dopants [23, 24] and lack of robustly fabricated geometric features [23, 25]. These process variations are exploited with the help of the challenge-response pairs (CRP) mechanism in a PUF. An input given to a PUF is called challenge and an output obtained is called response. The CRPs in PUF produce unique signatures from a physical system. A PUF design can be employed for reliable identification, device authentication, key storage, and other security tasks. This chapter discusses the various types of PUF and their classifications. 3.1 PUF Classification Based on the security aspect, PUFs can be classified into strong PUFs, controlled PUFs, and weak PUFs. A strong PUF [26, 22] utilizes the underlying properties of physical devices to output a response for a given challenge. A strong PUF design allows an adversary only to extract insignificant quantity of information of PUF responses obtained with random challenges, making it hard to characterize the CRPs resulting in difficulties to clone the design. These PUFs can be used for key generation, device authentication and identification. The controlled PUFs [27] are an alternative to strong PUFs, as they provide more security by preventing the adversary from applying the challenges directly to the PUF and avoids direct access to the PUF responses. A strong PUF with very few CRPs can be called as a week PUF [28]. Week PUFs are an alternative to the standard nonvolatile memory based keys storage for cryptography, as these PUFs are tolerant to invasive attacks which are possible on memory based systems. 23

42 Based on the fabrication process the PUFs can be classified into two types: nonsilicon PUFs and silicon PUFs. Non-silicon PUFs use explicitly introduced randomness occurring in physical systems rather than ICs. The PUF proposed in [22] is a non-silicon PUF which uses variations in optical systems. Coating PUFs are a class of non-silicon PUFs which use a network of metal wires laid out as a comb on top of ICs. Although the coating PUFs are fabricated on ICs, they do not come under silicon PUFs because they need additional fabrication techniques that are used for generic CMOS fabrication technology. The work in [29, 30] presents more non-silicon implementations. Silicon PUFs use the Randomness due to intrinsic properties. These are fabricated using the existing ASIC/FPGA fabrication process and are able to easily interface with FPGAs/ICs. Silicon PUFs capture the device variations by using perfectly designed configurations of identical circuits, and may lead to slight variation in circuit characteristics like propagation delays, voltage drop, leakage current, etc. These characteristics result in the unique responses for a given challenge. Different types of PUFs are discussed in this section Optical PUFs The Optical PUFs are the earliest version of the unclonable functions which can be called as weak non intrinsic PUFs. Optical PUFs are non-electrical PUFs. They are unlikely to be used in real world applications, but serves as a good demonstrative object to make someone understand the idea of PUFs. A basic element of an optical PUF is a token that contains an optical micro-structure built by combining the microscopic refractive glass spheres in a transparent epoxy plate [31]. When the token is radiated with 24

43 a laser beam, a random speckle pattern is observed. This pattern is recorded, quantized and then encoded in order to obtain the PUF responses. The angle or focal distance or wavelength of the laser beam could be a challenge in this PUF. Even for a little change in the angle of the beam, it is shown that a completely different speckle patterns and their hash values are obtained [32, 29]. An implementation of a basic optical PUF is shown in Figure 3-1 where the CRP consists of the laser orientation and their resulting hash which are saved in the public database for future use. In this PUF the intra and inter Hamming Distances are % and %. From this it is clear that these are not suitable for large scale production. A high overhead of laser beams and mechanical position make them very expensive and complex. These are implemented on ICs in [33]. Figure 3-1: Optical PUF [31] 25

44 3.1.2 Acoustical PUFs Acoustical PUFs [34] are a special type of non-electrical PUFs. These PUFs show that the design of physical systems includes the measurements of noisy random signals. This type of PUF is based on the acoustical delay lines or components to delay the electrical signals and convert these electrical signals into a mechanical vibration. They also convert mechanical signals to electrical. An observation of the acoustic delay line s characteristic frequency spectrum helps in constructing the Acoustical PUF. Principle component analysis has to be performed to observe the bit string Coating PUFs Coating PUFs work on the method of inserting a proactive coating, which is the covering applied to the top layer on the device surface. These PUF are weak nonelectrical PUFs. These are called non-silicon PUFs because Coating PUFs do not depend completely on the manufacturing variability, but insert random elements into the device. The dielectric particles are doped into the opaque coating materials. These particles are sprayed directly on the sensors at the top layer of the device. They have random properties in size, shape and location, which helps in offering strong protection from physical attacks and are called tamper evident. A challenge with particular frequency and amplitude is applied to a region of the sensor array and responses are taken as random capacitance measurements [32, 31]. Figure 3-2 shows the basic operation of a Coating PUF. The upper left picture in the Figure 3-2 shows the schematic of the cross-section of a CMOS IC. These PUFs have high randomness (inter HD = 50 %) and low noise (intra HD = 5%). 26

Figure 3-2: Coating PUFs [31] 3.1.4 SRAM PUFs SRAM PUFs are memory based silicon PUFs that come under the weak PUFs.

45 Figure 3-2: Coating PUFs [31] SRAM PUFs SRAM PUFs are memory based silicon PUFs that come under the weak PUFs. A basic 6 transistor (6T) SRAM memory cell constitutes of two cross coupled inverters with four load transistors (T1, T2, T3, T4) and two access transistors (T5, T6) as shown in Figure 3-3 [35]. To perform write operation with these cells, the transistors are turned on after the bit lines are loaded with the correct values. The cells perform read operation by forcing the bit lines to logic 1 for certain time and the access transistors are turned on. The bit lines are forced to the value available in the cross coupled inverter structure because of the dynamic nature of the charge. SRAMs perform proper read write operations with the correct transistor size. In order to flip the state of SRAM cell, the voltages are set as high as possible. 27

46 Vcc T2 T4 T5 T6 T1 T3 GND Figure 3-3: SRAM PUF with 6 Transistors These PUFs consider the process characteristics of load inverters in order to produce a random response. In SRAM PUFs the sensitivity to device variations of cross coupled inverters is increased using the minimum width transistors. Due to the process variations, one inverter gets more voltage than the other when the device is turned on. This property helps in generating the output response i.e. secret key in an SRAM PUF [28]. Due to the feedback structure, the voltage is amplified to either logic 0 or logic Butterfly PUFs A Butterfly PUF is a weak PUF which is regarded as memory based silicon PUF. SRAM PUFs and butterfly PUFs are conceptually similar. These PUFs have memory cells with unpredictable startup values. In FPGAs, all the SRAM cells are reset to a known state using the device reset. The Butterfly PUFs were introduced in [36] which 28

47 enabled the research on memory type PUFs on the FPGAs. A Butterfly PUF with cross coupled D flip-flops is shown in Figure 3-4. D SET Q CLK LATCH 2 EXCITE 0 CLR OUTPUT CLK 0 D SET Q LATCH 1 CLR Figure 3-4: Butterfly PUF with Cross Coupled D Flip-Flops The excitation signal (Excite in Figure 3-4) acts like the input. This signal will be raised high for few clocks. As the preset and clear pins on both latches are asserted, and the outputs are cross coupled, the circuit drives to indeterminate i.e. unstable state. By releasing excite signal, the output is obtained as either 0 or 1 according to the mismatch of delay between interconnects. In a complete symmetrical routing, the delay occurs due to the process variations. These PUFs require very careful routing due to the typical routing properties of the FPGA, which is a disadvantage Arbiter PUFs An Arbiter PUF [37] is a delay based silicon PUF which is also known as switch based PUF. It consists of a D flip-flop where one pin is attached to clock pin and the 29

48 other is given to the data pin. An Arbiter PUF sets up a set of closely-matched race tracks with an arbiter at the end. This determines which signal first reaches the end. The Figure 3-5 shows an Arbiter PUF with multiplexers as basic elements D Q RESPONSE Stimulus C[0] C[1] C[C-1] Challenge Figure 3-5: Arbiter PUF The adjustable delay portion can be implemented not only with multiplexers but in many other ways. In [38], the author uses LUTs to obtain precise programmable delay lines. A rigorous large scale analysis of Arbiter PUFs is shown in [39]. This PUF has some disadvantages related to security. Using machine learning technique, one can observe a sufficient number CRPs of this PUF and will be able to guess the outcome easily. This PUF also needs critical concentration on routing without which proper outputs cannot be obtained. Tristate buffer PUFs [40] are also similar in capturing the identical delay line variations Ring Oscillator (RO) PUFs The Ring Oscillators were first proposed by Suh and Devdas [26]. These PUFs are delay based silicon PUFs which comes under the category of strong PUFs. These 30

49 PUFs exploit the process variations in order to generate the response bits. They utilize the propagation delay comparison technique for identical circuits to obtain the unique binary bits. RO PUFs contain identically built ring oscillators, each RO has their own characteristic frequency due to the manufacturing process variations and environmental variations resulting in unique outputs from each RO. 1 MUX COUNTER 2 COMPARATOR OUTPUT 0 or 1 M oscillators COUNTER M INPUT Figure 3-6: Ring Oscillator A Ring Oscillator PUF with three stage inverter ROs is shown in Figure 3-6. The architecture of a Ring Oscillator consists of an odd number of inverter stages with a feedback from the output of the last inverter connected to the input of the first inverter. These inverters are implemented using the MOSFETs. In MOSFETs, the source drain current flows only when the gate capacitance is charged. This results the output of the inverters. Each inverter contributes some delay and it is observed in the form of a square wave of the RO output. The output of the inverter is changed after a finite amount of 31

50 time, when the input is changed. The time is controlled by the clock circuit in these type of oscillators. Each inverter ring oscillator gives a different output, each output is fed to the multiplexers and the number of cycles in the output of a selected inverter in given time is counted by the each counter. Both counters give a value of number of cycles. These values are compared by the comparator to generate the binary bits as a secret response. As the ROs are sensitive to process and environmental variations, they are highly used to create sensors in order to measure the voltage and temperature effects on various platforms [41]. The RO PUF is embedded on ASIC while manufacturing, but on an FPGA, the PUF can be mapped by the user at any time. The RO PUFs need careful identical placement of the ring oscillators on an FPGA. Placement constraints can be used for this purpose. To improve uniqueness, a proper selection of ROs is a must. These PUF are not self-timed as they are using a clock circuit in their architecture Anderson PUFs Anderson PUF [5] is quite a unique PUF that can be implemented on FPGAs without any need for hard macros to control the symmetry. As this PUF depicts the presence of glitch in the circuit, this is also called a Glitch PUF. Anderson PUFs uses the carry chain multiplexers available on some of the FPGA components. Figure 3-7 shows the simplified Anderson s PUF. LUTs 1 and 2 (in Figure 3-7) are used as shift registers. The LUTs are initialized by the bit strings that are inverse to each other. When the clock is given, both LUTs give 32

51 square waves that are 180 o out of phase. The inputs to both LUTs are used to control the sustainable output. Both LUTs result in different outputs due to the process variations and varying propagation delays. The outputs are out of phase and produce sufficient rising glitch which is captured by a flip-flop. Glitch presence determines the output bit. LUT 1 ß INIT : 5555 OUTPUT CLK OUT 0 1 LUT 2 ß INIT : AAAA N1 CLK OUT 0 1 CLK Figure 3-7: Anderson s PUF Latch PUFs These PUFs are very similar to SRAM and Butterfly PUF designs. Latch PUFs do not use inverters or latches in their architecture. They use two cross coupled NOR-gates. Figure 3-8 shows a simple latch cell in a PUF. The internal mismatch of the two symmetric components controls the stable state to which the latch converges after reset is applied. 33

52 RESET Figure 3-8: Latch cell Flip-Flop PUFs Flip-Flop PUFs [42] rely on the power-up characteristic of the D flip-flops which are uninitialized. When the IC is turned on, the output of each flip-flop in the PUF design will be either 0 state or 1 state. The output state is uncontrollable because of the process variations. Flip-Flop PUFs are advantageous compared to SRAM PUFs in one sense, because Flip-Flop PUFs easily spread around an IC and make it difficult for an adversary to locate them while reverse engineering [43]. 3.2 Applications of PUFs Device Authentication [26] A group of PUF structures will be able to generate a unique and unpredictable signature for each device with the help of process variations. These signatures are used for authenticating the device. As biometrics are used to identify people, these unique signatures are used in identifying the objects. When a device is manufactured, a set of challenge response pairs can be collected from the device by a trusted candidate and can be stored in the database for future use. These stored CRPs act as a lock and the PUF 34

53 design acts as a key. When the device has to be authenticated the PUF design can be used to generate the unique signatures, i.e., responses for given challenges and these are compared with the stored CRPs. If both of them match, the lock opens and the device is authenticated. This process is like encrypting the unique responses as secret bits on the hardware and decrypting it using PUF responses for future authentication. PUF provides device authentication at very low costs Cryptography [26] The generated secret responses from the PUF design can be used for cryptography. The cryptographic keys generated need not be stored on the hardware as they are generated dynamically at device reset. To use the PUF outputs for Cryptography, an error correction must be done on the secret responses because the PUFs generate outputs that may vary slightly from one evaluation to other. Once the PUF outputs are subjected to error correction, the error correction syndrome is saved for later. Each time the PUF outputs are taken, this error correction syndrome factor is used to correct the bit flips in the generated responses. The corrected responses are converted to cryptographic keys Secret Key Storage (Memory Less) [26] The secret keys used for cryptography are stored in nonvolatile memories. This method is in practice and is very expensive as well as difficult to protect from the adversaries. PUFs possess the property of generating volatile secret keys which do not require any memory storage on the chip and can be used for cryptographic applications. The digital secret keys generated by PUF design during device operation provide high security to the device. 35

54 3.2.4 Intellectual Property Protection Intellectual Property (IP) protection [28] is another advantageous application of the PUF. A private key generated by PUF design using process variations is encrypted on the FPGA chip during manufacturing and the Public key generated by the PUF is used to decrypt the private key. An adversary trying to reverse engineer the chip design will not be able to decrypt the private key, because the adversary cannot generate the exact PUF responses. Hence the adversary fails in producing a device in to the market, which can be authenticated Random Number Generation With some modifications, the PUF design can be used as a true random number generation which is cryptographically secure. Hardware Random Number Generator (HRNG) uses the randomness of the device physical properties, i.e., the manufacturing process variations in order to generate random numbers. HRNG accepts the challenge given and produces an unpredictable output that undergoes post processing and the random number is generated by removing the bias. This process is explained in detailed in [44]. 36

55 Chapter 4 Asynchronous Ring Oscillators 4.1 Introduction to Asynchronous Circuits Silicon fabrication technology has been scaling down at a high speed. It is starting from 28 nm technology in the market. Due to the scaled down technology, chip manufacturers are facing problems with process variations, power consumption, environmental parameters variation and Electro Magnetic Interference (EMI). A typical synchronous circuit depends on the global timing assumption. This timing is determined by the clock in a synchronous circuit. Although the clock has some advantages in a synchronous digital circuit, it introduces a number of significant problems in the design that uses the current technology. Due to the presence of process variations, the circuit timing analysis is highly affected in synchronous circuits. The clock in a synchronous circuit consumes more power and requires much effort to manage. Clock distribution, time closure effect, increasing clock rates, decreasing feature size, clock skew, and performance overhead are some of the challenges faced in clocked circuits [45]. 37

56 To reduce the above problems with synchronous circuits, the global clock signal must be eliminated. This gives rise to the basic idea behind an asynchronous design, where the globally distributed clocks are not used. Asynchronous circuits are found to be an efficient alternative to overcome these problems with synchronous circuits [46]. An asynchronous circuit communicates with different logic blocks using the hand shaking protocol. This protocol helps in achieving synchronization and sequencing of operations. They do not need a centralized clock to build a digital design and rely on the dynamic timing analysis of the logic. Asynchronous circuits have local clocks that are not in phase and their time-period is determined using the actual circuit delays. Benefits of asynchronous system design are explained in [45, 47] as follows: No clock skew problem- The simpler handshake interfaces and local timing in asynchronous designs eliminate the difficulties such as clock distribution and clock skew, as there is no centralized clock. Increased power efficiency- Asynchronous circuits consume zero dynamic power because signal transitions in such circuits stop automatically in the absence of data. Absence of clock signals and clock switching results in low power consumption in these circuits. Higher performance- An asynchronous circuit encourages dynamic change in speed and its performance depends on the average-case delay but not the worst case delay as in synchronous circuits. This results in high performance because the average-case delay is always less than the worst case delay. Robust and adaptive to operating conditions- A global clock signal has high impact on its timing due to operating conditions such as varying temperature and voltage. 38

57 As asynchronous circuits do not need a global clock signal and have loose timing requirements, the circuit operation will take place correctly under varying operating conditions. Greater modularity- Automatic adoption of speeds in asynchronous circuit results in proper operation of two distinct components that are put together. These circuits can easily handle the variable speed environments. Hence their modularity is almost perfect. Less EMI noise- Asynchronous circuit components operate at very different speeds. So, the EMI noise generated in the circuit is distributed in a wider range of frequency band and results in reduced effect of interference between different components. In a synchronous circuit, the EMI noise is confined to narrow frequency band because all the components operate at same clock frequency. In spite of all these advantages of asynchronous circuits, the one reason to choose synchronous circuits is due to the fact that the asynchronous circuits are difficult to design and need much attention to the dynamic state of the circuit. Handshaking can cause overhead in circuit area and circuit speed, which is another disadvantage. Asynchronous design is an active area of research since 1950 s but still has not achieved widespread usage. They may account for only 23% of chip area by 2014 according to ITRS [48] Principles of Asynchronous Logic A synchronous design consists of combinational function blocks and registers and the circuit activity is under the control of a global clock that triggers the memory of all registers at the same time. Global timing is considered in entire circuit, resulting in 39

58 synchronization and stable outputs. Figure 4-1 shows a synchronous design. In an asynchronous circuit, the synchronization is achieved by replacing the clock signal with a hand shaking protocol [49]. The asynchronous circuits are made up of multiple stages that can communicate with each other. The communication is controlled by the presence of data at both the inputs and the outputs [50]. In these circuits, the activities are controlled locally, unlike synchronous circuits, where a global control is employed. Asynchronous circuits operation resembles a data flow model. The basic structure of an asynchronous circuit is shown in Figure 4-2. Consider a single stage in Figure 4-2. Each stage consists of a functional block where the computation of the logic takes place and a storage block to store the output of the functional block. The storage block can be either a flip-flop, register, or a latch. The data received from the input port is computed and then stored in storage block. This storage block is connected to the output ports. Data communication is controlled by the control part present in each individual block. A bidirectional exchange of information takes place between the sender and receiver. This mechanism is called handshaking protocol. Clock IN Functional Block Storage Out IN Functional Block Storage Out IN Functional Block Storage Out Synchronous stage 1 Synchronous stage 2 Synchronous stage n Figure 4-1: Architecture of a Synchronous Circuit 40

59 Acknowledgement Request Controller Acknowledgement Request Controller Acknowledgement Request Controller IN Functional Block Storage Out IN Functional Block Storage Out IN Functional Block Storage Out Asynchronous stage 1 Asynchronous stage 2 Asynchronous stage n Figure 4-2: Architecture of an Asynchronous Circuit The handshaking protocol follows the sequencing rules of an asynchronous circuit. First rule is that the computation in an asynchronous stage takes place only if all the required data for the computation is ready. Input ports are released when the output is stored in the storage block. The stored result is sent through output ports only when they are available, i.e., the input ports of next stage are released. The handshaking protocol helps achieving the local sequencing [51]. The asynchronous operators with handshaking protocol are shown in Figure 4-3. Request and acknowledge (ACK) communication takes place between each block in it. The request signal activates the next module connected to it, resulting in a hazard-free circuit synthesis [52]. Ack Data/ Request Asynchronous operator 1 Ack Data/ Request Asynchronous operator 2 Ack Data/ Request Ack Data/ Request Asynchronous operator n Ack Data/ Request Figure 4-3: Handshaking Protocol in an Asynchronous Circuit 4.2 Muller Gate Muller gate was introduced by D.E. Muller [53]. It is also known as Muller C- element or C-element. Muller gate can be used as a basic building block in asynchronous circuit design. It helps to employ a hazard-free handshaking protocol in an 41

60 asynchronous circuit. Each C-element has two inputs: forward input F and reverse input R. A forward input will be connected to its previous stage output and reverse input will be connected to its following stage output. If both inputs are equal, the output(c) is equal to its inputs. If the inputs are not equal, then their output is hold (C'), i.e., it remains the same as the previous output state [53]. In other words, the output becomes true (logic high) when all inputs are true and remains the same until all the inputs are false [54]. In a Muller gate with forward input and inverted reverse input, the output is equal to its forward input when both inputs are different and output holds its previous state when both inputs are equal. Different implementations of Muller gate at transistor level and logic level are shown in Figure 4-4. A standard Muller gate and a Muller gate with inverted reverse input (used in STRO) are shown in Figure 4-5 along with their truth tables. The speed of a Muller gate plays an important role in asynchronous circuits. Vcc F R Weak Inverter C F C F C R R A) CMOS implementation B) Nand implementation C) And & or implemmentation Figure 4-4: Transistor and Logic Level Implementations of Muller Gate 42

61 F R C F R C F R C F R C C' 0 1 C' C' C' Figure 4-5: Muller Gate and its Truth Table (left). Muller Gate with Inverted Reverse Input and its Truth Table (right). 4.3 Asynchronous Ring Oscillator A ring structure can be used to perform iterative operations. A pipe line with each stage having a functional block (combinational circuit), memory element (storage) and channels or links (consisting of request, acknowledgement and data signals) can be turned into a ring by connecting the output of last stage to the input of first stage [55]. If each stage in this ring is self-timed by using request/acknowledgement handshake, an asynchronous ring can be obtained. An example of a six stage pipeline converted to a ring is shown in Figure 4-6. An asynchronous ring oscillator is also known as a Self-Timed Ring Oscillator (STRO) which acts as a good alternative to standard inverter based ring oscillators. An asynchronous ring structure consists of a Muller gate with an inverter connected to its reverse input R [53]. This ring is capable of producing oscillations with certain time period when implemented on an IC. An asynchronous ring does not require a global 43

62 clock circuit as in an Inverter Ring Oscillator (IRO) and it employs simple request/acknowledgement handshaking protocol. Therefore, this ring is called as selftimed ring oscillator. The terms asynchronous ring oscillator and self-timed ring oscillator are used interchangeably throughout this thesis. An example of STRO with six stages using Muller gates with inverted reverse input is shown in Figure 4-7. Channel Functional Block 1 Memory Channel Functional Block 2 Memory Channel Channel Functional Block 6 Memory Channel Channel Functional Block 1 Memory Channel Functional Block 2 Memory Channel Channel Functional Block 6 Memory Channel Figure 4-6: Six Stage Pipeline (top) Converted to a Six-Stage Ring (bottom) R0 0 F0 C5 5 R5 C0 F5 R1 1 C1 F1 F4 C4 4 R4 R2 2 F2 F3 C3 3 R3 C0 Figure 4-7: 6-Stage STRO using Muller Gates with Inverted Reverse Input 44

63 4.3.1 Token and Bubble Propagation The operation of an asynchronous ring oscillator depends on the Token-Bubble propagation in order to generate oscillations. According to micro pipeline structures, a token (T) represents the presence of data and bubble (B) represents the absence of data. A bubble is always ready to accept the data. Each stage of an STRO contains either a token or a bubble. Consider stage i in an STRO. The output of stage i will be the same as stage i+1, if stage i is a bubble. It is called a token if the outputs are not equal in stages i and i+1[12]. To make it simple, let C i be the output of stage i and C i+1 be the output of stage i+1. Then: If C i =C i+1 Stage i Bubble and if C i C i+1 Stage i Token. Let N T & N B be the number of tokens and number of bubbles respectively. In order to make an STRO oscillate, there must be even number of tokens, at least one bubble and N 3 where N= N T + N B (total number of stages in STRO). A token in one stage propagates to its next stage if and only if the next stage is a bubble. A bubble propagates to its previous stage if the previous stage is a token [53, 56]. An example of token and bubble propagation is shown in Figure 4-8. In this work, a 6 stage STRO with TTBBBB configuration is considered. Its initial stage output could be either or For example, let us assume that the propagation occurs at TTBBBB with its initial state at Then, the propagation in a six stage STRO will be: TTBBBB(101111) TBBBBT(000001) BBBBTT(111101) BBBTTB(111011) BB TTBB(110111) BTTBBB(101111) TTBBBB(101111). The token and bubble propagation takes place in two modes, i.e., evenly spaced mode and burst mode propagation. In evenly spaced mode propagation, tokens are distributed evenly around the ring with constant spacing. In burst mode propagation, all 45

64 the tokens come together to form a cluster that propagates all around the ring [14]. Figure 4-9 shows the example of an even mode and burst mode propagation T T B B B B T B B B B T Figure 4-8: Token-Bubble Propagation Figure 4-9: Burst Mode Propagation (top) and Evenly Spaced Mode (bottom) Propagation Jitter in Ring Oscillators (RO) The ring oscillators experience thermal noise due to the motion of charge carriers when the circuit is turned on. This noise is common in both the inverter ring oscillators and the self-timed ring oscillators [57]. The thermal noise is called jitter in time domain and phase noise in frequency domain. Jitter is a critical performance parameter because it causes the random variations in outputs. [58] The jitter accumulates in different ways for inverter ring oscillators and self-timed ring oscillators. There are two types of jitter sources. They are local Gaussian jitter and global deterministic jitter [59]. Randomness in a circuit is caused due to the local Gaussian jitter. A single stage in a ring oscillator uses one LUT during FPGA implementation. Therefore, each stage acts as a source to produce local Gaussian jitter. In an inverter based ring oscillator, one 46

65 oscillation is equal to two loops of a token around the ring and the jitter accumulates from the number of stages crossed by a single token. In a STRO, there are several tokens propagating at the same time. One oscillation in a STRO is equal to the time elapse between two tokens next to each other. At every single stage, every token crossing the stage will experience different delays due to the local Gaussian jitter. This helps in obtaining better robustness in a STRO design. The non-random variations in delay characteristics that are caused by the environmental parameter variations result in a global deterministic jitter. This type of jitter is linear throughout the ring in IROs. In a STRO, several tokens propagate at the same time and each token is affected in the same way rather than the complete ring. Thus, an asynchronous ring design still remains robust. 47

66 Chapter 5 Implementation of Asynchronous Ring Oscillator on FPGA 5.1 Introduction In Chapter 4, the advantages of asynchronous designs over synchronous designs were explained in detail along with the concepts of Muller gate and token-bubble propagation. In spite of the many advantages, most of the earlier research is concentrated on synchronous designs when compared to asynchronous designs. In the literature, the synchronous designs are employed to generate oscillations with certain clock period, which are mostly based on the Inverter Ring Oscillators (IROs). IROs exhibit some problems (explained in Chapter 4) faced due to the global CLK (clock) signal in their architecture. To overcome such problems, asynchronous logic based ring oscillators appear to be a better solution. Asynchronous Ring Oscillators are Self-Timed Rings (STRs) that employ handshaking protocol to eliminate the globally distributed clock. The Self-Timed Ring Oscillators (STROs) are defined as the oscillating structures designed from the asynchronous design techniques to generate robust timing signals [56, 59]. STROs use 48

67 Muller gates as the basic element in their design and generate the oscillations using token and bubble propagation. There are many ways to control frequency in these oscillators. It can be controlled by the number of stages, by stage propagation delay, the number of tokens and bubbles set during initialization, and modification of token propagation rules [60]. In this chapter, we discuss the design of Muller gate using an LUT and also the implementation of Asynchronous Ring Oscillator on FPGA by using the LUT based Muller gates. 5.2 Look-Up-Table Based Muller Gate Design In a Self-Timed Ring Oscillator, each stage consists of a Muller gate with inverted reverse input. To employ the token and bubble propagation in these stages, each stage should be loaded with either 0 or 1. Loading any of these logical values in a stage will decide the initial configuration of that stage as either a token or a bubble. In order to load 0 or 1 in every stage, Muller gates (as shown on left side of Figure 5-1) with forward input (F), inverted reverse input (R), and Set/Reset (SR) input are used. The set/reset gate is responsible for assigning the output of Muller gate either to 0 or 1 during the process initialization. For implementation of STRO on FPGA, the Muller gate in each stage is designed using a four-input LUT. A single stage with LUT based Muller gate in a STRO is shown on the right side of Figure 5-1. In a LUT, the four input pins are I0, I1, I2, and I3. The output C is obtained from pin O. The output of the LUT is connected back as a feedback to pin I0 and is represented by C'. Reverse input is assigned to pin I1, forward 49

68 input is given to pin I2 and set/reset input is given to pin I3 in an LUT. I3 loads either 0 or 1 on each LUT based Muller gate during initialization. Set/Reset F R C O C R F Set/reset I0 I1 I2 I3 O LUT C Figure 5-1: A Stage in Asynchronous Ring Oscillator (Muller Gate with Set/Reset Input (left) and LUT as Muller Gate (right)) A Look-Up-Table implements the logic only when an INIT number is assigned to each LUT. The INIT number defines the Boolean logic equation that has to be performed by an LUT. INIT number is a sixteen digit hexadecimal number attached to the inputs of an LUT [61]. The INIT number in this design, decides whether the LUT is a set Muller gate or a reset Muller gate. If the INIT number is not assigned to an LUT, the resulting output will always be zero irrespective of the input. The INIT number that is needed to perform the required logical function can be determined using the Muller gate truth tables. The truth table of a Muller gate with inverted reverse input is considered and the four inputs of the LUT are given 0000 to 1111 hexadecimal numbers. Assuming that the Muller gate is operating as a set gate, when pin I3 (set/reset pin) is logic 1, the output C will be forced to logic 1 and when I3 is logic 0 the output is obtained using the truth table of Muller gate with inverted reverse input. Similarly, assuming the Muller gate as a reset Muller gate, the output is 50

69 driven to logic 0 when I3 is logic 1. When I3 is logic 0, the output depends on the truth table. The INIT numbers obtained for set and reset Muller gates are shown in Tables 5.1 and 5.2 respectively. The INIT attribute can be determined by reading the output states in group of four from bottom to top. The INIT numbers for set gate and reset gate are FFB2 and 00B2 respectively. Table 5.1: LUT Mapping of a Set Muller Gate. INIT => x FFB2 I3=SR I2=F I1=R I0=C O=C INIT = =B =F =F

70 Table 5.2: LUT Mapping of a Reset Muller Gate. INIT => x 00B2 I3=SR I2=F I1=R I0=C O=C INIT = =B = = Using these INIT numbers, a four input LUT based Muller gate can be instantiated in VHDL. Set Muller gate instantiation in VHDL is shown in Figure 5-2. In [62], Instantiating an LUT using INIT attributes is explained with an example. 52

71 Figure 5-2: Instantiation of Set Muller Gate Using VHDL in Xilinx 5.3 Asynchronous Ring Oscillator Implementation on FPGA The PUF architecture in this work is designed using the Asynchronous Ring Oscillators instead of the commonly used IROs. Each Self-Timed Ring Oscillator consists of six stages. Each stage has a Muller gate with inverted reverse input. Therefore, each stage occupies one LUT. In every stage, the forward input F is connected to the output of its previous stage and the inverted reverse input is connected to the output of its next stage. Therefore, an asynchronous ring structure is obtained using LUTs. A six stage STRO is designed using VHDL coding in Xilinx ISE software. The initial configuration of this STRO is considered to be TTBBBB. A TTBBBB configuration could result in either or propagation. The position of the set and reset gates used in the design will decide the number of tokens and bubbles. To obtain TTBBBB configuration, we considered set-reset-set-set-set-set order of the set and reset Muller gates and initialize the six stage STRO with

72 configuration. Respective INIT attributes are attached to the input of each LUT in order to instantiate a set or reset Muller gate. The STRO is designed in such a way that it satisfies all the conditions of tokens and bubbles as explained in section in order to make the STRO oscillate. A single STRO occupies six LUTs, as it has six stages. The STRO design is implemented on Spartan-3E XC3S100E FPGA. According to its architecture (discussed in Section 1.2.5), this FPGA has four slices in a CLB. Each slice contains two LUTS. The STRO design occupies six LUTs in a single CLB. The schematic of the occupied slices is shown in Figure 5-3. To arrange the stages adjacent to each other, we used Plan Ahead tool in Xilinx ISE software. After the placement of each LUT is determined using the Plan Ahead tool, the placement constraints are generated and observed in the User Constraint File (UCF) as shown in Figure 5-4. The six stage STRO design, obtained after designing it in Xilinx is shown in Figure 5-5. A common input is given to set/reset pin I3 in each LUT. This input is named as init in the code and the output obtained is named as ring_out. The init initializes each stage when it is given as logic 1 and oscillations are produced when it is logic 0. The RTL schematic view of the Asynchronous Ring Oscillator is obtained as shown in Figure 5-6. As the hexadecimal INIT attribute is given to each LUT whilst coding, the Xilinx software solves the K-map for this INIT number and generates the logic design of each LUT (a set Muller gate or reset Muller gate). The obtained logic circuits for set and reset Muller gates are shown in Figure

73 Configurable Logic Block (CLB) LUT2 LUT1 LUT2 Slice 2 Slice 3 LUT1 Occupied LUTs (6) in 3 slices in one CLB LUT2 LUT1 LUT2 LUT1 Slice 0 Slice 1 Figure 5-3: LUTs Occupied by STRO in One CLB on Spartan-3E FPGA 55

74 Figure 5-4: Placement Constraints Used for LUTs, Input and Output in Xilinx (Generated by Plan Ahead) Figure 5-5: A 6-Stage Asynchronous Ring Oscillator with LUTs (LUT4 in Xilinx means a 4 Input LUT) 56

75 Figure 5-6: Xilinx RTL Schematic View of a 6-Stage Asynchronous Ring Oscillator (Configuration: TTBBBB and initial states: ) 57

76 (a) Set Muller Gate (b) Reset Muller Gate Figure 5-7: Logic Circuits Built Inside the (a) Set Muller Gate and (b) Reset Muller Gate The complete design of the Asynchronous Ring Oscillator is obtained with its placement constraints and the programming bit file is generated to map the STRO on an FPGA. After the design is mapped, an FPGA editor is used to check the location at which the LUTs are placed. The LUTs in STRO design and its routing is shown in Figure 5-8. A hard macro of this design is created and saved in order to use the STRO in our PUF design. While creating the hard macro, the IOB ports are unplaced and the external inputoutput pins are allocated. These pins are used while creating a PUF design. Also, a reference component is assigned to the hard macro so that the LUT placement always 58

77 starts from reference slice. Therefore, the routing remains the same whenever the hard macro is instantiated in the PUF design. The hard macro with its reference component (indicated with a bull eye) is shown in Figure 5-9. Creating a perfect hard macro by choosing correct pins as external pins is very important, because if the pins are wrongly assigned, the hard macros instantiated in PUF design will create compilation errors. The detailed explanation of creating a hard macro is given in [63]. Wire connecting to output block Wire connecting to input block Figure 5-8 STRO Occupying 3 Slices in 1 CLB (in FPGA Editor) 59

Figure 5-9: Hard Macro of 6-Stage STRO with Reference Component 5.4 Experimental Results A test bench code is written in order to simulate the design and test it for errors.

78 Figure 5-9: Hard Macro of 6-Stage STRO with Reference Component 5.4 Experimental Results A test bench code is written in order to simulate the design and test it for errors. The simulation result is successfully obtained without errors and is shown in Figure From the simulation waveform, it is clear that the process initialization is taking place when INIT is 1 and the oscillations are observed when INIT is 0. Later the code is mapped on the Spartan-3E Basys 2 FPGA board and the output is obtained from the Logic analyzer. A part of the practical output waveform is captured and shown in Figure

79 A similar Asynchronous Ring Oscillator is designed in [64]. This design is executed on a XSA board with Xilinx XC2S100 FPGA (Spartan II) device. As explained in Section 1.2.4, a Spartan II board has only 4 LUTs in each CLB. Therefore, single sixstage STRO will occupy three slices from two different CLBs, which could increase the delay due to the routing distances. Figure 5-12 (reproduced from [64]) shows the practical output waveform for TTBBBB configuration of the STRO. The frequency of our simulation waveform, practical waveform from the Logic analyzer and the frequency of design plotted on Spartan II are tabulated in Table 5.3. Figure 5-10: Simulation Result of 6-stage STRO with TTBBBB Configuration in Xilinx Figure 5-11: Practical Output of a 6-stage SRTO with TTBBBB Configuration on Spartan-3E (Obtained from Logic Analyzer) Figure 5-12: Practical Output of a 6-stage SRTO with TTBBBB Configuration on Spartan-II (Obtained from Logic Analyzer) 61

80 Table 5.3: Frequencies Obtained from 6-Stage STRO Designs FPGA used Type of output Frequency in MHz Spartan-3E (XC3S100E) Spartan-3E (XC3S100E) Spartan II (XC2S100) Simulation in Xilinx Practical result from Logic Analyzer Practical result from Logic Analyzer [64] From Figure 5-12 [64], it can be seen that the delays between oscillations are high. Therefore, the frequency obtained from a single STRO [64] on Spartan II FPGA is comparatively less than the frequency obtained by the STRO design on Spartan-3E as shown in Table 5.3. As the process variations and routing delays are not considered during simulation, the frequency from simulation result for Spartan 3E board ( MHz) is less than the practical output frequency (118.2 MHz). The frequency of Asynchronous Ring Oscillator depends on the number of tokens and bubbles used in the design, but is not dependent on their spatial arrangement [64]. The architecture of an LUT based Muller gate used to design the Asynchronous Ring Oscillator, implementation of Asynchronous Ring Oscillator on FPGA and its results are discussed in this chapter. It is understood that the practical output considers the process variations and routing delays which is not possible in simulation. The design of STRO is saved as a hard macro and can be used to create the Asynchronous PUF design with STROs (discussed in next chapter). 62

81 Chapter 6 Temperature Variation on FPGA Based Asynchronous PUF 6.1 Introduction The architecture of the Asynchronous PUF design is explained in this chapter along with its implementation on FPGAs to extract the unique responses. The Asynchronous PUF uses hard macros created from the Self Timed Ring Oscillator design (explained in Chapter 5) in order to generate oscillations with different frequencies. Identical mapping of STROs is considered while mapping the design on FPGA, such that the statistical delay variation across interconnects and transistors are exploited. The unique PUF responses generated at different temperatures are used to calculate the uniqueness and reliability of the design. PUFs are gaining much interest in the field of hardware oriented security. Active research is going on in this area for improving security to hardware design of ICs. Many PUF designs were explained in literature, but most of them are not self-timed. Although Asynchronous PUF designs were introduced in literature, they are not as popular as IRO PUFs because of the limited work done. The robustness and entropy of a PUF design 63

82 with Asynchronous Ring Oscillators is explained in [57]. However, this work is limited to electrical stimulation. In [59], the STROs are used as the entropy sources for TRNG (True Random Number Generation). In [64, 6], the author used an Asynchronous PUF design to extract the unique responses from three Spartan II-FPGAs and provided the uniqueness of the design but not the reliability under different environments. In this thesis, an Asynchronous PUF design is implemented on thirteen Spartan-3E FPGAs to obtain the response bits and the effect of temperature variation on these responses is observed. The uniqueness and reliability of the unique responses are calculated and are shown in this chapter. 6.2 Asynchronous PUF Architecture In this work, Self-Timed Ring Oscillators are used to design the Asynchronous PUF instead of the conventional IROs. Initially a Self-Timed Ring Oscillator is designed and implemented on an FPGA (chapter 5). Later, a hard macro is created as shown in Figure 5-9. This hard macro is instantiated sixteen times in the Asynchronous PUF design. In order to ensure that the design works at any location on the FPGA, it is mapped at four different locations on each FPGA as shown in Figure 6-1. The STROs in each PUF location are identically placed adjacent to each other using chain-like neighbor coding. Placement constraints are used to ensure the proper placement and routing of STROs. This avoids high routing delays and helps exploiting the maximum process variations. A common Set/Reset (SR) input is given to the sixteen identical STROs at each location. This input is responsible for oscillating the STROs. 64

83 The proposed PUF design utilizes the STROs of two locations at a time, in order to generate a secret response. As each location has sixteen STROs, a total of thirty two STROs are used to generate a single response on one FPGA. When SR = 1, every ring oscillator initializes and when SR = 0 the STROs oscillate. Due to the process variations, the output of each STRO will result in different frequencies. Location 2 Location 3 Location 1 Location 4 Figure 6-1: 4-Locations Chosen for Asynchronous PUF Design on Spartan-3E FPGAs The frequencies generated from STROs at each PUF location are fed to multiplexers (MUX). The frequencies from two different locations are recorded and compared. The challenge generator provides the challenge to select two STROs, one from each group. The frequencies of selected STROs are compared using frequency comparator. The schematic of the modeled Asynchronous PUF is shown in Figure 6-2. If 65

84 the frequencies of selected STROs are f 1 (STRO at location1) and f 2 (STRO at location3), then response bit = 1: if f 1 f 2 ; otherwise the response bit = 0. Figure 6-2 shows the Asynchronous PUF design using STROs at location 1 and location 3 to record and compare the frequencies, in order to obtain the response bits. Similarly, the frequencies of location 2 and location 4 are also considered by changing the placement constraints in the code as desired. Hence a set of response bits can be captured from the comparison of frequencies from two different locations. These responses can be used for FPGA authentication and various cryptographic applications. Figure 6-3 shows the technological schematic obtained in Xilinx for Asynchronous PUF at one location with sixteen STROs. All STROs are connected to a common input. Location 1 S T R O 1 S T R O 2 S T R O 3 S T R O 4 S T R O 5 S T R O 16 Set/reset 16 Bit Response Challenge Generator MULTIPLEXER MULTIPLEXER Frequency comparator Location 3 S T R O 1 S T R O 2 S T R O 3 S T R O 4 S T R O 5 S T R O 16 Set/reset Figure 6-2: Proposed STRO-Based Asynchronous PUF Design for FPGA 66

85 Figure 6-3: Technological Schematic of Asynchronous PUF with 16-STROs (at one Location) 6.3 Asynchronous PUF Implementation and Results Although FPGAs are considered as the eminent platforms for implementing cryptographic algorithms on hardware, they pose some significant challenges when implementing a PUF design. The designer will not be having sufficient information about the gate level structure of an FPGA device in order to exploit a full layout level design. As most of the PUF designs need symmetric routing, one must be very careful in using the proper placement constraints. A six-stage STRO with TTBBBB configuration is considered as the basic building block of the Asynchronous PUF design. The LUT-based STRO hard macro is instantiated sixteen times at each location in the Asynchronous PUF design. The locations of the hard macros are defined in the User Constrain File (UCF) to obtain the desired symmetric placement. The STRO hard macro can be duplicated in the PUF design depending on the number of bits required in each response. A small portion of the Asynchronous PUF design with the duplicated STRO hard macros (identical) is shown in Figure

Initially, the PUF design is implemented on 13 Spartan 3E FPGAs at room temperature. The equipment used for this purpose are Basys-2 Spartan 3E FPGAs, logic analyzer and a temperature chamber.

86 Initially, the PUF design is implemented on 13 Spartan 3E FPGAs at room temperature. The equipment used for this purpose are Basys-2 Spartan 3E FPGAs, logic analyzer and a temperature chamber. They are shown in Figure 6-5 and Figure 6-6. Xilinx-VHDL code for Asynchronous PUF design is initially simulated and the oscillations are obtained. The post place and route simulation result of the Asynchronous PUF design is shown in Figure 6-7. The oscillations from all STROs exhibit same frequency as the process variations and routing delays are not considered in simulation. Figure 6-4: A Portion of Asynchronous PUF with Identical STRO Hard Macros 68

87 (a) (b) Figure 6-5: (a) Basys-2 Spartan 3E FPGAs and (b) Agilent 18601A Logic Analyzer FPGA inside the Chamber Temperature Control panel Figure 6-6: Temperature Chamber 69

88 70 Figure 6-7: Post Place and Route Simulation Result of Asynchronous PUF Design

89 The bit stream files are generated for the design at four different locations. Using these files, the code is mapped on each FPGA. Figure 6-8 shows the Asynchronous PUF design placement and routing at location 1. Similarly the design is mapped at all 4- locations on 13 FPGAs. The varying frequencies are recorded from the 16-STROs at each location using the Logic analyzer. The output oscillations observed for 16-STROs of the PUF design at one location is shown in Figure 6-9. Figure 6-8: Placement and Routing of Asynchronous PUF at Location 1 71

90 72 Figure 6-9: Asynchronous PUF Outputs Observed in Logic Analyzer

Asynchronous physical unclonable function using FPGA-based self-timed ring oscillator

The University of Toledo The University of Toledo Digital Repository Theses and Dissertations 2013 Asynchronous physical unclonable function using FPGA-based self-timed ring oscillator Roshan Silwal The