A Practical DPA Countermeasure with BDD Architecture

Similar documents
Mixed CMOS PTL Adders

CHAPTER 2 LITERATURE STUDY

A COMPARISON OF CIRCUIT IMPLEMENTATIONS FROM A SECURITY PERSPECTIVE

Design and implementation of a high-speed bit-serial SFQ adder based on the binary decision diagram

Sequential Logic (2) Synchronous vs Asynchronous Sequential Circuit. Clock Signal. Synchronous Sequential Circuits. FSM Overview 9/10/12

ISSCC 2006 / SESSION 21 / ADVANCED CLOCKING, LOGIC AND SIGNALING TECHNIQUES / 21.5

MAXIMUM FLOWS IN FUZZY NETWORKS WITH FUNNEL-SHAPED NODES

Implementation of Different Architectures of Forward 4x4 Integer DCT For H.264/AVC Encoder

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

Area-Time Efficient Digit-Serial-Serial Two s Complement Multiplier

CS 135: Computer Architecture I. Boolean Algebra. Basic Logic Gates

Multi-beam antennas in a broadband wireless access system

& Y Connected resistors, Light emitting diode.

To provide data transmission in indoor

Digital Design. Sequential Logic Design -- Controllers. Copyright 2007 Frank Vahid

(1) Non-linear system

Design and Development of 8-Bits Fast Multiplier for Low Power Applications

CHAPTER 3 AMPLIFIER DESIGN TECHNIQUES

Chapter 2 Literature Review

Math Circles Finite Automata Question Sheet 3 (Solutions)

MOS Transistors. Silicon Lattice

Study on SLT calibration method of 2-port waveguide DUT

Three-Phase Synchronous Machines The synchronous machine can be used to operate as: 1. Synchronous motors 2. Synchronous generators (Alternator)

On the Description of Communications Between Software Components with UML

Dataflow Language Model. DataFlow Models. Applications of Dataflow. Dataflow Languages. Kahn process networks. A Kahn Process (1)

A Novel Back EMF Zero Crossing Detection of Brushless DC Motor Based on PWM

Solutions to exercise 1 in ETS052 Computer Communication

Control of high-frequency AC link electronic transformer

DIGITAL multipliers [1], [2] are the core components of

Exercise 1-1. The Sine Wave EXERCISE OBJECTIVE DISCUSSION OUTLINE. Relationship between a rotating phasor and a sine wave DISCUSSION

Power-Aware FPGA Logic Synthesis Using Binary Decision Diagrams

On the Effectivity of Different Pseudo-Noise and Orthogonal Sequences for Speech Encryption from Correlation Properties

A New Algorithm to Compute Alternate Paths in Reliable OSPF (ROSPF)

Two-layer slotted-waveguide antenna array with broad reflection/gain bandwidth at millimetre-wave frequencies

Homework #1 due Monday at 6pm. White drop box in Student Lounge on the second floor of Cory. Tuesday labs cancelled next week

CS2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2005

Kirchhoff s Rules. Kirchhoff s Laws. Kirchhoff s Rules. Kirchhoff s Laws. Practice. Understanding SPH4UW. Kirchhoff s Voltage Rule (KVR):

(CATALYST GROUP) B"sic Electric"l Engineering

EE Controls Lab #2: Implementing State-Transition Logic on a PLC

Safety Relay Unit. Main contacts Auxiliary contact Number of input channels Rated voltage Model Category. possible 24 VAC/VDC G9SA-501.

A New Stochastic Inner Product Core Design for Digital FIR Filters

Discontinued AN6262N, AN6263N. (planed maintenance type, maintenance type, planed discontinued typed, discontinued type)

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center

A Development of Earthing-Resistance-Estimation Instrument

Computing Logic-Stage Delays Using Circuit Simulation and Symbolic Elmore Analysis

Soft switched DC-DC PWM Converters

Nevery electronic device, since all the semiconductor

This is a repository copy of Four-port diplexer for high Tx/Rx isolation for integrated transceivers.

ECE 274 Digital Logic. Digital Design. Datapath Components Shifters, Comparators, Counters, Multipliers Digital Design

Experimental Application of H Output-Feedback Controller on Two Links of SCARA Robot

Synchronous Machine Parameter Measurement

SOLVING TRIANGLES USING THE SINE AND COSINE RULES

Experiment 3: The research of Thevenin theorem

Alternating-Current Circuits

Student Book SERIES. Fractions. Name

Asynchronous Data-Driven Circuit Synthesis

Experiment 3: Non-Ideal Operational Amplifiers

Understanding Basic Analog Ideal Op Amps

Y9.ET1.3 Implementation of Secure Energy Management against Cyber/physical Attacks for FREEDM System

Use of compiler optimization of software bypassing as a method to improve energy efficiency of exposed data path architectures

High Speed On-Chip Interconnects: Trade offs in Passive Termination

Experiment 3: Non-Ideal Operational Amplifiers

Direct Current Circuits. Chapter Outline Electromotive Force 28.2 Resistors in Series and in Parallel 28.3 Kirchhoff s Rules 28.

Simulation of Transformer Based Z-Source Inverter to Obtain High Voltage Boost Ability

NP10 DIGITAL MULTIMETER Functions and features of the multimeter:

Algebra Practice. Dr. Barbara Sandall, Ed.D., and Travis Olson, M.S.

Synchronous Machine Parameter Measurement

Pennsylvania State University. University Park, PA only simple two or three input gates (e.g., AND/NAND,

Analysis of circuits containing active elements by using modified T - graphs

On Dual-Rail Control Logic for Enhanced Circuit Robustness

Fuzzy Logic Controller for Three Phase PWM AC-DC Converter

The Discussion of this exercise covers the following points:

A Simple Approach to Control the Time-constant of Microwave Integrators

Synchronous Generator Line Synchronization

ABB STOTZ-KONTAKT. ABB i-bus EIB Current Module SM/S Intelligent Installation Systems. User Manual SM/S In = 16 A AC Un = 230 V AC

Open Access A Novel Parallel Current-sharing Control Method of Switch Power Supply

Genetic Representations for Evolutionary Minimization of Network Coding Resources

Maximum power point tracking of PV system under partial shading conditions through flower pollination algorithm

Wireless Transmission using Coherent Terahertz Wave with Phase Stabilization

Algorithms for Memory Hierarchies Lecture 14

Digital Design. Chapter 1: Introduction

Dokic: A Review on Energy Efficient CMOS Digital Logic

EET 438a Automatic Control Systems Technology Laboratory 5 Control of a Separately Excited DC Machine

The Design and Verification of A High-Performance Low-Control-Overhead Asynchronous Differential Equation Solver

Francis Gaspalou Second edition of February 10, 2012 (First edition on January 28, 2012) HOW MANY SQUARES ARE THERE, Mr TARRY?

Experiment 8 Series DC Motor (II)

Domination and Independence on Square Chessboard

Application Note. Differential Amplifier

Compared to generators DC MOTORS. Back e.m.f. Back e.m.f. Example. Example. The construction of a d.c. motor is the same as a d.c. generator.

A Slot-Asynchronous MAC Protocol Design for Blind Rendezvous in Cognitive Radio Networks

METHOD OF LOCATION USING SIGNALS OF UNKNOWN ORIGIN. Inventor: Brian L. Baskin

Proceedings of Meetings on Acoustics

Color gamut reduction techniques for printing with custom inks

Threshold Logic Computing: Memristive-CMOS Circuits for Fast Fourier Transform and Vedic Multiplication

Pilot Operated Proportional DC Valve Series D*1FB. Pilot Operated Proportional DC Valve Series D*1FB. D*1FBR and D*1FBZ

THE STUDY OF INFLUENCE CORE MATERIALS ON TECHNOLOGICAL PROPERTIES OF UNIVERSAL BENTONITE MOULDING MATERIALS. Matej BEZNÁK, Vladimír HANZEN, Ján VRABEC

Three-Phase NPC Inverter Using Three-Phase Coupled Inductor

Network-coded Cooperation for Multi-unicast with Non-Ideal Source-Relay Channels

Soft-decision Viterbi Decoding with Diversity Combining. T.Sakai, K.Kobayashi, S.Kubota, M.Morikura, S.Kato

Interference Cancellation Method without Feedback Amount for Three Users Interference Channel

Transcription:

A Prcticl DPA Countermesure with BDD Architecture Toru Akishit, Msnou Ktgi, Yoshikzu Miyto, Asmi Mizuno, nd Kyoji Shiutni System Technologies Lortories, Sony Corportion, -7- Konn, Minto-ku, Tokyo 8-75, Jpn {Toru.Akishit,Msnou.Ktgi,Yoshikzu.Miyto}@jp.sony.com, {Asmi.Mizuno,Kyoji.Shiutni}@jp.sony.com Astrct. We propose logic-level DPA countermesure clled Dulril Pre-chrge circuit with Binry Decision Digrm rchitecture (DP- BDD). The proposed countermesure hs dul-ril pre-chrge logic style nd cn e implemented using CMOS stndrd cell lirries, which is the similr property to Wve Dynmic Differentil Logic (WDDL). By using novel pproches, we cn successfully reduce the erly propgtion effect, which is one of the min fctors of DPA lekge of WDDL. DP- BDD is suited to implementtion of S-oxes. In our implementtions of the AES S-ox, DP-BDD cn reduce the mximum difference of trnsition timing t outputs of S-ox to out /6.5 compred to tht of WDDL without dely djustment. Moreover, y pplying simple dely djustment to the inputs of the S-ox, we cn reduce it to out /85 of tht without the djustment. We consider DP-BDD is prcticl nd effective DPA countermesure for implementtion of S-oxes. Keywords: DPA, countermesure, dul-ril pre-chrge logic, Binry Decision Digrm Introduction Differentil Power Anlysis (DPA) is serious thret to cryptogrphic devices such s smrt crds [8]. Recently, vrious countermesures hve een proposed to protect implementtions of cryptogrphic lgorithms ginst DPA t the logic level. Since the logic-level countermesures cn e dpted to sic logicl gtes such s n AND gte, we cn pply them to implementtions of ny cryptogrphic lgorithms. These logic-level countermesures re clssified into the following three groups: msking logics, dul-ril pre-chrge logics, nd hyrid-type logics. Msking logics try to rndomize the ctivity t every node in circuit using rndom vlues in order to remove correltion etween key-relted intermedite vlues nd power consumption of the circuit. Msked-AND, type of msking logics, ws proposed y Trichin [2]. It hs een pointed out, however, tht Msked-AND is not completely secure due to the effect of glitches [9, 4]. Recently, Rndom Switching Logic (RSL) ws proposed y Suzuki et l. [6]. RSL

is theoreticlly secure under the lekge models descried in [4], ut possesses two disdvntges: one is tht it cnnot e implemented using CMOS stndrd cell lirries nd the other is tht it reuires creful timing djustment of enle signls. A dul-ril pre-chrge logic ws first proposed y Tiri et l. s Sense Amplifier Bsed Logic (SABL) [7], where signl is represented y two complementry wires nd one of these two wires is chrged nd dischrged in every cycle. Considering tht SABL needs specil CMOS lirry, Tiri et l. lso proposed Wve Dynmic Differentil Logic (WDDL) [8] tht cn e implemented using CMOS stndrd cell lirries. WDDL is prcticl countermesure, ut it cnnot suppress two fctors of DPA lekge. The first one is due to unlnced lod cpcitnce of complementry logic gtes. In order to lnce it, WDDL reuires custom lyout for secure design [9, 7]. The other is due to the erly propgtion effect. This lekge is cused when input signls of WDDL gte hve difference of dely time [4]. The input signls generlly pss the different numer of logic gtes, nd then the difference of dely time inevitly occurs. Creful dely djustment cn reduce the difference, ut pplying it ll WDDL gtes in cryptogrphic circuits seems to e unrelistic. Hyrid-type logics re comined with msking logics nd dul-ril pre-chrge logics. At CHES 25, Popp nd Mngrd proposed MDPL tht comines dulril pre-chrge circuits with rndom msking to improve the first disdvntge of WDDL []. They climed tht it cn chieve secure design using CMOS stndrd cell lirry without specil lyout constrint, ut in the next yer it ws pointed out tht MDPL cn e still insecure when there is reltively lrge difference in dely time etween the input signls of MDPL gtes [4, 5]. In ddition, the comintion of msking nd dul-ril ws shown to e unle to provide routing-insensitive logic style [6, 3]. At present, hyrid-type logics seem to hve no dvntge over dul-ril pre-chrge logics. In this pper, we propose novel DPA countermesure clled Dul-ril Prechrge circuit with Binry Decision Digrm rchitecture (DP-BDD). It is sed on Binry Decision Digrm (BDD) tht is direct cyclic grph used to represent Boolen function. DP-BDD is composed of AND-OR gtes which re included in CMOS stndrd cell lirries. Due to the sed BDD rchitecture, the input signls of n AND-OR gte lwys pss the sme numer of AND-OR gtes, nd then the erly propgtion effect, which is one of the min fctors of DPA lekge of WDDL, is significntly reduced. This DPA countermesure is suited to implementtion of S-oxes. In our implementtions of the AES [] S-ox, DP-BDD cn reduce the mximum difference of trnsition timing t the outputs of the S-ox to out /6.5 compred to tht of WDDL without dely djustment. Moreover, y pplying simple dely djustment to the inputs of the S-ox, we cn reduce it to out /85 of tht without the djustment. DP-BDD reuires custom lyout to prevent the lekge cused y unlnced lod cpcitnce of complementry logic gtes the sme s WDDL, ut we consider tht DP-BDD is prcticl nd effective DPA countermesure for implementtion of S-oxes.

The rest of the pper is orgnized s follows: Section 2 presents WDDL nd its security prolem. Section 3 gives rief introduction of BDD tht is the sic rchitecture of our method. In Section 4 we present the proposed DPA countermesure clled DP-BDD. In Section 5, we pply WDDL nd DP-BDD to implementtions of AES S-ox nd compre their effectiveness. We introduce simple dely djustment of DP-BDD to reduce the difference of trnsition timing further in Section 6. Finlly we drw our conclusion nd discuss further work in Section 7. 2 Wve Dynmic Differentil Logic (WDDL) Tiri et l. proposed Wve Dynmic Differentil Logic (WDDL) s logic-level countermesure of DPA [8]. WDDL hs the following fetures: WDDL gtes hve complementry inputs nd outputs. WDDL hs the pre-chrge phse to trnsmit (, ) nd the evlution phse to trnsmit (, ) or (, ), nd performs these phses mutully. WDDL cn construct comintionl logics y using only AND gtes, OR gtes, nd NOT opertions (signl swpping). A vlue is represented (, ) in WDDL, where is the negtion of. An ctivity fctor within WDDL circuits is constnt independent of the input signls due to the ove fetures. Since power consumption t CMOS gtes generlly depends on the trnsition proility of the gtes, WDDL is considered to e effective s DPA countermesure. However, the power consumption t CMOS gtes lso depends on lod cpcitnce of the gtes. If there is difference of lod cpcitnce etween complementry logic gtes of WDDL, the difference of power consumption occurs. The numer of gtes connected to complementry logic gtes of WDDL is siclly eul, nd then the difference of lod cpcitnce is cused y the difference of plce-nd-route. The lekge due to the plce-nd-route is clled s incidentl lekge [5]. It cn e improved y the plce-nd-route in the mnul or semi-utomtic opertion using specil constrints such s Ft Wire [9] nd Bckend Dupliction [7]. Another lekge is due to the erly propgtion effect [4, 5]. This lekge is cused when there is the difference of dely time etween the input signls of WDDL gte. In Fig., we illustrte WDDL AND gte nd its signl trnsitions ccording to the inputs (, ). Here, we ssume tht the trnsition of or reches the gte erlier thn the trnsition of or oth on the evlution phse nd on the pre-chrge phse. The trnsition timing of the complementry output or on the evlution phse depends on the input. On the other hnd, the trnsition timing of or on the pre-chrge phse depends on the input. Therefore, the difference of dely time etween the inputs nd my lek the vlues nd. Since sic cryptogrphic components including S-oxes of lockciphers reuire complicted logic circuits, the input signls of WDDL gte generlly pss different numer of logic gtes. Therefore, the difference of

WDDL AND gte (, )=(, ) evlution (, )=(, ) evlution pre-chrge pre-chrge (, )=(, ) evlution (, )=(, ) evlution pre-chrge pre-chrge Fig.. The erly propgtion effect of WDDL AND gte dely time etween these signls inevitly occurs. This type of lekge is clled s inevitle lekge [5]. The lekge cn e improved y djusting dely time etween the input signls, ut very high effort nd mny constrints in the circuit design re reuired to djust dely time of ll WDDL gtes in complicted logic circuits including S-oxes. 3 Binry Decision Digrm A Binry Decision Digrm (BDD) is direct cyclic grph tht is used to represent Boolen function [], nd one of most commonly used synthesis tools for logic optimiztion of digitl systems [22]. We riefly explin BDD ccording to Fig. 2. The left figure is truth tle representing the function f(a, B, C) nd the right figure shows lock digrm of inry decision tree corresponding to the truth tle. In the right figure, n isosceles trpezoid represents 2-to- multiplexer, nd we cll signl A, B, C s non-terminl node, signl,,, t the lowest prt s terminl node, nd signl connecting two multiplexers s n internl node. The outputs f in the truth tle re locted in regulr order from the left to the right of terminl nodes. Generlly the term BDD refers to Reduced Ordered Binry Decision Digrm (ROBDD) [2]. A inry decision tree is uniuely trnsformed into ROBDD y merging ny isomorphic sugrphs nd eliminting ny redundnt nodes. In this pper, however, we cll s BDD the lock digrm in which we only merge ny isomorphic sugrphs on inry decision tree. In this BDD rchitecture, since the sme numer of multiplexers must e pssed from ny terminl node to the output, the difference of propgtion dely dependent of inputs is reltively smll.

input output A B C f non-terminl node A B C f output terminl node Fig. 2. A truth tle nd inry decision tree 2-to- multiplexer c AND-OR c Fig. 3. A 2-to- multiplexer nd n AND-OR gte 4 Dul-Ril Pre-Chrge Circuit with Binry Decision Digrm Architecture In this section, we propose novel DPA countermesure to reduce the inevitle lekge t logic level, clled Dul-ril Pre-chrge circuit with Binry Decision Digrm rchitecture (DP-BDD). It is sed on BDD nd constructed in the following steps. Pre-chrged AND-OR gtes. We void the existence of glitches to control the trnsition proility of ll signls in BDD circuit. In order to prevent glitches, we firstly replce 2-to- multiplexers in BDD to 2-wy 2-nd 4-input AND-OR (shortly, AND-OR) gtes. As shown in Fig. 3, n AND-OR gte is euivlent to 2-to- multiplexer except the negtion of ect signl eing input. Fig. 4() shows modified BDD circuit. In the figure n isosceles trpezoid represents n AND-OR gte. Non-terminl nodes (A, A), (B, B), or (C, C) re connected to ech AND-OR gte s (, ) in Fig. 3. Next, we pply so-clled pre-chrge mechnism to the terminl nodes (, ) nd the non-terminl nodes (A, A), (B, B), (C, C); these signls re set to on the pre-chrge phse nd evlute to the corresponding vlue on the evlution

f f f f A, A A, A A, A B, B B, B B, B C, C C, C C, C () BDD circuit () complementry BDD circuit (c) DP-BDD Fig. 4. Constructing DP-BDD phse. We consider the output of n AND-OR gte t the lowest stge. On the evlution phse, ll four inputs of n AND-OR gte perform either ( ) or ( ), then the output lso performs either ( ) or ( ). On the pre-chrge phse, ll four inputs perform either ( ) or ( ), then the output lso performs either ( ) or ( ). By dpting these trnsitions to the inputs of AND-OR gtes t the next stge, we cn confirm tht ll internl nodes nd outputs of BDD hve t most one trnsition oth on the evlution phse nd on the pre-chrge phse. Therefore, we cn prevent glitches in the BDD circuit. Appending complementry circuit. Preventing glitches doesn t give ny gurntee to DPA resistnce ecuse the distriution of the trnsition ctivity depends on the inputs A, B, C. In order to mke it independent of the inputs, we construct the complementry BDD circuit to the originl BDD circuit. It cn e simply creted y exchnging nd which re input to the terminl nodes s shown in Fig. 4(). By ppending the complementry circuit to the originl circuit nd merging them s shown in Fig. 4(c), one of the complementry AND- OR gtes perform trnsition oth on the evlution phse nd on the prechrge phse. Therefore, the ctivity fctor within the merged circuit is constnt independent of the input signls. We cll such merged circuit s Dul-ril Prechrge circuit with Binry Decision Digrm rchitecture (DP-BDD). We consider the inevitle lekge, which is lekge cused y the difference of dely time etween the input signls of complementry AND-OR gtes shown in Fig. 5. We ssume tht ll inputs of DP-BDD, non-terminl nodes nd terminl nodes, re directly connected to registers nd hve no propgtion dely except their setup time. By inputting rndom it m nd its negtion m to the terminl nodes insted of nd, ll internl nodes nd output of DP-BDD re esily msked y m. The ddition of rndom msking, however, does not chieve secure design without specil lyout constrint ccording to the oservtion in [6, 3].

Fig. 5. Complementry AND-OR gtes The difference of dely time etween input signls of AND-OR gtes my led the difference of trnsition timing t the output which depends on some secret informtion. Since signls nd re directly connected to inputs of DP-BDD, the trnsition of nd occurs soon fter the trnsition from the pre-chrge phse to the evlution phse, nd the reverse trnsition. On the pre-chrge phse, the trnsition of or occurs t the time when the trnsition of or whether = or. On the evlution phse, if =, the trnsition of the output signl or occurs t the time when the trnsition of the input or occurs; if =, the trnsition of or occurs t the time when the trnsition of the input or occurs. Therefore, the difference of dely time etween nd (or nd ) my lek the vlue on the evlution phse. However, since the signls nd (or nd ) pss the sme numer of AND-OR gtes, the difference of dely time etween these signls is reltively smll, nd then detecting the inevitle lekge y DPA is more difficult. 5 Appliction to AES S-ox In order to protect hrdwre implementtions of the Advnced Encryption Stndrd (AES) [], the S-ox is the most criticl opertion ecuse it is the only non-liner opertion in AES. In this section, we pply oth WDDL nd DP-BDD to implementtions of AES S-ox, nd compre their effectiveness. 5. AES S-ox sed on WDDL (WDDL S-ox) There re vrious wys to implement the AES S-ox. The most compct implementtion of AES S-ox is tht using composite fields [2, 2, 3]. We pply WDDL to the AES S-ox descried in [2], whose overll mount of gtes is 3 XORs + 57 ANDs, ecuse of its reltively short criticl pth. Fig. 6 shows the schemtic circuit of AES S-ox using composite fields. There re severl opertions including n isomorphic mpping, multiplictions nd dditions over Glois field. We notice pth nd pth 2 which oth re the pths to the multipliction circuit over GF(2 4 ). Pth hs reltively short propgtion dely ecuse it psses only the isomorphic mpping circuit. On the other hnd, pth 2 hs long propgtion dely ecuse it psses lso the suring, constnt multipliction, ddition, nd inversion circuits over GF(2 4 ) except the isomorphic mpping circuit. Thus, since the difference of dely time etween pth nd 2 re lrge, we guess the inevitle lekge cused y this difference cn e detected y DPA.

in [7-] isomorphism pth 2 x 2 X x λ pth x - X X - isomorphism x ffine out [7-] Fig. 6. AES S-ox using composite fields out[] out[] out[7] out[] out[] out[7] in[7], in[7] OR-NAND x 6 in[6], in[6] 2 AND-NOR x 6 in[5], in[5] 4 OR-NAND x 6 in[4], in[4] 8 AND-NOR x 6 in[3], in[3] 6 OR-NAND x 6 in[2], in[2] 2 AND-NOR in[], in[] 6 OR-NAND in[], in[] 4 AND-NOR Fig. 7. AES S-ox sed on DP-BDD (DP-BDD S-ox) 5.2 AES S-ox sed on DP-BDD (DP-BDD S-ox) Since the AES S-ox hs n 8-it input nd n 8-it output, we firstly rrnge eight inry decision trees of eight stges ccording to the truth tles of AES S- ox. Then, AES S-ox sed on DP-BDD (DP-BDD S-ox) cn e constructed in the wy descried in Section 4. Fig. 7 shows the constructed DP-BDD S-ox, where in[i] denotes i-th it of the input of the S-ox nd out[i] denotes i-th it of the output. In CMOS positive gte is usully constructed out of negtive gte nd n inverter, nd then the use of positive gtes is disdvntge in terms of gte size. In order to reduce the gte size of DP-BDD S-ox, we replce AND-OR gtes to AND-NOR gtes t the odd stges nd to OR-NAND gtes t the even stges, nd then the input of OR-NAND gtes re pre-chrged to on the pre-chrge phse. Its overll mount of gtes is 374 AND-NORs + 352 OR-NANDs. Since ny pth from the terminl node nd to two input signls of n AND-NOR/OR-NAND

2 WDDL out[6] DP-BDD out[3] 5 Propgtion Dely [nsec] 5 6 32 48 64 8 96 2 28 44 6 76 92 28 224 24 256 Input of S-ox Fig. 8. Propgtion dely of n output it of WDDL S-ox nd DP-BDD S-ox gte psses the sme numer of AND-NOR/OR-NAND gtes, the difference of dely time etween the input signls of the gte is reltively smll. 5.3 Experimentl Results We implemented oth WDDL S-ox nd DP-BDD S-ox, nd performed netlist timing simultions to evlute their effectiveness. The environment of our evlution is s follows: Lnguge Verilog-HDL Design Lirry.8 µm CMOS stndrd cell lirry Simultor VCS version 26.6 Logic Synthesis Design Compiler version 26.6 One gte is euivlent to 2-wy NAND nd the speed is evluted under the worst-cse conditions. In the lirry, n AND/OR gte, n AND-OR/OR-AND gte, nd n AND-NOR/OR-NAND gte re euivlent to 5/4 gtes, 9/4 gtes, nd 7/4 gtes, respectively. These simultions re sed on pre-routing dely, nd then free from the incidentl lekge cused y the utomtiztion of the plce-nd-route. We firstly evlute the gte counts of WDDL S-ox nd DP-BDD S-ox. An AND gte in the AES S-ox is implemented using n AND gte nd n OR gte in WDDL S-ox s shown in Fig., while n XOR gte in the AES S-ox cn e implemented using n AND-OR gte nd n OR-AND gte. Thus the gte count of WDDL S-ox is euivlent to 3 9/2 + 57 5/2 = 66 excluding uffers. On the other hnd, the gte count of DP-BDD S-ox is euivlent to 374 7/4 + 352 7/4 = 27 excluding uffers.

Next, we evlute the difference of trnsition timing t the output of logic gtes in oth WDDL S-ox nd DP-BDD S-ox. Since we guessed the lrgest difference will occur t the output of the S-ox, we serched the output it of S-ox tht hs the lrgest difference of trnsition timing for ll possile 256 S-ox inputs; out[6] (or out[6]) nd out[3] (or out[3]) re the corresponding its of WDDL S-ox nd DP-BDD S-ox respectively. Fig. 8 shows the propgtion dely of these its for ll 256 inputs; the ove line shows tht of WDDL S- ox nd the elow line shows tht of DP-BDD S-ox. We confirmed tht the mximum difference of trnsition timing t the output of DP-BDD S-ox (.526 ns) is out /6.5 of tht of WDDL S-ox (9.855 ns). 6 Towrds Less Difference of Trnsition Timing DP-BDD reduces the difference of trnsition timing t the output of AND-OR gtes. It is, however, desirle to reduce this difference ll the more since it could e detected y DPA. We consider tht the difference occurs y the ccumultion of the following fctors: difference of propgtion dely etween input ports of ech AND-OR gte, difference of lod cpcitnce etween input ports of ech AND-OR gte, difference of the numer of fn-out etween output signls of AND-OR gtes. In order to reduce the influence of these fctors, we pply dely djustment to inputs of DP-BDD shown in Fig. 9. On the pre-chrge phse, we don t reuire ny dely djustment cell ecuse the difference of trnsition timing t the output of ech AND-OR gte is euivlent to the difference of propgtion dely etween input port of the AND-OR gte. On the evlution phse, we insert dely cells of dely(), dely(), nd dely(c) to (A, A), (B, B), nd (C, C) respectively. By inserting the dely cell of dely(c) to (C, C), trnsition of the output of AND-OR gtes t stge occurs t the time when trnsition of C or C reches their input ports. Next, we set dely() tht stisfies dely() dely(c) is lrger thn the propgtion dely from ny input ports of AND-OR gtes t stge to ny input ports of AND- OR gtes t stge 2. Tht indictes tht trnsition of the output of AND-OR gtes t stge 2 occurs t the time when trnsition of B or B reches their input ports. Similrly, we set dely() tht stisfies dely() dely() is lrger thn the propgtion dely from ny input ports of AND-OR gtes t stge 2 to ny input ports of AND-OR gtes t stge 3. Therefore, we cn reduce the difference of trnsition timing t the outputs of ll AND-OR gtes to the difference of propgtion dely etween input port of the AND-OR gte lso on the evlution stge. It is very esy to stisfy these dely conditions ecuse we hve only to mke the difference of dely etween ny two djcent its of the input sufficiently lrge. By switching the input signls without dely nd those with dely using AND gtes, we cn successfully reduce the difference of trnsition timing t ll signls

A, A dely() f f stge 3 B, B dely() stge 2 C, C dely(c) stge Fig. 9. Dely djustment for DP-BDD in DP-BDD in oth the pre-chrge stge nd the evlution stge. We confirmed tht this dely djustment reduced the mximum difference of trnsition timing in DP-BDD S-ox to.8 ns (out /85 of tht without dely djustment), which is just the difference of propgtion dely etween the input ports nd of n OR-NAND gte. 7 Conclusion In this pper we presented the logic-level DPA countermesure clled DP-BDD. DP-BDD hs dul-ril logic style nd cn e implemented using CMOS stndrd cell lirries. Our experimentl results showed tht DP-BDD cn significntly reduce the difference of trnsition timing t the outputs of AES S-ox compred to WDDL. We consider tht DP-BDD is prcticl nd effective DPA countermesure for implementtions of S-oxes. At CHES 26, Homm et l. presented high-resolution wveform mtching sed on Phse-Only Correltion (POC) techniues nd its ppliction to DPA [5]. They climed tht the POC-sed techniues cn evlute the displcement etween signl wveforms with higher resolution thn the smpling resolution. One of further works we need to crry out is how lrge difference of the dely time etween the input signls leds to DPA lekge in rel devices using such techniues. References. S.B. Akers, Binry Decision Digrm, IEEE Trns. on Computers, Vol.C-27, No.6, pp.59-56, 978. 2. R.E. Brynt, Grph-Bsed Algorithm for Boolen Function Mnipultion, IEEE Trns. on Computers, Vol.C-35, No.8, pp.677-69, 986. 3. D. Cnright, A Very Compct S-Box for AES, CHES 25, LNCS 3659, pp.44-455, Springer-Verlg, 25.

4. Z. Chen nd Y. Zhou, Dul-Ril Rndom Switching Logic: A Countermesure to Reduce Side Chnnel Lekge, CHES 26, LNCS 4249, pp.242-254, Springer- Verlg, 26. 5. N. Homm, S. Ngshim, Y. Imi, T. Aoki, nd A. Stoh, High-Resolution Side-Chnnel Attck Using Phse-Bsed Wveform Mtching, CHES 26, LNCS 4249, pp.87-2, Springer-Verlg, 26. 6. B. Gierlichs, DPA-Resistnce Without Routing Constrints?, CHES 27, LNCS 4727, pp.7-2, Springer-Verlg 27. 7. S. Guilley, P. Hoogvorst, Y. Mthieu, nd R. Pclet, The Bckend Dupliction Method, CHES 25, LNCS 3659, pp.383-397, Springer-Verlg, 25. 8. P. Kocher, J. Jffe, nd B. Jun, Differentil Power Anlysis, Crypto 99, LNCS 666, pp.388-397, Springer-Verlg, 999. 9. S. Mngrd, T. Popp, nd B.M. Gmmel, Side-Chnnel Lekge of Msked CMOS Gtes, CT-RSA 25, LNCS 3376, pp.35-365, Springer-Verlg, 25.. Ntionl Institute of Stndrd nd Technology (NIST), Advnced Encryption Stndrd (AES), FIPS Puliction 97, 2.. T. Popp nd S. Mngrd, Msked Dul-Ril Pre-Chrge Logic: DPA-Resistnt without Routing Constrints, CHES 25, LNCS 3659, pp.72-86, Springer- Verlg, 25. 2. A. Stoh, S. Moriok, K. Tkno, nd S. Munetoh, A Compct Rijndel Hrdwre Architecture with S-ox Optimiztion, ASIACRYPT 2, LNCS 2248, pp.239-254, Springer-Verlg, 2. 3. P. Schumont nd K. Tiri, Msking nd Dul-Ril Logic Don t Add Up, CHES 27, LNCS 4727, pp.95-6, Springer-Verlg, 27. 4. D. Suzuki, M. Seki, nd T. Ichikw, DPA Lekge Models for CMOS Logic Circuits, CHES 25, LNCS 3659, pp.366-382, Springer-Verlg, 25. 5. D. Suzuki nd M. Seki, Security Evlutions of DPA Countermesures Using Dul-Ril Pre-Chrge Logic Style, CHES 26, LNCS 4249, pp.255-269, Springer- Verlg, 26. 6. D. Suzuki, M. Seki, nd T. Ichikw, Rndom Switching Logic: A New Countermesure ginst DPA nd Second-Order DPA t the Logic Level, IEICE Trnsctions 9-A(), pp.6-68, 27. 7. K. Tiri, M. Akml, nd I. Veruwhede, A Dynmic nd Differentil CMOS Logic with Signl Independent Power Consumption to Withstnd Differentil Power Anlysis on Smrt Crds, ESSCIRC 22, pp.43-46, 22. 8. K. Tiri nd I. Veruwhede, A Logic Level Design Methodology for A Secure DPA Resistnt ASIC or FPGA Implementtion, DATE 24, pp.246-25, 24. 9. K. Tiri nd I. Veruwhede, Plce nd Route for Secure Stndrd Cell Design, CARDIS 24, pp.43-58, 24. 2. E. Trichin, Comintionl Logic Design for AES SuByte Trnsformtion on Msked Dt, IACR Cryptology eprint Archive 23/236, 23. http://eprint.icr.org/23/236 2. J. Wolkerstorfer, E. Oswld, nd M. Lmerger, An ASIC Implementtion of the AES S-oxes, CT-RSA 22, LNCS 227, pp.67-78, Springer-Verlg, 22. 22. C. Yng, M. Ciesielski, V. Singhel, BDS: A BDD Bsed Logic Optimiztion System, Proc. of the 37th ACM/IEEE DAC 2, pp. 92-97, 2.