A COMPARISON OF CIRCUIT IMPLEMENTATIONS FROM A SECURITY PERSPECTIVE

A COMPARISON OF CIRCUIT IMPLEMENTATIONS FROM A SECURITY PERSPECTIVE Mster Thesis Division of Electronic Devices Deprtment of Electricl Engineering Linköping University y Timmy Sundström LITH-ISY-EX--05/3698--SE Supervisor: Atil Alvndpour Exminer: Atil Alvndpour Linköping, June 8, 2005

Avdelning, Institution Division, Deprtment Institutionen för systemteknik 581 83 LINKÖPING Dtum Dte 2005-05-26 Språk Lnguge Svensk/Swedish X Engelsk/English Rpporttyp Report ctegory Licentitvhndling X Exmensrete C-uppsts D-uppsts Övrig rpport ISBN ISRN LITH-ISY-EX--05/3698--SE Serietitel och serienummer Title of series, numering ISSN URL för elektronisk version http://www.ep.liu.se/exjo/isy/2005/3698/ Titel Title Förfttre Author En jämförelse v logik stilr ur ett säkerhetsperspektiv A comprison of circuit implementtions from security perspective Timmy Sundström Smmnfttning Astrct In the lte 90's reserch showed tht ll circuit implementtions were susceptile to power nlysis nd tht this nlysis could e used to extrct secret informtion. Further reserch to counterct this new thret y dding countermesures or modifying the underlying lgorithm only seemed to slow down the ttck. There were no ojective nlysis of how different circuit implementtions lek informtion nd y wht mgnitude. This thesis will present such n ojective comprison on five different logic styles. The comprison results re sed on simultions performed on trnsistor level nd show tht it is possile to implement circuits in more secure nd esier wy thn wht hs een previously suggested. Nyckelord Keyword Power nlysis, DPA, logic styles, secure implementtions, security comprison

ABSTRACT In the lte 90 s reserch showed tht ll circuit implementtions were susceptile to power nlysis nd tht this nlysis could e used to extrct secret informtion. Further reserch to counterct this new thret y dding countermesures or modifying the underlying lgorithm only seemed to slow down the ttck. There were no ojective nlysis of how different circuit implementtions lek informtion nd y wht mgnitude. This thesis will present such n ojective comprison on five different logic styles. The comprison results re sed on simultions performed on trnsistor level nd show tht it is possile to implement circuits in more secure nd esier wy thn wht hs een previously suggested.

1 TABLE OF CONTENTS 1 INTRODUCTION 3 1.1 Bckground............................................ 3 1.2 Aim of the thesis........................................ 3 1.3 Lyout of the report...................................... 3 2 CRYPTOGRAPHY 5 2.1 Pulic nd privte key cryptogrphy......................... 5 2.2 Asymmetric cryptogrphy................................. 5 2.3 Symmetric cryptogrphy.................................. 5 2.4 The Diffie-Hellmn key exchnge........................... 6 2.5 Exmples of symmetric nd symmetric cryptogrphy........... 6 2.5.1 RSA.............................................. 6 2.5.2 Elliptic curve cryptogrphy............................ 6 2.5.3 AES.............................................. 7 2.6 Cryptogrphy for emedded systems......................... 7 3 POWER ANALYSIS 9 3.1 Power nlysis ttcks.................................... 9 3.2 SPA Attck............................................. 9 3.3 DPA Attck............................................ 9 3.3.1 How to perform DPA ttck........................ 10 3.3.2 Improving the signl to noise rtio..................... 10 3.4 Existing countermesures for power nlysis................. 10 3.4.1 Algorithmic countermesures for SPA.................. 10 3.4.2 Algorithmic countermesures for DPA................. 11 3.4.3 Noise insertion.................................... 11 3.4.4 A hrdwre countermesure.......................... 11 3.4.5 Using power independent logic style................... 11 3.5 An exmple of how to perform power nlysis.............. 11

2 4 CIRCUIT STYLES 13 4.1 Sttic CMOS.......................................... 13 4.1.1 Timing of sttic CMOS.............................. 14 4.1.2 Informtion lekge in sttic CMOS................... 14 4.2 Dynmic CMOS........................................ 15 4.2.1 Timing in dynmic CMOS........................... 17 4.2.2 Informtion lekge in dynmic CMOS................. 17 4.3 Differentil Domino..................................... 17 4.4 Informtion lekge in differentil domino circuits............. 18 4.5 CRSABL............................................. 18 4.6 DyCML.............................................. 19 5 IMPLEMENTATION 23 5.1 Circuit comprison...................................... 23 5.2 Assumption........................................... 23 5.3 Test setup............................................. 24 5.4 Implementtions........................................ 24 5.5 Sttic CMOS.......................................... 25 5.6 Dynmic CMOS........................................ 29 5.7 Differentil Domino..................................... 34 5.8 CRSABL............................................. 39 5.9 DyCML.............................................. 46 6 COMPARISON 55 6.1 The stndrd devition................................... 55 6.2 Power consumption nd dely............................. 57 7 CONCLUSIONS 59 7.1 The different logic styles................................. 59 7.2 Wht cn e done to counterct the lekge.................. 59 7.3 Security trde-off?..................................... 59 8 REFERENCES 61

3 1 INTRODUCTION This mster thesis will present circuit technology comprison from security perspective. We will see tht security is trde-off for performnce. A discussion will follow out the roles of different circuits styles in system. Becuse some systems, not properly implemented, will lek informtion tht could led to the extrction of secure dt, we wnt to minimize this lekge nd this thesis will show how this cn e done. 1.1 Bckground In 1998, Kocher et l. [1,2] presents n ttck on secure systems sed on sttisticl nlysis. This provided wy to lern secret informtion stored in systems such s smrt crds. The nlysis is mde possile ecuse informtion cn e retrieved from studying the power consumption of system. In this thesis the informtion lekge of different circuit styles will e presented nd wht one cn do to prevent this. 1.2 Aim of the thesis The im of this thesis is to provide comprtive nlysis of circuits from security perspective. How different logic styles lek informtion nd how insecure they re in reltion to ech other. Finlly we will see wht we cn do to minimize this lekge in implementtion. 1.3 Lyout of the report In this report we will first see how cryptogrphy is used in different systems so s to provide sic understnding of how sensitive informtion is kept sfe nd the strength nd weknesses of this when implemented in rel system. After this the different nlyses is presented nd we will see how leked informtion is eing used to extrct useful dt nd some existing countermesures for these ttcks. This will e followed y presenttion of the logic styles tht will e used in the comprison nd some comments out them. After this the test setup nd the implementtion of the circuits will e shown together with the results nd discussion. In the end we will see the conclusions nd wht one cn do to minimize this informtion lekge.

5 2 CRYPTOGRAPHY Cryptogrphy is the mthemticl pproch of keeping informtion secure. Encryption nd decryption of plintext is done using secret key nd the strength of the crypto lies in mintining this keys secrecy. There exists vrious different cryptogrphic schemes nd we will strt y clssifying them nd then give some exmples of existing schemes used tody. 2.1 Pulic nd privte key cryptogrphy Encryption of text could e done with either pulic- or privte key cryptogrphy, lso known s symmetric nd symmetric cryptogrphy respectively. Using symmetric scheme the encryption nd decryption is done with the sme key nd privcy cn only e gurnteed of this key remins secret. In symmetric cryptogrphy one key is used for encryption nd nother for decryption, therefor the encryption key could e mde pulicly ville nd only the person with the secret decryption key would e le to red the messge. 2.2 Asymmetric cryptogrphy The strength of symmetric cryptogrphy is tht the key used for encryption does not hve to remin secret. Everyone who wish to send informtion could do so without the risk of third prty picking up the secret messge. The prolem with symmetric cryptos is tht they often require complex mthemtics nd encrypting long messge in n emedded system often tkes long time. 2.3 Symmetric cryptogrphy As opposed to symmetric cryptos, implementing symmetric ones is often esy nd encrypting messge tkes virtully no time compred to symmetric encryption. The downside is tht when oth sender nd receiver must use the sme key for encryption nd decryption, they hve prolem on how to decide on which key to use, since the sme key should not e used in more thn one trnsction. In order to prevent third prty from picking up this informtion exchnge nd lerning the secret key there hve to exist method to select key nd letting oth prties shre this informtion. Either the two

6 prties must meet or communicte over non-secure line to decide upon the key or use key exchnge scheme. One such method for deciding upon secret key is the Diffie-Hellmn key exchnge protocol. It uses symmetric cryptogrphy to decide upon the secret key which could lter e used s the privte key for symmetric encryption. 2.4 The Diffie-Hellmn key exchnge We ssume tht the two prties A nd B re using the sme symmetric crypto function F, tht given plintext P nd secret key k returns the ciphertext C. FkP (, ) = C A nd B now ech decide on secret key, k nd k, preferle rndom for ech new trnsction, nd given plintext P clculte C x nd trnsmit this to ech other. A then uses C nd B used C s plintext nd encrypt this one more time resulting in X. This vlue X is now the secret key which cn e used for symmetric cryptogrphy. A third prty trying to listen to this exchnge will only lern P nd the intermedite vlues C nd C. Given these it is computtionlly hrd to derive the keys k nd k nd the finl key X. Fk (, P) = C Fk (, P) = C Fk (, C ) = X Fk (, C ) = X This works under the ssumption tht C C Figure 1: The Diffie-Hellmn key exchnge Fk ( 1, Fk ( 2, P) ) = F( k 2, Fk ( 1, P) ) If this is not the cse, we must hve nother representtion of the exchnge. This is not the definition of the Diffie-Hellmn key exchnge ut n exmple of how it cn e used in prctice. 2.5 Exmples of symmetric nd symmetric cryptogrphy 2.5.1 RSA One scheme used for pulic key cryptogrphy is RSA. It is nmed fter its inventors Rivest, Shmir, nd Adlemn. It is sed on modulr exponentition nd its strength lie in the fct tht it is hrd to clculte prime fctors of lrge integer. In order to remin resonly sfety it is recommended to use key length of 1024 its. 2.5.2 Elliptic curve cryptogrphy Another pulic key system is elliptic curve cryptogrphy (EEC) proposed in [3] y N. Kolitz in 1997. It is more pproprite for the use in emedded systems such s smrt crds ecuse of the incresed security for given key length compred to RSA. For EEC to remin s sfe s RSA t 1024 its only key length of 163 its is required.

7 2.5.3 AES In 2001 the Rijndel lgorithm ws chosen s the new Advnced Encryption Stndrd (AES) y NIST [4]. The AES crypto is symmetric crypto tht uses vrile key length of 128, 192 or 256 its. It is lock cipher tht encrypts 128 its of plintext t time nd uses severl rounds of encryption. This could esily e implemented in n emedded system nd is the stndrd cryptogrphic scheme used tody. 2.6 Cryptogrphy for emedded systems In order to implement secure cryptogrphic scheme on emedded systems such s smrt crds we need to mke sure tht the system is cple of hndling these ciphers. Using the EEC symmetric crypto in the Diffie-Hellmn key exchnge nd the AES for symmetric encryption is vile option for implementtion in smrt crd.

9 3 POWER ANALYSIS A crypto system is idelly considered lck ox tht given plintext nd secret key outputs ciphertext. The strength of the system lies in the fct tht extrcting the key from the output should not e possile within resonle mount of time. Unfortuntely no implementtion of crypto system is idel nd dditionl informtion from side-chnnel lekge will e ville. Without creful design system which strength should lie in keeping the key secret could esily e roken using this lekge. In [1,2] Kocher et l. presents side-chnnel ttcks (SCAs) sed on simple nd differentil power nlysis, SPA nd DPA. Countermesures to power nlysis hs een proposed on different levels of the design nd lthough these techniques hs provided resistnce ginst DPA, improvements of the sme is still useful for reking these crypto systems. 3.1 Power nlysis ttcks This section will provide n introduction to the side-chnnel power nlysis ttcks nd exmples of countermesures used for resistnce ginst the sme. Power nlysis ttcks is sed on the fct tht the momentneous power consumption of system is correlted to the internl stte. This informtion lekge could for exmple provide such knowledge s the hmming weight of word eing processed. 3.2 SPA Attck The SPA ttck is performed y directly oserving the power consumption of system. We tke simple crypto system s n exmple, for round in the encryption the ctions performed is relted to specific it in the key. If k i is zero n ddition is executed nd if the it is one multipliction is performed insted. Since multipliction consumes more power thn n ddition exmining the power consumption of the system will give us informtion of this it. All conditionl execution tht depends on the secret key cn e extrcted using SPA. 3.3 DPA Attck In norml hrdwre implementtion the power consumption of logic gte is dependent of the inputs. This smll difference will not e directly visile on power trce due

10 to the interfering noise coming from other prts of the system running simultneously. When looking t the difference verged over lrge numer of trces, so tht the uncorrelted noise is suppressed, the difference will e seen. This sttisticl pproch is the se of the differentil power nlysis ttck. 3.3.1 How to perform DPA ttck In order to successfully perform DPA ttck we need to smple the power consumption for N encryptions. Ech run gives power trce S i [j] where j is time of the smple nd i represents which power trce rnging from 1 to N. A prtitioning function D is then used to divide the power trces into two sets S 0 nd S 1. S 0 S 1 = = { S i []D j = 0} { S i []D j = 1} The function D should e dependnt of the secret key s well s known vriles such s the plintext or the ciphertext depending on which is ville. We now clculte the verge of these two sets nd the difference of the verges. A 0 [] j A 1 [] j = = 1 -------- S S i [] j 0 S i [] j S 0 Since D is function of the secret key, the difference T[j] will e dependnt of this s well. If our guess of the secret key is correct the function T[j] will e chrcterized y spikes while if the key ws incorrect the prtitioning will e done t rndom nd T[j] will not show nything ut noise. The spikes on T[j] will e the difference in power dissiption in the logic gtes mentioned erlier. 3.3.2 Improving the signl to noise rtio Depending on the system nd which countermesures re used we might need to improve the DPA ttck to successfully identify the correct key. In [5] Messerges et l. gives severl methods on how to improve the signl to noise rtio for the DPA ttck. 3.4 Existing countermesures for power nlysis Vrious countermesures exist tht try to prevent DPA ttcks. Most cn still e sujected to successful DPA ttck if the method of the ttck is chnged ccording to the specific countermesure. The most promising countermesure so fr is the use of logic gtes tht try to eliminte the source tht mkes power nlysis possile, mking the power dissiption independent of signl vlue nd sequence. 3.4.1 Algorithmic countermesures for SPA Implementing countermesure for SPA ttck on the lgorithmic level is esily done, usully t the cost of incresed execution time nd power consumption. Mking the 1 -------- S S i [] j 1 S i [] j S 1 T[] j = A 0 [] j A 1 [] j

11 instructions executed nd their order independent of the secret key will provide security ginst SPA. The only dt dependence in the power consumption tht then remins is tht of the individul gtes nd this smll vrition will drown in the noise cused y other prts of the system. 3.4.2 Algorithmic countermesures for DPA The reson DPA works is tht the power dissiption is correlted to the secret key. A countermesure on the lgorithmic level would e to msk the secret key so tht the sme inputs give rise to different sttes within the system ut still gives the sme output. This would cuse the correltion etween specific intermedite it nd the secret key to e rndomized reducing the success rte of DPA ttck. Since the strength of crypto system should lie in the secret key nd not in keeping the lgorithms secret, we should ssume tht the ttcker hs informtion out the countermesures used nd could choose different prtitioning function to sidestep the msking procedure nd similr protections. 3.4.3 Noise insertion Adding white noise to the power source will corrupt the power trces. Since the verging method of the DPA ttck will eliminte this noise this method will only increse the numer of trces required for successful ttck. 3.4.4 A hrdwre countermesure In [6] Clvier et l. shows tht the hrdwre countermesure known s rndom process interrupts (RPIs) which inserts rndom dummy instructions is susceptile of n improved DPA ttck clled Sliding Window DPA. The RPI countermesure is used to rndomly spred out the spikes tht should occur in the difference function T[i]. This is ttcked y integrting the power trce over window centered on the men shift cused y the RPI reconstructing difference signl. Using the RPI countermesure will only increse the mount of power trces N needed to successfully perform DPA. 3.4.5 Using power independent logic style In [7,8,9] Tiri et l. introduces new logic style designed to eliminte the source of the power consumption difference. Mking the gtes power dissiption independent of the input vlue nd sequence. If gte could e perfectly relized the source of the DPA ttcks would e removed since there would not e ny difference in power consumption to trget. 3.5 An exmple of how to perform power nlysis To perform power nlysis one needs to cquire the power trces to hve enough mteril for nlysis. This is done y mesuring the current flowing into the device, for exmple y connecting n oscilloscope which smples the dt t specific times in series with the device nd the power supply. Here is two such power trces cquired y simulting the current of hlf dder implemented in differentil domino. The first trce is when one input is high nd the other low cusing the sum to e high nd the crry it to e low, while in the other trce oth inputs re high which mens the sum is low nd the crry is high.

12 Current (ma) 1.5 1 0.5 0 0.5 1 0 5 10 Time (ns) Current (ma) 1.5 1 0.5 0 0.5 1 0 5 10 Time (ns) Figure 2: Two power trces of hlf dder When looking t these two trces it is hrd to see ny differences etween them. In the eginning of the cycle the two peks looks little it different ut tht is ll. Power nlysis is extrcting informtion in these smll differences. If we look t plot displying the difference etween the two power trces these smll vritions will ecome visile. Current (ma) 0.2 0 0.2 0.4 0 5 10 Time (ns) Figure 3: The difference etween the two power trces We see tht the difference hs distinct peks which mens tht the two power trces relly re different. If one wnts to extrct informtion from complete circuit specific gte must e trgeted. When smpling the power one cnnot mke distinctions such s where the power ws consumed. This is wht the DPA ttck tkes cre of. If we see this hlf dder s prt of ig system the power consumption of this smll prt will e drowned y everything else going on t the sme time. The two trces will e overlid y the power consumed y ll other prts. The reson the DPA ttck will work now is tht the power consumption of ll other prts is independent of the hlf dder. Which mens tht when tking enough power trces into considertion nd looking t the difference etween the two sets we will still hve something tht looks similr to the difference plot. Tht is, the distinct peks will still e there. The hrd prt is to successfully divide ll the power trces into two sets in which the difference hs ny mening. This selection is usully sed on the given outputs t the time of smpling nd guess of secret key, if this guess is incorrect the selection of the two sets will e rndom nd the difference plot will e flt. If on the other hnd the guess is correct we will see trce with distinct peks, signifying tht our guess ws correct.

13 4 CIRCUIT STYLES This chpter will introduce the different logic styles tht will e used in the comprison. We will present how they work nd give n exmple of simple circuit using this style, lso short discussion out the pros nd cons of using ech style will follow the initil presenttion. 4.1 Sttic CMOS Sttic CMOS is the most sic circuit style used when designing circuits, it consists of pull-up nd pull-down network (PUN nd PDN respectively) where one nd only one conducts for ll input signls. These mutuly exclusive networks provide pth to one ril in stedy stte. In i PUN F(In i ) In i PDN Figure 4: Sttic CMOS structure

14 If one should uild simple inverter circuit in sttic CMOS, it would look like this Figure 5: Inverter in sttic CMOS The dvntge of sttic CMOS logic is its roustness nd tht it is esy to use. It is resonly good in speed nd re compred to generl circuit style. 4.1.1 Timing of sttic CMOS In sttic CMOS the timing of signls follows n esy scheme, in the eginning of the clock cycle the dt chnges vlue nd the chnge is then rippled through the stges without the need of clock to enle them. clock period dt Figure 6: Timing of sttic CMOS 4.1.2 Informtion lekge in sttic CMOS Depending on the input vlues nd sequence the power consumption of sttic CMOS gte vries. If there is no trnsition, nothing hppens in the gte nd there is no power consumption. But if there is trnsition on the inputs, so tht the output chnges vlue the cpcitnce t the output node must either chrge or dischrge cusing slight vrition in power consumption etween the two cses nd we cn retrieve tht informtion from the power profile.

15 0-0 trnsition 0-1 trnsition 1-0 trnsition 1-1 trnsition Figure 7: Power consumption of different trnsitions The only time sttic CMOS drws power is during the chnge of stte, nmely the 0-1 nd 1-0 trnsition on the output, this cuses the two rils to e shorted for time nd current is flowing from V dd to ground. This current vries depending on which trnsition we hve, if we re to chrge the output node, more current must come from V dd nd if we must dischrge it less current hs to flow from V dd nd it is this smll chnge tht we cn oserve in the power consumption. 4.2 Dynmic CMOS In dynmic logic, one of the networks is replce y clocked prechrge trnsistor nd n evlution trnsistor is connected in series with the remining network. This decreses the re nd the evluting time of the circuit compred to sttic CMOS. If we replce the pull-up network we get n n-type network nd when replcing the pull-down we get p-type. There re two wys to cscde dynmic gtes, using either Domino rules or NP rules nd these two differ in tht using Domino rules, we use only n-type networks with

16 sttic inverter in etween, while in the NP cse one lterntes etween n-type nd p-type gtes. In this thesis oth types will e used. CLK F(In i ) In i PDN CLK Figure 8: Dynmic CMOS structure An exmple is the simple inverter, which in dynmic CMOS looks like this CLK CLK Figure 9: Inverter in dynmic CMOS The dvntge of using dynmic CMOS is tht it is very fst implementtion style ut more sensitive to noise thn sttic CMOS. One wy to improve the roustness is to dd keeper, which provides pth to one of the rils t ll times, so tht the chrge does not dissipte or lek wy. CLK CLK Figure 10: Dynmic inverter with n dded keeper

17 4.2.1 Timing in dynmic CMOS When using dynmic CMOS the input signls is only llowed to mke one trnsition during the evlution phse. In n N-lock the trnsition is conditionl 0-1 nd in P- lock it is 1-0 trnsition. This will cuse prolems when we will implement our ripple crry dder since the inputs ville might chnge when the crry ecomes stedy. This will force us to dd restoring circuit mking this style not following the stndrd domino or NP rules. 4.2.2 Informtion lekge in dynmic CMOS A dynmic gte works differently thn sttic CMOS gte, y dding clock we get two intervls, prechrge nd evlution. In prechrge the gte output is set to high (one) independent of the previous vlue. During evlution the gte conditionlly dischrges depending on the input. If the output is low nd the gte enters prechrge, chrge is trnsferred to the output cpcitnce from V dd. The other cse is when the gte enters evlution nd the output is going low, then output node is dischrged to ground. This moving of chrge will e seen in power nlysis nd hence the internl stte of the gte will e known. 1-1 trnsition t prechrge 0-1 trnsition t prechrge 4.3 Differentil Domino 1-0 trnsition t evlution 1-1 trnsition t evlution Figure 11: Power consumption t trnsition in dynmic gte Differentil domino consists of two differentil dynmic gtes tht re cross coupled to provide stle output. The two pull-down networks re mutuly exclusive so tht during evlution, one output node is lwys pulled down. When this hppens the cross cou-

18 pled inverter pir kicks in so tht when one node is pulled down, the other is kept high removing the need for keepers s in dynmic logic. For differentil domino to work, oth the input signls nd their complements must e ville. Since the output of differentil domino gte is oth the logic function nd its complement this is no prolem when cscding differentil gtes. out CLK CLK out In i Differentil PDN CLK Figure 12: Structure of differentil domino gte The differentil domino logic is close to dynmic logic in speed, it is more roust ut hs incresed re compred to domino implementtions. 4.4 Informtion lekge in differentil domino circuits Since the dul ended logic cn e seen s contining norml gte nd its complement, the internl trnsitions is independent of the input vlues nd sequence. This is unfortuntely only true in the idel cse, in order for the trnsitions to e identicl oth the two prts must e electriclly identicl. Tht is, the pull-down pths must hve the sme resistnce nd the sme internl lod cpcitnce t ll nodes. Due to process vritions nd the fct tht some functions re hrd to implement identiclly s their complement the differentil domino circuit will lso lek some informtion, lthough less thn the previous logics. 4.5 CRSABL CRSABL (Chrge Recycling Sense Amplifying Bsed Logic) is circuit style developed y Tiri et l in [9]. It is sed on the differentil domino logic with certin modifictions, intended to lower the dt dependency s well s the men vlue of the power consumption. CRSABL utilizes chrge recycling scheme which uses the chrge lredy stored t the output nodes to equlize them during prechrge, the cross coupled PMOS pull-up pir will chrge the nodes to vlue of V dd -V th which is less thn V dd therefor using less power in prechrge thn differentil domino. In order to design the differentil pull-down network certin restrictions re lso in plce. All the internl nodes in the differentil pull-down network must e connected to one output node for ll input comintions, this gurntees tht during switching event the lod cpcitnce hs constnt vlue, mking them closer to electriclly identicl. The sme pplies to CRSABL s differentil

19 domino when it comes to informtion lekge, idelly the gtes would consume the sme power independent of input ut process vritions nd implementtion cuses the two networks to e slightly imlnced, llowing informtion to e ville through power nlysis. CLK internl internl V internl out In i Differentil PDN internl CLK Figure 13: Structure of CRSABL gte nd the level restoring circuit When cscding CRSABL gtes we use the pproch of differentil domino nd inverts the outputs. But since the prechrge phse does not fully chrge the nodes to V dd we cnnot use sttic inverts since we would hve sttic power consumption in those. This is solved y level restoring logic on ech output. 4.6 DyCML DyCML, which stnds for dynmic current mode logic, is reduced swing logic tht is sed upon comintion of MOS current mode logic (MCML) nd dynmic logic. It ws developed y Allm et l. [10] nd never intended s solution to the informtion lekge prolem. By hving differentil network tht never provides direct pth etween the two rils it is still interesting to exmine if the power consumption is dt dependnt. By removing the resistive lods in n MCML gte nd replcing them with clocked PMOS trnsistors nd dding cross coupled pir to this we rech setup which looks very much like the differentil domino. Insted of dischrging the output nodes to ground the DyCML logic hs virtul ground which is mde up y n NMOS trnsistor connected s cpcitnce. This virtul ground will e dischrged during prechrge nd one of the output nodes will e connected with the virtul ground, equlizing the voltge

20 t oth nodes. Since this voltge will never e zero, the DyCML logic lso hs reduced swing. CLK CLK out out In i Differentil PDN CLK Figure 14: Structure of DyCML gte The prolem of using the DyCML logic is twofold. First it is very sensitive to noise, the low output node is floting during evlution nd the chrge here my e destroyed from cross-tlk or other noise sources. Second, cscding DyCML gtes cuses prolems. Since we hve reducing swing, the NMOS trnsistors in the pull-down network will e leding ll the time, either fully or prtilly. This will cuse oth output nodes to dischrge to the virtul ground nd cusing the gte to mlfunction. Due to this one is forced

21 to use complex clocking scheme in which the evlution phse of the next gte strts fter the previous one hs stle outputs. clock 1 clock 2 clock 3 Figure 15: Three clocks in DyCML shown in reltion to ech other

23 5 IMPLEMENTATION In order to compre how the different circuits lek informtion, we uilt one 8-it ripple crry dder in 0.35 µm, 3.3V CMOS process for ll of the chosen circuit styles. The generl design ws to uild one hlf dder nd one full dder cell nd to connect them s following HA FA FA FA FA FA FA FA 5.1 Circuit comprison Figure 16: Structure of n 8-it ripple crry dder The power consumption ws studied for 1000 rndom input dt comintions nd ech power trce ws smpled t 1000 regulr points. With clock period of 10 ns, this gives us smpling period of 10 ps or, 100 GHz. A norml implementtion on smrt crd usully runs t speed of 10 MHz [7] nd stte of the rt oscilloscope is cple of 40G/s which would give us lower resolution thn simulted t this smpling rte. The chrcteristics tht ws gthered on ech logic style ws the worst cse dely, the verge power consumption nd the stndrd devition of the power consumption. 5.2 Assumption The min ojective of the comprison is to mesure the stndrd devition of the power consumption. If the power consumption is dt dependent this will show s lrger stndrd devition. If the devition would e zero long the entire power trce this would men tht it is totlly independent of dt.

24 The ssumption tht this comprison is sed upon is tht the stndrd devition of the power consumption is mesure of how much or how esy informtion cn e extrcted. Tht is, ny dt dependency cn e used to extrct informtion nd ll informtion lekge must e considered negtive. 5.3 Test setup When mesuring the power we tke the following prts of the test ench into ccount: the input drivers, the clock driver nd the entire 8-it ripple crry dder. To mke the simultion s relistic s possile, the input signls comes from gtes of similr structure nd the output of the dder is connected to similr gtes to hve relistic lods. inputs 8-it dder clock 5.4 Implementtions Figure 17: The test setup The following pssges will present the implementtion of ech logic style, how the ripple crry dder ws designed, motivtions why the trnsistors re sized the wy they re nd prolems with the specific implementtion. Ech implementtion will e followed y the simultions run on the circuit nd their results. In ll logic styles the structure of the ripple crry dder ws the sme, the first stge is hlf dder which is followed y seven full dders for totl of eight its. To clrify wht hlf nd full dder is, hlf dder tkes two inputs A nd B nd outputs the sum nd crry while the full dder lso hs crry in s n input. The dders hve the following truth tle. Tle 1: Truth tle for full dder (first four rows equls hlf dder) A B Crry in Sum Crry out 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 0 1 0 1 0 1 0 1 1 0 1 1 1 1 1 1

25 5.5 Sttic CMOS Implementtion in sttic CMOS follow specific design rules which re esy to use. The design of pull-up nd pull-down networks comes from sic techniques such s re-writing the desired output function. In ripple crry dder the crry chin is the vitl prt nd this is the one tht hs to e optimized for speed. In ll logic styles the sizing is designed to give the result of equl fll nd rise time of n inverter with the following sizes. 1.2 Figure 18: The minimum sized inverter This is not the minimum size supported y the design rules since this is 0.35 µm process, the minimum sizes chosen is just reference for ll logics. This is how the hlf dder ws designed in sttic CMOS together with the trnsistor sizes next to them. This size is the trnsistor width in µm so width of 1.2 µm corresponds to W/L rtio of 1.2/0.35. 7.2 5.4 5.4 7.2 2.4 2.4 2.4 10.8 crry out 1.2 sum Figure 19: Hlf dder implemented in sttic CMOS

26 The first prt of the hlf dder is the crry genertion, the crry is then propgted to the next step while t the sme time eing used to generte the sum. As rule ll the sizing should e done so s to get equl fll nd rise times in the internl nodes s well s on the output. Looking t the lst prt prior to the sum output inverter the pull-down nd pull-up oth stisfy the 1.2/ minimum size ut the crry genertion hs to e sized up to compenste for higher lod in the next stge. We see tht the crry inverter is triple the size of the sum inverter so to drive this the first networks were sized ccordingly. The full dder hs similr structure only with three inputs. 14.4 14.4 14.4 cin 7.2 7.2 7.2 7.2 10.8 10.8 cin 14.4 14.4 cin 10.8 cin 4.8 4.8 cin 4.8 4.8 4.8 2.4 cin 2.4 2.4 2.4 10.8 crry out 1.2 sum Figure 20: Full dder implementtion in sttic CMOS The sme size ws used for the two output inverters s in the hlf dder cse nd ll the networks in the circuit hs to e sized ccordingly. Implementtion in sttic CMOS ws stright forwrd nd no prolems cme up tht needed specil ttention. Ech dder ws simulted seprtely nd then connected ccording to the figure in the eginning of the chpter with the crry output of one stge

27 eing the crry input of the next. This forms the 8-it ripple crry dder tht ws simultions nd the properties ws mesured. Becuse we wnt to include the power needed to drive the inputs in the power consumption the inverters tht drive the 8-it dder is included in the dder lock. The test ench tht is used hs 16 inputs ech pssed through nd inverter to mke the signl more relistic nd fed to the dder lock. The outputs re lso pssed through inverters to simulte relistic lod. inputs 16 its 9 its 8-it dder Figure 21: The test ench setup The 16 input its re the eight its for input A nd B nd the nine output its re the eight sum its nd the crry output of the lst stge. One typicl sum output signl of one of the more significnt its could look like this. 3 Voltge (V) 2 1 0 0 2 4 6 8 10 Time (ns) Figure 22: An output of sttic CMOS under one clock cycle In the eginning of the phse the inputs chnges vlue cusing the crry to ripple through the chin. Soon fter the phse strts the output goes low ut when the correct crry input is ville, when it hs rippled through to this stge, the output once gin goes high nd remins stedy. Not ll output signls ehve this wy, only the ones tht tkes on flse vlue efore the correct crry is ville s n input. Other output signls my only mke the first trnsition nd remin there while others will not chnge t ll. To extrct the informtion used s comprison with the other logic styles the dder ws simulted under 1000 cycles with the inputs rndomly generted using Mtl. The current drwn from the power source ws smpled t 1000 points with regulr intervls for ech of

29 ences etween this men power trce nd rndom power trce which cn e seen in the stndrd devition s lrger vlue. 8 Current (ma) 6 4 2 0 0 2 4 6 8 10 Time (ns) Figure 24: The men current nd the stndrd devition (dshed) of sttic CMOS The mximum dely ws mesured using the worst cse scenrio where the crry hs to ripple through ll stges. The power consumption is clculted y integrting the men current multiplied with the supply voltge. The totl trnsistor re is used s comprison etween the circuits nd is not the ctul re the implementtion would tke on chip ut guidnce s to wht the rtio etween them would proly e. The trnsistor re is clculted y dding together ll trnsistor widths (one hlf dder nd seven full dders) nd multiplying y the trnsistor length. This gives the following dt of the sttic CMOS circuit style. Tle 2: Sttic CMOS properties Dely (ns) Power consumption (mw) Trnsistor re ( µm 2 ) 1.81 1.51 504 5.6 Dynmic CMOS Implementing the dder in dynmic CMOS ws it more troulesome thn using stndrd sttic CMOS. First mixture of different dynmic design techniques were used, the crry chin cn e seen s domino connected logic while the crry to sum stges is connected using NP rules. Becuse the output of dynmic gte cn only mke on trnsition during clock cycle the prolem where the crry rrives too lte must e tking into considertion. Imgine stge lte in the chin where the crry input is prechrged low. Both inputs A nd B re low, which cuses the sum to e low. This implementtion prechrges the sum high so when the circuit strts to evlute it will drive the sum low to the correct vlue. When the correct crry then ripples to this stge nd is high the output hs lredy

30 mde its trnsition nd it cnnot e pulled ck to high unless the dynmic structure is modified with restoring logic. clk 0.4 crry out clk 7.2 5.4 14.4 5.4 7.2 14.4 clk 5.4 clk 1.2 sum 0.4 10.8 crry out 1.2 sum Figure 25: Implementtion of hlf dder in dynmic CMOS clk 0.4 crry out cin 6.6 6.6 crry out (to sum genertion stge) 6.6 6.6 6.6 10.8 crry out clk 6.6 Figure 26: The crry genertion stge of full dder in dynmic CMOS

31 clk 14.4 14.4 9.6 9.6 cin 9.6 14.4 9.6 cin 14.4 sum crry out (from previous stge) cin 2.4 2.4 4.8 sum clk 1.2 0.4 1.2 4.8 Figure 27: Sum genertion stge of full dder in dynmic CMOS In ll this three circuits (the ltter two form the full dder) we see the specil keeper structure within dshed ox, this feedck is used to keep the chrge from leking wy ecuse it is dynmic node nd without the keeper nothing is there to gurntee tht the chrge is mintined. The sizing of this trnsistor is smller thn the minimum size used elsewhere ecuse it shll not hinder the signl mking trnsition only to mintin the prechrged signl level. In the sum genertion of the full dder we lso see the specil restoring structure which is dded to prevent the rce condition descried erlier, this structure ws designed in [11] y C-J Fng et l. If the crry input rrives lte nd is high this restoring structure pulls down the internl node nd returns the output to high, which is the correct vlue. Sizing ech pull-down or pull-up network follows the sme sic principle s sttic CMOS. The testench hs similr structure s the sttic cse, only with the clocks dded. Both the clock nd its inverse must e ville for the circuit to

32 function nd these clock drivers re lso included in the clcultion of totl power, together with the input drivers. inputs 16 its 8-it 9 its dder clock 2 clocks Figure 28: The test ench setup of dynmic CMOS 3 Voltge (V) 2 1 0 0 2 4 6 8 10 Time (ns) Figure 29: Output nd clock (dshed) under one cycle in dynmic CMOS This is n exmple of how one sum output it of the dynmic CMOS logic could chnge under one cycle. This is the specil cse discussed where the signl incorrectly pulls low nd lter when the crry hs rippled through it goes high. During the first hlf of the cycle when the clock is low, the circuit is in the stte of prechrge nd ll sum outputs re high. When the clock goes high it enters evlution nd signls my mke trnsition to low. The sum outputs of the ripple crry dder is outputs from n P-network which mkes them internlly prechrge low, ut when the signl psses through n inverter to the outside this signl is pre-chrged high, s seen in the figure. The dely of the circuit is the worst cse of the time it tkes for ll outputs to e stle fter the clock goes high.

34 8 6 Current (ma) 4 2 0 2 4 0 2 4 6 8 10 Time (ns) Figure 31: The men current nd the stndrd devition (dshed) of dynmic CMOS The stndrd devition of dynmic CMOS looks out the sme s the sttic cse during evlution. The devition during prechrge is higher nd more dt dependnt. The properties of dynmic CMOS ws gthered in the sme wy s sttic CMOS nd ll the other logic styles. Tle 3: Dynmic CMOS properties Dely (ns) Power consumption (mw) Trnsistor re ( µm 2 ) 1.59 3.19 469 5.7 Differentil Domino The following three logic styles re the ones tht re most interesting when it comes to the comprison of leked informtion. Since ech style now hs n uilt in dulity the power consumption would idelly e independent of input vlues nd sequence ut s discussed erlier this is not the cse. The sme principle is used for the hlf nd full dder s in the previous logic styles, the crry generted in one stge is used oth in the next stge nd for the sum genertion in the sme. Both dders re mde up of two seprte gtes, one for crry genertion nd one for sum genertion. All these re designed in the sme generl wy with only the pull-down network different in ll cses. We see the output inverter sized to mtch the next stge nd the entire pull-down network is sized s the NMOS device in the minimum inverter. In the hlf dder nother specil structure is dded to the circuit, for exmple in the crry gte we see the inverse of input B eing connected to lone NMOS trnsistor which functions s cpcitnce. This is to mke the circuit more independent of input dt. The pull-down pths of the network must hve the sme resistnce independent of input while t the sme time to lod on signl nd its inverse must e the sme forcing the use of such structure.

35 4.8 clk clk 4.8 crry out crry out 1.6 1.6 1.8 1.8 clk Figure 32: Crry genertion of hlf dder in differentil domino 4.8 clk clk 4.8 sum sum 1.6 1.6 cout cout 1.8 cout 1.8 clk Figure 33: Sum genertion stge of hlf dder in differentil domino

36 4.8 clk clk 4.8 crry crry out out 1.6 1.6 cin cin clk Figure 34: Crry genertion stge of full dder in differentil domino 4.8 clk clk 4.8 sum sum 1.6 1.6 cout cin 4.8 cout cin 4.8 3.2 3.2 cin 3.2 4.8 3.2 3.2 cin 3.2 4.8 4.8 4.8 clk 4.8 Figure 35: Sum genertion stge of full dder in differentil domino

37 Due to the dul nture of differentil domino, oth n input nd its inverse must e ville cusing the numer of input nd output its for the dder to increse. inputs 32 its 8-it 18 its dder clock 1 clock Figure 36: The test ench of differentil domino 4 3 Voltge (V) 2 1 0 1 0 2 4 6 8 10 Time (ns) Figure 37: Output nd clock (dotted) under one cycle in differentil domino This is n exmple of the output of one of the lter stges. As in dynmic CMOS the prechrge phse is when the clock is low which chrges the output to low (internlly high). When the clock goes high nd the circuit enters evlution we see the output tking on its vlue. Due to the fct tht the pull-down network consists of NMOS devices nd tht oth n output nd its inverse is prechrged low the circuit cnnot egin evlution until one is high. As opposed to dynmic nd sttic CMOS where n incorrect vlue could pper efore the crry hs rippled through, differentil domino logic does not strt to evlute until the crry hs rippled through to this stge. This mens tht there is lg etween the ltter sum its nd the clock which is the dely of the circuit.

39 10 Voltge (V) 5 0 0 2 4 6 8 10 Time (ns) Figure 39: The men current nd the stndrd devition (dshed) of differentil domino One cn see the low stndrd devition in the prechrge nd dt chnging peks while the devition t evlution is of out the sme mgnitude s the sttic nd dynmic CMOS. Tle 4: Differentil domino properties Dely (ns) Power consumption (mw) Trnsistor re ( µm 2 ) 1.68 3.03 410 5.8 CRSABL Chrge Recycling Sense Amplifying Bsed Logic ws designed to minimize the informtion lekge of logic gtes. It looks it like differentil domino ut hs some dded structure to lower the power consumption nd further decrese the lekge. One design rule tht hs to e followed when designing the pull-down networks of CRSABL is tht ll internl nodes in the network must, during evlution, e connected to one of the output pths or ground. This is not ig limittion for the hlf dder ecuse it only hs two inputs, ut in the full dder the networks ecome lot more complex thn the differentil domino ones. To hve more dulity in the pull-down networks of the sum genertion they were modified to not include the crry output. The sizing of the trnsistors in the pulldown networks follow the sme principle s efore, they must hve the sme pull-down strength of trnsistor with width of 1.2 µm. The sum genertion of the full dder is specil cse where only five trnsistors will e leding in series t the sme time. This give them size of 1.2 µm times 5 which equls 6.0 µm. The pull up pths were sized with the lod of the next stge in mind. The crry in is connected to totl trnsistor width of 7.2 µm + 4.8 µm + 6.0 µm + 6.0 µm = 24.0 µm. A minimum sized inverter hs

40 size of 1.2 µm + µm = 4.8µm nd cn drive three gtes of the sme kind. The totl width of the minimum inverter is 14.4 µm. 24/14.4 is 1.67 which is the rtio the pths must e sized with to hve the sme fll nd rise times. This gives two trnsistors in series with 7.2 µm *1.67 = 12.0µm ech nd the pull-down trnsistor of the externl node eing sized ccordingly. There is n NMOS trnsistor which connects the two pull-down pths nd the signl V which is connected to its gte is generted with the following circuit. If it ws crry genertion circuit then the different inputs would e the crry output nd its inverse insted of the sum. sum sum 1.2 1.2 7.2 sum V sum 7.2 Figure 40: Genertion of voltge V clk crry out 2.0 12.0 12.0 internl 12.0 4.8 internl 12.0 4.8 internl V internl 1.2 2.0 crry out 4.8 4.8 4.8 2.4 2.4 clk 4.8 Figure 41: Crry genertion of hlf dder in CRSABL

41 clk sum 2.0 12.0 12.0 internl 12.0 4.8 internl 12.0 4.8 internl V internl 1.2 2.0 sum 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 clk Figure 42: Sum genertion of hlf dder in CRSABL

42 clk crry out 2.0 12.0 12.0 internl 12.0 4.8 internl 12.0 4.8 internl V internl 1.2 2.0 crry out Pull-down network 1 2 Pull-down network clk 4.8 Figure 43: Crry genertion stge of full dder in CRSABL

43 7.2 7.2 cin 4.8 4.8 cin 4.8 7.2 7.2 4.8 Pull-down network 1 7.2 7.2 cin 4.8 4.8 cin 4.8 7.2 7.2 4.8 Pull-down network 2 Figure 44: The two pull-down networks of the crry genertion stge

46 The sme kind of peks s ws seen in differentil domino re here s well, with the evlution, prechrge nd dt chnge peks. The prechrge nd dt chnge peks look the sme eing compct nd not very dt dependnt. Becuse the evlution time is greter for CRSABL thn differentil domino the evlution pek is more spred out, the first stge follows the sme ehvior ut the following stges strt to evlute only when dt is redy which vries in time nd spreds out the pek. 10 Current (ma) 5 0 0 2 4 6 8 10 Time (ns) Figure 49: The men current nd the stndrd devition (dshed) of CRSABL Tle 5: CRSABL properties Dely (ns) Power consumption (mw) Trnsistor re ( µm 2 ) 5.9 DyCML 3.28 6.88 1069 The Dynmic Current Mode Logic ws never intended to minimize ny informtion lekge, ecuse of the clocking scheme tht hs to e used it still interesting in this comprison s guidnce to etter solution. DyCML pull-down networks should look the sme s the ones of differentil domino ut since we hve to use more dvnced clocking techniques when one gte must strt to evlute fter the previous some chnges were mde. The pull-down networks of the sum genertion were chnged to not use the generted crry signl of the current stge such s in CRSABL. The sizing of ll trnsistors except the virtul ground cpcitnce ws done using the sme principle s efore. The virtul ground cpcitnce ws sized so to give the voltge swing of 0.66 V on the output which is 20% of V dd. According to [10] the formul for sizing the trnsistor is. W L = V swing C L ---------------------------------------------------------- C ox ( V dd V swing )

47 C L is the lod cpcitnce tht in this cse is our next stge, C ox is the gte oxide cpcitnce per unit re, W nd L is the trnsistor width nd length which is wht we seek. Simultions hve showed chnging the trnsistor width did not ffect the voltge swing much nd tht size of 2.5 µm gve good simultion results. clk clk cout cout 1.8 1.8 clk 2.5 1.2 1.2 Figure 50: Crry genertion of hlf dder in DyCML

48 clk clk sum sum clk 1.2 1.2 2.5 Figure 51: Sum genertion of hlf dder in DyCML

49 clk clk cout cout cin cin clk 2.5 1.2 1.2 Figure 52: Crry genertion of full dder in DyCML

50 clk clk sum sum 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 cin 4.8 4.8 cin clk 4.8 2.5 1.2 1.2 Figure 53: Sum genertion of full dder in DyCML The test ench of the DyCML simultion looks the sme s in the other differentil cses, the inverters here re not sttic inverters ut inverters implemented in DyCML to give relistic signls nd lods. The different clocks used internlly ws generted inside the dder lock nd ws included in the power consumption. inputs 32 its 8-it 18 its dder clock 1 clock Figure 54: The test ench of DyCML

51 4 3 Voltge (V) 2 1 0 1 0 2 4 6 8 10 Time (ns) Figure 55: Output nd clock (dotted) under one cycle in DyCML In this typicl output signl one cn clerly see the ig difference etween the DyCML nd the other differentil logic styles. Since we hve reduced swing on the output we use the complex clocking scheme to enle the evlution of the next stge. We lso see the difference in the clock, it does not hve 50% duty cycle compred to the other clocked logics so s to ssure tht the output is stle for longer time t the end of the evlution. This does not ffect the functionlity, it only llows the circuits to evlute for longer time.