Vectorless Analysis of Supply Noise Induced Delay Variation

Vectorless Analyss of Supply Nose Induced Delay Varaton Sanjay Pant *, Davd Blaauw *, Vladmr Zolotov **, Savthr Sundareswaran **, Rajendran Panda ** {spant,blaauw}@umch.edu, {vladmr.zolotov,savthr.sundareswaran,rajendran.panda}@motorola.com * Unversty of Mchgan, Ann Arbor, MI **Motorola, Inc., Austn, TX Abstract The mpact of power supply ntegrty on a desgn has become a crtcal ssue, not only for functonal verfcaton, but also for performance verfcaton. Tradtonal analyss has typcally appled a worst case voltage drop at all ponts along a crcut path whch leads to a very conservatve analyss. We also show that n certan cases, the tradtonal analyss can be optmstc, snce t gnores the possblty of voltage shfts between drver and recever gates. In ths paper, we propose a new analyss approach for computng the maxmum path delay under power supply fluctuatons. Our analyss s based on the use of superposton, both spatally across dfferent crcut blocks, and temporally n tme. We frst present an accurate model of path delay varatons under supply drops, consderng both the effect of local supply reducton at ndvdual gates and voltage shfts between drver/recever pars. We then formulate the path delay maxmzaton problem as a constraned lnear optmzaton problem, consderng the effect of both IR drop and LdI/dt drops. We show how correlatons between currents of dfferent crcut blocks can be ncorporated n ths formulaton usng lnear constrants. The proposed methods were mplemented and tested on benchmark crcuts, ncludng an ndustral power supply grd and we demonstrate a sgnfcant mprovement n the worst-case path delay ncrease. 1 Introducton Power supply networks are essental n provdng the devces on a de wth a relable and constant operatng voltage. Due to the nterconnect resstance and nductance of the on-chp and package supply networks, the supply voltage delvered to varous devces on a de s non-deal and exhbts both spatal and temporal fluctuatons. These fluctuatons n the suppled voltage can result n a reducton n operatng frequency and can compromse the functonal stablty. Power supply ntegrty s therefore a crtcal concern n hgh-performance desgns. The voltage drop that develops n a supply network can be broadly classfed nto IR-drop, whch s the voltage drop due to the parastc resstances of the nterconnects and LdI/dt drop, whch s the voltage drop due to the nductance of I/O pads and the parastc nductance of the supply nterconnects. In today s hgh-end desgns, t s not uncommon for the supply network to conduct as much as 50-100 Amperes of total current [1,6]. As semconductor technology s scaled and the supply voltage s reduced, the total current that must be suppled by the power network s expected to ncrease even further, makng t more dffcult to meet strngent supply ntegrty constrants. In partcular, the LdI/dt voltage drop s expected to become more promnent as t worsens wth both ncreasng current demand and clock frequency [2]. Furthermore, IR-drop and LdI/dt drop nteract n a non-trval manner and total drop s not always the sum of the ndvdual voltage drops. The voltage fluctuatons n a supply network can nject nose n a crcut, leadng to functonal falures n the desgn. Extensve work has therefore been focussed on modellng and effcent analyss of the worse-case voltage drop n a supply network [2-7]. However, wth decreasng supply voltages, the gate delay s becomng ncreasngly senstve to supply voltage varaton as the headroom between V dd and V t s consstently reduced [12]. For nstance, n 0.13µm technology, a 10% varaton n the V dd and Gnd voltages can result Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. ICCAD 03, November 11-13, 2003, San Jose, Calforna, USA. Copyrght 2003 ACM 1-58113-762-1/03/0011...$5.00. n a 30% delay varaton for typcal gates. Wth ever dmnshng clock cycle tmes, accurate analyss of the supply voltage mpact on crcut performance has therefore become a crtcal ssue. In ths paper, we present a new approach for the analyss of supply voltage nduced delay varatons. Power supply analyss has been complcated by the enormous sze of the supply network. For modern processors, t s not uncommon for the supply network to be represented by an RLC crcut requrng more than 60 mllon elements. Smulaton of such a large crcut s extremely challengng and sgnfcant progress has been reported n developng effcent smulaton approaches [3,5,7]. However, even wth effectve acceleraton methods, t s typcally not possble to smulate a supply network for more than a handful of clock cycles n reasonable tme. Selectng the smulaton vectors that exhbt the worst-case supply voltage drops s therefore a key ssue n supply network verfcaton. The supply voltage fluctuaton s strongly dependent on the smulaton vectors that determne the currents drawn by the devces from the supply network. Hence, crtcal supply ntegrty problems can go undetected f worst-case smulaton vectors are not appled, regardless of the smulaton accuracy. A number of methods have therefore been proposed that use Genetc Algorthms or other search methods to automatcally fnd vectors that maxmze the total current drawn from the supply network [8,10]. These approaches typcally are computatonally ntensve and are lmted to crcut blocks, rather that full chp analyss. In addton, a number of vectorless approaches for constructng worstcase currents have been proposed usng ether propagaton of tmng wndows [8] or constrant graph formulatons [11]. Vectorless approaches have the advantage that they are conservatve, meanng that the supply drop wll be overestmated, rather than underestmated. However, these methods address only statc IR-drop analyss, and not LdI/dt drop, whch has become a key concern n supply ntegrty analyss. Also, they do not consder the mpact of supply fluctuatons on delay. Recently, a statstcal approach for analyzng the mpact of supply nose on delay was also presented [14]. Power supply varaton can mpact the crcut delay n two ways: Frst, a reduced supply voltage lessens the gate drve strength, thereby ncreasng the gate delay. Second, a dfference n the supply voltage between a drver and recever par creates an offset n the voltage wth whch the drver/recever gates reference the sgnal transton. Ths has the effect of creatng ether a postve or negatve tme shft n perceved sgnal transton at the recever gate, as llustrated n Fgure 1. Ths dual nature of the supply voltage mpact on crcut delay was observed n [13], and complcates the generaton of smulaton vectors that maxmze the delay along a partcular crcut path. Increasng the voltage drop at a partcular locaton may worsen the delay of one gate whle mprovng the delay of another. Therefore, a vector must be determned that results n an optmal combnaton of the often conflctng goals to ntroduce both reduced drve strengths and supply voltage shfts such that the total delay along a path s maxmzed. Tradtonally, the mpact of supply nose on delay has been accounted for by reducng the operatng voltage of all lbrary cell by the expected supply voltage drop durng lbrary characterzaton. Ths assumes that the worst-case expected voltage drop occurs n all places of the desgn. Ths yelds a very conservatve analyss snce, n practce, the worst-drop can occur n only a small regon at any 184

one pont n tme. On the other hand, ths approach gnores the mpact of voltage shfts between drver/recever pars, thereby possbly underestmatng the worst-case delay n certan stuatons. Also, t only accounts for statc IR-drop. In ths paper, we therefore present a new approach for the analyss of power supply drops on crcut delay. The proposed approach s vector-less, allowng for effcent analyss, and addresses for both IR-drop and LdI/dt drop effects. We develop a lnear model that accounts for both the mpact of drver strength reducton and voltage shfts between drver/recever pars. Based on ths model, we formulate the task of determnng the worst-case mpact of supply nose on a path delay usng a constraned lnear optmzaton model where the currents of the dfferent blocks are the optmzaton varables. We use both spatal and temporal super-poston of the voltage drops resultng from currents of ndvdual crcut blocks. Lnear constrants are then formulated both for the total power consumpton of a chp, as well as for ndvdual block currents. Constrants between currents of dfferent blocks or a sngle block n consecutve clock cycles can be formulated expressng both spatal and temporal correlatons that exst between crcut blocks. The proposed approach has the advantage that accurate constrants can be extracted from extensve gate level smulaton data that s readly avalable durng the desgn process, thereby sgnfcantly mprovng the accuracy of the analyss whle avodng the need for lengthy and tme consumng power grd smulaton. We mplemented the proposed methods and tested them on benchmark crcuts, ncludng a power grd from an ndustral processor desgn. We show that the tradtonal analyss may overestmate the change n delay of a path by more than 50% and demonstrate the effectveness of our analyss. The remander of ths paper s arranged as follows. Secton 2 descrbes our model for delay varatons wth respect to supply voltage fluctuatons. Secton 3 presents the problem formulatons and optmzaton method for maxmzng the mpact of power grd fluctuatons on delay. Secton 4 presents the results obtaned for dfferent power grds. In Secton 5, we draw our conclusons. 2 Delay Model for Supply Fluctuatons In ths Secton, we present our approach for modelng the mpact of voltage varatons on the delay of a crcut path. Snce the voltage varatons n a power grd are typcally very slow compared to the transton tme of a swtchng gate [15], we can make the smplfyng assumpton that the supply voltages are constant durng the swtchng transtons. From the perspectve of the path delay, we are therefore concerned wth the mpact of fxed voltage offsets from the nomnal V dd and V ss voltages on the delay of a crcut path. Note however that dynamc IR-drop and LdI/dt drop effects wll be the cause of these voltage offsets. A voltage drop at a power supply pont can mpact the delay of a gate through one of the followng two mechansms: 1. A decrease n the V dd voltage or an ncrease n the V ss voltage at the gate under consderaton decreases the locally observed supply voltage of the gate and wll reduce ts drve strength and hence ncrease ts delay. The worst case voltage drop s typcally localzed to a small regon n the chp, as t requres all currents to be concentrated n that regon. Hence, only a few gates n a path wll typcally be operated wth a worst-case drve strength. Gates wth hgher local supply voltage therefore compensate for the ncreased delay of gates wth reduced local supply voltage n the path and a global analyss of the mpact of supply voltage on the path delay s therefore requred. 2. A relatve shft n the V dd or V ss voltages between the drver and recever gates of a sgnal net can ntroduce a voltage offset that wll mpact the delay of a gate. Ths s llustrated n Fgure 1 V ssdrver V dddrver V ssdrver V ddrecever S n S n V ddrecever S out V ssrecever V dddrver S out V ssrecever Fgure 1. A drver-recever par n a non-deal supply network where the V ss voltage of the recever gate s ncreased relatve to the V ss voltage of the drver gate. Snce the nput sgnal has a rsng transton, the NMOS transstor of the recever gate senses the nput voltage relatve to the local V ss voltage level. The shown voltage shft therefore results n an effectve (negatve) nose voltage at the recever gate nput that ncreases the delay of the recever gate. Note that a shft n the supply voltage mpacts the rsng and fallng transtons of a gate n opposte ways, meanng that an ncrease n the V ss voltage from drver to recever results n an ncreased delay for a rsng nput transton whle an ncrease n the V dd voltage mproves the delay for a fallng nput transton. The relatve shft between the drver and recever gates s lkely to be larger f the gates are separated farther apart as compared to the case when they are closer together. Therefore, nets that transmt sgnals across the chp wll have a hgher lkelhood of shfts n supply voltage between ther drver and recever par and hence are more susceptble to power grd nose. The relatve magntude of the above two mechansms depends on the nput slope and output loadng of a gate. The senstvty of gate delay to drver strength reducton wll ncrease wth output loadng, whle the senstvty to voltage shfts wll ncrease wth slower nput sgnal transton tmes. In order to maxmze the delay of a path, t s necessary to nduce voltage drops n the supply network such that the delay of each gate s ncreased through both mechansms: reducton of drver strength and voltage shfts between successve gates n the path. A possble voltage assgnment that maxmzes the voltage shft between consecutve gates n a crcut path s shown n Fgure 2. However, ths V dd,0 V dd_max V dd_mn V ss_max V ss_mn V ss,0 Vdd,1 V ss,1 V dd,2 V ss,2 Vdd Supply Network V dd,3 1 2 3 n V ss,3 Gnd Supply Network V ss,3 V dd,n Fgure 2. A path n a power supply network wth worst-case voltage shfts assgnment does not reduce the drver strength of each gate by the maxmum possble amount. Maxmzng the delay through reduced drve strength and through voltage shfts therefore, requres conflctng voltage assgnments that cannot be realzed smultaneously. A worst-case realzable voltage assgnment that maxmzes the overall path delay s therefore not ntutvely obvous and wll depend on the specfc condtons of the gates and ther senstvtes to the dfferent 185 V dd,1 V ss,1 V dd,2 V ss,2 V dd,3 V dd,n V ss,n V ss,n

voltage drop phenomena. We now present our model for the dependence of the delay of a sngle gate on the voltage drops at that gate and at ts precedng gate. We then extend ths model to the delay of a crcut path. Indvdual gate delay model We consder the delay of a gate G, shown n Fgure 3(a), wth shows the rse and fall delays of a typcal gate as V dd,g, V ss,g, V dd,n V dd,n V dd,g F G (a) (b) V ss,n (a) V ss,g V dd,nomnal V dd,n V dd,nomnal/2 τ V dd,g V ss,g V dd,nomnal 0.9V dd,nomnal t r (b) V ss,n (c) Fgure 3. A Drver-recever par n a non-deal supply network local supply voltages V dd,g and V ss,g and supply voltages V dd,n, V ss,n at the precedng drver gate. As shown n Fgure 3(b), the propagaton delay τ between the nput and output transtons of a gate s measured at 1/2 the nomnal supply voltage pont to ensure a common reference between successve gates. The delay of the recever gate depends on the V dd,g and V ss,g voltages at the recever gate tself, the voltages V dd,n, V ss,n at the precedng drver gate, the nput transton tme and the output load. For the purpose of our dscusson, we consder a fxed output load, although n our actual mplementaton gates are characterze over a range of output loads. The nput transton tme at gate G s a functon of the delay of the precedng gate F, whch, n turn, s a functon of the supply voltages. It s therefore necessary to nclude the mpact of the supply voltage fluctuatons on the sgnal transton tmes n the delay model. To provde a common reference for transton tme, we agan defne the transton tme t r of a sgnal as the tme between the 10% to 90% crossng of nomnal Vdd for an equvalent full swng transton, as shown n Fgure 3(c). Gven the sgnal transton at the output of gate G, and gven the local transton tme t r, measured between the 10% to 90% crossng of the local supply voltage V ss,g to V dd,g, the equvalent full-swng transton tme t r s computed as follows: V dd, nomnal t r = t r ------------------------------- (EQ 1) V dd, g V ss, g We now express the delay and transton tme at the output of gate G as follows: τ = fv (, dd, g V, ss, g V, dd, n V, ss, n t rn ) (EQ 2) t r, out = gv (, dd, g V, ss, g V, dd, n V, ss, n t rn ) (EQ 3) In general, f and g are nonlnear functons of ther varables. However, the voltage drop n a power grd network s restrcted and s typcally wthn the range of ±10% of V dd,nomnal. We found that wthn ths range, the delay of a gate s close to lnear. Fgure 4 V dd,g 0.9(V dd,g -V ss,g )+V ss,g 0.1(V dd,g -V ss,g )+V ss,g 0.1V dd,nomnal V ss,g t r (c) (d) Fgure 4. Varaton of rse/fall propagaton delays of a gate wth respect to (a) V dd,g, (b) V ss,g, (c) V dd,n and (d) V ss,n and V ss,n are vared by ±20 %. The delay curves n Fgure 4 show that f and g can be accurately modeled as lnear functons for reasonable supply voltage varatons. We therefore express the change n delay, τ of a gate wth respect to ts delay at nomnal supply voltages as follows: τ= a 1 V dd, g + a 2 V ss, g + a 3 V dd, n + a 4 V ss, n + a 5 t r, n (EQ 4) where V dd,g, V ss,g, V dd,n, and V ss,n are the devaton of the four supply voltages from ther nomnal values and t r,n s the change n the nput transton tme from ts nomnal value. Smlarly, we express the change n the transton tme t r,out at the output of a gate wth respect to ts transton tme at nomnal supply voltages as follows: t rout = b 1 V dd, g + b 2 V ss, g + b 3 V dd, n + b 4 V ss, n 186 Table 1. Low-to-Hgh Propagaton delay Regresson Results V dd, g V ss, g V dd, n V ss, n t r,n Rse Tme (ps) Low-to-Hgh Delay Reg. spce % Error 0.10V 0.10V 0.10V 0.10V 50 11.19ps 11.72ps 4.5% 0.05V 0.00V -0.10V 0.05V 50 16.71ps 16.34ps 2.3% 0.00V 0.05V -0.10V 0.00V 75 19.12ps 19.48ps 1.85% 0.00V -0.10V -0.05V 0.10V 75 28.17ps 26.97ps 4.45% -0.05V -0.10V 0.10V -0.10V 100 33.43ps 33.54ps 0.33% -0.05V 0.00V 0.05V -0.05V 100 27.31ps 27.42ps 0.40% -0.10V 0.10V -0.05V -0.10V 125 26.89ps 26.17ps 2.75% -0.20V -0.05V -0.10V 0.20V 125 48.04ps 44.98ps 6.80% + b 5 t r, n (EQ 5) The constants a 1 -a 5 and b 1 -b 5 are determned usng multple regresson analyss where each gate s smulated over a range of supply voltage varatons and rse/fall transton changes. Table 1 compares the delay values determned usng our lnear model wth delay values obtaned through SPICE smulaton for a low to hgh propagaton delay of an nverter n 0.13 mcron technology wth a nomnal power supply of 1.2V. Dfferent combnatons of maxmum supply voltage varatons are shown. We also compared the accuracy of the

proposed delay model for more than 3000 randomly generated voltage and transton tme varatons of ±10 %, whch resulted n a average error of 0.74% and maxmum error of 8.1%. It should be noted that whle we lnearly model the change n delay due to supply voltage varatons, the nomnal delay tself s not a lnear functon of output load and nomnal nput transton tme. We therefore used a non-lnear, table based model, smlar to that used n Synopsys Desgn Compler, to model the dependence of nomnal delay and output transton tme on output load and nomnal transton nput tme. For each possble load and nput transton tme condton, we also determned dfferent lnear fttng constants a 1 -a 5 and b 1 -b 5, whch are stored n a table along wth the nomnal delay and output transton tme values. Crcut path delay model We now consder the varaton of the delay, τ Path of a crcut path due to supply voltage varatons at dfferent supply connectons along a path as shown n Fgure 2(a). In general, the change n the delay of the nth gate s gven by: τ n = a 1, n V dd, n + a 2, n V ss, n + a 3, n V dd, n 1 + a 4, n V ss, n 1 + a 5, n t rn, 1 and the change n ts output transton tme s gven by: t rn, = b 1, n V dd, n + b 2, n V ss, n + b 3, n V dd, n 1 (EQ 6) + b 4, n V ss, n 1 + b 5, n t r, n 1 (EQ 7) where a,n, b,n are the regresson coeffcents for gate n; V dd,n, V ss,n are the supply voltage drops at gate n; V dd,n-1, V ss,n-1 are the supply voltage drops at ts drver gate, n-1. The delay of gate n s therefore defned n terms of the change of the output transton tme of gate n-1, leadng to a recursve defnton of the overall path delay. The total delay change of a crcut path, τ Path, s the sum of the changes of the gate delays along the path and s expressed as follows: n τ Path = a 1, V dd, + a 2, V ss, + a 3, V = 1 dd, 1 + a 4, V ss, 1 + a 5, t r, 1 where, t r, = b 1, V dd, + b 2, V ss, + b 3, V dd, 1 (EQ 8) + b 4, V ss, 1 b 5 t r, 1, (EQ 9) and where n s the number of gates n the crcut path. For smplcty of our dscusson, we assume an deal transton between 0V and nomnal V dd at the nput of the path, and hence, V dd, 0 = V ss, 0 = t r, 0 = 0 (EQ 10) However, the analyss can be easly extended to account for nondeal nput sgnal transtons. Equatons 8 and 9 model the change n the delay of a path as a lnear functon of supply voltages at the ndvdual gate connectons. In the next secton, we propose a method to express these supply voltages as a lnearly functon of block currents and formulate the problem of maxmzng delay as a lnear optmzaton problem. 3 Maxmum Delay Varaton Formulaton We now dscuss how the supply voltages can be expressed as a lnear functon of the current sources usng both spatal and temporal superposton and accountng for both IR-drop and LdI/dt drop. We then show how the problem of maxmzng delay change for a crcut path can be formulated as a lnear optmzaton problem wth lnear constrants. We consder a power supply network composed of RLC elements, current sources and voltage sources. We frst consder an ndependent current source m (t), appled at node m, and denote the voltage response generated at any node n due to the current m (t) as V m,n (t). Gven a set of current sources m (t), the response at any node n n the crcut due to ths set of current sources actng together s the summaton of all the responses at node n caused by the ndvdual current sources: V n () t = V m, n () t for all m (EQ 11) m Ths s the well known prncple of superposton, appled spatally across the dfferent current sources of a supply network. However, V n (t) n EQ11 depends on the entre current waveform m (t), and requres that the entre current waveform s smulated for each current source. Ths complcates the formulaton of the delay maxmzaton problem snce the number of possble current waveforms m (t) can be very large and enumeratng all possbltes would be mpossble. We therefore approxmate an arbtrary current waveform m (t) usng a pece-wse constant waveform wth a dscretzaton of tme nto tme steps T s, as shown n Fgure 5(a). Gven the m (t) m (t) T s (a) T m t (b) Fgure 5. Temporal dscretzaton and superposton approach. total duraton T m of waveform m (t) and the tme step sze T s, the number of dscretzatons S s gven by: T m = T s *S. If the dscretzaton tme step T s s chosen suffcently small, the pece-wse constant approxmaton of the contnuous waveform has neglgble error. We now represent the pece-wse constant current waveform as the sum of a seres of current pulses of duraton T s, each shfted n tme by one tme step, as shown n Fgure 5(b) and expressed as follows: S 1 m () t = I m, pt ( T s ) (EQ 12) = 0 where, p(t) = 1, f 0 < t < T s = 0, otherwse. and I m, s the magntude of the pece-wse constant approxmaton of current pulse m (t) n the nterval T s to (+1)T s. Conceptually, we can therefore replace each current source m (t) at node m wth a set of S current pulse sources m, (t) connected to the same node n the grd. Note that each current pulse m, (t) s a scaled and shfted verson of the unt current pulse u (t) wth a unt pulse heght and a pulse wdth of T s : 187 u (t) = 1, f 0 < t < T s (EQ 13) 0, otherwse T m t

Due to the nature of a power supply network, the voltage response Vn u () t at node n due a sngle unt current pulse u (t) wll reach steady-state and approach the nomnal supply voltage gven suffcent tme. The dfference of the voltage Vn u () t at node n from the nomnal supply voltage V dd,nomnal therefore approaches zero gven suffcent tme. We assume that ths voltage dfference has dmnshed below a specfed error threshold at tme T k = K * T s. Snce any fnte length current waveform m (t) can be represented by a fnte set of current pulse sources, we can compute the voltage response V n (t) at node n by summng the response from each of the ndvdual current pulse sources, usng lnear superposton. However, snce the power supply network s lnear, the response resultng from each current pulse s smply a shfted and scaled verson of the response Vn u () t resultng from a unt current pulse. We can therefore express the change n the voltage response V m,n (t) from the nomnal supply voltage due to the current source m (t) as follows: K 1 V m, n () t = Vn u ( t T s )I = 0 m, (EQ 14) where I m, s the magntude of the pece-wse constant current waveform approxmated n nterval T s to ( + 1)T s. Usng superposton n ths temporal manner, we can therefore compute the response of any node n the network due to an arbtrary current source m (t) usng a sngle smulaton of a unt current pulse and combnng scaled and shfted versons of ths response, usng EQ14. The only approxmaton n ths approach arse from the pecewse constant approxmaton of the current waveform and the fnte smulaton length of the unt current pulse response. Gven a suffcently fne gran dscretzaton and suffcent smulaton length of the unte current pulse response, arbtrary accuracy can be obtaned. Also, the computatonal complexty grows lnearly wth respect to the unt pulse response smulaton length T k and the number of dscerzatons S of the current waveform m (t). Typcally, the length T m of waveforms m (t) wll be much greater than the unt pulse response tme T k. Snce the smulaton tme of the supply network wll by far domnate the run tme effort, the proposed approach wll provde a speedup of approxmately T m /T k compared to smulatng the entre current waveform m (t). It should also be noted that the current waveform m (t) can be approxmated not only by a sequence of square current pulses, but also by other current pulse shapes, usng a smlar analyss. Fnally, we combne the temporal superposton wth spatal superposton to obtan the voltage fluctuaton V n (t) at a node n due to a set of arbtrary current sources m (t) at nodes m as follows: M 1 K 1 u V n () t = V m, n ( t T s )I (EQ 15) m = 0 = 0 m, where M s the number of current sources and K s the number of dscretzatons of the unt current voltage response. V u m, n() t s the dfference n the voltage response at node n from nomnal supply resultng from a unt current source at node m. I m, s the magntude of the current source m at tme t. The proposed formulaton requres that each current source s smulated, n turn wth a unt current pulse for a smulaton perod of T n and the voltage responses V u m, n() t are recorded at all nodes of nterest. The formulaton of EQ15 has the advantage that t s lnear n terms of the current values m (t) and hence allows the delay maxmzaton problem to be cast as a constrant lnear optmzaton problem as explaned n the followng Secton. Delay maxmzaton formulaton We apply the above formulaton to the problem of delay maxmzaton, usng a lnear optmzaton formulaton wth the current values as optmzaton varables. We frst dvde the chp nto crcut blocks and smulate the mnmum and maxmum currents of each crcut block usng Powermll or Verlog smulatons or estmate them on the bass of a prevously fabrcated part. In a mcroprocessor desgn, these crcut blocks could be, for example, the nstructon fetch stage, nstructon decode stage, execute stage, caches and the man memory control unts. We make the smplfyng assumpton that the total current n a crcut block s evenly dvded among ts power supply ponts. Ths has the advantage that the voltage senstvtes, () t can be computed wth respect to the total current of a crcut block, nstead of wth respect to each ndvdual current source pont n a crcut block. Ths therefore greatly reduces the number of optmzaton varables n our formulaton and mproves ts effcency. V u n When selectng crcut blocks, t s therefore mportant that each block s suffcently small to ensure that the spatal dstrbuton of the currents wthn a crcut block do not sgnfcantly mpact the voltage response. For hgh-performance processors, wth tght and unform supply grds over multple layers of metal, the spatal dstrbuton of the total block current s typcally not sgnfcant for moderate sze blocks [17]. If however, necessary, the proposed approach can be extended for non-unform current dstrbutons. It s also desrable that crcut blocks are selected such that ther currents are ndependent, reducng the need to ncorporate constrants between the currents of dfferent blocks n the delay maxmzaton formulaton. The current waveform for a crcut blocks typcally has an approxmately trangular shape wthn an clock cycle, as shown n Fgure 6, reflectng a hgher swtchng actvty at the start of the clock cycle then at the end of the clock cycle [16]. We currently approxmate the current waveform for a crcut block n a sngle clock cycle wth a trapezodal waveform, as shown n Fgure 6. We Block Current Pece wse lnear current waveform appled Actual block current waveform T s 2T s 3T t s 4T s 5T s 6T s Fgure 6. Current modelng for crcut blocks then set the step sze T s n the superposton formulaton equal to one clock perod and approxmate the total block current as the sum of shfted and scaled trapezodal current pulses. Ths results n a pecewse lnear approxmaton of the total block current, as shown wth darkened lnes n Fgure 6. However, our approach s not restrcted to a specfc current profle and dfferent current profle approxmatons could be used as well. The block current wthn a clock cycle may vary not only n magntude but also n shape wth dfferent nput data. Some nput vectors wll result n more swtchng actvty at the start of the cycle, whle other nput vectors may result n more swtchng actvty at the end of the cycle. However, wth the scalng of process technology, the clock frequency has ncreased sgnfcantly whle the resonance frequency of the supply network has steadly decreased. For a 1-2Ghz processor, typcal resonance frequences of the power supply network are n the range of 30-80Mhz [15]. Any change n the shape of the current waveform wthn a sngle clock cycle therefore mpacts 188

frequences that are well above the resonance frequency of the power dstrbuton network and have lttle mpact on the voltage waveforms. Ths s llustrated n Fgure 7, where the voltage Response to Trangular current waveform Response to Trapezodal current waveform Fgure 7. Varaton of voltage at a node n the power grd wth dfferent clock cycle waveform shapes. response of a node n the grd resultng from two dfferent block current waveform shapes wth equal total charge, s shown. One waveform uses a trangular current waveform shape and the other waveform uses the trapezodal approxmaton, as shown n Fgure 6. The smulatons show that the response of the voltage s nearly ndstngushable. Note that, f necessary, the proposed approach can be extended such that each clock cycle s dvded nto multple tmesteps and s represented wth a seres of consecutve current pulses, allowng for dfferent waveforms wthn a clock cycle. Based on Fgure 6, we also observe that the voltage response V u m, n() t wthn a clock cycle s nearly constant and can be approxmated wth a fxed voltage value V u m, n,. Based on EQ15 we now express the voltage varaton of a Vdd node n as a functon of the current m (t) of crcut block m as follows: uv dd M 1 S 1, V dd, n = V m, n, I (EQ 16) m = 0 = 0 m, S where I m, s the average current of the crcut block m n clock cycle uv, dd and V m, n, s the senstvty of the V dd voltage node n wth respect to the current of block m after clock cycles of delays. Smlarly, we express the voltage varaton of a V ss node as: M 1 S 1 uv, ss V ss, n = V m, n, I = = 0 m, S uv ss m 0 (EQ 17), where V m, n, s the senstvty of the V ss node n wth respect to the current of block m after clock cycles of delays. We now formulate the problem of maxmzng delay as a lnear optmzaton problem as follows: Maxmze: n τ Path = a 1 V dd, + a 2 V ss, + a 3 V = 1 dd, 1 + a 4 V ss, 1 + a 5 t r, 1 (EQ 18) such that: M 1 S 1 uv, dd V dd, n = V m, n, I = = 0 m, S m 0 M 1 S 1 uv, ss V ss, n = V m, n, I = = 0 m, S m 0 (EQ 19) (EQ 20) t r, = b 1 V dd, + b 2 V ss, + b 3 V dd, 1 + b 4 V ss, 1 (EQ 21) I mn, I m, I max, N = 1 I m, I peak (EQ 22) (EQ 23) The constrant n EQ22 expresses that the current of a block must have a value between ts maxmum and mnmum possble value, as determned from Powermll or Verlog smulaton. The constrant n EQ23 forces an upper-bound on the total current of the chp. Ths expresses that, whle ndvdual blocks may vary dramatcally from cycle to cycle, the total power of the chp typcally has a well known maxmum current consumpton. Ths upper-bound on the total current can be computed usng ether chp-level Verlog smulaton or by scalng the maxmum power of a smlar desgn n an older technology. Other constrants expressng dependences between dfferent crcut blocks or expressng dependences between dfferent clock cycles can be added as well usng lnear nequaltes, as explaned n the followng Secton. uv, dd uv, ss To compute V m, n, and V m, n,, a unt trapezodal current source waveform s, n turn, appled at each crcut block and the voltage drop of all nodes s measured for S subsequent clock cycles, tll the voltage drop becomes nsgnfcant. Ths s a tme consumng step but for typcally processor desgn at most a few tens of crcut blocks are requred and the smulaton s performed only once for each crcut block, after whch the results can be reused for the analyss of any number of crcut paths. The optmzaton n EQ18 through EQ23 s mplemented usng a CPLEX lnear optmzaton package. For typcal power grds, the number of varables s of the order of thousands of varables, whch can be easly solved usng standard lnear soluton methods. Fnally, we note that the optmzaton soluton not only provdes the maxmum expected ncrease n the crcut path delay, but also wll provde the exact current waveforms for each crcut blocks that produce ths delay varaton. Such a worst-case block current trace can be smulated by the desgn to verfy the predcted delay change and can gve nsght nto the operaton of the supply grd. Generaton of block current constrants Equatons 22 and 23 express smple constrants on the current of ndvdual blocks or the total current of the processor as a whole. However, n most processor desgns, correlatons between the currents of dfferent blocks, or between currents of a block n consecutve clock cycles wll also arse. For nstance, postve correlaton between the current of two ppelne stages can arse when data s passed from one ppelne stage to the next, or negatve correlaton may exst between the currents of two crcut blocks that operate mutually exclusvely. We therefore ncorporate lnear constrants n the proposed formulaton to express such correlatons. It should be noted that the delay maxmzaton formulaton s conservatve, meanng that t wll over estmate the change n delay due to supply voltage fluctuatons. Ths s the result of the optmzaton formulaton, whch automatcally maxmzes the delay change wthn the bounds of the provded constrants. Incorporatng addtonal constrants n the analyss s therefore an effectve method to reduce the conservatsm of the analyss. Any lnear constrant can be represented n the proposed formulaton and a number of dfferent approaches of automatcally generatng such constrants can be used. In ths paper, we propose the use of gate level power smulaton, such a Verlog based smulator, to extract correlaton constrants. By smulatng a large set of chp level smulaton vectors, the correlaton between the currents of dfferent blocks n one clock cycle or between currents of blocks n dfferent clock cycles can be observed and can be represented usng lnear constrants. In Fgure 8, we show an example of the correlaton between the currents of a Multpler and an ALU block n an Alpha processor. The X-axs of the scatter plot corresponds to the current of the Multpler block and the Y-axs corresponds to the current of the ALU. The entre processor desgn was smulated, and the current of the ALU and Multpler blocks were computed usng pre-characterzed 189

Block Current of ALU (A) 2 1.6 1.2 0.8 0.4 Feasble Regon 0 0 0.25 0.5 0.75 1 1.25 1.5 Block Current of M ultpler (A) Fgure 8. Correlaton between Multpler and ALU block currents. power data n the cell lbrary. Each pont n the scatter plot represents a smulated clock cycle. In total, more than ten thousand clock cycles were smulated usng a number of benchmark programs. Note that many of the scatter ponts concde. Snce the Alpha processor s a sngle ssue machne and was desgned wth clock gatng for reduced power consumpton, the Multpler and ALU blocks cannot be actve n the same clock cycle. Ths negatve correlaton s evdent from the L-shaped skater ponts n Fgure 8. To express ths correlaton n the delay maxmzaton formulaton, we generate the lnear constrant as shown by the sold lne n Fgure 8 and expressed t wth the followng nequalty: I mult, t + 1.36I ALU, t 1.7 (EQ 24) It s clear that the constrant n EQ24 wll reduce predcted delay ncrease of the analyss by preventng the Multpler and the ALU from smultaneously exhbtng ther maxmum current values. An example of a correlaton between currents n dfferent clock cycles s shown n Fgure 9, where the current of the nstructon 3.0 2.5 2.0 1.5 1.0 0.5 Feasble Regon 0 0.5 1.0 1.5 2.0 2.5 Fgure 9. Correlaton between IF stage n cycle t and ID stage n cycle t+1. fetch stage n cycle t s plotted aganst the current of the nstructon decode stage n cycle t+1. Snce data s passed from the nstructon fetch stage to the nstructon decode stage, a correlaton can arse, as clearly vsble from the scatter plot n Fgure 9. In ths case, the correlaton s captured usng two constrants, as llustrated n Fgure 9 and expressed as follows: 1.7I IF, t + I ID, t + 1 3.5 (EQ 25) 3.1 Voltage drop formulaton We observe that the proposed method for delay maxmzaton can be easly reformulated to computed the maxmum voltage drop at a partcular crcut node. In ths case, we maxmze the voltage drop, agan subject to lnear constrant and wth the block currents as optmzaton varables, as follows: Maxmze such that 4 Results M 1 S 1 uv, dd V dd, n = V m, n, I = = 0 m, S m 0 9.6I IF, t + I ID, t + 1 14.4 Table 2. Comparson of Worst-Case Voltage Drops Usng Dfferent (EQ 26) Approaches Although n ths paper we manually extract constrants from the correlaton data, t s clear that such constrants could be easly generated Worst voltage drops automatcally by fndng a polyhedron that encompasses all # of # of Peak Average Constr. Grd Grd Type nodes Blocks Curr/ Curr/ Max generated current ponts. The use of gate level power smulaton has Block Block the advantage that very extensve sutes of test vectors are readly avalable and block current data can be obtaned from them wth mnmum overhead durng the desgn process. Also, gate level smulaton s typcally performed for many mllons of clock cycles. The proposed approach allows realstc constrants to be extracted, based on extensve smulaton data whle at the same tme avodng the need to evaluate long power grd vectors, that wll lead to ntractable smulaton tmes. Grd-1 Grd-2 Grd-3 Grd-4 Grd-5 Grd-6 Grd-7 WB WB FC WB FC WB FC 1051 1051 1691 1691 2438 2438 3818 10 16 16 20 20 25 25 (mv) 258.2 295.3 121.9 195.2 172.2 232.8 149.1 (mv) 96.8 105.5 43.5 90.1 57.4 76.7 43.9 (mv) 170.8 193.3 109.0 166.8 147.7 141.9 112.9 Grd-8 WB 3818 30 247.2 81.9 178.3 Grd-9 FC 1,57,180 30 190.3 69.2 134.7 (EQ 27) N I m, I = 1 peak (EQ 28) I mn, I m, I max, (EQ 29) for all n = 1,2...M. Note that ths formulaton accounts for both IRdrop and LdI/dt drop. The proposed approaches for determnng the worst-case voltage drop and maxmum ncrease n delay of a path were mplemented and tested on a number of grds of dfferent szes for both flp-chp and wre bond package models. Grd-1 through Grd-8 are dfferent sze grds n 9 layers of metal, generated usng ptches and wdths of an ndustral mcroprocessor desgn. Grd-9 s the grd of an ndustral processor, extracted usng a commercal extracton tool and conssts of over 1 mllon elements. For each chp, desgn was parttoned nto a number of blocks. The maxmum and mnmum current of each block and the total maxmum power of the chp was then obtaned through ether Verlog smulaton or chp area estmates. Table 2 shows the results for worst-case voltage drop computaton, usng the approach descrbed n Secton 3.1. We compare the obtaned results wth two tradtonal approaches for voltage drop analyss. In the frst approach (Peak Curr) shown n Table 2, all blocks are assgned ther maxmum swtchng current, so as to draw peak smultaneously. In the second approach (Avg. Curr), we assgn an average current to each block. The last column shows the voltage drop obtaned from the constraned maxmzaton approach, where blocks wth low senstvty wll be swtchng wth lower currents whle blocks wth hgher senstvty wll swtch wth hgher currents. The current drawn by each block wll change n every clock cycle so as to maxmze the voltage drop at a gven node due to both IR-drop and LdI/dt drop. Table 2 shows that the peak current approach overestmates the worst-case voltage drop by a maxmum of 64% and by 37% on average over all test cases. On the other hand, the average current approach underestmates the worst-case drop by as much as 61% and by 51% on average. 190

Table 3, show the results of the proposed delay maxmzaton approach. Table 3 shows the maxmum expected delay ncrease of a crtcal path for each chp as determned by the proposed constraned optmzaton approach (Constr. Max). The results are compared wth two tradtonal approaches. In tradtonal approach 1, the worst-case voltage drop of power supply network s appled at all voltage supply ponts of the gates consttutng the crtcal path. Ths s equvalent to the common practce of lowerng the operatng voltage of all cells n the lbrary by the worst-case expected voltage drop durng tmng characterzaton. Table 3 shows that ths approach over-estmates the ncrease n delay compared to the constraned maxmzaton approach by 135% on an average. It should be noted however, that the over-estmaton depends on the placement of the gates n the path on the chp, gvng a worse over-estmaton of the delay ncrease for paths that are dstrbuted over a sgnfcant area of the de. In tradtonal approach 2, the worst voltage drop at each gate locaton s frst determned usng the constraned voltage maxmzaton formulaton descrbed n Secton 3.1. Each local worst-case drop s then appled smultaneously at all gates n the path. Ths approach s therefore less conservatve than tradtonal approach 1 snce many nodes have a local worst-case drop that s less than the worst-case drop of the chp as a whole. Nevertheless, ths approach s also conservatve and Table 3 shows that ths approach stll overestmates the delay by 44.7% on average compared wth the constraned delay maxmzaton approach. Table 3. Comparson of Increase n Delay Usng Dfferent Approaches Grd # of Nodes Tradtonal Approach 1 Increase n Delay of a path Tradtonal Approach 2 Constr Max Grd-1 1051 18.39% 14.42% 10.34% Grd-2 1051 10.37% 8.07% 5.25% Grd-3 1691 15.87% 7.95% 4.96% Grd-4 1691 14.05% 4.83% 2.60% Grd-5 2438 13.50% 9.64% 6.41% Grd-6 2438 10.74% 6.95% 3.81% Grd-7 3818 16.97% 10.40% 8.44% Grd-8 3818 12.82% 8.94% 6.54% Grd-9 1,57,180 16.25% 8.16% 6.50% In Table 4, we demonstrate the effectveness of ncorporatng addtonal constrants between block currents nto the formulaton. We repeated the analyss of Grd-1 of the Alpha processor, but added several lnear constrants expressng correlatons between currents of dfferent blocks and between block currents n dfferent clock cycles. The constrants were obtaned usng extensve Verlog smulaton, as descrbed n Secton 3. Table 4 shows the ncrease n delay of 5 crtcal paths wth and wthout these correlaton constrants. Although only a few constrants were added to the analyss, the delay ncrease mproved by as much as 21.7%, and by 16.5% on average, showng the effectveness of ths approach. Table 4. Impact of Correlaton Constrants on Increase n Delay Crtcal Paths Increase n Delay of Crtcal Path Wthout Correlaton Constrants Wth Correlaton Constrants % Improvement Path 1 8.92% 6.98% 21.7% Path 2 8.40% 7.03% 16.3% Path 3 10.91% 9.11% 16.49% Path 4 10.88% 9.58% 11.95% Path 5 9.36% 7.85% 16.13% In Fgure 9, the current waveforms generated by delay maxmzaton approach for Grd-1 are shown. As can be seen, the currents generated by the analyss are tme varyng and explot the tme dependence of IR-drop and LdI/dt drop. The run tme for the lnear 191 optmzaton was less than 1sec for all the grds snce the lnear optmzer can solve lnear maxmzaton problems very quckly. The ntal step of computng senstvtes s computatonally ntensve n ths approach but t can be consderably reduced usng fast lnear solvers. Fgure 10. Varaton of Block curents wth tme to maxmze delay 5 Concluson In ths paper, we have presented a new approach for computng the maxmum delay ncrease of crtcal path due to power supply voltage fluctuatons. The analyss s vectorless whle consderng both IR-drop and LdI/dt drop. We presented an accurate model for the path delay as a functon of the supply voltages and then formulated the delay maxmzaton problem as a constraned lnear optmzaton problem. We also dscuss how lnear constrants can be added to the formulaton to represent correlatons between block currents. The analyss was mplemented and tested on a number of benchmark grds, ncludng the power grd of an ndustral processor and we demonstrate the effectveness of the proposed approach. 6 Acknowledgement Ths work was funded by research grants and contracts from SRC, NSF, Intel and IBM. References [1] G. Steele, D. Overhauser, S. Rochel and Z, Hussan, Full-chp verfcaton methods for DSM power dstrbuton systems, n DAC, 1998. [2] H. Chen and D. Lng, Power supply nose analyss methodology for deepsubmcron VLSI chp desgn, n DAC, pp. 638-643, 1997. [3] S. Zhao, K. Roy and C. K. Koh, Frequency doman analyss of swtchng nose on power supply network, n ICCAD, pp. 487-492, 2000 [4] R. Panda, D. Blaauw, R. Chaudhry, V. Zolotov, B. Young and R. Ramaraju, Model and analyss for combned package and on-chp power grd smulaton, n Proc. of the ISLPED, pp. 179-184, 2000. [5] S. R. Nassf and J. N. Kozhaya, Fast power grd smulaton, n Proc. Desgn Automaton Conference, pp. 156-161, 2000. [6] S. Taylor, The challenge of desgnng global systems, n Proc. IEEE Custom Integrated Crcuts Conference, pp. 429-435, 1999. [7] M. Zhao, R. V. Panda, S. S. Sapatnekar and D. Blaauw, Herarchcal analyss of power dstrbuton networks, IEEE Trans. on Computer-Aded Desgn of Integrated Crcuts and Systems, pp. 159-168, 2002. [8] H. Krplan, F. Najm and I. Hajj, Pattern ndependent mnmum current estmaton n power and ground buses of CMOS VLSI crcuts, IEEE Trans. on Computer-Aded Desgn, pp. 998-1012, 1995. [9] A. Krstc and K. Cheng, Vector generaton for maxmum nstantaneous current through supply lnes for CMOS crcuts, n Proc. Desgn Automaton Conference, pp. 383-388, 1997. [10] Y. M. Jang, T. Young and K. Cheng, VIP - an nput pattern generator for dentfyng crtcal voltage drop for deep submcron desgns, Proc. ISLPED, pp. 156-161, 1999. [11] S. Bobba and I. N. Hajj, Maxmum voltage varaton n the power dstrbuton network of VLSI crcuts wth RLC models, n Proc. Intl. Symposum of Low Power Electroncs and Desgn, 2001. [12] D. Sylvester and K. Keutzer, Gettng to the bottom of deep submcron, Proc. Computer-Aded Desgn, pp. 203-211, 1998. [13] L. H. Chen, M. Sadowska and F. Brewer, Copng wth buffer delay change due to power and ground nose, Proc. DAC, 2002. [14] Y. M. Jang and K.T. Cheng, Analyss of performance mpact caused by power supply nose n deep submcron devces, Proc. Computer-Aded Desgn, pp. 760-765, 1999. [15] A. Chandrakasan, W. J. Bowhll and F. Fox, Desgn of hgh performance mcroprocessor crcuts. NY: IEEE Press, 2001. [16] R. Panda, tutoral, On chp nductance extracton and modellng, Intl. Symposum on Qualty Electroncs Desgn, tutoral. [17] G. Ba, S. Bobba and I.N. Hajj RC power bus maxmum voltage drop n dgtal VLSI crcuts, Intl. Symposum on Qualty Electroncs Desgn.