1 Reolving Signal Correlation for Etimating Maximum Current in CMOS Combinational Circuit Harih Kriplani y,farid Najm y, Ping Y ang yy and Ibrahim Hajj y y Univerity of Illinoi at Urbana-Champaign yy Texa Intrument Inc., Dalla, TX. Abtract: Current owing in the power and ground (P&G) line of CMOS digital circuit aect both circuit reliability and performance bycauing exceive voltage drop. Maximum current etimate are therefore needed in the P&G line to determine the everity of the voltage drop problem and to properly deign the upply line to eliminate thee problem. Thee current, however, depend on the pecic input pattern that are applied to the circuit. Since it i prohibitively expenive to enumerate all poible input, thi problem ha, for a long time, remained largely unolved. In [1], we propoed apattern-independent, linear time algorithm (imax) that etimate an upper bound envelope ofallpoible current waveform that reult from the application of dierent input pattern to the circuit. While the bound produced byimax i fairly tight on many circuit, there can be a ignicant lo in accuracy due to correlation between ignal internal to the circuit. In thi paper, we preent a new partial input enumeration (PIE) algorithm to reolve thee correlation and ignicantly improve the upper bound (in one cae, reducing the error by 64% on a circuit with about 1,700 gate). We alo how good peedperfor- mance, analyzing circuit with more than 20,000 gate in about 2 hour on a SUN ELC. We demontrate with extenive experimental reult that the algorithm repreent a good time-accuracy trade-o and i applicable to large VLSI circuit. 1 Introduction Current owing in the power and ground (P&G) line of CMOS digital circuit aect both circuit reliability and performance by cauing exceive voltage drop in the line. Furthermore, the everity of thee voltage drop problem intenify with the continuing puh for dener chip and ner technologie. Indeed, a i known from the claical caling theory [2], a the minimum feature ize and the upply voltage are caled down, while the total chip power remain contant, the required upply current increae. With higher current owing in narrower line, the voltage drop in the Thi reearch i upported by the Semiconductor Reearch Corporation (92-DP-109) and Texa Intrument Inc. Permiion to copy without fee all or part of thi material i granted provided that the copie are not made or ditributed for direct commercial advantage, the ACM copyright notice and the title of the publication and it date appear, and notice i given that copying i by permiion of the Aociation for Computing Machinery. To copy otherwie, or to republih, require a fee and/or pecic permiion. upply line goe up and quickly become a limiting factor in the deign of VLSI chip. Furthermore, a lower upply voltage mean that the noie margin for the correct operation of a tranitor decreae. In hort, in order to avoid logic error, power and ground line need to be carefully deigned to take care of the increaed voltage drop and reduced noie margin. Thi highlight the need for ecient CAD tool to etimate the power upply and ground current. Since wort cae current determine wort cae voltage drop, our reearch i focued on the problem of etimating maximum current waveform in the power or ground line. The current drawn by a CMOS circuit i a complex function of input excitation. For each input pattern applied to the circuit, a dierent tranient current waveform i drawn from the upply line. An input pattern for a circuit with n input conit of a vector of n excitation, where each excitation could be either a table input tate i.e., low or high, or a tranition i.e., high to low or low to high. In the preence of uch input dependent and tranient current waveform, one mut carefully dene the notion of maximum current waveform. In [1], we propoed the Maximum Envelope Current (MEC) waveform a an etimate of the maximum current. The MEC waveform at a contact point i the upper bound envelope of all the tranient current waveform that reult from the application of dierent input pattern to the circuit. Accurate etimation of the MEC waveform at every contact point i NP-complete, a thi problem can be tranformed to a Boolean atiability problem [3]. In [1], we propoed a pattern-independent, linear time algorithm, called imax, that provide an upper bound for the MEC waveform (for completene, thi algorithm i ummarized in the next ection). However, in order to maintain reaonable execution time, the imax algorithm neglect variou ignal correlation that may exit inide a circuit. Thi can reult in a ignicant lo in accuracy (i.e. a looe upper bound), even with the imple improvement heuritic ued in [1]. The main contribution of thi paper i a new partial input enumeration (PIE) algorithm that eciently reolve thee correlation and lead to ignicant improvement in the upper bound (in one cae, reducing the error by 64% on a circuit with about 1,700 gate). Thi technique i baed on (1) intelligently electing a few critical input node and (2) enumerating a limited number of cae at thee node to produce an overall improvement in the upper bound. It turn out that the choice of thee critical node i the key, and we will preent two heuritic for doing thi that have hown good reult in practice. While thi technique may be To appear in 30th ACM/IEEE Deign Automation Conference 1993.

2 lower than the imple imax technique, we till demontrate good peed performance, olving circuit with more than 20,000 gate in about 2 hour on a SUN ELC. Our algorithm ha the attractive property that it doe an iterative improvement, o that one can top it at any time, and till obtain a better upper bound than the imple imax reult. We will demontrate with extenive experimental reult that the PIE algorithm repreent a good time-accuracy trade-o and i applicable to large VLSI circuit. Thi paper i organized a follow. Following the next background ection, we dicu the ignal correlation problem in ection 3. Thi i followed by a dicuion of poible method that can be ued to reolve the ignal correlation in ection 4. In ection 5, we preent the partial input enumeration method. In ection 6, we preent experimental reult on everal benchmark circuit. Finally, the alient feature of thi paper are ummarized in ection 7. 2 Main Idea of the imax Algorithm We will briey review the imax algorithm preented in [1], in which the following implifying aumption are made. Firtly, the combinational circuit under conideration i aumed to be part of a ynchronou equential circuit and, therefore, all of it input witch only at time zero. Secondly, the delay of each gate in the circuit i aumed to be xed and i a uer-pecied number. We dene excitation atanode 1 at time t a the timulu (or ignal value) preent at the node at that time. At any time, a node in the circuit could be either table at low or high, or could tranition from high to low or from low to high. Thu, the excitation could be any ingle value from the et X = fl, h, hl, lhg, where l = low, h = high, hl = high to low tranition and lh = low to high tranition. The et of all poible excitation that a node n can aume at any time t i called the uncertainty et for the node at time t and i denoted by Xn(t). Clearly, Xn(t) X. The imax algorithm ue a gate level decription of the circuit and, unle pecied otherwie, aume that the uncertainty et for each input at time zero i X. The baic idea of the algorithm i to propagate the \uncertainty" preent at the input inide the circuit o a to determine the entire range of poible excitation and their aociated timing at the output of every logic gate. From thi information, the wort cae current waveform are computed. The detail of the algorithm can be found in [1]. An example illutrating the algorithm i hown in Fig. 1. In thi example, we aume that the uncertainty et for each input at time zero i X. Therefore, each input can tranition from low to high or from high to low at time zero, or tay atlow or high for all time. Tranition at variou node of the circuit are repreented by interval. Thu, a tranition at a pecic time point T can be repreented by a cloed interval which begin and end at T. Given the above decription at the input of the inverter (i1), the output (n1) can tranition from low to high or from high to low at time 1 (auming the delay of the inverter a 1 unit) or tay atlow or high for all time. Similarly, auming the delay of the NAND gate a 2 unit, the output of 1 A node in a circuit i either a primary input or the output of a gate. i1 i2 1 n1 2 o1 Input Decription : i1, i2 2fl; h; hl; lhg at time 0. Uncertainty Interval : i1, i2: lh[0, 0], hl[0, 0], l[0, 1), h[0, 1) n1: lh[1, 1], hl[1, 1], l[0, 1), h[0, 1) o1: lh[2, 2][3, 3], hl[2, 2][3, 3], l[0, 1), h[0, 1) Key : Excitation[Interval Begin, Interval End] Figure 1. An example illutrating the imax algorithm. the gate (o1) can tranition lh or hl at time 2 due to the input i2, or tranition lh or hl at time 3 due to the output of the inverter (n1); or tay atlow or high for all time. In thi fahion, the algorithm compute the et of all poible tranition at the output of every logic gate. The current waveform of each gate i calculated from thi et of all poible tranition, and then the current waveform from dierent gate are combined at the contact point(). A each gate current i computed from the et of all poible tranition, the current waveform at the contact point i a point-wie upper bound on the MEC waveform (alo ee [1]). In order to ae the quality of the upper bound obtained from imax, we need to determine the exact MEC waveform. However, a mentioned earlier, doing o i practically impoible for mot circuit with more than about 10 input. We therefore, ue an iterative optimization cheme, namely the imulated annealing (SA) algorithm [4], to calculate a current waveform that i cloe to the MEC waveform. In SA, dierent input pattern are electively applied to the circuit and then a logic imulator i ued to calculate the output of variou gate. From thee gate output, the upply current are eaily calculated. We ue the peak value of the overall current waveform a the objective function to be maximized in the annealing algorithm. Since we cannot aord to examine all input pattern, the reult of SA will be only a lower bound for the MEC waveform. By comparing the upper bound obtained from imax to thi lower bound, we obtain a meaure of the maximum deviation of the imax upper bound from the true MEC [1]. 3 The Signal Correlation Problem In general, ignal value at internal node of a circuit are correlated. Thi limit the number of tranition that can poibly occur at the output of the gate, an eect that i ignored by the imax algorithm [1]. An example of how ignal correlation limit the number of tranition i illutrated in Fig. 2. In thi gure, the ignal line x1 and x2 are correlated i.e., they carry the ame ignal. Depending upon the pecic excitation preent at x, only one of the two gate can witch at a time. However, ince imax ignore the ignal correlation preent between x1 and x2, it erroneouly conclude that both gate may witch at the ame time. It i thi kind of approximation that contribute to a looe imax upper bound. A i clear from thi example, the ource of the ignal correlation problem, in general, i a gate (or input) whoe output

3 x {l, lh} {l, h, lh, hl} {h, hl} x1 x2 {h, hl} {l, lh} Figure 2. The ignal correlation problem. fan out to everal other gate. Such gate are called multiple fan-out (MFO) gate. The baic imax algorithm ignore all ignal correlation and, therefore, overetimate the upply current. The advantage of ignoring correlation in the algorithm i it, very deirable, linear time performance. 4 Reolving Signal Correlation The upper bound produced by the imax algorithm can be made exact by doing a brute-force enumeration at the input of the circuit and toring the envelope of the current waveform produced. In enumeration, ince unambiguou input pattern are applied to the circuit, there i no uncertainty preent at the input and therefore, ignal correlation do not become an iue. In a imilar fahion, one can improve the reult of the imax algorithm by doing a partial enumeration at a few elected node in the circuit. An example of how the partial enumeration help improve the upper bound can be een from Fig. 2. In thi circuit with no enumeration, imax would aume that the ignal line x1 and x2 are mutually independent and therefore, infer that both the NAND and the NOR gate can witch at the ame time. However, if we do a partial enumeration at ignal line x, then we would generate four cae correponding to when x = l, x = h, x = hl and x = lh. When x = l or hl, only the NOR gate witche. Similarly, when x = h or lh, only the NAND gate witche. Thu, by plitting the problem into four ub-problem, we have improved our reult, i.e., found that only one of the two gate may witch at any given time. While enumerating a node, we need to proce only thoe gate that are aected by a change in excitation at the node. We dene COne of INuence (COIN) of a node a the et of all the gate that can poibly be aected by a change in excitation at that node. Thu, a gate i in the COIN of a node if it i either directly driven by it or i connected to the output of a gate that i in COIN. While enumerating a node, we only need to conider thoe gate that are in it COIN. One technique to partially enumerate the internal node of a circuit, called Multi-Cone Analyi (MCA), wa reported in [1]. The motivation behind uch an approach wa to be able to enumerate at the output of the MFO gate, which are the ource of the ignal correlation problem. However, the MCA approach oer only modet improvement in reult. In the next ection, we preent a partial input enumeration approach that ignicantly improve the imax reult and repreent a good peed-accuracy trade-o. 5 Partial Input Enumeration (PIE) There are uually many more MFO node than primary input in a circuit. Secondly, a tated in ection 2, all the input to the circuit witch at mot once at time 0. Therefore, there i only one time point at which a primary input need to be enumerated. Thi i in contrat to an internal node which uually need to be enumerated at everal time point. Thee obervation, combined with the fact that imax i an extremely fat algorithm led u to explore the following partial input enumeration (PIE) method to improve the upper bound. Let x 1, x 2, :::, x N be the N primary input of a circuit under conideration. Let X i repreent the uncertainty et for input x i at time 0. The input earch pace for the circuit conit of all valid input pattern that can be applied to it. Mathematically, the input earch pace i f(e 1 ;e 2 ;:::;e N ) j e 1 2 X 1 ; e 2 2 X 2 ;:::; e N 2 X N g. For brevity, we denote thi by (X 1 ;X 2 ;:::;X N ). Suppoe, for the purpoe of thi illutration, for a particular input x i, X i = X. Then the input earch pace (X 1 ;X 2 ;:::;X N ) for the circuit can be divided into four dijoint part, namely (X 1 ;X 2 ; ::; flg;::;x N ), (X 1 ;X 2 ; ::; fhg;::; X N ), (X 1 ;X 2 ; ::; fhlg;::;x N ) and (X 1 ;X 2 ; ::; flhg;::; X N ). We can compute the maximum current waveform for each of thee four part by running the imax algorithm. Since the four part combined together contitute the complete earch pace, by taking an upper bound envelope of the four current waveform, we can till guarantee an upper bound on the MEC waveform. Since, in each of the four run of imax, pecic excitation value are preent at input x i, ignal correlation due to x i diappear and the reulting current waveform hould be an improvement on the original upper bound. In a imilar fahion, the upper bound for the individual ubcae can be improved. The et of input elected for enumeration ha a denite inuence on the quality of the olution obtained. If all the input are elected then the upper bound obtained would be exact. However, doing thi i practically impoible for mot circuit. From Fig. 2, we oberve that ome input contribute more to ignal correlation than other e.g., enumerating input x i more benecial than enumerating any of the other two. Hence, by electing and enumerating input in an intelligent fahion, we can ignicantly improve the imax upper bound, without pending too much cpu time. We have implemented the partial input enumeration approach in the form of a bet rt earch (BFS) algorithm [5]. Variou earch node (call it node) generated during the earch correpond to partial input pecication, a explained above. During the earch, we alway expand node which correpond to the highet peak value (objective of the earch) of the upper bound waveform. Becaue of thi bet rt trategy, there i a gradual reduction in the peak value of the upper bound. Thi i a very important feature of the algorithm for large circuit where an exhautive exploration of the input pace i practically impoible. The BFS algorithm can be topped at any intermediate tage and the current bet upper bound can till be reported. The BFS algorithm tart with the initial uncertain tate i.e., (X 1, X 2, :::,X N ). During the earch, a node with the highet objective value i repeatedly elected and it decendent node are generated by enumerating an input, a explained in the following outline. Here, Lit i an ordered lit of node, arranged in their decreaing objective value. 1. Lit Initial uncertainty tate. UB it imax value. LB obj value for an input pattern.

4 objn objl objh objhl objlh Figure 3. The H 1 Splitting Criterion. 2. While Stopping Criterion i not atied, do 2.1 Remove top node from Lit. 2.2 Calculate next input to enumerate from SC. 2.3 Generate all ( 4) children node by enumerating the above input and calculate their obj value. 2.4 If thee children are leaf node, then update the LB, ele, inert them in Lit, after pruning if any. 2.5 UB obj value of top node in Lit. 3. Report the bet UB, LB found. STOP. The following function are ued in the algorithm. Objective Function: It i the peak value of the upper bound waveform obtained from imax. Stopping Criterion: We top the earch when the number of node expanded during the earch exceed a certain uer pecied limit (MaxNoNode). Pruning Criterion: During the earch, a node for which the upper bound exceed the lower bound can be deleted from the earch. Splitting Criterion (SC): The plitting Criterion i a very important component of the BFS algorithm. Thi criterion pecie the input that hould be enumerated next from any node during the earch. Let u uppoe that during the earch, we are at a particular node n and we elect an input x i for enumeration. If we aume that the uncertainty et for x i at time 0 i X, then by enumerating x i,wewould generate four children node, a hown in Fig. 3. We aume that the objective value of node n i denoted by objn and the objective value of the children node are denoted by obj l, obj h, obj hl and obj lh.if obj i = objn, maxfobj l ; obj h ; obj hl ; obj lh g ; then by enumerating x i,we can improve the objective value of node n by an amount obj i. Baed on thi obervation, we have come up with the following (more general) heuritic function called H 1 : H 1 = A (objn, obj 1 ) + B (objn, obj 2 ) + C (objn, obj 3 ) + (objn, obj 4 ) where obj 1, obj 2, obj 3 and obj 4 are the objective value of the children node arranged in decreaing order and A, B and C are three contant uch that A B C 1. At every node during the earch, we calculate the heuritic value for every input and elect the input with the maximum aociated heuritic value. However, for large circuit, it i very expenive Table 1. Reult of PIE for 10 ISCAS-85 circuit. Static H1 SC Static H2 SC Circuit imax BFS BFS Time BFS BFS Time (100) (1k) (100) (100) (1k) (100) c m m 34 c m m 23 c m m 5 c m m 13 c m m 51 c h 57m m 56 c m m 3 c h 2m m 2 c h 5m m 28 c h 21m m 4 to repeat thi proce at every node. Therefore, intead of calculating the heuritic value for every input at every node, the heuritic value for every input are calculated at the beginning of the earch. All the input are arranged in decreaing order of thee heuritic value and during the earch, input are enumerated in thi xed order. Thi criterion i called tatic (H 1 ) plitting criterion. The number of gate that are aected by achange in excitation at an input i another good heuritic meaure of how much inuence the input ha on the upper bound waveform. Input which aect more number of gate (i.e., which have larger COIN) hould be enumerated before other. Thi lead u to another (tatic) plitting criterion H 2, whoe value i equal to the ize of the COIN aociated with the input. A with H 1, all the input are arranged in decreaing order of H 2 value and during the earch, input are enumerated in thi xed order. We will how in the next ection that, while both tatic H 1 and H 2 plitting criteria give good reult in practice, H 2 i much better in term of peed and ha accuracy comparable to H 1. 6 Experimental Reult The reult of partial input enumeration uing the BFS algorithm and both H 1 and H 2 tatic plitting criteria for the ISCAS-85 benchmark circuit [6] are documented in Table 1. In the table, under variou imax and BFS column, we how the ratio of the repective upper bound to the lower bound obtained from imulated annealing. The number in parenthee under the BFS column indicate the number of node that were generated before topping the earch (i.e., the MaxNoNode parameter; 1k tand for 1000). Total cpu time needed by the algorithm on a un SPARC tation ELC (with MaxNoNode = 100) are alo hown in the table. From Table 1, we note that for all the circuit, the ratio of the upper bound to the lower bound i at mot 1.52, a oppoed to a wort cae of 2.02 for the imple imax algorithm. Thi ratio can be further improved by running the algorithm for longer duration. We emphaize that, ince we can only compare the upper bound to a lower bound obtained from SA, the number in

5 R A TI O UB/LB LB Time (min) Figure 4. `UB / LB v Time' plot for c3540. the table are only upper bound on the error. It i prohibitively expenive to meaure the true error. While the improvement over the original imax algorithm i not large in all cae, in thoe cae where the imax bound wa very looe, uch a c3540, the new PIE algorithm give ignicant improvement : the ratio of 2.02 (maximum over-etimation by 1.02) i now 1.37 (maximum over-etimation by 0.37) with H 2, a reduction in the maximum over-etimation by about 64%. We alo emphaize the following attractive property of the algorithm : a ignicant amount of improvement in the upper bound occur in the rt few node expanion (about ) in the algorithm. Thi i illutrated in Fig. 4, where the ratio of the upper bound to the lower bound i plotted a a function of cpu time for c3540. The gure clearly indicate that our heuritic are working well to elect the mot critical node rt. It alo point to the fact that we can top the earch atanyintermediate tep and till be able to obtain ome improvement in reult. Similar behavior i oberved for mot other circuit. The cpu time needed for generating the input lit by the H 2 plitting criterion i negligible compared to the time needed by the H 1 criterion. For VLSI circuit with everal hundred input, where the time needed by the H 1 criterion may be large, H 2 criterion may be ued intead. A can be een from Table 1, the reult produced by uing either plitting criteria are quite comparable, pecially for thoe circuit where imax did not produce a good upper bound. In order to demontrate the applicability of the partial input enumeration algorithm for VLSI circuit with everal thouand gate, we have alo experimented with the ISCAS-89 benchmark circuit [7]. For thee ynchronou equential circuit, we have extracted the combinational block by deleting the ip-op. Thee combinational block have gate count ranging up to 22,000 and number of input ranging up to The reult of the BFS algorithm on ome of the ISCAS-89 circuit uing the H 2 plitting criteria are ummarized in Table 2. It i clear from the table that even for circuit of thi ize, our algorithm how good peed and accuracy performance. 7 Concluion In thi paper, we have decribed the ignal correlation problem which arie while etimating maximum current in CMOS combinational circuit by the pattern independent approach imax [1]. We have preented a new partial input enumeration (PIE) algorithm to reolve the ignal correlation and ignicantly improve the upper bound obtained from the imax algorithm (in one cae, reducing the error by 64% on a cir- Table 2. Reult of PIE for ISCAS-89 circuit. Static H2 SC Circuit No. imax BFS BFS Time Gate (100) (1k) (100) m m m m m h 11m h 6m h 46m h 15m cuit with about 1,700 gate). We alo how good peed performance, olving circuit with more than 20,000 gate in about 2.25 hour on a SUN ELC. The algorithm i baed on the bet rt earch (BFS) technique and repreent a good time-accuracy trade-o. The PIE algorithm involve a earch procedure, but thi earch need not be carried too deep to obtain good reult. The algorithm i quite applicable to large VLSI circuit, a i demontrated by the experimental reult. In our future reearch, we plan to extend the tudy to include better delay model and to etimate wort cae voltage drop in upply line, uing RC model, from the maximum current etimate. Reference [1] H. Kriplani, F. Najm, and I. Hajj, \Maximum current etimation in CMOS circuit," in Proceeding of 29th ACM/IEEE Deign Automation Conference, pp. 2{7, Anaheim, CA, June 8-12, [2] C. Mead and L. Convey, Introduction to VLSI Sytem. Reading, MA: Addion-Weley, [3] M. R. Garey and D. S. Johnon, Computer and Intractibility, A Guide to the Theory of NP- Completene. New York, NY: W. H. Freeman and Co., [4] S. Kirkpatrick, C. D. Gelatt, Jr., and M. P. Vecchi, \Optimization by imulated annealing," Science, vol. 220, no. 4598, pp. 671{680, 13 May [5] J. Pearl, Heuritic { Intelligent Search Strategie for Computer Problem Solving. Reading, MA: Addion-Weley, [6] F. Brglez and H. Fujiwara, \A neutral netlit of 10 combinational benchmark circuit and a target tranlator in fortran," in Proceeding of International Sympoium on Circuit and Sytem, pp. 695{698, June [7] F. Brglez, D. Bryan, and K. Kozminki, \Combinational prole of equential benchmark circuit," in Proceeding of International Sympoium on Circuit and Sytem, pp. 1929{1934, May 1989.

