Accurate Temperature-Dependent Integrated Circuit Leakage Power Estimation is Easy

Similar documents
Power Improvement in 64-Bit Full Adder Using Embedded Technologies Er. Arun Gandhi 1, Dr. Rahul Malhotra 2, Er. Kulbhushan Singla 3

Adaptive Harmonic IIR Notch Filter with Varying Notch Bandwidth and Convergence Factor

Design and Implementation of Block Based Transpose Form FIR Filter

A 1.2V rail-to-rail 100MHz amplifier.

DSI3 Sensor to Master Current Threshold Adaptation for Pattern Recognition

International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE) Volume 3, Issue 9, September 2014

Keywords: International Mobile Telecommunication (IMT) Systems, evaluating the usage of frequency bands, evaluation indicators

International Journal of Scientific & Engineering Research, Volume 4, Issue 12, December ISSN

Yield Enhancement Techniques for 3D Memories by Redundancy Sharing among All Layers

Energy-Efficient Cellular Communications Powered by Smart Grid Technology

WIPL-D Pro: What is New in v12.0?

A New Localization and Tracking Algorithm for Wireless Sensor Networks Based on Internet of Things

Distributed Power Delivery for Energy Efficient and Low Power Systems

Exploring the Electron Tunneling Behavior of Scanning Tunneling Microscope (STM) tip and n-type Semiconductor

Keywords: Equivalent Instantaneous Inductance, Finite Element, Inrush Current.

Intermediate-Node Initiated Reservation (IIR): A New Signaling Scheme for Wavelength-Routed Networks with Sparse Conversion

OTC Statistics of High- and Low-Frequency Motions of a Moored Tanker. sensitive to lateral loading such as the SAL5 and

Power Comparison of 2D, 3D and 2.5D Interconnect Solutions and Power Optimization of Interposer Interconnects

ELECTROMAGNETIC COVERAGE CALCULATION IN GIS

NINTH INTERNATIONAL CONGRESS ON SOUND AND VIBRATION, ICSV9 PASSIVE CONTROL OF LAUNCH NOISE IN ROCKET PAYLOAD BAYS

Comparison Between PLAXIS Output and Neural Network in the Guard Walls

POWER QUALITY ASSESSMENT USING TWO STAGE NONLINEAR ESTIMATION NUMERICAL ALGORITHM

Ignition and monitoring technique for plasma processing of multicell superconducting radio frequency cavities

Fundamental study for measuring microflow with Michelson interferometer enhanced by external random signal

Implementation of Adaptive Viterbi Decoder

SECURITY AND BER PERFORMANCE TRADE-OFF IN WIRELESS COMMUNICATION SYSTEMS APPLICATIONS

Iterative Receiver Signal Processing for Joint Mitigation of Transmitter and Receiver Phase Noise in OFDM-Based Cognitive Radio Link

Detection of Faults in Power System Using Wavelet Transform and Independent Component Analysis

Allocation of Multiple Services in Multi-Access Wireless Systems

Content-Centric Multicast Beamforming in Cache-Enabled Cloud Radio Access Networks

L It indicates that g m is proportional to the k, W/L ratio and ( VGS Vt However, a large V GS reduces the allowable signal swing at the drain.

Notes on Orthogonal Frequency Division Multiplexing (OFDM)

Performance of Multiuser MIMO System Employing Block Diagonalization with Antenna Selection at Mobile Stations

Low-noise Design Issues for Analog Front-end Electronics in 130 nm and 90 nm CMOS Technologies

TESTING OF ADCS BY FREQUENCY-DOMAIN ANALYSIS IN MULTI-TONE MODE

AccuBridge TOWARDS THE DEVELOPMENT OF A DC CURRENT COMPARATOR RATIO STANDARD

Comparing structural airframe maintenance strategies based on probabilistic estimates of the remaining useful service life

Overlapping Signal Separation in DPX Spectrum Based on EM Algorithm. Chuandang Liu 1, a, Luxi Lu 1, b

Boris Krnic Nov 15, ECE 1352F. Phase Noise of VCOs

Robust Acceleration Control of Electrodynamic Shaker Using µ Synthesis

A Preprocessing Method to Increase High Frequency Response of A Parametric Loudspeaker

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

A NEW CMOS DIFFERENTIAL OTRA DESIGN FOR THE LOW VOLTAGE POWER SUPPLIES IN THE SUB-MICRON TECHNOLOGIES

Amplifiers and Feedback

Investigating Multiple Alternating Cooperative Broadcasts to Enhance Network Longevity

Modeling Beam forming in Circular Antenna Array with Directional Emitters

Interference Management in LTE Femtocell Systems Using Fractional Frequency Reuse

Kalman Filtering for NLOS Mitigation and Target Tracking in Indoor Wireless Environment

A Novel Control Scheme to Reduce Storage Capacitor of Flyback PFC Converter

Performance Analysis of Atmospheric Field Conjugation Adaptive Arrays

Cross-correlation tracking for Maximum Length Sequence based acoustic localisation

AN OPTIMAL DESIGN PROCESS FOR AN ADEQUATE PRODUCT?

Experiment 7: Frequency Modulation and Phase Locked Loops October 11, 2006

Additive Synthesis, Amplitude Modulation and Frequency Modulation

Wavelength-Selective Switches for Mode-Division Multiplexing: Scaling and Performance Analysis

EFFECTS OF MASKING ANGLE AND MULTIPATH ON GALILEO PERFORMANCES IN DIFFERENT ENVIRONMENTS

New Adaptive Linear Combination Structure for Tracking/Estimating Phasor and Frequency of Power System

The Research of PV MPPT based on RBF-BP Neural Network Optimized by GA

Improving Power Grid Resilience Through Predictive Outage Estimation

Sound recording with the application of microphone arrays

SIG: Signal-Processing

Power-Efficient Resource Allocation for MC-NOMA with Statistical Channel State Information

Compensated Single-Phase Rectifier

Performance Analysis of Reversible Fast Decimal Adders

EXPERIMENTAL VERIFICATION OF SINUSOIDAL APPROXIMATION IN ANALYSIS OF THREE-PHASE TWELVE-PULSE OUTPUT VOLTAGE TYPE RECTIFIERS

Three Component Time-domain Electromagnetic Surveying: Modeling and Data Analysis

Incorporating Performance Degradation in Fault Tolerant Control System Design with Multiple Actuator Failures

A HIGH POWER FACTOR THREE-PHASE RECTIFIER BASED ON ADAPTIVE CURRENT INJECTION APPLYING BUCK CONVERTER

HIGH FREQUENCY LASER BASED ACOUSTIC MICROSCOPY USING A CW GENERATION SOURCE

Simplified Analysis and Design of MIMO Ad Hoc Networks

Parameter Identification of Transfer Functions Using MATLAB

COMBINED FREQUENCY AND SPATIAL DOMAINS POWER DISTRIBUTION FOR MIMO-OFDM TRANSMISSION

PARAMETER OPTIMIZATION OF THE ADAPTIVE MVDR QR-BASED BEAMFORMER FOR JAMMING AND MULTIPATH SUPRESSION IN GPS/GLONASS RECEIVERS

A Novel NLOS Mitigation Approach for Wireless Positioning System

ESTIMATION OF OVERCOVERAGE IN THE CENSUS OF CANADA USING AN AUTOMATED APPROACH. Claude Julien, Statistics Canada Ottawa, Ontario, Canada K1A 0T6

A simple charge sensitive preamplifier for experiments with a small number of detector channels

OPTIMIZE THE POWER CONTROL AND NETWORK LIFETIME USING ZERO - SUM GAME THEORY FOR WIRELESS SENSOR NETWORKS

Selective Harmonic Elimination for Multilevel Inverters with Unbalanced DC Inputs

UNIT - II CONTROLLED RECTIFIERS (Line Commutated AC to DC converters) Line Commutated Converter

Modified vector control appropriate for synthesis of all-purpose controller for grid-connected converters

EXPERIMENTATION FOR ACTIVE VIBRATION CONTROL

Transmit Power and Bit Allocations for OFDM Systems in a Fading Channel

Design and Implementation of Serial Port Ultrasonic Distance Measurement System Based on STC12 Jian Huang

An orthogonal multi-beam based MIMO scheme. for multi-user wireless systems

Keywords Frequency-domain equalization, antenna diversity, multicode DS-CDMA, frequency-selective fading

A New Simple Model for Land Mobile Satellite Channels

Analysis on DV-Hop Algorithm and its variants by considering threshold

New Control Strategies for a Two-Leg Four-Switch STATCOM

Design of Efficient ZVS Half-Bridge Series Resonant Inverter with Suitable Control Technique

Statistical Singing Voice Conversion based on Direct Waveform Modification with Global Variance

A Frequency Domain Approach to Design Constrained Amplitude Spreading Sequences for DS-CDMA Systems for Frequency Selective Fading Channels

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ICC.2006.

Spectrum Sensing in Low SNR: Diversity Combining and Cooperative Communications

COMPARISON OF TOKEN HOLDING TIME STRATEGIES FOR A STATIC TOKEN PASSING BUS. M.E. Ulug

Group Secret Key Generation in Wireless Networks: Algorithms and Rate Optimization

FORWARD MASKING THRESHOLD ESTIMATION USING NEURAL NETWORKS AND ITS APPLICATION TO PARALLEL SPEECH ENHANCEMENT

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

A NEW APPROACH TO UNGROUNDED FAULT LOCATION IN A THREE-PHASE UNDERGROUND DISTRIBUTION SYSTEM USING COMBINED NEURAL NETWORKS & WAVELET ANALYSIS

Secondary-side-only Simultaneous Power and Efficiency Control in Dynamic Wireless Power Transfer System

ELEC2202 Communications Engineering Laboratory Frequency Modulation (FM)

Transcription:

Accurate Teperature-Dependent Integrated Circuit Leakage Power Estiation is Easy Yongpan Liu Robert P. Dick y Li Shang z Huazhong Yang ypliu99@ails.tsinghua.edu.cn dickrp@eecs.northwestern.edu lshang@ee.ueensu.ca yanghz@ail.tsinghua.edu.cn Electronic Engineering Dept. y EECS Dept. z ECE Dept. Tsinghua University Northwestern University Queen s University Beijing, 84, China Evanston, IL 68, U.S.A. Kingston, ON K7L N6, Canada Abstract It has been the conventional assuption that, due to the superlinear dependence of leakage power consuption on teperature, and widely varying on-chip teperature profiles, accurate leakage estiation reuires detailed knowledge of theral profile. Leakage power depends on integrated circuit (IC) theral profile and circuit design style. We show that linear odels can be used to perit highly-accurate leakage estiation over the operating teperature ranges in real ICs. We then show that for typical IC packages and cooling structures, a given aount of heat introduced at any position in the active layer will have siilar ipact on the average teperature of the layer. These two observations allow us to prove that, for wide ranges of design styles and operating teperatures, extreely fast, coarse-grained theral odels, cobined with linear leakage power consuption odels, perit highly-accurate syste-wide leakage power consuption estiation. The results of our proofs are further confired via coparisons with leakage estiation based on detailed, tie-consuing theral analysis techniues. Experiental results indicate that the proposed techniue yields a 59,59,79, speedup in leakage power estiation while aintaining accuracy. I. INTRODUCTION As a result of continued IC process scaling, the iportance of leakage power consuption is increasing []. Leakage accounts for 4% of the power consuption of today s high-perforance icroprocessors []. Power consuption, teperature, and perforance ust now be optiized during the entire design flow. Leakage power consuption and teperature influence each other: increasing teperature increases leakage and vice versa. Leakage power estiation is freuently used in IC synthesis, within which it ay be invoked tens of thousands of ties: it ust be both accurate and fast. Researchers have developed a variety of techniues to characterize IC leakage power consuption, ranging fro architectural level to device level [] [8]. However, ost of these techniues neglect the dependence of leakage on teperature. Leakage is a strong function of teperature. Therefore, theral analysis ust be ebedded within the IC power analysis flow. Figure shows a typical teperature-dependent IC leakage power estiation flow. Power consuption, including dynaic power and leakage power, is initially estiated at a reference teperature. The estiated power profile is then provided to a chip-package theral analysis tool to estiate circuit theral profile. This theral profile is, in turn, used to update circuit leakage power estiation. This iterative process continues until power and teperature converge. Recent work has considered the ipact of teperature on leakage. Zhang et al. developed HotLeakage, a teperature-dependent cache leakage power odel [9]. Su et al. proposed a full-chip leakage odeling techniue that characterizes the ipact of teperature and supply voltage fluctuations []. Liao et al. presented a teperaturedependent icroarchitectural power odel []. In leakage analysis, one can be confident of an accurate result by using a fine-grained theral odel. However, this is coputationally intensive. One can also use a coarse-grained theral odel. Although fast, previous This work was supported in part by the NSFC under awards 97 and 656; in part by the 86 Progra under award 6AAZ4; in part by the NSF under award CNS-4794; and in part by NSERC Discovery Grant 88694-. until leakage power & teperature profiles converge Fig.. power estiation at reference teperature (using PrierPower, HSPICE, etc.) leakage power detailed IC power profile chip-package theral analysis detailed IC theral profile leakage power analysis dynaic power Theral-aware power estiation flow. work has not deonstrated that this will perit accurate leakage estiation. Designers ay select odeling granularity. However, without an understanding of the reuireents necessary for accurate leakage prediction conservative designers are forced to use slow, finegrained theral odels. This hinders the use of accurate IC leakage power estiation during IC synthesis. In this paper, we propose a very fast, accurate ethod of estiating IC leakage power consuption. ) We deonstrate that, within the operating teperature ranges of ICs, using a linear leakage odel for each functional unit results in less than % error in leakage estiation (Section II). ) We deonstrate that IC packages and cooling structures have the useful property that a given aount of heat produced within the active layer of an IC will have siilar ipact on the average teperature of the active layer, regardless of its distribution (Section III). ) We use the two properties described above to prove that within regions of unifor design style, knowledge of the average teperature is sufficient to accurately deterine leakage power consuption. Based on this result, we show that leakage can be predicted using a siple, coarse-grained odel without sacrificing accuracy (Section IV). 4) We validate the proposed techniue via analytical proofs and siulation results. We deonstrate that for a wide range of ICs, a siplified theral odel in which only one theral eleent is used for each functional unit perits a speedup in leakage estiation of 59,59,79, while aintaining accuracy to within % (Section V), when copared with a conventional approach that uses a detailed theral odel. II. PROPOSED LEAKAGE MODEL This section introduces IC leakage power consuption and characterizes leakage odeling linearization. II.A. IC Leakage Sources IC leakage current consists of various coponents, such as subthreshold leakage, gate leakage, reverse-biased junction leakage, punch-through leakage, and gate-induced drain leakage. Aong these, subthreshold leakage and gate leakage are doinant []. They will be the focus of our analysis. 978--988--4/DATE7 7 EDAA

Considering weak inversion Drain-Induced Barrier Lowering and body effect, the subthreshold leakage current of a MOS device can be odeled as follows []: I subthreshold = A s W L vt e V DS (V GS V th ) v T nv e T () where A s is a technology-dependent constant, V th is the threshold voltage, L and W are the device effective channel length and width, V GS is the gate-to-source voltage, n is the subthreshold swing coefficient for the transistor, V DS is the drain-to-source voltage, and v T is the theral voltage. and v T = kt. Therefore, Euation can be reduced to W kt (V GS V th ) I subthreshold = A s e nkt () V DS v T L The gate leakage of a MOS device results fro tunneling between the gate terinal and the other three terinals (source, drain, and body). Gate leakage can be odeled as follows []: I gate = W LA J T oxr T ox nt VgVaux T ox e BTox(a bjvox j)(+cjvox j) () where A J ; B; a; b, and c are technology-dependent constants, nt is a fitting paraeter with a default value of one, V ox is the voltage across gate dielectric, T ox is gate dielectric thickness, T oxr is the reference oxide thickness, V aux is an auxiliary function that approxiates the density of tunneling carriers and available states, and V g is the gate voltage. II.B. Theral Dependency Linearizion Euations deonstrate that subthreshold leakage depends priarily on teperature, supply voltage, and body bias voltage. Gate leakage, in contrast, is priarily affected by supply voltage and gate dielectric thickness, but is insensitive to teperature. Using the Taylor series expansion at a reference teperature T ref, the total IC leakage current of a MOS device can be expressed as follows: I leakage (T ) = I subthreshold + I gate (4) = A s W L k T (V GS V th ) e nkt + I gate (5) = I linear (T ) + I high order (T ) (6) where the linear portion I linear (T ) is (V W k GS V th ) nkt I linear (T ) =I gate + A s e ref L T ref + (Tref and the high-order portion I high order (T ) is (V GS V th ) )(T T ref ) nk I high order (T ) = I leakage(t ref )(T T ref ) +O((T T ref ) ) (8) Therefore, the estiation error resulting fro truncation of superlinear ters is bounded as follows: Err = I high order(t ) I leakage (T ) (7) (9) Euations 8 9 deonstrate that the estiation error of the linear leakage power odel is a function of jt T ref j, i.e., the difference between the actual circuit teperature T and the reference teperature T ref at which the linear odel is derived. Therefore, to iniize the estiation error, the linear leakage odel should be derived as close as possible to the actual subcircuit teperature. This can be Noralized leakage value 5.5 5 4.5 4.5.5.5 C755 HSPICE C755 Linear Model C755 PWL SRAM HSPICE SRAM Linear Model SRAM PWL.5 4 6 8 Teperature (C o ) Fig.. Noralized leakages for HSPICE, piece-wise linear, and linear odels using the 65 n process for c755 and SRAM. Leakage odel error (%) Fig.. 6 5 4 C755 Worst Mx SRAM Worst C755 Avg. Mx SRAM Avg. PWL PWL PWL PWL4 PWL5 PWL PWL5 Piece wise linear leakage odel nae Linear leakage odel errors for c755 and SRAM. intuitively understood fro Figure, which shows the noralized leakage power consuption of two circuits (a cobinational circuit benchark c755 [4] and SRAM [5]) as a function of teperature. For each circuit, we can copare linear and three-segent piecewise linear (PWL ) odels with HSPICE siulation results for the 65 n predictive technology odel [6]. Within the noral operating teperature ranges of any ICs, 55 C 85 C, even a linear odel is fairly accurate. This accuracy can be further iproved by using a piece-wise linear odel. Accuracy iproves with segent count although, in practice, only a few segents are needed. If a continuous leakage function is available, e.g., via curve fitting to easured or siulated results, the first and second ters of its Taylor series expansion at the average teperature of the IC or subcircuit of interest can be used to provide a derivative-based linear odel at the reference teperature of interest. Figure shows average and axiu leakage odel error as functions of piece-wise linear odel segent count for the sae two circuits considered in Figure. Coparisons with HSPICE siulation are used to copute error. Leakage was odeled in the IC teperature range of 5 C C. Within each piece-wise linear region, a linear leakage odel is derived at the average teperature of this region using Euation 7. The accuracy peritted by the piecewise linear odel is deterined by the granularity of the regions. Figure shows that odeling error decreases as the nuber of linear segents increases. For three or ore segents, the axiu errors are less than.69% and.47% for c755 and SRAM, respectively. These results deonstrate that coarse-grained piece-wise linear odels perit good leakage estiation accuracy. Finer granularity or differentiation of curve fitted continuous functions will generally further iprove accuracy, at the cost of increased coplexity.

Rilat Rilat Ptot Rilat Fig. 4. R iver Layer Layer n Rilat T () T (n) Ptot P T () Ptot T (n) Steady-state power distribution odel. + - T A III. THERMAL MODEL AND PROPERTIES This section introduces the theral odel typically used in detailed teperature-aware IC leakage estiation and explains the properties of IC cooling solutions that perit use of the proposed leakage analysis techniue. III.A. Theral Model Introduction To conduct nuerical theral analysis, the IC chip and package are partitioned into nuerous eleents. This perits heat flow to be odeled in the sae anner as electrical current in a distributed RC network [7], [8]. where C d ~ T (t) dt P = A ~ T (t) ~p U(t) () C is an n n diagonal theral capacitance atrix, A is an n n theral conductance atrix, T ~ (t) = [T T A; T T A; ; T n T A] T is the teperature vector in which T A is the abient teperature, ~p = [p ; p ; ; p n] T is the power vector, and U(t) is a step function. In steady-state theral analysis, the theral profile does not vary with tie. Therefore, we can denote li t!inf T ~ (t) as T ~, allowing Euation to be siplified as follows: ~p = A ~ T = 64 a a : : : a n a a : : : a n........ a n a n : : : a nn 75 ~ T The theral resistance atrix R is the inversion of theral conductance atrix, i.e., R = A. III.B. Insensitivity to Power Profile Clai and Proof A typical IC theral odel is shown in Fig 4. In order to accurately odel spatial teperature variation, several layers of theral eleents are generally necessary between the active layer and heat sink to perit accurate theral analysis. Assuing an IC floorplan within which the active layer is divided into isotheral blocks, blk i ; i ; ; ;, the P teperature, area, and power consuption of blk i are expressed as T i, s i, and p i. The total power consuption of the chip is P tot = pi. The atrix, S, holds the values of vector ~s, [s ; s ; ; s ; ; s n] along its diagonal. We now prove that a useful property of IC cooling solutions perits use of the proposed leakage estiation techniue. Theore (Su of Products Area-Teperature Conservation): For all IC cooling configurations, as long as the total power input is P constant, the su of the IC area-teperature product in the active layer, siti, is constant if and only if each power source has the sae ipact on the average teperature of the active layer. That is, the subblock of area-weighted theral resistance atrix S R associated with the active layer should have the eual colun su property. The theore can also be expressed as follows: s it i P tot () 8R j; R j = R const () P where R j = sirij. Rij is the ith row and jth colun ite of the theral resistance atrix R, and R const is a constant decided by the aterial and thickness of the chip. Proof: Assuing the following condition holds, the sufficiency of the theore can be proven. s it i = 8R j : R j = R const () j= s ir ijp j = j= R jp j () According to Condition, Euation can be rewritten as: s it i = R const P tot (4) P Therefore, if Condition holds, the su of each block s areateperature product siti P in the active layer keeps constant, as long as the total power input is constant. In particular the su of area teperature products, siti = Stot Tavg, i.e., the areaaverage teperature product of the IC, reains constant. Next, we prove the necessity of the theore. If Condition does P not hold, the su of each block s area-teperature product siti in the active layer does not reain constant with changing power profile, even if total power consuption is constant. Assue, without loss of P P generality, there are regions with high and low theral ipact on the active layer: R high = sirij; j ; ; ;, and R low = sirij; j + ; ;. The total power can be divided into two parts accordingly, P tot = P high + P low. Thus, the su of area-teperature product can be expressed as follows: s it i = X j= R high p j + j=+ R low p j = R high P high + R low P low (5) Even if P tot is constant, it is clear that a differing ratio between P high and P low akes the su of area-teperature products different. Necessity is proved. We will show that for a typical ultiple layer IC and cooling configuration, the sufficient and necessary conditions for the Theore are satisfied, based on the following assuptions: ) All heat generated in the active layer flows eventually to the abient through the top of the heatsink or the botto of the package, i.e., no heat flows the sides of the silicon and ) All layers either have the sae area or are isotheral. We will later deonstrate that these assuptions are well satisfied for a wide range of ICs. Due to space constraints, we can only suarize the proof that these assuptions perit the use of Theore. However, this suary illustrates the reasons for the high accuracy indicated by the results in Section V. We first generate a theral conductance atrix A j for each layer j. A j is clearly a real syetric atrix, in which the su of ites in the ith row (or colun) euals kcon s i, where k con is the silicon theral t die conductivity and t die is the thickness of the layer. We transfor A j to B j by factoring the area of each block s i out of atrix A j using atrix S j. We prove that atrix B j has the eual colun su property and that the su is kcon t die. For atrix M, with the eual colun su property, it is easy to prove the following properties.

Given that is an arbitrary set of atrices and is the set of all atrices having the eual colun su property, ) @ X MA ^ @ Y M M M A (6) 8 M : 9M ) M (7) For the ultiple layer case, we can prove that the subblock of areaweighted theral resistance atrix S R associated with the active layer can be expressed as a linear cobination of atrices B j fro each layer j. In this way, we prove that the Condition is satisfied. We will further validate the sufficient and necessary conditions under realistic cooling configurations in Section V. IV. TEMPERATURE-AWARE LEAKAGE ESTIMATION This section describes the approach conventionally used for teperature-aware leakage estiation and proposes a new accurate and fast techniue. IV.A. Conventional Approach In the past, ost attepts at teperature-aware leakage power consuption estiation used fine-grained theral analysis to copute leakage power consuption [], []. It can be surised that this is due to the superlinear relationship between leakage and teperature. After partitioning the IC into thousands of theral eleents, the leakage current for each theral eleent is coputed based on the corresponding estiated teperature. The total leakage current is coputed by taking the su of the leakage of all theral eleents. Since the nuber of theral eleents is large, ost coputation tie is spent estiating the detailed theral profile in the conventional approach. This prevents efficient leakage estiation for any candidate solutions during synthesis or early design space exploration. IV.B. Proposed Method In this section, we propose a fast and accurate teperaturedependent leakage estiation ethod. Assue the IC is divided into n isotheral hoogeneous grid eleents, blk i; i ; ; ; n. The teperature, area, and power consuption of each eleent, blk i, are expressed as T i, s i, and p i, respectively. Using the linear leakage odel developed in Section II, the leakage power of blk i is expressed as follows: p blk i leak (Ti) ' VDD linear (Ti) (8) For a subcircuit with unifor design style, the leakage current is proportional to its area, i.e., linear (Ti) / Fisi (9) yielding the following forula: linear (Ti) = Fisi(MiTi + Ni) () where F i is the leakage current per unit area. This value depends on anufacturing technology, design style, supply voltage and input pattern. Since input vectors have a great influence on the leakage current, the leakage current should be an input vector probability weighted one. M i and N i are paraeters obtained by curve fitting in the piece-wise linear odel. Collectively, F i; M i, and N i are referred to as leakage coefficients. If the derivative odel is used, M i and N i are calculated at the estiated T i using the Taylor series expansion techniue developed in Section II. Unifor Case: F i, M i, and N i are decided only by the circuit design style, supply voltage and input pattern. For an IC with unifor design style and supply voltage, such as SRAM and fieldprograable gate arrays (FPGAs), these values are the sae under specific input patterns for all portions of the IC and can be denoted as F tech, and M and N, respectively. Theore can be used to show that linear (Ti) = MF tech (s it i) + F tech N (s i) () = F tech S tot (MT avg + N ) () Therefore, as long as the conditions necessary to use Theore are well satisfied, only a few theral eleents are needed for accurate leakage analysis of the entire IC. This perits highly-efficient leakage estiation. Nonunifor Case: Many ICs are coposed of regions with different design styles, e.g., logic and eory, or with different supply voltages. These regions have different F i, M i, and N i values. In this case, we divide the chip into regions, within which the leakage coefficients are consistent. Therefore, the leakage current for region k is expressed as follows: X linear (Ti) = M kf k (s it i) + F k N k (s i) blkiregk = F k S tot (M k T reg k + N k ) () where T reg k is the average teperature of region k. By suing the leakage current of all regions, the total leakage current is obtained. The use of only one, or a few, theral eleents for each region allows extreely fast theral and leakage analysis. Multiple theral eleents ay also be used in cases for which the IC leakage coefficients are unifor in order to increase estiation accuracy. Finer theral odel granularity iplies saller teperature variations within each theral eleent. Recall that the estiation accuracy of a linear odel depends on deviation between the actual teperature and the reference teperature at which the linear odel was derived. Decreasing the size of a theral eleent decreases the teperature variation within it. Therefore, decreasing theral eleent size decreases the truncation error resulting fro using a linear approxiation of the superlinear leakage function. Our results in Section V indicate that, even given pathological power and teperature profiles, very few theral eleents are reuired for leakage estiation with less than % error. Leakage power consuption influences teperature, which in turn influences leakage power consuption. This feedback can be handled by repeating theral analysis until convergence. This usually reuires only a few iterations for ost ICs. More advanced techniues to odel this feedback loop ay also be devised, but are beyond the scope of this article. V. EXPERIMENTAL RESULTS In this section, we evaluate the accuracy and efficiency of the proposed teperature-dependent leakage estiation techniue, which consists of piece-wise linear leakage odeling and coarse-grained theral analysis. We characterize the two sources of leakage estiation error introduced by this techniue: truncation error as a result of using a linear leakage odel and teperature error as a result of using a coarse-grained theral odel. The base case for coparison is conventional teperature-aware leakage estiation using superlinear leakage odel and fine-grained theral analysis. Our experients deonstrate that for a set of FPGA, SRAM, icroprocessor, and application specific integrated circuit (ASIC) bencharks, the proposed leakage odeling techniue is accurate and perits great increases in efficiency. All bencharks were run on an AMD Athlon-based Linux PC with GB of RAM. V.A. Experiental Setup We use the 65 n predictive technology odel [6], for leakage odeling. This odel characterizes the ipact of teperature on device leakage. We first derive the superlinear leakage odel using

TABLE I LEAKAGE ERROR FOR FPGA DM error CPU tie T avg P tot Speedup Avg. Max. SF DM ( C) (W) (illion ) (%) (%) (s) (s) 4..5 6..6 5 4.9.9 4.7.47 6 7..58 6..6 7..65 6..6 8 5.55.96 6. 9.79 9 8.7.5 6. 9.78 HSPICE siulation. The piece-wise linear leakage odel is then derived using the ethod described in Section II: partitioning the teperature range into unifor segents and using least-suared error fitting for each segent. The derivative-based odel is based on the first two ters of the Taylor series expansion of the superlinear leakage function around the reference teperature of interest. We use HotSpot. [9] for both coarse-grained and fine-grained steady-state theral analysis. HotSpot. supports both block-based coarse-grained and grid-based fine-grained stead-state theral analysis. Previous work [] deonstrated that the coarse-grained blockbased ethod is fast. In contrast, fine-grained grid-based partitioning is slower but perits ore accurate theral analysis. In this work, coarse-grain theral analysis is based on the block-based ethod, as only the average block teperature is reuired. For fine-grained theral odeling, we partition the IC active layer into eleents. This resolution is necessary; decreasing resolution to 5 5 resulted in a 6 C error in peak teperature for the Alpha 64. A resolution of eleents is also sufficient for our bencharks; we have used resolutions up to,, to validate our results and have found that increasing resolution beyond has little ipact on teperature estiation accuracy. V.B. Leakage Power Estiation Table I shows the accuracy and speedup resulting fro using the proposed leakage estiation techniue on an FPGA []. We used six sets of rando power profiles. Six different total power consuptions (Colun ) resulting in different average teperatures (Colun ) were considered. Power profiles were generated by assigning uniforly-distributed rando saples ranging fro [, ] to each cell in a 5 5 array overlaying the IC and then adjusting the power values to reach the target total IC power while aintaining the ratios of power consuptions aong cells. In Section IV we show that the leakage power of an IC with unifor leakage coefficients depends only on total power consuption. To evaluate this clai, we copare the superlinear fine-grained odel (SF) with the single-eleent linear derivative-based odel (DM). At each total power setting, the average estiation error for the randoized power profiles is shown in Colun. As shown in Colun 4, the axiu estiation error was never greater than.%. As shown in Coluns 5 7, the speedup peritted by our techniue ranges fro,47,,79,. This speedup results fro a reduction in theral odel coplexity that greatly accelerates the theral analysis portion of leakage estiation. In addition to considering odeling accuracy for unifor leakage coefficients in the presence of randoized power profiles, we designed a power profile to deterine the error of the proposed techniue under pathological conditions. In this configuration, all of the power in the IC is consued by a corner block and other blocks consue no power. The total power input is set to 7 W, leading to an extreely unbalanced theral profile. Teperatures ranged fro 5.85 C to 6.85 C. This case goes well beyond what can be expected in practice, but serves to establish a bound on the estiation error of the proposed approach. Figure 5 shows the leakage estiation error as a function of theral odeling granularity for piece-wise linear theral odels with various nubers of segents and a linear odel based on Fig. 5. Leakage estiation error (%) 4.5.5.5.5 PWL PWL 5 PWL Derivative 5 5 Theral odeling granularity Leakage estiation error of FPGA under worst-case power profile. TABLE II LEAKAGE ERROR FOR ALPHA 64 Benchark gcc euake esa gzip art bzip twolf PWL 5.5.7.5.4.4.45.65 Error (%) DM.54.64.5.48.56.47.57 Speedup (thousand ) 59 67 65 8 66 67 66 the derivative of the continuous leakage function at the block s predicted teperature. Using the sae one-segent linear odel for all blocks (PWL ) results in approxiately % estiation error. However, piece-wise linear odels with five or ore segents, and the derivative-based odel, all aintain errors of less than.5%, as long as at least four theral eleents are used. Note that the derivative based odel is not identical to a piece-wise linear odel in which the nuber of segents approaches infinity because the piecewise linear odel is fit to the leakage function using a least-suared error iniizer while the derivative based odel is based on the Taylor series expansion around a single teperature. Therefore, it is possible for the piece-wise linear odel to result in higher accuracy in soe cases. Fro these data, we can conclude that even when faced with extree power profiles, only a few theral eleents are necessary to perit high leakage power estiation accuracy. In addition to considering ICs with unifor design styles, e.g., FPGAs, we have evaluated the proposed techniue when used on the Alpha 64 processor, an IC having regions with different sets of leakage coefficients, e.g., control logic, datapath, and eory. Power traces were generated using the Wattch power/perforance siulator [] running SPEC progras. One theral eleent is used for each functional unit in the processor. Table II shows results for five-segent piece-wise linear (PWL 5) and derivativebased (DM) leakage odels. Row 4 shows that reducing theral odel coplexity results in leakage estiation speedups ranging fro 59,59 8,965. As Rows and show, derivative-based and piece-wise linear odel leakage estiation errors are less than % for all bencharks, copared with an HSPICE-based superlinear leakage odel used with fine-grained theral analysis. This sall error has two coponents: truncation error resulting fro coarse-grained theral odeling and slight deviation of real cooling structures fro the conditions stated in Theore. We now discuss the conditions reuired by Theore. V.C. Theral Model Error Breakdown In Section III, we showed that the necessary and sufficient conditions for Theore hold under reasonable assuptions. IC cooling structures approxiately confor to the assuptions reuired for su of products area-teperature conservation to hold, e.g., uch ore heat leaves an IC and package through the heatsink than through the sides of the silicon die. However, they do not perfectly confor, e.g., soe heat can leave the syste through the sides of the die. We now evaluate the error resulting fro approxiating the

TABLE III P n siti WITH DIFFERENT POWER PROFILES T avg ( C) SATP Error (%) SATP Error (%) SATP Error (%) SATP Error (%) Avg. Max. Avg. Max. Avg. Max. Avg. Max. FPGA SRAM EV6 HP 4.6.9..8....547 5.57.75.97..99.5.85.458 6.99..89.47.8.4.9.6 7.45.69.8.6.6..68.9 8.78.7.7.47.8.89.77.788 9.4.8.44.57.4.54.5.9 Maxiu teperature estiation error (%) Fig. 6...8.6.4. EV6 FPGA SRAM HP 4 5 6 7 8 9 Average IC teperature ( C) Theral error breakdown aong different types of ICs. conditions reuired to use Theore. We use several ICs with differing floorplans: FPGA, SRAM [], Alpha 64, and HP, an ASIC benchark fro MCNC benchark suite [4], to copare su of area-teperature products (SATP) values given different power profiles. For each IC, SATP is calculated for randoized power profiles, which are generated in sae way as those for Table I. Each IC has a different area. Therefore, total power consuption values were chosen to produce each of the six reported average teperatures. Table III shows axiu and average differences between the SATP values for the rando power profiles and the SATP value for a unifor power profile. Fro these results, we can conclude that the SATP error is less than.6% for all four benchark ICs. We also coputed SATP error for the unbalanced worst-case power FPGA profile used in Figure 5. The worst-case error is saller than.5% for all theral odel granularities. We conclude that the conditions reuired to use Theore are well-satisfied for a wide range of ICs. Although we have shown that the properties reuired to use Theore are well-approxiated for a nuber of ICs, we have yet to show the iplications of this observation upon teperature estiation accuracy. We partition the IC into blocks, each of which corresponds to a region with unifor leakage coefficients, and copare the average block teperatures with those calculated by using a finegrained theral odel. Figure 6 shows the axiu teperature estiation error as a function of average IC teperature for the sae set of bencharks shown in Table III. Error is coputed on the Kelvin scale. Figure 6 shows that the axiu teperature estiation error over all power profiles is less than.%. For the Alpha 64 processor we also calculated the teperature differences using power traces fro SPEC applications. In all cases, the average teperature difference is less than.6%. Fro this, we can conclude that using a coarse-grained theral odel is sufficient for IC leakage power consuption estiation. VI. CONCLUSIONS This article has presented an extreely fast and accurate ethod of estiating IC leakage power consuption during design and synthesis. This idea allows a speedup of 59,59,79, while aintaining accuracy copared with a conventional teperatureaware leakage estiation techniue using a detailed theral odel. The proposed techniue s accuracy is proven based on two observations: () leakage ay be accurately odeled as a linear function of teperature over the operating teperature ranges of real functional units and () given a fixed total power consuption, the average teperature of an IC active layer is ostly independent of the power distribution. Its accuracy is further validated via nuerous coparisons with results fro detailed theral odeling. The proposed techniue can easily be used in coercial or acadeic synthesis and design flows in order to accelerate accurate teperature-dependent leakage power consuption estiation. REFERENCES [] International Technology Roadap for Seiconductors, 5, http: //public.itrs.net. [] S. Naffziger, et al., The ipleentation of a -core, ulti-threaded itaniu faily processor, J. Solid-State Circuits, vol. 4, no., pp. 97 9, Jan. 6. [] J. A. Butts and G. S. Sohi, A static power odel for architects, in Proc. Int. Syp. Microarchitecture, Dec., pp. 9. [4] S. M. Martin, et al., Cobined dynaic voltage scaling and adaptive body biasing for lower power icroprocessors under dynaic workloads, in Proc. Int. Conf. Coputer-Aided Design, Nov., pp. 7 75. [5] S. Narendra, et al., Full-chip subthreshold leakage power prediction and reduction techniues for sub-.8 CMOS, J. Solid-State Circuits, vol. 9, no., pp. 5 5, Feb. 4. [6] Y. F. Tsai, et al., Characterization and odeling of run-tie techniues for leakage power reduction, IEEE Trans. VLSI Systes, vol., no., pp., Nov. 4. [7] A. Abdollahi, F. Fallah, and M. Pedra, Leakage current reduction in CMOS VLSI circuits by input vector control, IEEE Trans. VLSI Systes, vol., no., pp. 4 54, Feb. 4. [8] K. Roy, S. Mukhopadhyay, and H. Mahoodi-Meiand, Leakage current echaniss and leakage reduction techniues in deepsubicroeter CMOS circuits, Proc. IEEE, vol. 9, no., pp. 5 7, Feb.. [9] Y. Zhang, et al., HotLeakage: A teperature-aware odel of subthreshold and gate leakage for architects, Univ. of Virginia, Tech. Rep., May, CS--5. [] H. Su, et al., Full chip leakage estiation considering power supply and teperature variations, in Proc. Int. Syp. Low Power Electronics & Design, Aug., pp. 78 8. [] W. P. Liao, L. He, and K. M. Lepak, Teperature and supply voltage aware perforance and power odeling at icroarchitecture level, IEEE Trans. Coputer-Aided Design of Integrated Circuits and Systes, vol. 4, no. 7, pp. 4 5, July 5. [] A. Chandrakasan, W. Bowhill, and F. Fox, Design of High-Perforance Microprocessor Circuits. IEEE Press,. [] K. M. Cao, et al., BSIM4 gate leakage odel including source-drain partition, in IEDM Technology Dig., Dec., pp. 85 88. [4] ISCAS85 bencharks suite, http://www.visc.vt.edu/ hsiao/iscas85. htl. [5] F. Zhang, Syste-level leakage power odeling ethodology, Dept. of Electronics Engg., Tsinghua University, Bachelor s Degree Thesis, July 6. [6] W. Zhao and Y. Cao, New generation of predictive technology odel for sub-45n design exploration, in Proc. Int. Syp. Quality of Electronic Design, Mar. 6, pp. 585 59. [7] G. S. Oh, The Galvanic circuit investigated atheatically, 87. [8] J. Fourier, The Analytical Theory of Heat, 8. [9] K. Skadron, et al., Teperature-aware icroarchitecture, in Proc. Int. Syp. Coputer Architecture, June, pp.. [] W. Huang, et al., HotSpot: A copact theral odeling ethodology for early-stage VLSI design, IEEE Trans. VLSI Systes, vol. 4, no. 5, pp. 5 54, May 6. [] I. C. Kuon, Autoated FPGA design verification and layout, Ph.D. dissertation, Dept. of Electrical and Coputer Engg., University of Toronto, July 4. [] D. Brooks, V. Tiwari, and M. Martonosi, Wattch: A fraework for architectural-level power analysis and optiizations, in Proc. Int. Syp. Coputer Architecture, June, pp. 8 94. [] SRAM layout, SRAM link at http://www.eecs.uich.edu/umichmp/ Presentations. [4] MCNC bencharks suite, http://www.cse.ucsc.edu/research/surf/ GSRC/MCNCbench.htl.