Novel Modeling Techniques for RTL Power Estimation

Similar documents
A New Space-Repetition Code Based on One Bit Feedback Compared to Alamouti Space-Time Code

CHAPTER 5 A NEAR-LOSSLESS RUN-LENGTH CODER

Analysis and Optimization Design of Snubber Cricuit for Isolated DC-DC Converters in DC Power Grid

Test Time Minimization for Hybrid BIST with Test Pattern Broadcasting

Combined Scheme for Fast PN Code Acquisition

A SELECTIVE POINTER FORWARDING STRATEGY FOR LOCATION TRACKING IN PERSONAL COMMUNICATION SYSTEMS

Compound Controller for DC Motor Servo System Based on Inner-Loop Extended State Observer

Super J-MOS Low Power Loss Superjunction MOSFETs

PROJECT #2 GENERIC ROBOT SIMULATOR

DIGITALLY TUNED SINUSOIDAL OSCILLATOR USING MULTIPLE- OUTPUT CURRENT OPERATIONAL AMPLIFIER FOR APPLICATIONS IN HIGH STABLE ACOUSTICAL GENERATORS

APPLICATION NOTE UNDERSTANDING EFFECTIVE BITS

Hybrid BIST Optimization for Core-based Systems with Test Pattern Broadcasting

Single Bit DACs in a Nutshell. Part I DAC Basics

Cross-Layer Performance of a Distributed Real-Time MAC Protocol Supporting Variable Bit Rate Multiclass Services in WPANs

Logarithms APPENDIX IV. 265 Appendix

A New Design of Log-Periodic Dipole Array (LPDA) Antenna

Measurement of Equivalent Input Distortion AN 20

Application of Improved Genetic Algorithm to Two-side Assembly Line Balancing

Reducing Power Dissipation in Complex Digital Filters by using the Quadratic Residue Number System Λ

Problem of calculating time delay between pulse arrivals

x y z HD(x, y) + HD(y, z) HD(x, z)

High Speed Area Efficient Modulo 2 1

A New Basic Unit for Cascaded Multilevel Inverters with the Capability of Reducing the Number of Switches

Design of FPGA- Based SPWM Single Phase Full-Bridge Inverter

INCREASE OF STRAIN GAGE OUTPUT VOLTAGE SIGNALS ACCURACY USING VIRTUAL INSTRUMENT WITH HARMONIC EXCITATION

MEASUREMENT AND CONTORL OF TOTAL HARMONIC DISTORTION IN FREQUENCY RANGE 0,02-10KHZ.

Radar emitter recognition method based on AdaBoost and decision tree Tang Xiaojing1, a, Chen Weigao1 and Zhu Weigang1 1

Counting on r-fibonacci Numbers

Design of FPGA Based SPWM Single Phase Inverter

AC : USING ELLIPTIC INTEGRALS AND FUNCTIONS TO STUDY LARGE-AMPLITUDE OSCILLATIONS OF A PENDULUM

X-Bar and S-Squared Charts

OPTIMIZATION OF RNS FIR FILTERS FOR 6-INPUTS LUT BASED FPGAS

Survey of Low Power Techniques for ROMs

Measurements of the Communications Environment in Medium Voltage Power Distribution Lines for Wide-Band Power Line Communications

Fingerprint Classification Based on Directional Image Constructed Using Wavelet Transform Domains

ELEC 204 Digital Systems Design

A study on the efficient compression algorithm of the voice/data integrated multiplexer

Analysis of SDR GNSS Using MATLAB

(2) The MOSFET. Review of. Learning Outcome. (Metal-Oxide-Semiconductor Field Effect Transistor) 2.0) Field Effect Transistor (FET)

Tehrani N Journal of Scientific and Engineering Research, 2018, 5(7):1-7

Delta- Sigma Modulator with Signal Dependant Feedback Gain

Data Acquisition System for Electric Vehicle s Driving Motor Test Bench Based on VC++ *

Technical Explanation for Counters

A New Energy Efficient Data Gathering Approach in Wireless Sensor Networks

SIDELOBE SUPPRESSION IN OFDM SYSTEMS

28.3. Kaushik Roy Dept. of ECE, Purdue University W. Lafayette, IN 47907, U. S. A.

THE OCCURRENCE OF TRANSIENT FIELDS AND ESD IN TYPICAL SELECTED AREAS

f Sum(n) = 6 8 δ(n)+ 1 8 δ(n 1)+ 1 δ(n+1) (1) f Sum(n) = 2 8 δ(n)+ 1 8 δ(n 1)+ 4 8 δ(n 2)+ 1 δ(n 3) (2) n f Sum (n) 0 e 0 = 1 p ap b p c p ap b p c

A SIMPLE METHOD OF GOAL DIRECTED LOSSY SYNTHESIS AND NETWORK OPTIMIZATION

A PLANE WAVE MONTE CARLO SIMULATION METHOD FOR REVERBERATION CHAMBERS

Sensors & Transducers 2015 by IFSA Publishing, S. L.

A SIMPLE METHOD OF GOAL DIRECTED LOSSY SYNTHESIS AND NETWORK OPTIMIZATION

Estimation of reflection location by the correlation coefficient function

An Adaptive Image Denoising Method based on Thresholding

Intermediate Information Structures

High-Order CCII-Based Mixed-Mode Universal Filter

Unit 5: Estimating with Confidence

Comparison of Frequency Offset Estimation Methods for OFDM Burst Transmission in the Selective Fading Channels

4. INTERSYMBOL INTERFERENCE

Lab 2: Common Source Amplifier.

HDL LIBRARY OF PROCESSING UNITS FOR GENERIC AND DVB-S2 LDPC DECODING

On Parity based Divide and Conquer Recursive Functions

A Low Spurious Level Fractional-N Frequency Divider Based on a DDS-like Phase Accumulation Operation

Optimal Arrangement of Buoys Observable by Means of Radar

General Model :Algorithms in the Real World. Applications. Block Codes

PHY-MAC dialogue with Multi-Packet Reception

CHAPTER 6 IMPLEMENTATION OF DIGITAL FIR FILTER

Outline. Motivation. Analog Functional Testing in Mixed-Signal Systems. Motivation and Background. Built-In Self-Test Architecture

Roberto s Notes on Infinite Series Chapter 1: Series Section 2. Infinite series

CS 201: Adversary arguments. This handout presents two lower bounds for selection problems using adversary arguments ëknu73,

Novel pseudo random number generation using variant logic framework

Message Scheduling for the FlexRay Protocol: The Dynamic Segment

7. Counting Measure. Definitions and Basic Properties

H2 Mathematics Pure Mathematics Section A Comprehensive Checklist of Concepts and Skills by Mr Wee Wen Shih. Visit: wenshih.wordpress.

Reconfigurable architecture of RNS based high speed FIR filter

ELEC 350 Electronics I Fall 2014

Joint Power Allocation and Beamforming for Cooperative Networks

Performance analysis of NAND and NOR logic using 14nm technology node

Summary of Random Variable Concepts April 19, 2000

Antenna Diversity Techniques for a Single Carrier System with Frequency Domain Equalization An Overview

Harnessing oversampling in correlation-coded OTDR

Permutation Enumeration

AN ESTIMATION OF MULTILEVEL INVERTER FED INDUCTION MOTOR DRIVE

Computational Algorithm for Higher Order Legendre Polynomial and Gaussian Quadrature Method

SHORT-TERM TRAVEL TIME PREDICTION USING A NEURAL NETWORK

Efficient Feedback-Based Scheduling Policies for Chunked Network Codes over Networks with Loss and Delay

A generalization of Eulerian numbers via rook placements

A Novel Three Value Logic for Computing Purposes

A Novel Small Signal Power Line Quality Measurement System

Spread Spectrum Signal for Digital Communications

HOW BAD RECEIVER COORDINATES CAN AFFECT GPS TIMING

LETTER A Novel Adaptive Channel Estimation Scheme for DS-CDMA

Efficient Energy Consumption Scheduling: Towards Effective Load Leveling

Ch 9 Sequences, Series, and Probability

WAVE-BASED TRANSIENT ANALYSIS USING BLOCK NEWTON-JACOBI

Maximum Voltage Variation in the Power Distribution Network of VLSI Circuits with RLC Models Λ

Cross-Entropy-Based Sign-Selection Algorithms for Peak-to-Average Power Ratio Reduction of OFDM Systems

An Optimal Test Pattern Selection Method to Improve the Defect Coverage

Sapana P. Dubey. (Department of applied mathematics,piet, Nagpur,India) I. INTRODUCTION

Assessment of Soil Parameter Estimation Errors for Fusion of Multichannel Radar Measurements

Transcription:

Novel Modelig Techiques for RTL Power Estimatio Michael Eierma Walter Stechele Istitute for Itegrated Circuits Istitute for Itegrated Circuits Techical Uiversity of Muich Techical Uiversity of Muich Arcisstr. 21, 829 Mueche, Germay Arcisstr. 21, 829 Mueche, Germay Phoe +49 89 2892384 Phoe +49 89 28923862 m.eierma@ei.tum.de w.stechele@ei.tum.de ABSTRACT I this work, we propose efficiet macromodelig techiques for RTL power estimatio, based oly o word ad bit level switchig iformatio of the module iputs. We preset practicable combiatios of these two properties for the costructio of power macromodels. It is demostrated, that our developed models reduce the estimatio error compared to the Hammig-distace model at least by 64%. The total average errors (compared to PowerMill) achieved over a wide rage of test modules ad iput stimuli are less tha 4.6%. This is comparable to complex models, which however, have to make use of several more sigal properties. Categories ad Subject Descriptors I.6.5 [Simulatio ad Modelig]: Model Developmet - modelig methodologies. Geeral Terms Desig, Experimetatio, Verificatio. Keywords Power estimatio, power modelig, RTL macromodels, low power. 1. INTRODUCTION I recet years, power cosumptio has become a key parameter i the desig of itegrated circuits (ICs). This is maily due to the ever icreasig itegratio desity, which eables the fuctioality ad the performace of ICs to improve dramatically. Higher complexity ad higher performace ievitably lead to a icrease of power cosumptio, if stadard desig methodologies are applied. Istead, i order to ehace the ru-time of battery-operated portable applicatios, ICs have to be optimized with respect to power cosumptio. This also helps to esure reliable operatio ad to reduce the cost for packagig ad coolig. Permissio to make digital or hard copies of all or part of this work for persoal or classroom use is grated without fee provided that copies are ot made or distributed for profit or commercial advatage ad that copies bear this otice ad the full citatio o the first page. To copy otherwise, or republish, to post o servers or to redistribute to lists, requires prior specific permissio ad/or a fee. ISLPED 2, August 12-14, 22, Moterey, Califoria, USA. Copyright 22 ACM 1-58113-475-4/2/8...$5.. I order to maage the risig complexity of today s chips, the desig process has to be started o a very high level of abstractio. At those early desig phases power optimizatio opportuities are sigificatly larger tha i later steps. Such optimizatio tasks have to be validated with respect to the yield for power reductio. For this purpose, power estimatio tools are eeded, but ufortuately stadard tools oly exist for gate level ad lower levels. Estimatig power at gate or trasistor level is very time cosumig. Therefore, a lot of techiques for high level power estimatio have bee proposed i the past years, most of them for the register trasfer level (see [8][9][11] for a survey). Beside characterizatio-free iformatio-theoretic approaches (based oly o the iput-output fuctioality of a module) e.g. [1], the mai strategy o RT level, targets o buildig power models for the used modules. This meas, for every submodule type of a RTL desig, the template power model parameters have to be ivestigated by performig a umber of simulatio experimets at lower levels of abstractio. Oce the model is characterized, power estimatio ca be carried out by weightig the model parameters with the actual sigal properties geerated from ruig a behavioral simulatio. A wide rage of differet approaches for power modelig ca be foud i literature [8][9][11]. The model s power properties are either stored ito a multi-dimesioal look-up table (table based) e.g. [7] or they ca be expressed through a equatio (equatio-based) by usig regressio methods e.g. [3]. Further, the techiques are distiguished accordig to their applicatio. I some case cumulative (average) power estimatio is isufficiet ad power has to be modeled ad estimated o a cycleby-cycle basis [6][12]. The major differece betwee the approaches, however, ca be see i the kid ad umber of sigal properties used for characterizatio ad estimatio. Nearly all models are activity-sesitive, which meas power is expressed as a fuctio of iput (ad output) switchig activity. I order to improve accuracy, some models cosider iput sigal probability [4], while other methods additioally use spatial correlatio of the iput sigal [7]. Clearly, the price paid for this improvemet is a higher effort for characterizatio ad estimatio. Our approaches take ito accout oly the iput switchig property, however i a specific way. We do ot oly cosider the umber of switchig iputs (Hammig-distace of two cosecutive iput vectors), but we also regard the idividual iputs, which take part i the switchig. Experimetal results demostrate, that usig these ovel models, the estimatio accuracy will be i the same rage as models which, also cosider other sigal properties. The remaider of the paper is structured as follows. I Sectio 2, our modelig approaches are described i detail. The model characterizatio ad validatio process are preseted i Sectio 3 ad 4, respectively. Results are give ad discussed i Sectio 5. Fially, cocludig remarks are provided.

2. THE PROPOSED NOVEL MODELING APPROACH First, we state out some assumptios about the coditios o RT level. I geeral, combiatioal modules are surrouded by registers. Thus, the iput sigals are treated as ideal, which meas there is oly oe trasitio per bit ad clock cycle, of oe of the valid types {L-L, L-H, H-L, H-H}. All iput trasitios of a module occur at the same time ad have the same rise ad fall times. 2.1 Statemet of the Problem The power cosumptio of a module ca be exactly modeled by assigig a eergy value to every possible iput trasitio ad storig them i a look-up table (LUT). If V t 1 ad V t represet the iput sigal vectors i cycle t 1 ad t, respectively, the eergy ad power P t i cycle t ca be expressed by E[ V t 1, V t ] ad P t T, (1) respectively, where E[ ] deotes the eergy LUT ad T the clock period. The relatioship betwee the cycle power P t ad the average power P avg for a sequece of M cycles is give by M P avg 1 M P t. (2) t 1 The umber of LUT etries N for a -bit iput vector is N 2 2. 9 A 16-bit iput module e.g. would have 4.3 1 etries. Due to the effort of geeratig ad storig the eergy data, this ca oly be doe for very small modules. Therefore, the geeral approach must be to reduced the size of the LUT by developig models based o more abstract sigal properties. 2.2 Basic Power Depedecies Whe we cosider RTL modules, the iputs ca be divided ito cotrol ad data iputs. Cocerig power dissipatio, the two types of iputs behave very differetly. I geeral, for cotrol iputs the sigal state is the decisive value for power cosumptio, while for data iputs it is their switchig activity. Due to the usually small umber of cotrol iputs for typical RTL data path modules, we decided to use separate model parameter sets for each valid state of the cotrol sigals. Therefore, i the remaider of this paper we cofie our ivestigatios oly o data iputs. I order to further reduce the LUT etries, abstractios are coceived i the followig way: Istead of cosiderig the real sigal chages, we distiguish oly betwee whether or ot a iput bit trasitio takes place. For a -bit data word, this leads to a switchig word i cycle t SW t ( sb 1, t, sb 2, t,, sb 1, t, sb, t ), (3) where each sb i, t represets the switchig of bit i i cycle t. Possible values for sb i, t are 1 or, switchig or ot. Usig oly this iformatio, the umber of LUT etries are reduced from 2 2 to 2, which is still too high. Note, all these abstractios do ot oly reduce the modelig complexity, but they also decrease the accuracy. Therefore, there will be a trade-off betwee effort ad accuracy! Further abstractios of the switchig vector leads to the followig two alterative approaches: 2.2.1 Relatig the power to the word level switchig A techique to reduce the complexity, used e.g. i [4] (beside other properties), relates the eergy to the umber of simultaeously switchig iput bits (Hammig-distace of two cosecutive iput vectors). The etries of the eergy LUT E w are reduced to the maximum umber of switchig bits + 1. The eergy of cycle t is easily expressed by E w [ sw t ], with sw t sb i, t. (4) i 1 where sw t is the total umber of word level switchig bits i cycle t. I priciple, the word level switchig ca be take as a useful measure for average switchig eergy. The idividual values, however, ca be quite differet from that, which ca be see by the compariso of the average, the stadard deviatio ad the total rage of switchig eergy i Figure 1. This is due to the abstractio, switchig eergy (pj) 1 8 6 4 2 2 4 6 8 umber of word level switchig bits Figure 1. Average (bold), deviatio (error bars) ad total rage (dotted) of the switchig eergy depedet o the word level switchig for a 8x11 bit vector adder where oly the mea of the eergy per umber of switchig bits is stored i the eergy LUT, while the behavior of the idividual bits ivolved i switchig remais ucosidered. The depedece of the eergy o the idividual iput bits is exemplified i Figure 2 for switchig eergy (pj) 3 2 1 2 4 6 8 iput pi positio Figure 2. Eergy cotributios of the idividual iput bits for a 8x11 bit vector adder

the same module. Both diagrams have bee created by performig PowerMill simulatios, at least 2 cycles per word level switchig umber. 2.2.2 Relatig the power to the bit level switchig The other alterative is to relate the eergy to the switchig of the sigle iput bits (e.g. bitwise data model i [9]). The total eergy of cycle t is give by sb i, t E, (5) b [ i] i 1 where, E b ad sb i, t deote the bit-level eergy LUT ad the bitlevel switchig of bit i i cycle t, respectively. The etries of the E b -table are determied from a umber of lower level power experimets, o which the least mea squares fittig method is applied (the values of Figure 2 have bee created i this way). Accordig to our ivestigatios, the estimatio accuracy teds to be very sesitive to the switchig properties used durig that characterizatio process. I particular, the estimatio error will be acceptable, if the actual average word level switchig is similar to that, applied durig characterizatio. I Figure 3, the characterizatio is optimized for medium umbers of word level switchig. The estimatios for those switchig properties are quite good, however for lower ad higher values, there will be a uder estimatio ad over estimatio, respectively. This is due to the assumptio i (5), that treats the sigle eergy cotributios as idepedet of the total umber of switchig pis, which is ot accurate. estimatio error (%) 75 5 25 25 5 75 2 4 6 8 umber of word level switchig bits Figure 3. Estimatio error oly due to the differece i the average word level switchig betwee the characterizatio ad the estimatio for a 8x11 bit vector adder 2.3 Our Modelig Approaches As a cosequece, we propose the combiatio of both alteratives to overcome the particular deficiecies. Thus, we costructed several models that utilize bit as well as word level switchig properties. Due to the limited space, we caot discuss all ivestigated combiatios, but focus o the very promisig techiques. 2.3.1 Subword model The aim of this approach is to improve accuracy of the eergy model relatig to word level switchig by subdividig all iput bits ito subwords or groups. For every subword the umber of switchig bits is evaluated separately. Therefore, each possible cofiguratio of subword switchig umbers requires a eergy etry i a multidimesioal LUT. The estimatio is oly based o a table look-up accordig to the determied subword switchig cofiguratio for each cycle, e.g. the eergy for cycle t is E sub [ ssub 1, t, ssub 2, t,, ssub g 1, t, ssub g, t ], (6) where E sub, ssub i, t ad g are the subword eergy LUT, the umber of switchig bits of subword i i cycle t, ad the umber of subwords, respectively. Usig the switchig activity of two iput buses istead of usig the whole iput switchig has also bee proposed i [5], where they additioally used the iput sigal probability. Accordig to their published ad our results, this model approach works well for small, regular modules (i [5] arithmetic modules with two iput buses have bee used). However, for modules with more tha 5 iput bits ad more tha two iput buses, the estimatio error icreases. Further, we ivestigated the depedecy of the accuracy o the subdividig strategies for the iput pis. Amog the subdivisio criterios: A) by the iput-output delay (from the static timig aalysis), B) by the bit positio o the iput buses (LSB.. MSB), C) by the logic iput buses, we foud the last oe as the best by experimets. 2.3.2 Ehaced model relatig to sigle bit switchig Similar to Sectio 2.2.2, this model relates the eergy to the bit level switchig. However, to improve the model s accuracy, a adjustig factor is used, which depeds o the word level switchig property. The eergy cosumptio of cycle t is expressed by c sgl [ sw t ] sb i, t E [ i], (7) i 1 sgl where, c sgl ad E sgl are the LUTs for the adjustig factors ad the bit level eergy, while sw t ad sb i, t deote the word ad bit level switchig properties for cycle t, respectively. Each eergy LUT etry is determied durig the characterizatio process by the average of a umber of sigle bit switchig experimets for the correspodig bit, where the states of the remaiig bits differ. Sigle bit switchig has bee chose, because this allows the strogest distictio betwee the differet bit switchig eergy cotributios (cf. Figure 4 vs. Figure 2). The LUT for the adjustig factors for each word level switchig umber is simply determied by replacig i (7) by the true eergy E act, t ad solve the equatio for switchig eergy (pj) 8 6 4 2 2 4 6 8 iput pi positio Figure 4. Sigle bit switchig eergies depedet o the iput bit positio for a 8x11 bit vector adder

the correspodig c sgl etry. The mea for a umber of experimets is take as the coefficiet (see Figure 5). However, it has adjustig factor 1..8.6.4.2 2 4 6 8 umber of word level switchig bits Figure 5. Adjustig factors for a 8x11 bit vector adder bee foud out, that the bit-level eergy cotributios have to be adjusted differetly, depedig o their eergy quatity. Cosiderig these observatios, the followig equatio based o higher order expressio of the eergy coefficiets improves the model. k c sgl, o [ sw t ] ( sb i, t E [ i] ) o. (8) o 1 i 1 sgl The adjustig factors i LUT c sgl, o for the k orders ca be foud by applyig the stadard least mea squares method. The accuracy improvemets due to the use of higher order eergy coefficiets are show i Table 1. A large umber of test modules ad test sequeces have bee used to calculate these average errors. For the test setup, see Sectio 4. A value of k 2 or 3 has bee foud to be satisfactory ad leads to improvemets of at least 1%. Table 1: Improvemets through the itroductio of the higher order equatio equatio orders 1st 2d 3rd 4th average errors 5.26 4.73 4.5 4.8 improvemets to 1st order 1% 14% 9% maximum errors 8.91 7.12 8.25 8.46 2.3.3 Ehaced model relatig to bit pair switchig A slight drawback of the previous model ca be see i the fact, that o iterdepedecies betwee switchig bits are cosidered. This ca be obtaied by relatig the eergy to iput bit pairs as opposed to sigle bits. We modify the term i (8) to the followig formulatio k 1 c pair, o [ sw t ] ( s p ij, t E, (9) pair [ i, j] )o o 1 i 1 j i + 1 where s p ij, t becomes 1 oly if bit i ad bit j switch at the same time. E pair [ i, j] represets the eergy coefficiet for the same switchig pair. These eergy coefficiets ad the adjustig factors i LUT c pair, o are determied similarly to the process described i Sectio 2.3.2. The improvemets resultat of usig higher order equatios are about 1% (cf. Table 2). Table 2: Improvemets through the itroductio of the higher order equatio equatio orders 1st 2d 3rd 4th average errors 4.41 3.89 4.3 3.96 improvemets to 1st order 12% 8% 1% maximum errors 8.81 5.3 6.87 6.89 2.3.4 Ehaced regressio model To determie the eergy coefficiets the model is based o, we use the well kow liear regressio method [2], however i a ehaced maer for the additioal cosideratio of the word level switchig properties. For each umber of simultaeously switchig bits we perform a separate least mea squares fittig. The eergy equatio for cycle t is give by sb i, t E i 1 reg, 2D [ sw t, i], (1) with E reg, 2D [ sw t, i] as the 2-dimesioal LUT for the eergy coefficiet depedig o the word level switchig umber sw t ad the switchig bit i. Sice for every value of word level switchig bits a umber of low level power experimets has to be performed, the characterizatio effort icreases. This ca be reduced by the followig process: First, the eergy coefficiets E reg, li [ ] are determied with stadard regressio methods (LMS fittig). This is doe with experimets, where pseudo radom patters are applied, that cover equally the whole rage of switchig properties, bit level as well as word level. I a secod step, for each word level switchig umber sw t the adjustig factor c reg, o is calculated as described i Sectio 2.3.2. The eergy equatio k c reg, o [ sw t ] ( sb i, t E [ i] ) o o 1 i 1 reg, li (11) has the same structure as (8). Note, the eergy coefficiets i E[ ] are differet to the approach i Sectio 2.3.2, because they are determied differetly. Also, the adjustig factors differ. 3. MODEL CHARACTERIZATION As metioed above, the model coefficiets are determied oce withi the characterizatio process, i which the modules are stimulated by well defied characterizatio patters. Lower level power estimators (gate or trasistor level) are used to ascertai the eergy for each iitiated iput trasitio. For that, we use Syopsys PowerMill, because of its capability for cycle based curret estimatio. The characterizatio patter geeratio represets a critical task. For this reaso, we costructed a special sequece sythesizer writte i C. Its mai properties are listed below: the umber of experimets per coefficiet to be characterized ca be chose accordig to its relevace. i order to cut dow simulatio time, all trasitios are arraged i oe cotiuous stream. for the models based o LMS methods, the solvability of the system of equatio is proved i advace.

for every umber of (sub)word level switchig bits, the sigle iputs are covered equally. the order of the patter is chose pseudo radomly, i order to prevet similar sigal probabilities for the same umber of (sub)word level switchig bits. The effort for the characterizatio is show i Table 3. It has bee foud that about 1 experimets are sufficiet for most coefficiets. For the models applyig adjustig factors, which are calculated usig the eergy LUT (e.g. sigle bit switchig eergy), some more experimets per etry for these uderlyig eergy LUTs were performed (1..1). Table 3: Effort for the characterizatio sectio model type umber of coefficiets 2.3.1 subword model 2.3.2 relatig to sigle bit sw. 2.3.3 relatig to bit pair sw. 2.3.4 ehaced regressio usig equatio (1) usig equatio (11) : iput umber; g : subword umber; k : order of equatio 4. MODEL VALIDATION To assess the estimatio accuracy of the proposed models, comparisos to PowerMill simulatios have bee performed for a large umber of module types ad iput sequeces. Thus, 21 testmodules have bee take partly from real desigs ad partly from the Syopsys DesigWare (DW). The properties of the modules are summarized below: gate equivalets: 41.. 3136 umber of data pis: 16.. 88 umber of data buses: 2.. 9 umber of cotrol iputs:.. 2 ( g + 1) g k( 2) + + 2 k( 2) + ( 1) 2 + + 2 ( 1) + 2 k( 2) + + 2 I order to test the models i a extreme way, we sythesized a umber of differet iput test streams for every module, each cotaiig 1 iput patters. These iput sequeces completely differ from those used for the characterizatio. We also made use of the logic iputs buses to which differet switchig activities were applied. The average switchig activities were either the same for all bits of a bus, or they were liearly distributed, ragig from 5(95)% at the LSB to 25(5)% at the MSB (to have realistic test coditios). We also chose very extreme cases where oly oe, oly two or all but oe buses were switchig. Four mai types of test streams have bee used. The idividual streams of each type were distiguished by their switchig properties, which are give below: switchig activities of logic iput buses are distributed for LSB..MSB: 5%..25% or 95%..5% (2 streams) oly oe or all but oe logic iput buses are switchig; others remai early stable; bus activities: 25%, 5% or 75% equal for all bits of a bus or distributed for LSB..MSB: 5%..25% (8..56 streams) oly two logic iput buses are switchig; bus activities distributed for LSB..MSB: 5%..25% (1..36 streams) average switchig activities are equal for all bits 1%, 2%,.., 8% or 9% (9 streams) (LSB/MSB deotes least/most sigificat bit of a logic bus) The umber of streams of each type differ accordig to the umber of the modules iput buses ad the possible cofiguratios resultat from that. For every stream-module-model combiatio, we calculated a separate average relative error ε avg compared to PowerMill usig the followig formula: P ε avg P avg ------------------------------------------- PowerMill, (12) P PowerMill where ε avg is the average power for the stream calculated based o the our model equatios i Sectio 2.3. I order to prevet the compesatio of positive ad egative errors, the absolute values of the relative error ε avg of each stream is take to calculate the mea error of all streams S for every modulemodel combiatio, S ε mea, abs 1 S ε avg, i. (13) i 1 The power estimatio effort are show i Table 4, where the ecessary iteger ad floatig poit operatios are preseted. It ca be see, that the ehaced model relatig o bit pairs switchig has the highest computatioal effort, while the other are approximately equal. sectio 2.3.1 2.3.2 2.3.3 2.3.4 Table 4: Effort for the power estimatio operatios per cycle: oly oce for a sequece: compare icremet floatig poit add ad mult ( s + 1) ( g + 1) g 2s ( 1) + 2 s( s + 1) 2 ( 2) ( 1) 2 + + 2 2s ( 1) + 2 : iput umber; g : subword umber; s : switchig umber 5. RESULTS AND DISCUSSION The fuctioality, the umber of iput pis ad iput buses as well as the size (i gate equivalets) of the test modules are give i Table 5 o the left. I the right colums, results are preseted for the two basic models from Sectio 2.2.1-2 (for referece) ad for our proposed hybrid models from Sectio 2.3.1-4. The mea estimatio errors for each module-model combiatio correspod to the models i the followig maer: r1 based oly o word level switchig (for referece) r2 based oly o bit level switchig (for referece) 1a subword model (2 subwords) 1b subword model (3 subwords) 2 relatig to sigle iput bit switchig (3rd order equatio) 3 relatig to iput pair switchig (2d order equatio) 4 ehaced regressio model From the table, it ca be see, that usig ay of our ovel models 2..4, the average estimatio error has bee reduced at least

Table 5: Desig properties of test modules ad estimatio results for differet models descriptio of the test modules type DesigWare ripple-carry adder DesigWare carry-look-ahead adder DesigWare carry-save multiplier DesigWare wallace-tree multiplier iput# (bus#) estimatio errors ε mea, abs i % for models size r1 r2 1a 1b 2 3 4 16 (2) 41 5. 15.6 4.4 3.4 5. 4.3 5.1 24 (2) 62 5.8 16.3 3.6 3.6 4.7 4.2 4.5 32 (2) 83 3.5 15.6 2. 4.3 3.6 3.2 3.6 16 (2) 54 5.7 12.6 3.8 4.5 5.3 3.9 3.9 24 (2) 83 5.2 12.7 3.6 3.6 4.6 3.8 3.8 32 (2) 119 4. 15. 1.8 3.6 3.6 3.1 3.4 16 (2) 436 5.6 16.1 3.8 4.5 4.8 4.5 4.4 24 (2) 135 4.1 15.9 3.6 3.9 3.8 3.9 3.1 32 (2) 1681 5.3 15.3 6.3 7.2 3.1 3.3 2.7 16 (2) 512 9.1 18. 3.3 3.6 4.2 3. 4. 24 (2) 156 8.5 22 2.2 3.9 6.1 3.4 6.1 32 (2) 179 9.8 24 2.4 4.7 8.3 5.2 7.9 DW duplex-comp. 64 (4) 173 7.3 8.2 4.8 4.5 2.5 2.1 2.9 9,32 bit accu. 41 (2) 173 35 9.2 16.3 21 2.1 4.7 2.7 media of 3 words 48 (3) 413 14. 1.6 12.7 13.4 3.7 3.4 5.4 media (fast impl.) 48 (3) 972 12.2 1. 1.7 11.4 3. 3.5 4.6 2x2 mux, cmp, ic 58 (4) 22 74 9.2 76 2.9 5.8 4.2 5.3 2x2 sub_add, cmp 43 (4) 391 18.6 25 2 9.9 6.6 5.2 6.8 2x2 sub_abs, add 48 (4) 479 3.3 22 4. 3.6 3. 2.7 2.9 8 word vectoradd. 88 (8) 52 15. 28 9.7 7.4 6. 5.3 5.4 mi/med/max of 9 81 (9) 3126 19.3 25 21 2 5. 4.9 8.5 21 modules average 12.9 16.4 1.3 6.7 4.5 3.9 4.6 21 modules maximum 74 28 76 21 8.3 5.3 8.5 by 64% compared to the well kow Hammig distace model (word level switchig from Sectio 2.2.1) i colum r1. This reductio has bee achieved oly by cosiderig the combiatio of both, bit ad word level iput switchig properties. Further more, it ca be observed, that the improvemets are small for regular modules (DW with two iput buses), however, for complex modules (lower part of the table) the refiemets are immese. Also the maximum errors of all modules for our models 2..4 are very low. Compared to the secod referece r2 the improvemets are quite higher, but this is, as metioed i Sectio 2.2.2, because the accuracy of this model based oly o bit level switchig is very sesitive to the characterizatio patters (a media umber of average word level switchig bits has bee take). The subword models 1a,b (cf. Sectio 2.3.1) oly cause slight ehacemets ad have high characterizatio effort. The results ca be improved by dividig the iputs ito more tha three subwords. However, accordig to the expoetial growth of the LUT etries with the umber of subwords (see Table 3), the grid of these LUT has to be wideed. If we take ito accout the effort for characterizatio ad estimatio, both models (ehaced model relatig to sigle iput bit switchig ad ehaced regressio model) achieve the best trade off, for all test modules. The estimatio error compared to r1 ca be reduced by up to 7% usig the model 4 (relatig to iput pair switchig) at the cost of ehaced estimatio effort. I all, these three ovel approaches based o iput switchig iformatio oly, cause estimatio errors less tha 4.6% o average over a wide rage of test modules ad iput stimuli. These results are comparable to recetly published ehaced models [1][3][5][7], which achieve average estimatio errors of about 3..15%, but they have to make use of several more sigal properties (e.g. output switchig, iput probabilities, etc.). A combiatio of our efficiet modelig techiques with those additioal properties ca icrease the estimatio accuracy, but each property leads to a additioal dimesio i the LUTs, which would result i a multiple effort, particularly for the characterizatio. 6. CONCLUSIONS It has bee show, that usig both, word ad bit level switchig iformatio of module iputs for macromodelig without other sigal properties, the estimatio error compared to the Hammig distace model ca be reduced by up to 7%. Total errors less tha 4.6% o average for a large umber of test modules ad iput stimuli have bee achieved. This is comparable to complex models based o several more sigal properties. 7. REFERENCES [1] M. Ato, I. Coloescu, E. Macii, M. Pocio, "Fast Characterizatio of RTL Power Macromodels," i IEEE Proc. of ICECS, pp. 1591-1594, 21. [2] L. Beii, A. Bogliolo, M. Favalli, G. De Micheli, "Regressio models for behavioral power estimatio," i Proc. of PAT- MOS, pp. 179-187, 1996. [3] A. Bogliolo, L. Beii, G. de Micheli, "Regressio-Based RTL Power Modelig," ACM Tras. o Desig Automatio of Electroic Systems, vol. 5, o. 3, pp. 337-372, July 2. [4] G. Joches, L. Kruse, W. Nebel, "A New Parameterizable Power Macro-Model for Datapath Compoets," i Proc. of Europea Desig & Test Coferece, Date, pp. 29-36, 1999. [5] G. Joches, L. Kruse, E. Schmidt, A. Stammerma, W. Nebel, "Power Macro-Modellig for Firm-Macro," i Proc. of PATMOS Workshop, Germay, pp. 24-35, Sep. 2. [6] S. Gupta, F.N. Najm, "Eergy-per-cycle estimatio at RTL," i Proc. ISLPED, Moterey, CA, pp.121-126, 1999. [7] S. Gupta, F.N. Najm, "Power Modelig for High-Level Power Estimatio," i IEEE Tras. o VLSI, vol. 8, o. 1, pp. 18-29, February 2. [8] P. Ladma, "High-Level Power Estimatio," i IEEE Proc. of ISLPED, Moterey, CA, pp. 29-35, Jue 1996. [9] E. Macii, M. Pedram, F. Somezi, "High-Level Power Modelig, Estimatio, ad Optimizatio," i IEEE Trasactios o CAD, vol. 17, o. 11, pp. 161-179, Aug. 1998. [1] D. Marculescu, R. Marculescu, M. Pedram, "Iformatio theoretic measures for power aalysis," i Tras. o CAD, vol. 15, o. 6, pp. 599-61, 1996. [11] A. Raghuatha, N.K. Jha, S. Dey, High-Level Power Aalysis ad Optimizatio, Kluwer Academic Publishers, Bosto/ Dordrecht/Lodo, 1998. [12] Q. Wu, Q. Qiu, M. Pedram, C.S. Dig, "Cycle-Accurate Macro-Models for RT-Level Power Aalysis," i Tras. o VLSI 1998, vol.6, o.4, pp. 52-528, 1998.