Hard Real-Time Scheduling for Low-Energy Using Stochastic Data and DVS Processors

Similar documents
Exploiting Dynamic Workload Variation in Low Energy Preemptive Task Scheduling

Calculation of the received voltage due to the radiation from multiple co-frequency sources

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985

IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES

Uncertainty in measurements of power and energy on power networks

Research of Dispatching Method in Elevator Group Control System Based on Fuzzy Neural Network. Yufeng Dai a, Yun Du b

Control Chart. Control Chart - history. Process in control. Developed in 1920 s. By Dr. Walter A. Shewhart

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf

Redes de Comunicação em Ambientes Industriais Aula 8

MTBF PREDICTION REPORT

Priority based Dynamic Multiple Robot Path Planning

Total Power Minimization in Glitch-Free CMOS Circuits Considering Process Variation

Test 2. ECON3161, Game Theory. Tuesday, November 6 th

antenna antenna (4.139)

High Speed, Low Power And Area Efficient Carry-Select Adder

Understanding the Spike Algorithm

Topology Control for C-RAN Architecture Based on Complex Network

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results

Total Power Minimization in Glitch-Free CMOS Circuits Considering Process Variation

A Mathematical Solution to Power Optimal Pipeline Design by Utilizing Soft Edge Flip-Flops

The Impact of Spectrum Sensing Frequency and Packet- Loading Scheme on Multimedia Transmission over Cognitive Radio Networks

MASTER TIMING AND TOF MODULE-

Resource Scheduling in Dependable Integrated Modular Avionics

Walsh Function Based Synthesis Method of PWM Pattern for Full-Bridge Inverter

A Fuzzy-based Routing Strategy for Multihop Cognitive Radio Networks

Non Pre-emptive Scheduling of Messages on SMTV Token-Passing Networks

Figure.1. Basic model of an impedance source converter JCHPS Special Issue 12: August Page 13

Prevention of Sequential Message Loss in CAN Systems

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques

problems palette of David Rock and Mary K. Porter 6. A local musician comes to your school to give a performance

Adaptive Modulation for Multiple Antenna Channels

Practical Issues with the Timing Analysis of the Controller Area Network

A study of turbo codes for multilevel modulations in Gaussian and mobile channels

A Simple Satellite Exclusion Algorithm for Advanced RAIM

Rejection of PSK Interference in DS-SS/PSK System Using Adaptive Transversal Filter with Conditional Response Recalculation

Comparative Analysis of Reuse 1 and 3 in Cellular Network Based On SIR Distribution and Rate

Performance Analysis of Multi User MIMO System with Block-Diagonalization Precoding Scheme

Webinar Series TMIP VISION

Distributed Channel Allocation Algorithm with Power Control

Analysis of Time Delays in Synchronous and. Asynchronous Control Loops. Bj rn Wittenmark, Ben Bastian, and Johan Nilsson

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation

Modeling Hierarchical Event Streams in System Level Performance Analysis

ANNUAL OF NAVIGATION 11/2006

High Speed ADC Sampling Transients

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

THE GENERATION OF 400 MW RF PULSES AT X-BAND USING RESONANT DELAY LINES *

Approximating User Distributions in WCDMA Networks Using 2-D Gaussian

Define Y = # of mobiles from M total mobiles that have an adequate link. Measure of average portion of mobiles allocated a link of adequate quality.

Queuing-Based Dynamic Channel Selection for Heterogeneous Multimedia Applications over Cognitive Radio Networks

Learning Ensembles of Convolutional Neural Networks

Adaptive System Control with PID Neural Networks

Review: Our Approach 2. CSC310 Information Theory

A TWO-PLAYER MODEL FOR THE SIMULTANEOUS LOCATION OF FRANCHISING SERVICES WITH PREFERENTIAL RIGHTS

A NSGA-II algorithm to solve a bi-objective optimization of the redundancy allocation problem for series-parallel systems

AN ALGORITHM TO COMBINE LINK ADAPTATION AND TRANSMIT POWER CONTROL IN HIPERLAN TYPE 2

HUAWEI TECHNOLOGIES CO., LTD. Huawei Proprietary Page 1

An Energy Efficient Hierarchical Clustering Algorithm for Wireless Sensor Networks

Discussion on How to Express a Regional GPS Solution in the ITRF

TODAY S wireless networks are characterized as a static

Utility-based Routing

NETWORK 2001 Transportation Planning Under Multiple Objectives

Harmonic Balance of Nonlinear RF Circuits

Throughput Maximization by Adaptive Threshold Adjustment for AMC Systems

熊本大学学術リポジトリ. Kumamoto University Repositor

ECE315 / ECE515 Lecture 5 Date:

Energy-Aware Algorithms for Tasks and Bandwidth Co-Allocation under Real-Time and Redundancy Constraints

DETERMINATION OF WIND SPEED PROFILE PARAMETERS IN THE SURFACE LAYER USING A MINI-SODAR

Digital Transmission

The Synthesis of Dependable Communication Networks for Automotive Systems

Resource Allocation Optimization for Device-to- Device Communication Underlaying Cellular Networks

Control of Chaos in Positive Output Luo Converter by means of Time Delay Feedback

California, 4 University of California, Berkeley

A Preliminary Study on Targets Association Algorithm of Radar and AIS Using BP Neural Network

Configuring the communication on FlexRay - the case of the static segment

Generalized Incomplete Trojan-Type Designs with Unequal Cell Sizes

A High-Sensitivity Oversampling Digital Signal Detection Technique for CMOS Image Sensors Using Non-destructive Intermediate High-Speed Readout Mode

Malicious User Detection in Spectrum Sensing for WRAN Using Different Outliers Detection Techniques

Chaotic Filter Bank for Computer Cryptography

Opportunistic Beamforming for Finite Horizon Multicast

Optimal Placement of PMU and RTU by Hybrid Genetic Algorithm and Simulated Annealing for Multiarea Power System State Estimation

Graph Method for Solving Switched Capacitors Circuits

Low Switching Frequency Active Harmonic Elimination in Multilevel Converters with Unequal DC Voltages

NOVEL ITERATIVE TECHNIQUES FOR RADAR TARGET DISCRIMINATION

Modelling Service Time Distribution in Cellular Networks Using Phase-Type Service Distributions

FAST ELECTRON IRRADIATION EFFECTS ON MOS TRANSISTOR MICROSCOPIC PARAMETERS EXPERIMENTAL DATA AND THEORETICAL MODELS

A Novel Optimization of the Distance Source Routing (DSR) Protocol for the Mobile Ad Hoc Networks (MANET)

Iterative Water-filling for Load-balancing in


Power-Constrained Test Scheduling for Multi-Clock Domain SoCs

FFT Spectrum Analyzer

Figure 1. DC-DC Boost Converter

A Mathematical Model for Restoration Problem in Smart Grids Incorporating Load Shedding Concept

Comparison of Two Measurement Devices I. Fundamental Ideas.

Keywords LTE, Uplink, Power Control, Fractional Power Control.

Dynamic Power Consumption in Virtex -II FPGA Family

Side-Match Vector Quantizers Using Neural Network Based Variance Predictor for Image Coding

Performance Analysis of the Weighted Window CFAR Algorithms

Transcription:

Hard Real-me Schedulng for Low-Energy Usng Stochastc Data and DVS Processors Flavus Gruan Department of Computer Scence, Lund Unversty Box 118 S-221 00 Lund, Sweden el.: +46 046 2224673 e-mal: Flavus.Gruan@cs.lth.se ABSRAC he wor presented n ths paper addresses schedulng for reduced energy of hard real-tme tass wth fxed prortes assgned n a rate monotonc or deadlne monotonc manner. he approach we descrbe can be exclusvely mplemented n the ROS. It targets energy consumpton reducton by usng both on-lne and off-lne decsons, taen both at tas level and at tas-set level. We consder sets of ndependent tass runnng on processors wth dynamc voltage supples (DVS). ang nto account the real behavor of a realtme system, whch s often better than the worst case, our methods employ stochastc data to derve energy effcent schedules. he expermental results show that our approach acheves more mportant energy reductons than other polces from the same class. Keywords Low-energy, hard real-tme, ROS, schedulng 1. INRODUCION Low energy consumpton s today an ncreasngly mportant desgn requrement for dgtal systems, wth mpact on operatng tme, on system cost, and, of no lesser mportance, on the envronment. Reducng power and energy dsspaton has long been addressed by several research groups, at dfferent abstracton levels. We focus here on methods applcable at system-level, where the system to be desgned s specfed as an abstract set of tass. Selectng the rght archtecture has been shown to have a great nfluence on the system energy consumpton [4,5]. Recently, wth the advent of dynamc voltage supply (DVS) processors [2,22,25], hghly flexble systems can be desgned, whle stll tang advantage of supply voltage scalng to reduce the energy consumpton. Snce the supply voltage has a drect mpact on processor speed, classc tas schedulng and supply voltage selecton have to be addressed together. Schedulng offers thus yet another level of possbltes for achevng energy/ power effcent systems, especally when the system archtecture s fxed or the system exhbts a very dynamc behavor. For such dynamc systems, varous power management technques exst and are revewed for example n [1,17]. Yet, these manly target soft Permsson to mae dgtal or hard copes of all or part of ths wor for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. o copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or fee. ISLPED 01, August 6-7, 2001, Huntngton Beach, Calforna, USA. Copyrght 2001 ACM 1-58113-371-5/01/0008...$5.00. real-tme systems, where deadlnes can be mssed f the Qualty of Servce s ept. Several schedulng technques for soft real-tme tass, runnng on DVS processors have already been descrbed [3,18,19,23]. Energy reductons can be acheved even n hard realtme systems, where no deadlne can be mssed, as shown n [6,7,10,20,24]. In ths paper, we also focus on hard real-tme schedulng technques, where every deadlne has to be met. as level voltage schedulng decsons can reduce even further the energy consumpton. Some of these ntra-tas schedulng methods use several re-schedulng ponts nsde a tas, and are usually compler asssted [11,16,21]. Alternatvely, fxng the schedule before the tas starts executng as n [6,7,8] elmnates the nternal schedulng overhead, but wth possble affects on energy reducton. Statstcs can be used to tae full advantage of the dynamc behavor of the system, both at tas level [16] and at tas-set level [24]. In our approach we employ stochastc data to derve effcent voltage schedules wthout the overhead of ntra-tas re-schedulng. he rest of the paper s organzed as follows. In secton 2 we descrbe our hard real-tme schedulng strategy, pontng out the related wor for each decson we mae. Secton 3 contans several expermental results conducted both on real lfe examples and on randomly generated, large tas sets. Fnally, we present our conclusons n secton 4. 2. R SCHEDULING FOR LOW-ENERGY In the wor descrbed here, we address ndependent tass runnng on a sngle processor. he processor has varable speed (supply voltage and energy) adustable at runtme. he tass arrve wth gven perods and have to be executed before certan deadlnes. he prortes are fxed, assgned n a rate-monotonc (RM) or deadlne monotonc (DM) manner [14]. he runtme schedulng also operates as n RM/DM schedulng wth the dfference that each tas nstance s assgned a maxmum allowed executon tme. he schedulng strateges we adopt at tas-level are presented n subsecton 2.1. he allowed executon tme are nfluenced by tas group level decsons, taen both off-lne and on-lne. he off-lne phase s presented n sub-secton 2.2 and the on-lne phase n subsecton 2.3. Sub-secton 2.3 also contans a proof that our schedulng method eeps the response tmes from the orgnal RM/DM schedulng, and thus does not affect the feasblty of the schedule. 2.1 as-level Schedulng Decsons as-level voltage schedulng has captured the attenton of the research communty rather recently [8]. Fne gran schedulng, where several re-schedulng ponts are used nsde a tas were pre- 46

sented n [11,16]. In [16] statstcal data s used to mprove the tas level schedule, by slowng down dfferent regons of a tas accordng to ther average executon tme. Our approach produces voltage schedules only when a tas starts executng, whle usng stochastc data more aggressvely both at tas level and tas-set level. At tas level we generate voltage schedules that are correlated wth the tas executon length probablty dstrbuton. For tas-set level schedulng decsons see sub-secton 2.3. In our model a tas τ can be executed n phases, at dfferent avalable voltages, dependng on ts allowed executon tme A. he deal case states that the most energy s saved when the processor uses the voltage for whch the tas exactly covers ts allowed executon tme. hs corresponds to an deal voltage whch may not overlap wth the avalable voltages. A close to optmal soluton s to execute the tas n two phases at two of the avalable voltages. hese two voltages are the ones boundng the deal voltage [6,8]. An mportant observaton s that tass may fnsh, and n many cases do fnsh, before ther worst case executon tme (WCE). herefore t maes sense to execute frst at a low voltage and accelerate the executon, nstead of executng at hgh voltage frst and decelerate. In ths manner, f a tas nstance s not the worst case, one sps executng hgh voltage (and power eager) regons. In the followng we wll dstngush between three modes of executon for a tas, as depcted n Fgure 1. he deal case (mode 1) s when the actual executon pattern (the number of cloc cycles) becomes nown when the tas arrves. We can stretch then the actual executon tme of the tas to exactly fll the allowed tme. hs mode requres rather accurate executon pattern estmates, dependng on the nput data, and therefore s rarely achevable n practce. he second mode (mode 2) s the WCE stretchng - the voltage schedule for the tas s determned as f the tas wll exhbt ts worst case behavor. hese two modes use at most two voltage regons, and therefore at most one DC-DC swtch. he thrd mode (mode 3), descrbed n more detal next, uses stochastc data to buld a multple voltage schedule. he purpose for usng stochastc data s to mnmze the average case energy consumpton. Note that the voltage schedules n all these three modes are decded at a tas nstance arrval. Unle n [11,21] no reschedulng s done whle the tas s executng. he only overhead durng tas executon s the one gven by the changes n the supply voltage. For nstance, the lparm processor [2] needs at most 70µs to swtch from 1.2 to 3.8V. For closer voltage levels, the swtch occurs faster. Dependng on the actual tas executon tme, ths delay may have some mpact on the schedule. he same goes for the energy lost durng the DC-DC swtch. Although our dscusson does not cover these, the methods Used Energy mode 1 mode 2 mode 3 actual E WCE allowed tme tme Fgure 1. Voltage schedulng modes for tass: 1) deal schedule, 2) WCE orented schedule, 3) stochastc schedule. presented here can be adapted to accommodate both the DC-DC delay and energy loss whenever the actual processor requres t. he stochastc voltage schedule (mode 3 n Fgure 1) for a tas s obtaned usng the probablty dstrbuton of the executon pattern for a tas (the number of cloc cycles used). hs probablty dstrbuton can be obtaned off-lne, va smulaton, or bult and mproved at runtme. Let us denote by X the random varable assocated wth the number of cloc cycles used by a tas nstance. We wll use the cumulatve densty of probablty functon, cdf x, assocated wth the varable X, cd f x = PX ( x). hs functon reflects the probablty that a tas nstance fnshes before a certan number of cloc cycles. If WX s the worst case number of cloc cycles, cd f WX = 1. Decdng a voltage schedule for a tas, means that for every cloc cycle up to WX we decde a specfc voltage level (and processor speed). Each cycle y, dependng on the voltage adopted, wll consume a specfc energy, e y. But each of these cycles are executed wth a certan probablty, so n average the energy consumed by cycle y can be computed as ( 1 cd f y ) e y. o obtan the average energy for the whole tas, we have to consder all the cycles up to WX: E = ( 1 cd f y ) e y (1) 0 < y WX hs s the value we want to mnmze by choosng approprate voltage levels for each cycle. Snce WX may be a large number n practce, n our mplementaton we group several consecutve cloc cycles nto equal sze groups. For the sae of brevty and clarty we descrbe here only the smpler case, when the voltage levels are decded cloc cycle by cloc cycle. A tas has to complete ts executon durng an allowed executon tme, A. If we denote the cloc length assocated to cloc cycle y by y, ths constrant can be wrtten as: y A (2) 0 < y WX he cloc cycle length dependency on the supply voltage V and threshold voltage V s accordng to: V ( V V ) β where β s the velocty saturaton ndex. If V s small enough or we use a varable threshold technology [22], ths dependency s smplfed to: V ( 1 β). he cloc cycle energy e s drectly dependent on the square of the supply voltage as n: e V 2 [6]. Elmnatng V from the last two expressons we obtan the dependency between the cloc cycle energy and length: 2 ----------- β 1 e 1 (3) For clarty we wll bound now β = 2, but the rest of the calculus can be carred out for any other reasonable value of β. If we substtute (3) n (1), we obtan: ( 1 cd f E y ) ------------------------- (4) 2 0 < y WX y whch s the value to be mnmzed. By mathematcal nducton one can prove that the rght hand sde of (4) has a lower bound (usng also (2)): 1 cd f y 2 0 < y WX 1 2 LB = ----------------------------------------------- ----- (5) y A 2 1 cd f y 0 < y WX 0 < y WX hs lower bound can only be obtaned f and only f: y = A ( 1 cd f y ) 1 cd f y 0 < y WX (6) 47

hese are the optmal values for the cloc cycle length n each cloc cycle up to WX. In practce these values may not overlap wth the avalable cloc lengths so they have to be converted to real cloc cycles. hs converson s done n a smlar way to dervng a dual level voltage schedule from an deal one [6,8]. We fnd the two boundng avalable cloc cycles CK < y CK + 1 and dstrbute the wor of the deal cycle n two such that y = w CK + ( 1 w ) CK + 1, where w s the wor gven to CK and the rest s the wor gven to CK +1. hus, each cycle n the tas wll dstrbute ts wor between two of the several avalable cloc lengths. Fnally, the accumulated wor loads for each avalable cloc cycle s rounded to ntegers, snce one can only execute full cloc cycles. Note that the coeffcent of A n (6) can be computed off-lne or, f the probablty dstrbuton s bult at runtme, on-lne from tme to tme. herefore, the on-lne computatonal complexty for obtanng the stochastc voltage schedule s gven by the steps subsequent to (6). One has to compute the deal cloc cycle for each of the WX cloc cycles. Fndng the boundng cloc cycles taes logarthmc tme of the number of voltage levels, N v. hs gves a complexty of OWX ( logn v ). wo examples of stochastc voltage schedules are gven n Fgure 2. We assumed a normal probablty dstrbuton wth the mean of 70 cycles, and standard devaton of 10. WX s 100. Assumng we only have four avalable cloc frequences f, f/2, f/3, and f/4, we gve two voltage schedules obtaned for two dfferent values of the allowed executon tme. he schedules are gven n number of cloc cycles executed at each avalable frequency. he allowed executon tme s reported n percentage of the tme needed for executng the worst case behavor (WX) at the hghest cloc frequency (f). Some expermental results on how stochastc voltage schedule contrbute at savng energy are presented n secton 3. 2.2 Off-lne as Stretchng he schedulng condton proposed by Lu and Layland [14] s a suffcent one and covers the worst possble case for the tas group characterstcs. Yet, an exact analyss as proposed n [13] may reveal possbltes for stretchng tass and stll eepng the deadlnes. Based on ths, [20] descrbes a method to compute the maxmum requred frequency for a tas set (or the mnmum stretchng factor). In smlar way, we go further and compute mnmal stretchng factors { α } 1 n for each tas τ n the tas group { τ }. A tas s a defned by the trple 1 n τ = ( C,, D ) composed of the WCE, perod and deadlne for tas τ. Note that throughout the paper C refers to the worst case executon pattern WX runnng at the fastest cloc frequency. We 1 0.8 0.6 0.4 0.2 1 - cdf 0 0 20 40 60 80 100 47@f/4 25@f/3 8 20@f 1-cdf functon for a normal dstrbuton wth mean 70 and standard devaton 10. owed s 300% of WX at cloc f 27@f/3 47@f/2 26@f owed s 200% of WX at cloc f Fgure 2. wo stochastc voltage schedules for a tas wth normal dstrbuton executon tme and worst case behavor of 100 cycles consder that the tass n the group are ndexed accordng to ther prorty, computed as n RMS. We compute the stretchng factors n an teratve manner, from the hgher to the lower prorty tass. An ndex q ponts to the latest tas whch has been assgned a stretchng factor. Intally, q = 0. Each of the tass τ, q < n has to be executed before one of ts schedulng ponts S as defned n [13]: S = { 1 ; 1 }, f = D. If D, we only need to change the set of schedulng ponts accordng to S ' = { t ( t S ) ( t < D )} { D }. For each of ths schedulng ponts S S, tas τ exactly meets ts deadlne f: S α r C S r ----- + α C p ----- = S 1 r q r q < p p Note that for the tass whch already have assgned a stretchng factor we used that one, α r, whle for the rest of the tass we assumed they wll all use the same and yet to be computed stretchng factor, α, whch s dependent on the schedulng pont. For the tas τ the best schedulng choce, from the energy pont of vew, s the largest of ts α. At the same tme, from (7), ths has to be the equal for all tass τ, q < n. here s a tas wth ndex m for whch ts best stretchng factor s the smallest among all other tass: max( α m ) = mn( max( α. Note that ths n not necessarly the last )) tas, n. If q = 0, ths tas sets the mnmal cloc frequency as computed n [20]. Havng the ndex m, all tass between q and m can be at most stretched (equally) by the stretchng factor of m. hus, we assgn them stretchng factors as α r = max( α m ), q< r m. Wth ths an teraton of the algorthm for fndng the stretchng factors s complete. he next teraton then proceeds for q = m. Fnally the process ends when q reaches n, meanng all tass have been gven ther own off-lne stretchng factors. An example s gven n able 1. Note that tass 3 and 4 can be stretched off-lne more than 1 and 2, whle 5 has the largest stretchng factor. he processor utlzaton changes from 0.687 to 0.994. We use the utlzaton after off-lne stretchng n computng the energy reducton upper bound n our experments. For > D, the dfference between the stretchng factors grows. able 1: Numercal Example for Off-lne Stretchng as τ Off-lne Stretchng factor α No. WCE (C) Perod () value teratons needed 1 1 5 1.428 1 2 5 11 1.428 1 3 1 45 1.785 2 4 1 130 1.785 2 5 1 370 2.357 3 2.3 On-lne Slac Dstrbuton At runtme t s mportant to use the varatons n executon length of the varous tas nstances to be able to stretch other tass and thus consume less energy. In [20] the only stuaton when a tas s stretched s when t s the only one runnng and has enough tme untl the next tas arrves. In all other stuatons tass are executed at the speed dctated by the off-lne analyss. In [11] tass are (7) 48

stretch at ther WCE at runtme, ndependent of other tass, usng several checng/re-schedulng ponts durng a tas nstance. he wor n [10] uses only two voltage levels. he slac produced by fnshng a tas early s entrely used to run the processor at the low voltage. As soon as ths slac s consumed, the tas starts runnng at hgh voltage. Our method s perhaps most resemblant to the optmal schedulng method OPASS presented n [7]. Yet, OPASS performs analyss over tas hyperperods, whch may lead to worng on a huge number of tas nstances for certan tas sets. Our method eeps a low and the same computatonal complexty, regardless of the tas set characterstcs. We descrbe next our strategy for slac dstrbuton. In short, an early fnshng tas may pass on ts unused processor tme for any of the tass executng next. But ths tme slac can not be used by any tas at any tme snce deadlnes have to be met. We solve ths by consderng several levels of slacs, wth dfferent prortes, as n the slac stealng algorthm [12]. If the tass n the tas set { τ = ( C,, D ) } have m dfferent prortes, we use m 1 n levels of slacs { S }. Wthout great loss of generalty consder that the tass have dfferent prortes, m=n. he slac n each 1 m level s a cumulatve value, the sum of the unused processor tmes remanng from the tass wth hgher prorty. he nvarant descrbng the state of the slacs n every level, at any tme s gven by (10). Intally, all level slacs S are set to zero. o mantan the relaton between slac levels, the levels are managed at runtme as follows: whenever an nstance of a tas τ C wth prorty starts executng, t can use an arbtrary part of the slac avalable at level, S. So the allowed executon tme for tas τ wll be: A = C + C. he remanng slac from level wll degrade nto level +1 slac. Each level slac wll be updated accordng to: 0, S ' = (8) S C, > whenever a tas nstance fnshes ts executon, t wll generate some slac f t fnshes before ts allowed tme. If E s the actual executon tme, the generated slac s A = A E. hs slac can be used by the lower prorty tass. In ths case the level slacs are updated accordng to: S, S '' = (9) S + A, > dle processor tmes are subtracted for all slacs. hs ensures that the crtcal nstance from the classc RM analyss remans the same. he computatonal complexty requred by the on-lne method s lnearly dependent to the number of slac levels. Note that tas nstances can only use slac generated from hgher prorty tass and produce low prorty slac. We call ths slac degradaton. Whenever the lowest prorty tas starts executng, all level slacs are reset. Note also that not necessarly all slac at one level s used by a sngle tas. Varous methods can be used, but we menton here only the two we used n our experments: Greedy: the tas gets all the slac avalable for ts level: C = S Mean proportonal: we consder the mean executon tme µ for each tas nstances watng to execute (n the ready queue). he slac s proportonally dstrbuted accordng to these: C = S µ µ ReadyQ he strategy of managng the slac we ust descrbed allows us to eep the crtcal nstance response tme for all tass, as we prove next. he response tme R () t for tas τ s computed as R () t = A + I () t, where A s ts allowed executon tme, as before, and I (t) s the nterference from the other tass. From the managng strategy gven before, the cumulated slac on each level, at a certan tme t s of form: S () t = S 1 () t C 1 + A 1, = t ----------- (10) 1 he slac of level s composed of all slac from level -1, less the slac used by the nstances of tass wth prorty -1 but plus all the slac generated by these. he number of nstances executed,, s determned by the tas perod. Note that S 1 s always zero. Elmnatng the teraton n the prevous formula: < S () t = A C = t (11) ----- = 1 he tas wth the hghest prorty wll never receve slac and therefore, C 1 = 0. he nterference from the hgh prorty tass s the tme used to execute all arrved nstances of these hgh prorty tass: < I () t = E = t ----- (12) = 1 Wth the notatons from the slac managng algorthm E = A A = C + C A. Introducng ths n (12): < I () t = ( C + C A ) = t ----- (13) = 1 he last two terms n the sum are actually gvng the slac of level, as n (11), so we can re-wrte (13) as: < I () t = C S () t = t ----- (14) = 1 Note that the maxmal response tme for a tas s obtaned when t uses all the slac avalable at ts level: R () t = C + I () t + S () t. From the last two equatons: < R () t = C + t ----- C (15) = 1 whch s exactly the response tme when all tass execute at WCE. hus, f the RM analyss decdes that a tas set s schedulable, t remans vald when usng our on-lne polcy. In our mplementaton we addtonally used a method smlar to the on-lne method presented n [20]. Namely, whenever there are no tass n the Ready queue, the currently executng tas can stretch untl the closest arrval tme of a tas nstance. We wll refer to ths n our experments as the 1stretch method. 3. EXPERIMENAL RESULS he frst experment examnes the energy gans of usng a stochastc voltage schedule at tas level. For ths we consdered a sngle tas wth executon tme varyng between a best case (BCE) and a worst case (WCE) accordng to a normal dstrbuton. dstrbutons have the mean (BCE+WCE)/2 and standard devaton (WCE- BCE)/6. For a several cases rangng from hghly flexble executon tme ( s 0.1) to almost fxed ( s 0.9) we bult stochastc schedules for a range of allowed executon tmes (from 49

WCE to 3x WCE). We assumed that our processor has 9 dfferent voltage levels, equally dstrbuted between f and f/3. For a large number of tas nstances generated accordng to the gven dstrbuton we computed both the energy of the stochastc schedule (mode 3 n Fgure 1) and the WCE-stretch schedule (mode 2 n Fgure 1). We depct n Fgure 3 the average energy consumpton of the stochastc schedule as a part of the WCE-stretch schedule. Note that when the allowed tme approaches ether WCE or 3-tmes WCE, the energy consumptons become equal. he lowest possble cloc frequency s f/3 whch anyway means 3-tmes WCE, so there s no better schedule for these cases. On the other hand when the allowed tme closes WCE, there s no other way but to use the fastest cloc. Somewhere between the slowest and the fastest frequences (owed/wce = 2) s the largest energy gan snce the stochastc schedule can use the whole spectrum of avalable frequences. Note that the energy gans become more mportant when the tas executon tme vares much ( closes 0.1). It s mportant to notce that WCE-stretch already gans very much energy compared to the non-scalng case. For example when the allowed tme s twce the WCE, the WCE-stretch energy s around 25% of the no-scalng energy. But a stochastc approach contrbutes even more to these gans, as the fgure shows. Next we too two real-lfe hard-r applcatons [9, 15] and appled several energy reducton strateges. he results are depcted n Fgure 4. We assumed tass wth normal dstrbutons, wth the same characterstcs as n the prevous experment. he 100% energy s the energy obtaned by runnng all tass as fast as possble and executng NOPs when no tass are supposed to run. We assumed that the NOP nstructon consumes only 20% of the average power, as n [20]. he vrtual processor used for these experments has 14 voltage levels, wth cloc frequences varyng between f=100mhz and 11MHz. A power-down mode s also avalable, n whch the processor consumes 5% of the hghest frequency average energy. he curves named depct the upper bound of the energy reducton possbltes. hese were obtaned n a post-executon analyss, by consderng that the tass are unformly stretched up to maxmum processor utlzaton as computed n sub-secton 2.2.2. hs lmt s hardly achevable n practce, snce the actual executon patterns for all tas nstances are never avalable beforehand. Moreover, ths optmum obtaned by unformly stretchng all nstances may volate some deadlnes, beng therefore useless n practce. A more realstc bound s gven by the. he curves named Offlne+1stretch were obtaned by usng only the off-lne stretchng method and the 1stretch method mentoned n sub-secton 2.2.3. he labeled curves were obtaned by Stochastc schedule energy compared to WCE-stretch 100% 95% 90% 85% 75% 70% 3 2.5 2 owed/wce 1.5 Levels 95.5% 90.8% 86.1% 81.4% 76.6% 0.7 0.9 0.5 0.3 0.1 1 Fgure 3. he average energy consumpton of a stochastc voltage schedule vs. the energy consumpton of a WCE- stretch schedule. usng the off-lne strategy, the on-lne strategy wth mean proportonal slac dstrbuton (sub-secton 2.3), plus the stochastc executon tas model (mode 3 n Fgure 1). he curves labeled were obtaned by usng the same method as the curves, except usng an deal-stretch tas executon model (mode 1 n Fgure 1). Note that ths method mples nowng the actual executon tme at a tas arrval, whch s unlely n realty. For the last three methods, Offlne+1stretch,, and Ideal-stretch, whenever the processor s dle, t goes to a power down mode. We also tested our schedulng polcy on randomly generated tas sets of 50 and 100 tass. he tas sets were generated as follows. For each set, the tas perods (and deadlnes) were selected usng a unform dstrbuton n 100..5000 and 100..10000 respectvely. he worst case executon tmes were then randomly generated such that the tas set would yeld approxmately 0.67 processor utlzaton, for the fastest cloc. he average utlzaton after off-lne stretchng turned out to be 0.92 for the sets of 50 tass, and 0.85 for the sets of 100 tass. Usng the same processor type as n the prevous experment, we smulated the runtme behavor of several schedulng methods. We also used post-smulaton data to obtan the upper bounds, as n the prevous experment. he values depcted n Fgure 5 are averages over one hundred sets of tass. As results from these experments, our polcy ( ) performs best, when lttle nformaton on tas executon s avalable. 100% 100 60% 95 90 40% 85 Offlne+1stretch 20% 80 75 Offlne+1stretch 0 70 0.1 0.3 0.5 0.7 0.9 0.1 0.3 0.5 0.7 0.9 a) avoncs, 17 tass b) CNC, 8 tass Fgure 4. he energy reducton for an a) avoncs applcaton [15] and b) a controller CNC [9]. In b) the area between 70-100% s enlarged. Energy reducton 4. CONCLUSIONS We presented and analyzed a schedulng polcy for hard real-tme tass runnng on a dynamc voltage supply processor, wth the fnal purpose of reducng the energy consumpton. he polcy s desgned for sets of tass wth fxed prortes assgned n a rate/ deadlne monotonc manner. It conssts of both off-lne and on-lne schedulng decsons, taen both at tas and tas set levels. he offlne decsons use exact tmng analyss to derve off-lne voltage scalng factors for each tas. he on-lne polcy dstrbutes avalable processor tme on prorty bass, usng slac levels and statstcs. as-level voltage schedules are bult usng stochastc data, wth the goal of mnmzng the average case energy consumpton. he paper also contans a proof that our schedulng polcy meets all deadlnes. Our method can be fully mplemented n the ROS, wthout appealng to specal complers or changng the software. Yet, combned wth the afore mentoned methods, our approach may yeld even greater energy reductons. he expermental results show that our polcy can be successfully used to reduce the energy consumpton n a hard real-tme system. 50

100% 90% 70% 60% 50% 0 100% 90% 70% 60% 50% 40% 30% Energy reducton Energy reducton Offlne+1stretch 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Offlne+1stretch 5. ACKNOWLEDGMENS hs wor was funded by ARES - A networ for Real-me research and graduate Educaton n Sweden 1. he author would le to than Petru Eles, Krs Kuchcns, and Per Larsson-Edefors for ther helpful comments. 6. REFERENCES [1] Benn, L. and DeMchel, G. System-level power optmzaton: technques and tools, n ACM rans. on Desgn Automaton of Electronc Systems, No. 2, Vol. 5, Aprl 2000, 115-192. [2] Burd,., Perng,., Strataos, A., and Brodersen, W. A dynamc voltage scaled mcroprocessor system n IEEE Journal of Sold-State Crcuts, No. 11, Vol. 35, November 2000, 1571-1580. [3] Chnadraasan, A., Gutn, V., and Xanthopoulos,. Data drven sgnal processng: an approach for energy effcent computng n Proceedngs of ISLPED 96, 347-352. [4] Dave, B.P., Lashmnarayana, G., and Jha, N.K. COSYN: hardware-software co-synthess of embedded systems n Proceedngs of the 34th DAC 1997, 703-708. [5] Gruan, F., and Kuchcns, K. Low-energy drected archtecture selecton and tas schedulng for system-level desgn n Proceedngs of the 25th Euromcro Conference, 1999, pp. 296-302. [6] Gruan, F., and Kuchcns, K. LEneS: tas schedulng for low-energy systems usng varable voltage processors n Proceedngs of ASP-DAC2001, 449-455. [7] Hong, I., Potona, M., and Srvastava, M.B. On-lne schedulng of hard real-tme tass on varable voltage 1 http://www.artes.uu.se/ sets of 50 tass sets of 100 tass 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fgure 5. he energy reducton usng dfferent strateges for sets of 50 tass above and sets of 100 tass bellow. he value are averages over a hundred tas sets. processor n Dgest of echncal Papers of ICCAD 98, 653-656. [8] Ishhara,., and Yasuura, H. Voltage schedulng problem for dynamcally varable voltage processors n Proceedngs of ISLPED 98, 197-202. [9] Km, N., Ryu, M., Hong, S., Sasena, M., Cho, C.-H., and Shn, H. Vsual assessment of a real-tme system desgn: a case study on a CNC controller, he 17th IEEE Real-me Systems Symposum, 1996, 300-310. [10] Lee, Y.-H., and Krshna, C.M. Voltage-cloc scalng for low energy consumpton n real-tme embedded systems n Proceedngs of the 6th Internatonal Conference on Real-me Computng Systems and Applcatons, 1999, 272-279. [11] Lee, S., and Saura,. Run-tme voltage hoppng for lowpower real-tme systems n Proceedngs of the 37th DAC, 2000, 806-809. [12] Lehoczy, J., and Ramos-huel, S. An optmal algorthm for schedulng soft-aperodc tass n fxed-prorty preemptve systems n Proceedngs of RSS 92, 110-123. [13] Lehoczy, J., Sha, L., and Dng, Y. he rate monotonc schedulng algorthm: exact characterzaton and average case behavor n Proceedngs of RSS 89, 166-171. [14] Lu, C.L., and Layland, J.W. Schedulng algorthms for multprogramng n a hard real tme envronment n JACM 20 (1), 1973, 46-61. [15] Loce, C.D., Vogel, D.R., and Mesler,.J. Buldng a predctable avoncs platform n Ada: a case study n Proceedngs of RSS 91, 181-189. [16] Mossé, D., Aydn, H., Chlders, B., and Melhem, R., Compler-asssted dynamc power-aware schedulng for realtme applcatons. Worsop on Complers and Operatng Systems for Low-Power, October 2000. [17] Pedram, M. Power optmzaton and management n embedded systems, Proceedngs of ASP-DAC 2001, 239-244. [18] Perng,., Burd,., and Brodersen, R., he smulaton and evaluaton of dynamc voltage scalng algorthms n Proceedngs of ISLPED 98, 76-81. [19] Perng,., Burd,., and Brodersen, R., Voltage schedulng n the lparm mcroprocessor system n Proceedngs of ISLPED 00, 96-101. [20] Shn, Y., and Cho, K. Power conscous fxed prorty schedulng for hard real-tme systems n Proceedngs of the 36th DAC, 1999, 134-139. [21] Shn, D., Km, J., and Lee, S. Intra-tas voltage schedulng for low-energy hard real-tme applcatons, Specal Issue of IEEE Desgn and est of Computers, October 2000. [22] Suzu, K., Mta, S., Futa,., Yamane, F., Sano, F., Chba, A., Watanabe, Y., Matsuda, K., Maeda,., and Kuroda,. A 300MIPS/W RISC core processor wth varable supplyvoltage scheme n varable threshold-voltage CMOS, Proceedngs of the ICC 97, 587-590. [23] Weser, M., Welch, B., Demers, A., and Shener, S. Schedulng for reduced CPU energy n Proceedngs of the Frst Symposum on Operatng Systems Desgn and Implementaton, November 1994. [24] Yao, F., Demers, A., and Shener, S. A schedulng model for reduced CPU energy n Proceedngs of the 36th Symposum on Foundatons of Computer Scence, 1995, 374-382. [25] http://www.transmeta.com 51