Energy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control

Similar documents
Energy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control

*Engineering and Industrial Services, TATA Consultancy Services Limited **Professor Emeritus, IIT Bombay

Evaluation of CPU Frequency Transition Latency

Energy Efficient Scheduling Techniques For Real-Time Embedded Systems

Dependable Wireless Control

Deadline scheduling: can your mobile device last longer?

SCHED_DEADLINE. Ongoing development and new features. Juri Lelli ARM Ltd. Linaro Connect BUD17, Budapest (Hungary) 08/03/2017 ARM 2017

Towards Real-Time Volunteer Distributed Computing

MODEL-BASED PREDICTIVE ADAPTIVE DELTA MODULATION

Model Predictive Control of an Automotive Organic Rankine Cycle System

3.5: Multimedia Operating Systems Resource Management. Resource Management Synchronization. Process Management Multimedia

REAL-TIME LINEAR QUADRATIC CONTROL USING DIGITAL SIGNAL PROCESSOR

Glossary of terms. Short explanation

Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems

Scheduling and Communication Synthesis for Distributed Real-Time Systems

Evaluation of CPU Frequency Transition Latency

Lecture 2: Embedded Systems: An Introduction

CHASSIS DYNAMOMETER TORQUE CONTROL SYSTEM DESIGN BY DIRECT INVERSE COMPENSATION. C.Matthews, P.Dickinson, A.T.Shenton

CSE 3215 Embedded Systems Laboratory Lab 5 Digital Control System

FIFO WITH OFFSETS HIGH SCHEDULABILITY WITH LOW OVERHEADS. RTAS 18 April 13, Björn Brandenburg

Dynamic MIPS Rate Stabilization in Out-of-Order Processors

EECS 461, Winter 2009, Problem Set 2 1

Maximizing the execution rate of low-criticality tasks in mixed-criticality system

Introduction to Real-Time Systems

Power Capping Via Forced Idleness

TRACK-FOLLOWING CONTROLLER FOR HARD DISK DRIVE ACTUATOR USING QUANTITATIVE FEEDBACK THEORY

Modular Performance Analysis

Andrea Zanchettin Automatic Control 1 AUTOMATIC CONTROL. Andrea M. Zanchettin, PhD Winter Semester, Linear control systems design Part 1

SCHED_DEADLINE: It s Alive!

EE 461 Experiment #1 Digital Control of DC Servomotor

Optimizing VM Checkpointing for Restore Performance in VMware ESXi Server

Embedded Systems. 9. Power and Energy. Lothar Thiele. Computer Engineering and Networks Laboratory

Optimality and Improvement of Dynamic Voltage Scaling Algorithms for Multimedia Applications

UNDERSTANDING LTE WITH MATLAB

5. Process and thread scheduling

Continuous Time Model Predictive Control for a Magnetic Bearing System

Sourjya Bhaumik, Shoban Chandrabose, Kashyap Jataprolu, Gautam Kumar, Paul Polakos, Vikram Srinivasan, Thomas Woo

Self-Aware Adaptation in FPGAbased

Real-Time Systems Hermann Härtig Introduction

3. DAC Architectures and CMOS Circuits

PROCESS DYNAMICS AND CONTROL

Control Design Made Easy By Ryan Gordon

Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips

Ben Baker. Sponsored by:

Lecture 1: Introduction to Digital System Design & Co-Design

AutoBench 1.1. software benchmark data book.

BIO Helmet EEL 4914 Senior Design I Group # 3 Frank Alexin Nicholas Dijkhoffz Adam Hollifield Mark Le

Mining for Statistical Models of Availability in Large-Scale Distributed Systems: An Empirical Study of

Laurea Specialistica in Ingegneria. Ingegneria dell'automazione: Sistemi in Tempo Reale

Exploiting Synchronous and Asynchronous DVS

Performance Characterization of IP Network-based Control Methodologies for DC Motor Applications Part II

A High Definition Motion JPEG Encoder Based on Epuma Platform

DEVELOPING INTELLIGENT SYSTEMS METHODS, BEST PRACTICE AND CHALLENGES

Hardware-Software Interaction for Run-time Power Optimization: A Case Study of Embedded Linux on Multicore Smartphones

Lec 24: Parallel Processors. Announcements

Closing the loop around Sensor Networks

Experimental Evaluation of the MSP430 Microcontroller Power Requirements

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics

CIS 480/899 Embedded and Cyber Physical Systems Spring 2009 Introduction to Real-Time Scheduling. Examples of real-time applications

The Case for Feedback Control Real-Time Scheduling. Abstract

Teleoperation and System Health Monitoring Mo-Yuen Chow, Ph.D.

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs

Power of Realtime 3D-Rendering. Raja Koduri

MEM380 Applied Autonomous Robots I Winter Feedback Control USARSim

Digital Signal Processing. VO Embedded Systems Engineering Armin Wasicek WS 2009/10

CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors

Adaptive Touch Sampling for Energy-Efficient Mobile Platforms

MPC Design for Power Electronics: Perspectives and Challenges

Pan-Tilt Signature System

Convolution Engine: Balancing Efficiency and Flexibility in Specialized Computing

SCALCORE: DESIGNING A CORE

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION. Scott Deeann Chen and Pierre Moulin

Experiment Of Speed Control for an Electric Trishaw Based on PID Control Algorithm

Improving Digital Control System Performance Through a Novel Jitter Compensating Method

Introduction to Real-time software systems Draft Edition

Relay Based Auto Tuner for Calibration of SCR Pump Controller Parameters in Diesel after Treatment Systems

Automatic Control Systems

INTERFACING WITH INTERRUPTS AND SYNCHRONIZATION TECHNIQUES

Image Processing Architectures (and their future requirements)

Characterizing, Optimizing, and Auto-Tuning Applications for Energy Efficiency

Fast Placement Optimization of Power Supply Pads

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System

Experiment 9. PID Controller

CS4617 Computer Architecture

Magnetic Suspension System Control Using Position and Current Feedback. Senior Project Proposal. Team: Gary Boline and Andrew Michalets

Automated Driving Systems with Model-Based Design for ISO 26262:2018 and SOTIF

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

Hardware in the Loop Simulation for Unmanned Aerial Vehicles

MCE441/541 Midterm Project Position Control of Rotary Servomechanism

Contents. Basic Concepts. Histogram of CPU-burst Times. Diagram of Process State CHAPTER 5 CPU SCHEDULING. Alternating Sequence of CPU And I/O Bursts

Module 3. Embedded Systems I/O. Version 2 EE IIT, Kharagpur 1

A Candidate to Replace PID Control: SISO Constrained LQ Control 1

Arda Gumusalan CS788Term Project 2

EFFICIENT CONTROL OF LEVEL IN INTERACTING CONICAL TANKS USING REAL TIME CONCEPTS

Gesture Based Smart Home Automation System Using Real Time Inputs

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Cooperative Cross-Layer Protection for Resource Constrained Mobile Multimedia Systems

MEM01: DC-Motor Servomechanism

A Guide to Design MIMO Controllers for Architectures

Transcription:

Energy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control Guangyi Cao and Arun Ravindran Department of Electrical and Computer Engineering University of North Carolina at Charlotte

Organization of Talk Motivation Related Work Cross-Layer Control Framework Evaluation Methodology Experimental Results Future Directions

Motivation Related Work Cross-Layer Control Framework Evaluation Methodology Experimental Results Future Directions

Data Center Energy Consumption In 2012, data centers consumed equivalent of 30GW of power Source: BalticServers, Wikimedia Servers typically operate between 10% to 50% of their maximum utilization level Server idle power is 50%-60% of the peak power

Energy Efficient Computing Resource Allocation Feedback Control Scheduling

Motivation Related Work Cross-Layer Control Framework Evaluation Methodology Experimental Results Future Directions

What we mean by cross layer From a computing systems point of view Application Operating System Hardware

Cross layer optimization and control Several work on single layer feedback control Fu et. al. (2011) used Model Predictive Control for cache aware utilization control Hoffman et. al. (2013) proposed a control framework for controlling multiple hardware parameters Reed et. al. (2013) proposed an application level controller for Apache webserver Among cross layer approaches that influenced our work- Illinois GRACE project (2006) DVFS, CPU budget, frame rate and dithering for video decoding Hierarchical optimization Cucinotta et. al. (2010) Cross-layer feedback approach with separate feedback loops Internal loop for resource allocation by controlling scheduling parameters External loop for application quality

Motivation Related Work Cross-Layer Control Framework Evaluation Methodology Experimental Results Future Directions

Control Framework

Soft Real Time Schedulers Multiprocessor Earliest Deadline First Algorithm Previous research (Devi and Anderson) have shown that for soft realtime tasks, bounded tardiness with utilization of m (# of cores) is possible for multi-processor EDF

System Model LTI State space model x(k+1) = Ax(k) + B u u(k) + B v v(k) + B d d(k) y m (k) = C m x(k) + D vm v(k) + D dm d(k) Gaussian white noise u(k) v(k) Unmeasured Disturbance model d(k) Plant Model y m (k) x(k) is the n x -dimensional state vector of the plant u(k) is the n u -dimensional vector of manipulated variables v(k) is the n v -dimensional vector of measured disturbances d(k) is the n d -dimensional vector of unmeasured disturbances y m (k) is the n y -dimensional vector of measured outputs

Model Predictive Control Source: Bemporad, Morari and Ricker, Users Guide, Model Predictive Control Toolbox For use with Matlab

Motivation Related Work Cross-Layer Control Framework Evaluation Methodology Experimental Results Future Directions

Benchmarks x264 video Encoder (from FFMEPG) Application quality control variable per frame video resolution Bodytrack track human movement (from Parsec benchmark) Application quality control variable annealing layers and number of particles Visual quality determined the relative mean square error in the magnitude of position vectors Benchmarks modified to satisfy Soft Real-Time task model and allow for application quality control

Experimental Setup Dual socket Intel Clovertown (X5365) quadcore DVFS levels: 2.0 GHz, 2.33 GHz, 2.67 GHz, and 3.0 GHz Application quality levels: 4 each for x264 encoder and bodytrack Linux 2.6.36 kernel patched with Litmus-RT-2011

Sensors and Actuators DVFS (actuator) Low transition latency (~ 10 us) Cpufreq used to dynamically scale operational frequency Modulated using a delta-sigma modulator (uses feedback) Application quality (actuator) Higher transition latency (~ 500 us) Global variables protected by FMLP read-write lock Modulated using a pulse-width modulator (no feedback) Utilization (sensor) custom system call that aggregates average per-core execution time measured using a high resolution timer, and divides it by the control period

Controller Design System Identification MATLAB SI toolbox First order model fit 84.8% for x264 and 87.4% for bodytrack n x = 1, n u = 2, n v = 1, and n d = 1 Controller design MATLAB MPC toolbox C code generation MATLAB Embedded Coder x264 bodytrack Control horizon 2 4 Prediction horizon 10 12 Input weight 0, 0 0, 0 Output weight 1 1 Blocking step 5 3 Disturbance model 1 ss + 1 1 ss + 10

Motivation Related Work Cross-Layer Control Framework Evaluation Methodology Experimental Results Future Directions

Avg. FPS vs Number of Tasks bodytrack x264

Controller Step Response Input step Step change in the number of tasks from 5 to 9 at t = 50s for bodytrack % steady state error 5% % peak overshoot 30% settling time 3.8 seconds

Controller step response output step Step change utilization from 4 to 5 at t = 50s for bodytrack % steady state error 5% % peak overshoot 22% settling time 1.8 seconds

Other benefits For light task load potential to save power while meeting performance goals P α f 3 To evaluate power savings, we compare the cross-layer control vs. the non-control case for different tasks loads from ranging to light to heavy and calculate the average. Average power saving is 31% for x264 and 21% for body track Obtained at average application quality of 70% for x264 and 65% for bodytrack Fault tolerance

Task Heterogeneity and Scheduling Number of tasks FPS of x264 FPS of bodytrack x264 bodytrack C-EDF G-EDF C-EDF G-EDF 2 2 25 25 20 20 2 8 25 25 15.8 20 10 2 20.1 25 20 20 8 6 25 23.1 20 18.3 C-EDF vs G-EDF C-EDF better data locality G-EDF better load balancing G-EDF performs better when one application has much more tasks than other C-EDF performs better when both applications are more evenly matched Scheduling algorithm potentially another control variable?

How good is the LTI model? Video index % steady state error 1 music video 8.6% 31.3% 2 music video 7.5% 36.7% 3 news report 9.1% 28.9% 4 photography hacks 22.5% 0.015% 5 cooking 8.2% 32.5% 6 sports 25.7% 0.006% 7 news report 9.7% 24.3% 8 hiring program 8.9% 29.4% 9 movie clip 11.2% 19.4% 10 about champagne 9.5% 24.1% Significance level of K-S test X264 controller built with the Hubble video input Evaluate performance of controller against other popular videos drawn from YouTube Found to perform well if Kolmogorov-Smirnov test of distribution of average execution times returns a high significance level

Controller overheads x264 bodytrack About 0.5% of one control period

Motivation Related Work Cross-Layer Control Framework Evaluation Methodology Experimental Results Future Directions

What next? Non-linear control Adaptive control Power models Increased Control variables User space control Scalability

Questions and Suggestions?