APPENDIX B PARETO PLOTS PER BENCHMARK

Similar documents
Research Article Modeling the Power Variability of Core Speed Scaling on Homogeneous Multicore Systems

VOLTAGE NOISE IN PRODUCTION PROCESSORS

ROBIN: Incremental Oblique Interleaved ECC for Reliability Improvement in STT-MRAM Caches

Power-Sleuth: A Tool for Investigating your Program s Power Behavior

Freeway: Maximizing MLP for Slice-Out-of-Order Execution

Computer Architecture

Compiler-Directed Power Management for Superscalars

Lighting the Dark Silicon by Exploiting Heterogeneity on Future Processors

Exploring Heterogeneity within a Core for Improved Power Efficiency

Memory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors

Design of Negative Bias Temperature Instability (NBTI) Tolerant Register File

Managing Mismatches in Voltage Stacking with CoreUnfolding

Efficiently Exploiting Memory Level Parallelism on Asymmetric Coupled Cores in the Dark Silicon Era

CS Computer Architecture Spring Lecture 04: Understanding Performance

Hotspot Monitoring and Temperature Estimation with Miniature On-Chip Temperature Sensors

Accuracy-Configurable Adder for Approximate Arithmetic Designs

MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor

A Flexible Framework for Throttling-Enabled Multicore Management (TEMM)

Measuring and Evaluating Computer System Performance

MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor

Real Numbers and the Number Line. Unit 1 Lesson 3

Trace Based Switching For A Tightly Coupled Heterogeneous Core

Under Submission. Energy-Performance Trade-offs on Energy-Constrained Devices with Multi-Component DVFS

A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION. Scott Deeann Chen and Pierre Moulin

Use smooth curves to complete the graph between and beyond the vertical asymptotes.

Laboratory 1: Uncertainty Analysis

Performance Evaluation of Recently Proposed Cache Replacement Policies

COTSon: Infrastructure for system-level simulation

Towards a Cross-Layer Framework for Accurate Power Modeling of Microprocessor Designs

Comparing the State Estimates of a Kalman Filter to a Perfect IMM Against a Maneuvering Target

Scientific Investigation Use and Interpret Graphs Promotion Benchmark 3 Lesson Review Student Copy

DETECTION AND CLASSIFICATION OF POWER QUALITY DISTURBANCES

Power Optimization Techniques Using Multiple VDD

Detection and Verification of Missing Components in SMD using AOI Techniques

PREPARATION OF METHODS AND TOOLS OF QUALITY IN REENGINEERING OF TECHNOLOGICAL PROCESSES

MOS TRANSISTOR THEORY

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

MODELLING AND SIMULATION OF LOCAL AREA WIRELESS CHANNELS FOR WLAN PERFORMANCE ANALYSIS

Discussion 8 Solution Thursday, February 10th. Consider the function f(x, y) := y 2 x 2.

ON THE MEASUREMENT OF NON-FINANCIAL ASSETS FOURTH MEETING, 1-3 SEPTEMBER 2004, LONDON, UK THE DEMARCATION BETWEEN GFCF OF SOFTWARE AND R&D

Appendix C: Graphing. How do I plot data and uncertainties? Another technique that makes data analysis easier is to record all your data in a table.

Design of Resistive-Input Class E Resonant Rectifiers for Variable-Power Operation

Energy-Performance Trade-offs on Energy-Constrained Devices with Multi-Component DVFS

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

Keysight Technologies Vector Network Analyzer Receiver Dynamic Accuracy

Advanced electromagnetism and electromagnetic induction

POWER consumption has become a bottleneck in microprocessor

Modeling and Control of Electromagnetic Damper Himanshu Chauhan 1 Ananya Sharma 2 Ishita Singh 3

MLP-aware Instruction Queue Resizing: The Key to Power-Efficient Performance

Heads Up! A c t i v i t y 5. The Problem. Name Date

Department of Mechanical Engineering, Khon Kaen University, THAILAND, 40002

AgileRegulator: A Hybrid Voltage Regulator Scheme Redeeming Dark Silicon for Power Efficiency in a Multicore Architecture

Content Area: Mathematics- 3 rd Grade

Connected Mathematics 2, 6th Grade Units (c) 2006 Correlated to: Utah Core Curriculum for Math (Grade 6)

Field - Effect Transistor

Domino Static Gates Final Design Report

Capacity-Approaching Bandwidth-Efficient Coded Modulation Schemes Based on Low-Density Parity-Check Codes

Note to Teacher. Description of the investigation. Time Required. Materials. Procedures for Wheel Size Matters TEACHER. LESSONS WHEEL SIZE / Overview

University of Tennessee at Chattanooga. Stead State Operating Curve Report. Engr 3280L/Week 3. William Disterdick. Brown Team

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics

DEGRADED broadcast channels were first studied by

Author: Yih-Yih Lin. Correspondence: Yih-Yih Lin Hewlett-Packard Company MR Forest Street Marlboro, MA USA

Toward Holistic Modeling, Margining and Tolerance of IC Variability

LEVEL 9 Mathematics Observation

Current Feedback Loop Gain Analysis and Performance Enhancement

Female Height. Height (inches)

CHAPTER 4 PI CONTROLLER BASED LCL RESONANT CONVERTER

3. Data and sampling. Plan for today

Statistical Simulation of Multithreaded Architectures

4 The Cartesian Coordinate System- Pictures of Equations

Final Report: DBmbench

MOS CURRENT MODE LOGIC BASED PRIORITY ENCODERS

Linear Control Systems Lectures #5 - PID Controller. Guillaume Drion Academic year

=request = completion of last access = no access = transaction cycle. Active Standby Nap PowerDown. Resyn. gapi. gapj. time

Device Characterization Project #1

Year 11 Graphing Notes

Mining for Statistical Models of Availability in Large-Scale Distributed Systems: An Empirical Study of

SNR Estimation in Nakagami-m Fading With Diversity Combining and Its Application to Turbo Decoding

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

IMPLEMENTATION OF KALMAN FILTER ON VISUAL TRACKING USING PID CONTROLLER

DESCRIBING DATA. Frequency Tables, Frequency Distributions, and Graphic Presentation

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

FOUR TOTAL TRANSFER CAPABILITY. 4.1 Total transfer capability CHAPTER

Laboratory PID Tuning Based On Frequency Response Analysis. 2. be able to evaluate system performance for empirical tuning method;

A slope of a line is the ratio between the change in a vertical distance (rise) to the change in a horizontal

Junior Circle Meeting 5 Probability. May 2, ii. In an actual experiment, can one get a different number of heads when flipping a coin 100 times?

E. Slope-Intercept Form and Direct Variation (pp )

CS4617 Computer Architecture

Part I: Bell Work When solving an inequality, when would you flip the inequality sign?

Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling

CS 6290 Evaluation & Metrics

Section 5.2 Graphs of the Sine and Cosine Functions

Practice problems from old exams for math 233

Using Program Slicing to Identify Faults in Software:

Depth of Focus, part 2

Processor Setting Fundamentals -or- What Is the Crossover Point?

Appendix III Graphs in the Introductory Physics Laboratory

Identifying Long Term Voltage Stability Caused by Distribution Systems vs Transmission Systems

Efficiency Model Based On Response Surface Methodology for A 3 Phase Induction Motor Using Python

Multiple Constant Multiplication for Digit-Serial Implementation of Low Power FIR Filters

Transcription:

IEEE TRANSACTIONS ON COMPUTERS, VOL., NO., SEPTEMBER 1 APPENDIX B PARETO PLOTS PER BENCHMARK Appendix B contains all Pareto frontiers for the SPEC CPU benchmarks as calculated by the model (green curve) and simulated by Sniper (blue curve). The red points are the configurations for which our model predicted they are Pareto-optimal, but shown with their simulated performance and power consumption. The difference between the blue and green curves shows the error of the model, while the difference between the red points and the blue curve indicates how well we can predict actual Pareto-optimal configurations. Next to the visual matching, we show various metrics underneath each figure: the average absolute error for performance and power, as well as sensitivity, specificity and the Hypervolume Ratio (HVR) [31]. Sensitivity and specificity quantify the fraction of predicted actual Pareto-optimal and non-pareto-optimal designs, respectively. HVR quantifies how well we can predict the range of solutions across the entire frontier. Put together, these metrics denote how good each predicted Pareto frontier is. The average values over the whole design space are %, 7% and 97% for specificity, sensitivity and HVR, respectively. Hence, our model is very good at predicting the actual range of the Pareto frontier (HVR) and also at filtering out the non-pareto optimal solutions (sensitivity), but performs less good on detecting all Pareto-optimal designs (specificity). However, a visual inspection of the Pareto frontiers shows us that either we find only a few designs in a large cluster of Pareto-optimal designs that are very close to each other, which leads to lower sensitivity but which we deem acceptable or we miss some Pareto-optimal designs that are not useful to implement (e.g., the designs on the left vertical tail of bzip2: large power increase with only a small performance gain). We include additional explanation for some of the Pareto frontiers: bzip2, h2ref, gobmk and soplex: The model misses the top-left tail of the Pareto frontier, which appears to be almost vertical. However, these designs are less interesting to find because they represent a marginal increase in performance while power increases substantially. Furthermore, this tail is always comprised of less than % of the total designs... 1. 1. 2. 2. 3. astar abs. error performance / power:.% /.% Sensitivity:.% - Specificity: 9.% HVR: 99.3% gromacs: As shown in the phase graph (see Appendix A), we make a systematic error across all configurations. However, this still leads to good relative accuracy when changing the processor configuration. This systematic error is shown in the Pareto frontier: the green curve is a shifted version of the blue curve where the error for all CPI values is indeed around 22% to the left. Due to the good relative accuracy, the designs on the frontier are almost exactly the same as the ones on the frontier (almost all red points are part of the blue curve). hmmer: There is a tail on the right that we do not predict accurately. This is similar to bzip2 etc. for which we not accurately predict the tail on the left. We still see most of the Pareto-optimal designs on that right tail, but not all Pareto-optimal designs in the knees of the curve, leading to lower sensitivity. perlbench: We do not predict the left vertical tail, and we predict two dense clusters of Pareto-optimal designs. However, this is not an issue since there are actually no Pareto optimal designs in between those clusters. The frontier actually connects the Paretooptimal clusters. sjeng: The model does not find any of the designs on the left vertical tail because it cannot properly estimate the decrease in branch misprediction rate of using the gshare branch predictor. The model classifies all branch predictors as performing approximately the same, while in fact, the gshare branch predictor outperforms the others for the larger dispatch widths. sphinx3: We do not see the left vertical tail, which in this case is actually built up out around designs. However, those designs are all clustered on places on the vertical tail, and are less interesting because they double power consumption for a gain in performance of less than %. xalancbmk: Here we observe designs which are not actually Pareto-optimal. However, these points are still close to being Pareto-optimal. 2 22 1. 1. 1.2 1. 1. bwaves abs. error performance / power: 7.% / 2.% Sensitivity: 27.% - Specificity: 77.7% HVR: 9.7%

IEEE TRANSACTIONS ON COMPUTERS, VOL., NO., SEPTEMBER 2 3...7..9 1. 1.1 1.2 1.3 3 bzip2 abs. error performance / power:.% / 2.% Sensitivity: 1.9% - Specificity:.% HVR: 9.%.3....7..9 1. 3 calculix abs. error performance / power: 2.9% / 2.1% Sensitivity: 31.% - Specificity:.1% HVR: 99.%....7..9 gamess abs. error performance / power:. / 1.9% Sensitivity:.% - Specificity: 93.% HVR: 97.9%..7..9 1. 1.1 1.2 1.3 1. cactusadm abs. error performance / power:.1% /.7% Sensitivity: 2.% - Specificity:.% HVR: 99.%..7..9 1. 1.1 1.2 1.3 1. 1 dealii abs. error performance / power:.% / 2.1% Sensitivity:.% - Specificity:.% HVR: 9.% 1. 1. 2. 2. 3. 3..... gcc abs. error performance / power: 13.2% / 2.% Sensitivity:.9% - Specificity:.2% HVR: 99.%

IEEE TRANSACTIONS ON COMPUTERS, VOL., NO., SEPTEMBER 3.. 1. 1.2 1. 1. 1. 2. 2.2 GemsFDTD abs. error performance / power:.% /.2% Sensitivity: 2.1% - Specificity: 9.2% HVR: 99.%...7..9 1. 1.1 1.2 1.3 gromacs abs. error performance / power: 22.9% / % Sensitivity:.% - Specificity: 92.% HVR: 99.2% 3.......7.7. hmmer abs. error performance / power:.3% / 3.% Sensitivity: 33.3% - Specificity: 91.3% HVR: 9.7%...7..9 1. 1.1 3 gobmk abs. error performance / power:.3% / 2.% Sensitivity: 1.3% - Specificity: 93.% HVR: 9.3%......7.7.. h2ref abs. error performance / power: 2.7% / 1.% Sensitivity:.7% - Specificity: 92.3% HVR: 9.%. 1. 1.2 1. 1. 1. 2. 2.2 lbm abs. error performance / power:.9% /.% Sensitivity:.% - Specificity:.% HVR: 9.3%

IEEE TRANSACTIONS ON COMPUTERS, VOL., NO., SEPTEMBER.. 1. 1.2 1. 1. 1. leslie3d abs. error performance / power: 17.% /.3% Sensitivity: 31.1% - Specificity: 7.% HVR: 9.1% 2 3 7 9 mcf abs. error performance / power:.9% / 2.% Sensitivity: 2.% - Specificity: 7.% HVR: 9.9% 3...3.......7 namd abs. error performance / power:. /.% Sensitivity: 2.2% - Specificity: 97.% HVR: 99.% 1 1. 1. 2. 2. 3. 3.... libquantum abs. error performance / power: 11.% /.1% Sensitivity: 33.3% - Specificity: 93.% HVR: 97.% 22 1 1. 1. 2. 2. 3. 3. milc abs. error performance / power:.% / 3.1% Sensitivity:.3% - Specificity: 9.% HVR:.%. 1. 1. 2. 2. 3. 3. omnetpp abs. error performance / power: 9.2% /.2% Sensitivity:.7% - Specificity:.9% HVR: 99.2%

IEEE TRANSACTIONS ON COMPUTERS, VOL., NO., SEPTEMBER 3.3....7..9 3 perlbench abs. error performance / power: 3.% / 2.1% Sensitivity: 3.% - Specificity: 79.% HVR: 93.9%....7..9 3 sjeng abs. error performance / power:.3% / 3.% Sensitivity: 3.% - Specificity: 9.3% HVR: 7.%.. 1. 1. 2. 2. 3. sphinx3 abs. error performance / power:.% /.% Sensitivity:.% - Specificity:.1% HVR: 9.% 3....7..9 1. povray abs. error performance / power:.% / 2.% Sensitivity: 9.% - Specificity: 9.1% HVR: 9.% 1 2 3 soplex abs. error performance / power:.7% /.% Sensitivity: 2.% - Specificity: 9.2% HVR: 99.7% 3....7. tonto abs. error performance / power:.% / 3.% Sensitivity: 7.9% - Specificity: 9.% HVR: 99.%

IEEE TRANSACTIONS ON COMPUTERS, VOL., NO., SEPTEMBER 3... 1. 1.2 1. 1. 1. wrf abs. error performance / power: 11.7% -.% Sensitivity: 7.% - Specificity: 3.3% HVR: 9.%... 1. 1.2 1. xalancbmk abs. error performance / power:.9% -.% Sensitivity: 7.% - Specificity: 1.% HVR: 9.3%.. 1. 1.2 1. 1. 1. zeusmp abs. error performance / power: 7.% /.% Sensitivity: 1.% - Specificity: 9.% HVR: 99.3%