Characterizing, Optimizing, and Auto-Tuning Applications for Energy Efficiency

Size: px
Start display at page:

Download "Characterizing, Optimizing, and Auto-Tuning Applications for Energy Efficiency"

Transcription

1 PhD Dissertation Proposal Characterizing, Optimizing, and Auto-Tuning Applications for Efficiency Wei Wang The Committee: Chair: Dr. John Cavazos Member: Dr. Guang R. Gao Member: Dr. James Clause Member: Dr. Allan Porterfield January 28, 205

2 2 / 45 HPC Optimization Challenge Table: Performance, power, and energy efficiency of top/green500 and exascale systems System Name Performance (TFLOP/s) Power (KW) GFLOPS/W Exascale System,000,000 20, MilkyWay-2 33, ,808.9 Titan 7, , L-CSC Exascale computing requires more than 20 improvement in GFLOPS/Watts.

3 3 / 45 Outline Auto-Tuning and Optimization in Polyhedral Optimization Space 2 Optimization with Program Characterization and CPU Clock Modulation 3 Proposed Future Work

4 4 / 45 Outline Auto-Tuning and Optimization in Polyhedral Optimization Space 2 Optimization with Program Characterization and CPU Clock Modulation 3 Proposed Future Work

5 5 / 45 Motivation Polyhedral optimization effective for optimizing computational kernels Accurate predictive performance model derived Effective Predictive Performance Model Effective Predictive Model?

6 6 / 45 Adapting AutoTuning Framework for Program Characterization Loop Pattern Control Flow Graph Optimization Sequences Src-to-Src Compiler Profiling Counters Machine Learning Algorithms SVM Figure: Auto-Tuning framework for energy. (Refs: Park et al. CGO, CGO 2, IJPP 3, ICPP 4)

7 7 / 45 Measurement using RCRTool SandyBridge Monitors MSR Counter 000Hz+ update frequency Measures energy, computes power KNIGHT s Corner Built-in power measurement used (/sys/class/micras/power) 20Hz update frequency Measures power, computes energy Figure: Simplified view of RCRtool energy monitoring

8 8 / 45 Measurement APIs Figure: Original Program Figure: Added with energy profiling APIs

9 9 / 45 Polyhedral Compilers Generate code variants of programs containing Static Control Parts (SCoP) using PoCC (Polyhedral Compiler Collection) Loop Transformations Auto Parallelization (PLUTO) Tested Applications Existing: Polybench New: 2D Cardiac Wave Propagation Simulation, LULESH

10 0 / 45 Exposing SCoP Figure: Simplified version of the original and the transformed loop nest

11 / 45 Profiling of Different Program Optimizations Figure: Workflow of energy-aware polyhedral framework

12 2 / 45 Experiments Setup Hardware Intel Xeon E (dual socket 8-core processor with 20MB cache) Xeon Phi coprocessor (6 cores,.09ghz, 52KB cache each) Software Polyhedral Compilers: PoCC v.2 and Polyopt v0.2. Application: Polybench v3.2 and LULESH v.0 (OpenMP) Back-end Compilers: GCC v4.4.6 and ICC v4.0.0

13 Consumption and Execution Correlation (Polybench) Covariance Polybench 2mm Polybench (joules) Program Variants Execution (seconds) (joules) Program Variants Execution (seconds) Best optimizations for time are best for energy savings for these two polybench application. 3 / 45

14 4 / 45 Consumption and Execution Correlation (Stencil Seidel2D Polybench and LULESH) Seidel2D LULESH (joules) Program Variants Execution (seconds) (joules) Program Variants Execution (seconds) Jumps in Seidel2D energy usage (and decreased execution time) are results of turning parallelization on.

15 5 / 45 Polyhedral Optimizations on a Realistic Application 2D Cardiac Wave Propagation Simulation /Performance improvement on the Sandy Bridge system Speedups Problem Size Normalized Savings Baseline: manual OpenMP implementation

16 6 / 45 Results on Xeon Phi for Cardiac Simulation Speedups Manual Polyopt Problem Size Speedups Speedups Savings Problem Size Normalized Savings Conclusion: Polyhedral approach is effective in optimizing the 2D Cardiac Wave Propagation Simulation.

17 Consumption and Execution Correlation (2D Cardiac Wave Propagation Simulation) (joules) Program Variants Execution (seconds) (joules) Program Variants Execution (seconds) Left: Sandy Bridge Right: Xeon Phi Conclusion: Saving energy consumption is consistent with improving performance on both processors 7 / 45

18 8 / 45 Conclusion Tuning for time can be used as proxy to tuning for energy Polyhedral optimizations for realistic applications possible

19 9 / 45 Outline Auto-Tuning and Optimization in Polyhedral Optimization Space 2 Optimization with Program Characterization and CPU Clock Modulation 3 Proposed Future Work

20 20 / 45 Motivation HPC energy optimizations focus on DVFS DVFS only applied in the coarse-grain cases Fine-grained energy control requires faster frequency transition techniques

21 CPU Clock Modulation Write Specific Value to IA32 CLOCK MODULATION (0x9a) MSR Modify /dev/cpu/cpu{0:5}/msr with root privilege Invoke wrmsr inline assembly from applications using added System Call Figure: CPU Clock Modulation. Sample Modulation with 25% Duty Cycle. (Source: IA-32 Intel Architecture Software Developer s Manual, Volume 3: System Programming Guide) 2 / 45

22 Available Frequencies Duty Cycle Level Binary Decimal Hexadecimal Effective Frequency 000B 7 H 6.25% 2 000B 8 2H 2.5% 3 00B 9 3H 8.75% 4 000B 20 4H 25% 5 00B 2 5H 3.25% 6 00B 22 6H 37.5% 7 00B 23 7H 43.75% 8 000B 24 8H 50% 9 00B 25 9H 56.25% 0 00B 26 AH 63.5% 0B 27 BH 69.75% 2 00B 28 CH 75% 3 0B 29 DH 8.25% 4 0B 30 EH 87.5% 5 B 3 FH 93.75% B 0 00H 00% 22 / 45

23 23 / 45 Benchmarks and Experimental Setup Benchmarks LULESH : Hydrodynamics minife from Mantevo Project: implicit finite-element application brdr2d: 2D Cardiac Wave Propagation Simulation Polybench: 30 Computational Kernels 2 Hardware/Software Setup Intel Xeon E (Dual Socket, 8-core processor with 20MB LLC, 2.7GHz) Linux with ACPI and MSR modules Intel ICC v4.0.2 with -O3 RCRdaemon taskset to core 6

24 24 / 45 Loops with High Memory Access.6.4 EDP EDP.8.6 Normalized Metrics % 56.25% 50% 43.75% 37.5% 3.25% 25% 8.75% 2.5% 6.25%.2 Normalized Metrics % 87.5% 8.25% 75% 68.75% % % 43.75% 37.5% 56.25% 62.5% 75% 68.75% 87.5% 8.25% 93.75% % 0.4 Duty Cycle (Clock Skipping) (a) LULESH with DCM Duty Cycle (Clock Skipping) (b) jacobi-2d Polybench

25 24 / 45 Loops with High Memory Access reduced and EDP lowered with very low performance EDP impact Normalized Metrics % 87.5% 8.25% 75% 68.75% 62.5% 56.25% 50% 43.75% 37.5% 3.25% 25% 8.75% 2.5% 6.25% 00% Normalized Metrics % 43.75% 50% 56.25% 62.5% 68.75% 75% 8.25% 87.5% 93.75% 00% Duty Cycle (Clock Skipping) (a) LULESH with DCM Duty Cycle (Clock Skipping) (b) jacobi-2d Polybench

26 25 / 45 Loops with Low Memory Access (More Computation) Normalized Metrics EDP Normalized Metrics EDP % 00% 93.75% 87.5% 8.25% 75% 68.75% 62.5% 56.25% 50% 43.75% 37.5% 3.25% 25% 8.75% 2.5% 6.25% Duty Cycle (Clock Skipping) (a) LULESH 93.75% 87.5% 8.25% 75% 68.75% 62.5% 56.25% 50% 43.75% 37.5% 3.25% 25% 8.75% 2.5% 6.25% Duty Cycle (Clock Skipping) (b) minife

27 25 / 45 Loops with Low Memory Access (More Computation) EDP EDP reduced with high performance impact, resulting in higher EDP Normalized Metrics % 56.25% 50% 43.75% 37.5% 3.25% 25% 8.75% 2.5% 6.25% 93.75% 87.5% 8.25% 75% 68.75% 00% Normalized Metrics % 50% 43.75% 37.5% 3.25% 25% 8.75% 2.5% 6.25% 93.75% 87.5% 8.25% 75% 68.75% 62.5% 00% Duty Cycle (Clock Skipping) (a) LULESH Duty Cycle (Clock Skipping) (b) minife

28 26 / 45 Loops with Balanced Memory Access and Computation.8.6 EDP EDP.8.6 Normalized Metrics Normalized Metrics % % % 87.5% 8.25% 75% 68.75% 62.5% 56.25% 50% 43.75% 37.5% 3.25% 25% 8.75% 2.5% 6.25% Duty Cycle (Clock Skipping) (a) LULESH 93.75% 87.5% 8.25% 75% 68.75% 62.5% 56.25% 50% 43.75% 37.5% 3.25% 25% 8.75% 2.5% 6.25% Duty Cycle (Clock Skipping) (b) minife

29 26 / 45 Loops with Balanced Memory Access and Computation reduced and EDP lowered EDP.6 with.6 relatively low performance EDP impact.6 Normalized Metrics % 87.5% 8.25% 75% 68.75% 62.5% 56.25% 50% 43.75% 37.5% 3.25% 25% 8.75% 2.5% 6.25% 00% Normalized Metrics % 87.5% 8.25% 75% 68.75% 62.5% 56.25% 50% 43.75% 37.5% 3.25% 25% 8.75% 2.5% 6.25% 00% Duty Cycle (Clock Skipping) (a) LULESH Duty Cycle (Clock Skipping) (b) minife

30 Polybench Loops Normalized Metrics durbin adi jacobi-2d-imper fdtd-2d lu jacobi-d-imper cholesky mvt floyd-warshall gemver gesummv covariance gemm gramschmidt ludcmp dynprog Benchmarks EDP Frequency 2mm 3mmsyrk trmm correlation doitgen fdtd-apml trisolv atax bicg seidel-2d Polybench programs running at the best non-full speed setting reg-detect syr2k symm 27 / 45

31 Polybench Loops Normalized Metrics Loops have different energy characteristics responding to frequency EDP changes Frequency durbin adi jacobi-2d-imper fdtd-2d lu jacobi-d-imper cholesky mvt floyd-warshall gemver gesummv covariance gemm gramschmidt ludcmp dynprog Benchmarks 2mm 3mmsyrk trmm correlation doitgen fdtd-apml trisolv atax bicg seidel-2d Polybench programs running at the best non-full speed setting reg-detect syr2k symm 27 / 45

32 Memory Access Density vs. Three Types of Loops. EDP MAD Normalized EDP durbin adi jacobi-2d-imper fdtd-2d lu jacobi-d-imper cholesky mvt floyd-warshall gemver gesummv covariance gemm gramschmidt ludcmp dynprog Benchmarks 2mm 3mmsyrk trmm correlation doitgen fdtd-apml trisolv atax bicg seidel-2d reg-detect syr2k symm MAD Value 28 / 45

33 Memory Access Density vs. Three Types of Loops Normalized EDP Memory Access Density could be used as loop type indictor durbin adi jacobi-2d-imper EDP fdtd-2d lu jacobi-d-imper cholesky mvt floyd-warshall gemver gesummv covariance gemm gramschmidt ludcmp dynprog Benchmarks MAD 2mm 3mmsyrk trmm correlation doitgen fdtd-apml trisolv atax bicg seidel-2d reg-detect syr2k symm MAD Value 28 / 45

34 29 / 45 Loop Characterization Summary Loop Type Mem-Int. Comp-Int. Balanced Characteristics Power reduced and EDP significantly lowered. Full frequency required Saves energy with relatively low performance degradation.

35 for (j=0; j<n; j++) {... } } 30 / 45 Hybrid Execution of Multi-loop Applications Adding energy control APIs around loops Fine-grain loop regions require fast machine power-state transition to avoid overhead while (condition) {... setlowfrequency(); for (i=0; i<n; i++) {... } resetfrequency();

36 3 / 45 LULESH Results Table: Comparison of execution time, energy consumption, and EDP for LULESH Version Duty Cycle Level EDP mint 00% mine 56.25% minedp 8.25% Hybrid 00% & 50% Hybrid 2 00% & 62.5% Hybrid 3 00% & 68.75%

37 32 / 45 minife Results Table: Comparison of execution time, energy consumption, and EDP for minife Version Duty Cycle Level EDP mint 00% mine 62.5% minedp 8.25% Hybrid 00% & 8.25% Hybrid 2 00% & 87.5% Hybrid 3 00% & 93.75%

38 Fine-grained DVFS vs. Fine-grained DCM (Entire Application) Normalized Metrics EDP H-DCM-50% H-DCM-62.5% H-DCM-68.75% H-DVFS-2.6GHz H-DVFS-2.5GHz H-DVFS-2.4G Hybrid Versions H-DVFS-2.3GHz H-DVFS-2.2G H-DVFS-2.G 33 / 45

39 DVFS vs. DCM: Another Comparison Measure and compare the execution time and power of Loop and Loop2 with and without energy control APIs. while (condition) {... //Loop MemLoop(); //Loop2 CompLoop(); OtherLoops(); } VS. while (condition) {... setfrequency(); MemLoop(); resetfrequency(); CompLoop(); OtherLoops(); } 34 / 45

40 35 / 45 Power Transition Overhead DVFS-2.4GHz.5 Power DVFS-2.GHz DVFS-.8GHz DCM-68.75% DCM-62.5% DCM-50% Normalized Metric Loop (MemLoop) Normalized Metric Loop2 (CompLoop) Power

41 35 / 45 Power Transition Overhead.5 Power DVFS-2.4GHz DVFS-2.GHz DVFS-.8GHz DVFS energy control is not synchronized with fine-grain loops, but DCM is. DCM-68.75% 0.5 Normalized Metric Loop (MemLoop) 0 DCM-62.5% DCM-50% Normalized Metric Loop2 (CompLoop) Power

42 Fine-Grain vs. Coarse-Grain DVFS/DCM Control of Loop (MemLoop).5 FG- FG-Power CG- CG-Power Normalized Metric DCM-50% DCM-62.5% DCM-68.75% DVFS-.8GHz DVFS-2.GHz Control Settings DVFS-2.4GHz 36 / 45

43 Fine-Grain vs. Coarse-Grain DVFS/DCM Control of Loop (MemLoop) Normalized Metric FG- FG-Power CG- CG-Power DCM fine-grain energy control is almost identical to coarse-grain. 0 DCM-50% DVFS overhead is larger than DCM. DCM-62.5% DCM-68.75% DVFS-.8GHz DVFS-2.GHz Control Settings DVFS-2.4GHz 36 / 45

44 37 / 45 Clock Skipping with Concurrency Throttling Concurrency Throttling mitigates resource contention Clock Modulation reduces idle state power # of Threads % 75% 8.25% 87.5% 93.75% Clock Skipping # of Threads % 75% 8.25% 87.5% 93.75% Clock Skipping (c) with concurrency throttling and clock skipping. Minimum occurs at (75%, 4) (d) with concurrency throttling and clock skipping. Minimum occurs at (00%, 6) Figure: fdtd-2d Polybench

45 38 / 45 LULESH # of Threads % 75% 8.25% 87.5% 93.75% Clock Skipping # of Threads % 75% 8.25% 87.5% 93.75% Clock Skipping (a) results. Minimum occurs at (75%, 6) (b) results. Minimum occurs at (00%, 8) Version # of Threads Duty Cycle Level EDP Default 6 00% CT 6 00% CT+CS 6 75%

46 39 / 45 minife # of Threads % 75% 8.25% 87.5% 93.75% Clock Skipping # of Threads % % 8.25% 87.5% 93.75% Clock Skipping (a) results. Minimum occurs at (75%, 8) (b) results. Minimum occurs at (00%, 4) Version # of Threads Duty Cycle Level EDP Default 6 00% CT 0 00% CT+CS %

47 40 / 45 When Concurency Throttling is Not Beneficial # of Threads % 75% 8.25% 87.5% 93.75% Clock Skipping # of Threads % 8.25% 87.5% 93.75% 00% Clock Skipping (a) results. Minimum occurs at (75%, 6) (b) results. Minimum occurs at (00%, 6) Figure: brdr2d results when applying both concurrency throttling and clock skipping.

48 Concurrency Throttling and Memory Access Density Normalized /EDP durbin adi jacobi-2d-imper EDP NumThreads fdtd-2d lu jacobi-d-imper cholesky mvt floyd-warshall gemver gesummv covariance gemm gramschmidt ludcmp dynprog Benchmarks 2mm 3mmsyrk trmm correlation doitgen fdtd-apml trisolv atax bicg seidel-2d reg-detect syr2k symm Number of Threads 4 / 45

49 Concurrency Throttling and Memory Access Density Normalized /EDP Loops with high Mem-value tend to benefit from concurrency throttling durbin adi jacobi-2d-imper EDP NumThreads fdtd-2d lu jacobi-d-imper cholesky mvt floyd-warshall gemver gesummv covariance gemm gramschmidt ludcmp dynprog Benchmarks 2mm 3mmsyrk trmm correlation doitgen fdtd-apml trisolv atax bicg seidel-2d reg-detect syr2k symm Number of Threads 4 / 45

50 42 / 45 Conclusion Hybrid execution of OpenMP loops with Clock Modulation can achieve better energy efficiency 2 Concurrency throttling can be combined with Clock Modulation to save more energy

51 43 / 45 Outline Auto-Tuning and Optimization in Polyhedral Optimization Space 2 Optimization with Program Characterization and CPU Clock Modulation 3 Proposed Future Work

52 44 / 45 Future Work Build predictive energy/performance model (May 205) 2 Enhance results with frequency/threads configuration (Aug. 205)

53 Future Work Power Impacts of Polyhedral Optimizations (Dec. 205) Power (Watts) With-tiling-Power W/o-tiling-Power Program Variants With-tiling- W/o-tiling- Figure: Covariance Polybench program variants with and without loop tiling Execution (seconds) Extending DCM Optimization Technique (Dec. 205) MPI/OpenMP programs Runtime Control Automating frequency/number-ofthreads configuration 45 / 45

Evaluation of CPU Frequency Transition Latency

Evaluation of CPU Frequency Transition Latency Evaluation of CPU Frequency Transition Latency Abdelhafid Mazouz 1 Alexandre Laurent 1 Benoît Pradelle 1 William Jalby 1 1 University of Versailles Saint-Quentin-en-Yvelines, France ENA-HPC 2013, Dresden

More information

Understanding the Interactions Hardware/Software Parameters on the Energy Consumption of Multi-Threaded Applications

Understanding the Interactions Hardware/Software Parameters on the Energy Consumption of Multi-Threaded Applications Understanding the Interactions Hardware/Software Parameters on the Energy Consumption of Multi-Threaded Applications Jeyarajan Thiyagalingam, Anne E. Trefethen April 29, 2014 Abstract In recent years,

More information

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp Programming and Optimization with Intel Xeon Phi Coprocessors Colfax Developer Training One-day Boot Camp Abstract: Colfax Developer Training (CDT) is an in-depth intensive course on efficient parallel

More information

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Labs CDT 102

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Labs CDT 102 Programming and Optimization with Intel Xeon Phi Coprocessors Colfax Developer Training One-day Labs CDT 102 Abstract: Colfax Developer Training (CDT) is an in-depth intensive course on efficient parallel

More information

Evaluation of CPU Frequency Transition Latency

Evaluation of CPU Frequency Transition Latency Noname manuscript No. (will be inserted by the editor) Evaluation of CPU Frequency Transition Latency Abdelhafid Mazouz Alexandre Laurent Benoît Pradelle William Jalby Abstract Dynamic Voltage and Frequency

More information

Latency-aware DVFS for Efficient Power State Transitions on Many-core Architectures

Latency-aware DVFS for Efficient Power State Transitions on Many-core Architectures J Supercomput manuscript No. (will be inserted by the editor) Latency-aware DVFS for Efficient Power State Transitions on Many-core Architectures Zhiquan Lai King Tin Lam Cho-Li Wang Jinshu Su Received:

More information

Energy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control

Energy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control Energy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control Guangyi Cao and Arun Ravindran Department of Electrical and Computer Engineering University of North Carolina at Charlotte

More information

Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes

Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Rachata Ausavarungnirun Joshua Landgraf Vance Miller Saugata Ghose Jayneel Gandhi Christopher J. Rossbach Onur

More information

Characterizing and Improving the Performance of Intel Threading Building Blocks

Characterizing and Improving the Performance of Intel Threading Building Blocks Characterizing and Improving the Performance of Intel Threading Building Blocks Gilberto Contreras, Margaret Martonosi Princeton University IISWC 08 Motivation Chip Multiprocessors are the new computing

More information

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Performance Metrics http://www.yildiz.edu.tr/~naydin 1 2 Objectives How can we meaningfully measure and compare

More information

FROM KNIGHTS CORNER TO LANDING: A CASE STUDY BASED ON A HODGKIN- HUXLEY NEURON SIMULATOR

FROM KNIGHTS CORNER TO LANDING: A CASE STUDY BASED ON A HODGKIN- HUXLEY NEURON SIMULATOR FROM KNIGHTS CORNER TO LANDING: A CASE STUDY BASED ON A HODGKIN- HUXLEY NEURON SIMULATOR GEORGE CHATZIKONSTANTIS, DIEGO JIMÉNEZ, ESTEBAN MENESES, CHRISTOS STRYDIS, HARRY SIDIROPOULOS, AND DIMITRIOS SOUDRIS

More information

CS4617 Computer Architecture

CS4617 Computer Architecture 1/26 CS4617 Computer Architecture Lecture 2 Dr J Vaughan September 10, 2014 2/26 Amdahl s Law Speedup = Execution time for entire task without using enhancement Execution time for entire task using enhancement

More information

CP2K PERFORMANCE FROM CRAY XT3 TO XC30. Iain Bethune Fiona Reid Alfio Lazzaro

CP2K PERFORMANCE FROM CRAY XT3 TO XC30. Iain Bethune Fiona Reid Alfio Lazzaro CP2K PERFORMANCE FROM CRAY XT3 TO XC30 Iain Bethune (ibethune@epcc.ed.ac.uk) Fiona Reid Alfio Lazzaro Outline CP2K Overview Features Parallel Algorithms Cray HPC Systems Trends Water Benchmarks 2005 2013

More information

Declarative Tuning for Locality in Parallel Programs

Declarative Tuning for Locality in Parallel Programs Declarative Tuning for Locality in Parallel Programs Sanjay Chatterjee, Nick Vrvilo, Zoran Budimlic, Kathleen Knobe, Vivek Sarkar Rice University Habanero Extreme Scale Software Research Project 2 Tuning

More information

Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs

Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs Michael Gordon, William Thies, and Saman Amarasinghe Massachusetts Institute of Technology ASPLOS October 2006 San Jose,

More information

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir Parallel Computing 2020: Preparing for the Post-Moore Era Marc Snir THE (CMOS) WORLD IS ENDING NEXT DECADE So says the International Technology Roadmap for Semiconductors (ITRS) 2 End of CMOS? IN THE LONG

More information

Early Adopter : Multiprocessor Programming in the Undergraduate Program. NSF/TCPP Curriculum: Early Adoption at the University of Central Florida

Early Adopter : Multiprocessor Programming in the Undergraduate Program. NSF/TCPP Curriculum: Early Adoption at the University of Central Florida Early Adopter : Multiprocessor Programming in the Undergraduate Program NSF/TCPP Curriculum: Early Adoption at the University of Central Florida Narsingh Deo Damian Dechev Mahadevan Vasudevan Department

More information

Automatic Energy Saving Schemes for Parallel Applications

Automatic Energy Saving Schemes for Parallel Applications Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2013 Automatic Energy Saving Schemes for Parallel Applications Vaibhav Sundriyal Iowa State University Follow

More information

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική Υπολογιστών Presentation of UniServer Horizon 2020 European project findings: X-Gene server chips, voltage-noise characterization, high-bandwidth voltage measurements,

More information

An Energy Conservation DVFS Algorithm for the Android Operating System

An Energy Conservation DVFS Algorithm for the Android Operating System Volume 1, Number 1, December 2010 Journal of Convergence An Energy Conservation DVFS Algorithm for the Android Operating System Wen-Yew Liang* and Po-Ting Lai Department of Computer Science and Information

More information

Signal Processing on GPUs for Radio Telescopes

Signal Processing on GPUs for Radio Telescopes Signal Processing on GPUs for Radio Telescopes John W. Romein Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands 1 Overview radio telescopes motivation processing pipelines signal-processing

More information

Power Capping Via Forced Idleness

Power Capping Via Forced Idleness Power Capping Via Forced Idleness Rajarshi Das IBM Research rajarshi@us.ibm.com Anshul Gandhi Carnegie Mellon University anshulg@cs.cmu.edu Jeffrey O. Kephart IBM Research kephart@us.ibm.com Mor Harchol-Balter

More information

Lec 24: Parallel Processors. Announcements

Lec 24: Parallel Processors. Announcements Lec 24: Parallel Processors Kavita ala CS 3410, Fall 2008 Computer Science Cornell University P 3 out Hack n Seek nnouncements The goal is to have fun with it Recitations today will talk about it Pizza

More information

Multiple Clock and Voltage Domains for Chip Multi Processors

Multiple Clock and Voltage Domains for Chip Multi Processors Multiple Clock and Voltage Domains for Chip Multi Processors Efraim Rotem- Intel Corporation Israel Avi Mendelson- Microsoft R&D Israel Ran Ginosar- Technion Israel institute of Technology Uri Weiser-

More information

An Adaptive Core-specific Runtime for Energy Efficiency

An Adaptive Core-specific Runtime for Energy Efficiency 2017 IEEE International Parallel and Distributed Processing Symposium An Adaptive Core-specific Runtime for Energy Efficiency Sridutt Bhalachandra, Allan Porterfield, Stephen L. Olivier, Jan F. Prins Department

More information

CUDA-Accelerated Satellite Communication Demodulation

CUDA-Accelerated Satellite Communication Demodulation CUDA-Accelerated Satellite Communication Demodulation Renliang Zhao, Ying Liu, Liheng Jian, Zhongya Wang School of Computer and Control University of Chinese Academy of Sciences Outline Motivation Related

More information

Analysis of Image Compression Algorithm: GUETZLI

Analysis of Image Compression Algorithm: GUETZLI Analysis of Image Compression Algorithm: GUETZLI Lingyi Li August 18, 2017 Abstract How to balance picture size and quality is the core of image compression. This paper evaluates Google's jpeg image compression

More information

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System Ho Young Kim, Robert Maxwell, Ankil Patel, Byeong Kil Lee Abstract The purpose of this study is to analyze and compare the

More information

Final Report: DBmbench

Final Report: DBmbench 18-741 Final Report: DBmbench Yan Ke (yke@cs.cmu.edu) Justin Weisz (jweisz@cs.cmu.edu) Dec. 8, 2006 1 Introduction Conventional database benchmarks, such as the TPC-C and TPC-H, are extremely computationally

More information

Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence

Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence Katayoun Neshatpour George Mason University kneshatp@gmu.edu Amin Khajeh Broadcom Corporation amink@broadcom.com Houman Homayoun

More information

NRC Workshop on NASA s Modeling, Simulation, and Information Systems and Processing Technology

NRC Workshop on NASA s Modeling, Simulation, and Information Systems and Processing Technology NRC Workshop on NASA s Modeling, Simulation, and Information Systems and Processing Technology Bronson Messer Director of Science National Center for Computational Sciences & Senior R&D Staff Oak Ridge

More information

Experience with new architectures: moving from HELIOS to Marconi

Experience with new architectures: moving from HELIOS to Marconi Experience with new architectures: moving from HELIOS to Marconi Serhiy Mochalskyy, Roman Hatzky 3 rd Accelerated Computing For Fusion Workshop November 28 29 th, 2016, Saclay, France High Level Support

More information

DYNAMIC VOLTAGE FREQUENCY SCALING (DVFS) FOR MICROPROCESSORS POWER AND ENERGY REDUCTION

DYNAMIC VOLTAGE FREQUENCY SCALING (DVFS) FOR MICROPROCESSORS POWER AND ENERGY REDUCTION DYNAMIC VOLTAGE FREQUENCY SCALING (DVFS) FOR MICROPROCESSORS POWER AND ENERGY REDUCTION Diary R. Suleiman Muhammed A. Ibrahim Ibrahim I. Hamarash e-mail: diariy@engineer.com e-mail: ibrahimm@itu.edu.tr

More information

IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, VOL. 1, NO. 1, JANUARY

IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, VOL. 1, NO. 1, JANUARY This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 1.119/TMSCS.218.287438,

More information

Enhancing Power, Performance, and Energy Efficiency in Chip Multiprocessors Exploiting Inverse Thermal Dependence

Enhancing Power, Performance, and Energy Efficiency in Chip Multiprocessors Exploiting Inverse Thermal Dependence 778 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 4, APRIL 2018 Enhancing Power, Performance, and Energy Efficiency in Chip Multiprocessors Exploiting Inverse Thermal Dependence

More information

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs Li Zhou and Avinash Kodi Technologies for Emerging Computer Architecture Laboratory (TEAL) School of Electrical Engineering and

More information

Platform Comptence Center Report

Platform Comptence Center Report Platform Comptence Center Report CERN openlab Major Review Feb 2014 Paweł Szostek, CERN openlab On behalf of G.Bitzes, S.Jarp, P.Karpinski, A.Nowak, A.Santogidis, P.Szostek, L. Valsan Outline Manpower

More information

Fall 2015 COMP Operating Systems. Lab #7

Fall 2015 COMP Operating Systems. Lab #7 Fall 2015 COMP 3511 Operating Systems Lab #7 Outline Review and examples on virtual memory Motivation of Virtual Memory Demand Paging Page Replacement Q. 1 What is required to support dynamic memory allocation

More information

Adaptive Guardband Scheduling to Improve System-Level Efficiency of the POWER7+

Adaptive Guardband Scheduling to Improve System-Level Efficiency of the POWER7+ Adaptive Guardband Scheduling to Improve System-Level Efficiency of the POWER7+ Yazhou Zu 1, Charles R. Lefurgy, Jingwen Leng 1, Matthew Halpern 1, Michael S. Floyd, Vijay Janapa Reddi 1 1 The University

More information

2017 by Bilge Acun. All rights reserved.

2017 by Bilge Acun. All rights reserved. 2017 by Bilge Acun. All rights reserved. MITIGATING VARIABILITY IN HPC SYSTEMS AND APPLICATIONS FOR PERFORMANCE AND POWER EFFICIENCY BY BILGE ACUN DISSERTATION Submitted in partial fulfillment of the requirements

More information

CS Computer Architecture Spring Lecture 04: Understanding Performance

CS Computer Architecture Spring Lecture 04: Understanding Performance CS 35101 Computer Architecture Spring 2008 Lecture 04: Understanding Performance Taken from Mary Jane Irwin (www.cse.psu.edu/~mji) and Kevin Schaffer [Adapted from Computer Organization and Design, Patterson

More information

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy CSE 2021: Computer Organization Single Cycle (Review) Lecture-10 CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan CSE-2021 July-12-2012 2 Single Cycle with Jump Multi-Cycle Implementation

More information

System Level Analysis of Fast, Per-Core DVFS using On-Chip Switching Regulators

System Level Analysis of Fast, Per-Core DVFS using On-Chip Switching Regulators System Level Analysis of Fast, Per-Core DVFS using On-Chip Switching s Wonyoung Kim, Meeta S. Gupta, Gu-Yeon Wei and David Brooks School of Engineering and Applied Sciences, Harvard University, 33 Oxford

More information

Low Power Embedded Systems in Bioimplants

Low Power Embedded Systems in Bioimplants Low Power Embedded Systems in Bioimplants Steven Bingler Eduardo Moreno 1/32 Why is it important? Lower limbs amputation is a major impairment. Prosthetic legs are passive devices, they do not do well

More information

Simulation Performance Optimization of Virtual Prototypes Sammidi Mounika, B S Renuka

Simulation Performance Optimization of Virtual Prototypes Sammidi Mounika, B S Renuka Simulation Performance Optimization of Virtual Prototypes Sammidi Mounika, B S Renuka Abstract Virtual prototyping is becoming increasingly important to embedded software developers, engineers, managers

More information

Self-Aware Adaptation in FPGAbased

Self-Aware Adaptation in FPGAbased DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Self-Aware Adaptation in FPGAbased Systems IEEE FPL 2010 Filippo Siorni: filippo.sironi@dresd.org Marco Triverio: marco.triverio@dresd.org Martina Maggio: mmaggio@mit.edu

More information

Improving Energy-Efficiency of Multicores using First-Order Modeling

Improving Energy-Efficiency of Multicores using First-Order Modeling Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1404 Improving Energy-Efficiency of Multicores using First-Order Modeling VASILEIOS SPILIOPOULOS ACTA

More information

A Case for Opportunistic Embedded Sensing In Presence of Hardware Power Variability

A Case for Opportunistic Embedded Sensing In Presence of Hardware Power Variability A Case for Opportunistic Embedded Sensing In Presence of Hardware Power Variability L. Wanner, C. Apte, R. Balani, Puneet Gupta, and Mani Srivastava University of California, Los Angeles puneet@ee.ucla.edu

More information

Deadline scheduling: can your mobile device last longer?

Deadline scheduling: can your mobile device last longer? Deadline scheduling: can your mobile device last longer? Juri Lelli, Mario Bambagini, Giuseppe Lipari Linux Plumbers Conference 202 San Diego (CA), USA, August 3 TeCIP Insitute, Scuola Superiore Sant'Anna

More information

Computer Architecture

Computer Architecture Computer Architecture Lecture 01 Arkaprava Basu www.csa.iisc.ac.in Acknowledgements Several of the slides in the deck are from Luis Ceze (Washington), Nima Horanmand (Stony Brook), Mark Hill, David Wood,

More information

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads Terminology CUDA Threads Bedrich Benes, Ph.D. Purdue University Department of Computer Graphics Streaming Multiprocessor (SM) A SM processes block of threads Streaming Processors (SP) also called CUDA

More information

Track and Vertex Reconstruction on GPUs for the Mu3e Experiment

Track and Vertex Reconstruction on GPUs for the Mu3e Experiment Track and Vertex Reconstruction on GPUs for the Mu3e Experiment Dorothea vom Bruch for the Mu3e Collaboration GPU Computing in High Energy Physics, Pisa September 11th, 2014 Physikalisches Institut Heidelberg

More information

IHV means Independent Hardware Vendor. Example is Qualcomm Technologies Inc. that makes Snapdragon processors. OEM means Original Equipment

IHV means Independent Hardware Vendor. Example is Qualcomm Technologies Inc. that makes Snapdragon processors. OEM means Original Equipment 1 2 IHV means Independent Hardware Vendor. Example is Qualcomm Technologies Inc. that makes Snapdragon processors. OEM means Original Equipment Manufacturer. Examples are smartphone manufacturers. Tuning

More information

Advances in Antenna Measurement Instrumentation and Systems

Advances in Antenna Measurement Instrumentation and Systems Advances in Antenna Measurement Instrumentation and Systems Steven R. Nichols, Roger Dygert, David Wayne MI Technologies Suwanee, Georgia, USA Abstract Since the early days of antenna pattern recorders,

More information

Message Passing-Aware Power Management on Many-Core Systems

Message Passing-Aware Power Management on Many-Core Systems Copyright 214 American Scientific Publishers All rights reserved Printed in the United States of America Journal of Low Power Electronics Vol. 1, 1 19, 214 Message Passing-Aware Power Management on Many-Core

More information

Dynamic MIPS Rate Stabilization in Out-of-Order Processors

Dynamic MIPS Rate Stabilization in Out-of-Order Processors Dynamic Rate Stabilization in Out-of-Order Processors Jinho Suh and Michel Dubois Ming Hsieh Dept of EE University of Southern California Outline Motivation Performance Variability of an Out-of-Order Processor

More information

Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips

Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips Timothy N. Miller, Xiang Pan, Renji Thomas, Naser Sedaghati, Radu Teodorescu

More information

6 TH INTERNATIONAL CONFERENCE ON APPLIED INTERNET AND INFORMATION TECHNOLOGIES 3-4 JUNE 2016, BITOLA, R. MACEDONIA PROCEEDINGS

6 TH INTERNATIONAL CONFERENCE ON APPLIED INTERNET AND INFORMATION TECHNOLOGIES 3-4 JUNE 2016, BITOLA, R. MACEDONIA PROCEEDINGS 6 TH INTERNATIONAL CONFERENCE ON APPLIED INTERNET AND INFORMATION TECHNOLOGIES 3-4 JUNE 2016, BITOLA, R. MACEDONIA PROCEEDINGS Editor: Publisher: Prof. Pece Mitrevski, PhD Faculty of Information and Communication

More information

22 Marzo 2012 IFEMA, Madrid spain.ni.com/nidays.

22 Marzo 2012 IFEMA, Madrid spain.ni.com/nidays. 22 Marzo 2012 IFEMA, Madrid spain.ni.com/nidays www.infoplc.net The Art of Benchmarking Speed PXI Versus Rack-and-Stack Test Equipment Filippo Persia Systems Engineer Automated Test Mediterranean Region

More information

Server Operational Cost Optimization for Cloud Computing Service Providers over

Server Operational Cost Optimization for Cloud Computing Service Providers over Server Operational Cost Optimization for Cloud Computing Service Providers over a Time Horizon Haiyang(Ocean)Qian and Deep Medhi Networking and Telecommunication Research Lab (NeTReL) University of Missouri-Kansas

More information

All Digital on Chip Process Sensor Using Ratioed Inverter Based Ring Oscillator

All Digital on Chip Process Sensor Using Ratioed Inverter Based Ring Oscillator All Digital on Chip Process Sensor Using Ratioed Inverter Based Ring Oscillator 1 G. Rajesh, 2 G. Guru Prakash, 3 M.Yachendra, 4 O.Venka babu, 5 Mr. G. Kiran Kumar 1,2,3,4 Final year, B. Tech, Department

More information

COTSon: Infrastructure for system-level simulation

COTSon: Infrastructure for system-level simulation COTSon: Infrastructure for system-level simulation Ayose Falcón, Paolo Faraboschi, Daniel Ortega HP Labs Exascale Computing Lab http://sites.google.com/site/hplabscotson MICRO-41 tutorial November 9, 28

More information

What can POP do for you?

What can POP do for you? What can POP do for you? Mike Dewar, NAG Ltd EU H2020 Center of Excellence (CoE) 1 October 2015 31 March 2018 Grant Agreement No 676553 Outline Overview of codes investigated Code audit & plan examples

More information

MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng.

MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng. MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng., UCLA - http://nanocad.ee.ucla.edu/ 1 Outline Introduction

More information

Study On Two-stage Architecture For Synchronous Buck Converter In High-power-density Power Supplies title

Study On Two-stage Architecture For Synchronous Buck Converter In High-power-density Power Supplies title Study On Two-stage Architecture For Synchronous Buck Converter In High-power-density Computing Click to add presentation Power Supplies title Click to edit Master subtitle Tirthajyoti Sarkar, Bhargava

More information

Computational Simulations of The World s Biggest Eye on GPUs

Computational Simulations of The World s Biggest Eye on GPUs Computational Simulations of The World s Biggest Eye on GPUs Hatem Ltaief Extreme Computing Research Center King Abdullah University of Science and Technology, Saudi Arabia NVIDIA GTC at San Jose, CA April

More information

Dynamic hardware management of the H264/AVC encoder control structure using a framework for system scenarios

Dynamic hardware management of the H264/AVC encoder control structure using a framework for system scenarios Dynamic hardware management of the H264/AVC encoder control structure using a framework for system scenarios Yahya H. Yassin, Per Gunnar Kjeldsberg, Andrew Perkis Department of Electronics and Telecommunications

More information

Vampir Getting Started. Holger Brunst March 4th 2008

Vampir Getting Started. Holger Brunst March 4th 2008 Vampir Getting Started Holger Brunst holger.brunst@tu-dresden.de March 4th 2008 What is Vampir? Program Monitoring, Visualization, and Analysis 1. Step: VampirTrace monitors your program s runtime behavior

More information

LS-DYNA Performance Enhancement of Fan Blade Off Simulation on Cray XC40

LS-DYNA Performance Enhancement of Fan Blade Off Simulation on Cray XC40 LS-DYNA Performance Enhancement of Fan Blade Off Simulation on Cray XC40 Ting-Ting Zhu, Cray Inc. Jason Wang, LSTC Brian Wainscott, LSTC Abstract This work uses LS-DYNA to enhance the performance of engine

More information

Pilot: Device-free Indoor Localization Using Channel State Information

Pilot: Device-free Indoor Localization Using Channel State Information ICDCS 2013 Pilot: Device-free Indoor Localization Using Channel State Information Jiang Xiao, Kaishun Wu, Youwen Yi, Lu Wang, Lionel M. Ni Department of Computer Science and Engineering Hong Kong University

More information

Recent Advances in Simulation Techniques and Tools

Recent Advances in Simulation Techniques and Tools Recent Advances in Simulation Techniques and Tools Yuyang Li, li.yuyang(at)wustl.edu (A paper written under the guidance of Prof. Raj Jain) Download Abstract: Simulation refers to using specified kind

More information

Trinity Center of Excellence

Trinity Center of Excellence Trinity Center of Excellence I can t promise to solve all your problems, but I can promise you won t face them alone Hai Ah Nam Computational Physics & Methods (CCS-2) Presented to: Salishan Conference

More information

Monte Carlo integration and event generation on GPU and their application to particle physics

Monte Carlo integration and event generation on GPU and their application to particle physics Monte Carlo integration and event generation on GPU and their application to particle physics Junichi Kanzaki (KEK) GPU2016 @ Rome, Italy Sep. 26, 2016 Motivation Increase of amount of LHC data (raw &

More information

The Ghost in the Machine Observing the Effects of Kernel Operation on Parallel Application Performance

The Ghost in the Machine Observing the Effects of Kernel Operation on Parallel Application Performance The Ghost in the Machine Observing the Effects of Kernel Operation on Parallel Application Performance Aroon Nataraj, Alan Morris, Allen Malony, Matthew Sottile, Pete Beckman l {anataraj, amorris, malony,

More information

EE382V-ICS: System-on-a-Chip (SoC) Design

EE382V-ICS: System-on-a-Chip (SoC) Design EE38V-CS: System-on-a-Chip (SoC) Design Hardware Synthesis and Architectures Source: D. Gajski, S. Abdi, A. Gerstlauer, G. Schirner, Embedded System Design: Modeling, Synthesis, Verification, Chapter 6:

More information

Multi-core Platforms for

Multi-core Platforms for 20 JUNE 2011 Multi-core Platforms for Immersive-Audio Applications Course: Advanced Computer Architectures Teacher: Prof. Cristina Silvano Student: Silvio La Blasca 771338 Introduction on Immersive-Audio

More information

Mohit Arora. The Art of Hardware Architecture. Design Methods and Techniques. for Digital Circuits. Springer

Mohit Arora. The Art of Hardware Architecture. Design Methods and Techniques. for Digital Circuits. Springer Mohit Arora The Art of Hardware Architecture Design Methods and Techniques for Digital Circuits Springer Contents 1 The World of Metastability 1 1.1 Introduction 1 1.2 Theory of Metastability 1 1.3 Metastability

More information

ANALOG-TO-DIGITAL CONVERTER FOR INPUT VOLTAGE MEASUREMENTS IN LOW- POWER DIGITALLY CONTROLLED SWITCH-MODE POWER SUPPLY CONVERTERS

ANALOG-TO-DIGITAL CONVERTER FOR INPUT VOLTAGE MEASUREMENTS IN LOW- POWER DIGITALLY CONTROLLED SWITCH-MODE POWER SUPPLY CONVERTERS ANALOG-TO-DIGITAL CONVERTER FOR INPUT VOLTAGE MEASUREMENTS IN LOW- POWER DIGITALLY CONTROLLED SWITCH-MODE POWER SUPPLY CONVERTERS Aleksandar Radić, S. M. Ahsanuzzaman, Amir Parayandeh, and Aleksandar Prodić

More information

A 32 Gbps 2048-bit 10GBASE-T Ethernet Energy Efficient LDPC Decoder with Split-Row Threshold Decoding Method

A 32 Gbps 2048-bit 10GBASE-T Ethernet Energy Efficient LDPC Decoder with Split-Row Threshold Decoding Method A 32 Gbps 248-bit GBASE-T Ethernet Energy Efficient LDPC Decoder with Split-Row Threshold Decoding Method Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California,

More information

Measuring and Evaluating Computer System Performance

Measuring and Evaluating Computer System Performance Measuring and Evaluating Computer System Performance Performance Marches On... But what is performance? The bottom line: Performance Car Time to Bay Area Speed Passengers Throughput (pmph) Ferrari 3.1

More information

Performance Metrics, Amdahl s Law

Performance Metrics, Amdahl s Law ecture 26 Computer Science 61C Spring 2017 March 20th, 2017 Performance Metrics, Amdahl s Law 1 New-School Machine Structures (It s a bit more complicated!) Software Hardware Parallel Requests Assigned

More information

CAMEO: Continuous Analytics for Massively Multiplayer Online Games

CAMEO: Continuous Analytics for Massively Multiplayer Online Games CAMEO: Continuous Analytics for Massively Multiplayer Online Games Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology 1 MMOGs are a Popular, Growing Market 25,000,000

More information

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture Overview 1 Trends in Microprocessor Architecture R05 Robert Mullins Computer architecture Scaling performance and CMOS Where have performance gains come from? Modern superscalar processors The limits of

More information

ABSTRACT. GHOLKAR, NEHA. On the Management of Power Constraints for High Performance Systems. (Under the direction of Frank Mueller).

ABSTRACT. GHOLKAR, NEHA. On the Management of Power Constraints for High Performance Systems. (Under the direction of Frank Mueller). ABSTRACT GHOLKAR, NEHA. On the Management of Power Constraints for High Performance Systems. (Under the direction of Frank Mueller). The supercomputing community is targeting exascale computing by 2023.

More information

FILA: Fine-grained Indoor Localization

FILA: Fine-grained Indoor Localization IEEE 2012 INFOCOM FILA: Fine-grained Indoor Localization Kaishun Wu, Jiang Xiao, Youwen Yi, Min Gao, Lionel M. Ni Hong Kong University of Science and Technology March 29 th, 2012 Outline Introduction Motivation

More information

Real-time Concurrent Collection on Stock Multiprocessors

Real-time Concurrent Collection on Stock Multiprocessors RETROSPECTIVE: Real-time Concurrent Collection on Stock Multiprocessors Andrew W. Appel Princeton University appel@cs.princeton.edu 1. INTRODUCTION In 1987, Kai Li of Princeton University was working with

More information

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps CSE 30321 Computer Architecture I Fall 2011 Homework 06 Pipelined Processors 75 points Assigned: November 1, 2011 Due: November 8, 2011 PLEASE DO THE ASSIGNMENT ON THIS HANDOUT!!! Problem 1: (15 points)

More information

Document downloaded from:

Document downloaded from: Document downloaded from: http://hdl.handle.net/1251/64738 This paper must be cited as: Reaño González, C.; Pérez López, F.; Silla Jiménez, F. (215). On the design of a demo for exhibiting rcuda. 15th

More information

Trace Based Switching For A Tightly Coupled Heterogeneous Core

Trace Based Switching For A Tightly Coupled Heterogeneous Core Trace Based Switching For A Tightly Coupled Heterogeneous Core Shru% Padmanabha, Andrew Lukefahr, Reetuparna Das, Sco@ Mahlke Micro- 46 December 2013 University of Michigan Electrical Engineering and Computer

More information

HARDWARE ACCELERATION OF THE GIPPS MODEL

HARDWARE ACCELERATION OF THE GIPPS MODEL HARDWARE ACCELERATION OF THE GIPPS MODEL FOR REAL-TIME TRAFFIC SIMULATION Salim Farah 1 and Magdy Bayoumi 2 The Center for Advanced Computer Studies, University of Louisiana at Lafayette, USA 1 snf3346@cacs.louisiana.edu

More information

FTSP Power Characterization

FTSP Power Characterization 1. Introduction FTSP Power Characterization Chris Trezzo Tyler Netherland Over the last few decades, advancements in technology have allowed for small lowpowered devices that can accomplish a multitude

More information

Transient Temperature Analysis. Rajit Chandra, Ph.D. Gradient Design Automation

Transient Temperature Analysis. Rajit Chandra, Ph.D. Gradient Design Automation Transient Temperature Analysis Rajit Chandra, Ph.D. Gradient Design Automation Trends in mixed signal designs More designs with switching high power drivers (smart power chips, automotive, high-speed communications,

More information

*Engineering and Industrial Services, TATA Consultancy Services Limited **Professor Emeritus, IIT Bombay

*Engineering and Industrial Services, TATA Consultancy Services Limited **Professor Emeritus, IIT Bombay System Identification and Model Predictive Control of SI Engine in Idling Mode using Mathworks Tools Shivaram Kamat*, KP Madhavan**, Tejashree Saraf* *Engineering and Industrial Services, TATA Consultancy

More information

CSE502: Computer Architecture Welcome to CSE 502

CSE502: Computer Architecture Welcome to CSE 502 Welcome to CSE 502 Introduction & Review Today s Lecture Course Overview Course Topics Grading Logistics Academic Integrity Policy Homework Quiz Key basic concepts for Computer Architecture Course Overview

More information

H8238/MCM MODBUS POINT MAP

H8238/MCM MODBUS POINT MAP H8238/MCM MODBUS POINT MAP F O R M A T Int Float R/W NV Description 1 257/258 R/W NV Energy Consumption, kwh, Low-word integer 2 259/260 R/W NV Energy Consumption, kwh, High-word integer Both 257/258 and

More information

Increasing Performance Requirements and Tightening Cost Constraints

Increasing Performance Requirements and Tightening Cost Constraints Maxim > Design Support > Technical Documents > Application Notes > Power-Supply Circuits > APP 3767 Keywords: Intel, AMD, CPU, current balancing, voltage positioning APPLICATION NOTE 3767 Meeting the Challenges

More information

Introduction to Real-Time Systems

Introduction to Real-Time Systems Introduction to Real-Time Systems Real-Time Systems, Lecture 1 Martina Maggio and Karl-Erik Årzén 16 January 2018 Lund University, Department of Automatic Control Content [Real-Time Control System: Chapter

More information

Microarchitectural Attacks and Defenses in JavaScript

Microarchitectural Attacks and Defenses in JavaScript Microarchitectural Attacks and Defenses in JavaScript Michael Schwarz, Daniel Gruss, Moritz Lipp 25.01.2018 www.iaik.tugraz.at 1 Michael Schwarz, Daniel Gruss, Moritz Lipp www.iaik.tugraz.at Microarchitecture

More information

Project 5: Optimizer Jason Ansel

Project 5: Optimizer Jason Ansel Project 5: Optimizer Jason Ansel Overview Project guidelines Benchmarking Library OoO CPUs Project Guidelines Use optimizations from lectures as your arsenal If you decide to implement one, look at Whale

More information

Energy-Performance Trade-offs on Energy-Constrained Devices with Multi-Component DVFS

Energy-Performance Trade-offs on Energy-Constrained Devices with Multi-Component DVFS Energy-Performance Trade-offs on Energy-Constrained Devices with Multi-Component DVFS Rizwana Begum, David Werner and Mark Hempstead Drexel University {rb639,daw77,mhempstead}@drexel.edu Guru Prasad, Jerry

More information