Measuring and Evaluating Computer System Performance

Similar documents
Performance Metrics, Amdahl s Law

CS 6290 Evaluation & Metrics

CS Computer Architecture Spring Lecture 04: Understanding Performance

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics

Metrics How to improve performance? CPI MIPS Benchmarks CSC3501 S07 CSC3501 S07. Louisiana State University 4- Performance - 1

Assessing and. Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Administrative Issues

Lec 24: Parallel Processors. Announcements

CSE502: Computer Architecture Welcome to CSE 502

CSE 305: Computer Architecture

CS4617 Computer Architecture

Computer Architecture

CS61c: Introduction to Synchronous Digital Systems

ECE473 Computer Architecture and Organization. Pipeline: Introduction

CS429: Computer Organization and Architecture

Single vs. Mul2- cycle MIPS. Single Clock Cycle Length

Best Instruction Per Cycle Formula >>>CLICK HERE<<<

CS 110 Computer Architecture Lecture 11: Pipelining

CS586: Distributed Computing Tutorial 1

CS 61C: Great Ideas in Computer Architecture Finite State Machines, Functional Units

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

CS152 Computer Architecture and Engineering Lecture 3: ReviewTechnology & Delay Modeling. September 3, 1997

Computer Hardware. Pipeline

Pipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold

Instructor: Dr. Mainak Chaudhuri. Instructor: Dr. S. K. Aggarwal. Instructor: Dr. Rajat Moona

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

Outline Simulators and such. What defines a simulator? What about emulation?

A Static Power Model for Architects

Synthetic Aperture Beamformation using the GPU

EECE 321: Computer Organiza5on

Improving Loop-Gain Performance In Digital Power Supplies With Latest- Generation DSCs

CMSC 611: Advanced Computer Architecture

Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence

Computer Architecture

Challenges of in-circuit functional timing testing of System-on-a-Chip

Chapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:

REAL TIME DIGITAL SIGNAL PROCESSING. Introduction

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

A B C D. Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold. Time

USING EMBEDDED PROCESSORS IN HARDWARE MODELS OF ARTIFICIAL NEURAL NETWORKS

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.

Experience with new architectures: moving from HELIOS to Marconi

CS 61C: Great Ideas in Computer Architecture Lecture 10: Finite State Machines, Func/onal Units. Machine Interpreta4on

RTT TECHNOLOGY TOPIC January G DSP

Recent Advances in Simulation Techniques and Tools

Image Processing Architectures (and their future requirements)

The Nanokernel. David L. Mills University of Delaware 2-Aug-04 1

What this paper is about:

Energy-Performance Trade-offs on Energy-Constrained Devices with Multi-Component DVFS

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.

Blackfin Online Learning & Development

Announcements. Advanced Digital Integrated Circuits. Midterm feedback mailed back Homework #3 posted over the break due April 8

How cryptographic benchmarking goes wrong. Thanks to NIST 60NANB12D261 for funding this work, and for not reviewing these slides in advance.

EE382N-20 Computer Architecture Parallelism and Locality Lecture 1

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir

Final Report: DBmbench

EECS 579 Fall What is Testing?

Suggested Readings! Lecture 12" Introduction to Pipelining! Example: We have to build x cars...! ...Each car takes 6 steps to build...! ! Readings!

22 Marzo 2012 IFEMA, Madrid spain.ni.com/nidays.

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

RF and Microwave Test and Design Roadshow 5 Locations across Australia and New Zealand

Ben Baker. Sponsored by:

Keysight Technologies N9051B Pulse Measurement Software X-Series Signal Analyzers. Technical Overview

Image Processing Architectures (and their future requirements)

The Metrics and Designs of an Arithmetic Logic Function over

SPEC. Intelligent EW Systems for Complex Spectrum Operations ADEP. ADEP Product Descriptions

CORRECTED VISION. Here be underscores THE ROLE OF CAMERA AND LENS PARAMETERS IN REAL-WORLD MEASUREMENT

DASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators

RPG XFFTS. extended bandwidth Fast Fourier Transform Spectrometer. Technical Specification

Author: Yih-Yih Lin. Correspondence: Yih-Yih Lin Hewlett-Packard Company MR Forest Street Marlboro, MA USA

CS250 VLSI Systems Design. Lecture 3: Physical Realities: Beneath the Digital Abstraction, Part 1: Timing

Digital Filters Using the TMS320C6000

Dr. D. M. Akbar Hussain

ni.com The NI PXIe-5644R Vector Signal Transceiver World s First Software-Designed Instrument

Energy Efficient Circuit Design and the Future of Power Delivery

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

Midterm: In Perspective

CS420/520 Computer Architecture I

Wave Inspector Navigation and Search: Simplifying Waveform Analysis

IBM Research Report. Characterizing the Impact of Different Memory-Intensity Levels. Ramakrishna Kotla University of Texas at Austin

Datorstödd Elektronikkonstruktion

Pipelined Processor Design

GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links

Power of Realtime 3D-Rendering. Raja Koduri

Deadline scheduling: can your mobile device last longer?

1) Fixed point [15 points] a) What are the primary reasons we might use fixed point rather than floating point? [2]

Lecture 4: Introduction to Pipelining

High Performance Computing for Engineers

An Efficient Low Power and High Speed carry select adder using D-Flip Flop

Serial Addition. Lecture 29 1

Linear Integrated Circuits

PC accounts for 353 Cory will be created early next week (when the class list is completed) Discussions & Labs start in Week 3

HARDWARE ACCELERATION OF THE GIPPS MODEL

Overview of Design Methodology. A Few Points Before We Start 11/4/2012. All About Handling The Complexity. Lecture 1. Put things into perspective

Trace Based Switching For A Tightly Coupled Heterogeneous Core

MODEL AND MODEL PULSE/PATTERN GENERATORS

ELLIPTIC curve cryptography (ECC) was proposed by

Transcription:

Measuring and Evaluating Computer System Performance

Performance Marches On... But what is performance?

The bottom line: Performance Car Time to Bay Area Speed Passengers Throughput (pmph) Ferrari 3.1 hours 160 mph 2 320 Greyhound 7.7 hours 65 mph 60 3900 Time to do the task execution time, response time, latency Tasks per day, hour, week, sec, ns... throughput, bandwidth

How to measure Execution Time? % time program... program results... 90.7u 12.9s 2:39 65% % Wall-clock time? user CPU time? user + kernel CPU time? Answer:

Our definition of Performance Performance X = 1 Execution Time X, for program X only has meaning in the context of a program or workload Not very intuitive as an absolute measure, but most of the time we re more interested in relative performance.

Relative Performance can be confusing A runs in 12 seconds B runs in 20 seconds A/B =.6, so A is 40% faster, or 1.4X faster, or B is 40% slower B/A = 1.67, so A is 67% faster, or 1.67X faster, or B is 67% slower needs a precise definition

Relative Performance, the Definition Relative Performance (X/Y) Performance X Performance Y Execution Time Y Execution Time X = = = n "X is n times faster than Y" "X is n times as fast as Y" "From Y to X, speedup is n"

Example Machine A runs program C in 9 seconds, Machine B runs the same program in 6 seconds. What is the speedup we see if we move to Machine B from Machine A? Machine B gets a new compiler, and can now run the program in 3 seconds.???

What is Time? CPU Execution Time = CPU clock cycles * Clock cycle time Every conventional processor has a clock with an associated clock cycle time or clock rate Every program runs in an integral number of clock cycles Cycle Time MHz = millions of cycles/second, GHz = billions of cycles/second X MHz = 1000/X nanoseconds cycle time Y GHz = 1/Y nanoseconds cycle time

How many clock cycles? Number of CPU cycles = Instructions executed * Average Clock Cycles per Instruction (CPI) Computer A runs program C in 3.6 billion cycles. Program C consists of 2 billion dynamic instructions. What is the CPI?

How many clock cycles? Number of CPU cycles = Instructions executed * Average Clock Cycles per Instruction (CPI) A computer is running a program with CPI = 2.0, and executes 24 million instructions, how long will it run?

All Together Now seconds CPU Execution Time Instruction CPI = Count X X Clock Cycle Time instructions cycles/instruction seconds/cycle

CPU Execution Time Instruction CPI = Count X X Clock Cycle Time IC = 1 billion, 500 MHz processor, execution time of 3 seconds. What is the CPI for this program? Suppose we reduce CPI to 1.2 (through an architectural improvement). What is the new execution time?

Who Affects Performance? CPU Execution Time Instruction CPI = Count X X Clock Cycle Time programmer compiler instruction-set architect machine architect hardware designer materials scientist/physicist/silicon engineer

Performance Variation CPU Execution Time Instruction CPI = Count X X Clock Cycle Time Number of instructions CPI Clock Cycle Time Same machine different programs same programs, different machines, same ISA Same programs, different machines

MIPS MFLOPS Other Performance Metrics

MIPS MIPS = Millions of Instructions Per Second = Instruction Count Execution Time * 10 6 = Clock rate CPI * 10 6 Program-independent? Deceptive

FLOPS FLOPS = FLoating-point Operations Per Second Program-independent? Which operations? Useful, sometimes "Theoretical peak" FLOPS, peak FLOPS, sustained FLOPs How does execution time depend on FLOPS?

Which Programs? peak throughput measures (simple programs)? synthetic benchmarks (whetstone, dhrystone,...)? "kernels" of useful computation (lapack, fftw,...) Real applications SPEC (best of both worlds, but with problems of their own) System Performance Evaluation Cooperative Provides a common set of real applications along with strict guidelines for how to run them. provides a relatively unbiased means to compare machines.

Danger in Benchmark-Specific Performance Measures measures compiler as much as architecture (what about kernels?)

SPEC Performance on Pentium III and Pentium 4

Amdahl s Law The impact of a performance improvement is limited by the percent of execution time affected by the improvement Execution time after improvement = Execution Time Affected Amount of Improvement + Execution Time Unaffected Make the common case fast!!

Key Points Be careful how you specify performance Execution time = instructions * CPI * cycle time Use real applications Use standards, if possible Make the common case fast