CS 6290 Evaluation & Metrics

Similar documents
Measuring and Evaluating Computer System Performance

CS Computer Architecture Spring Lecture 04: Understanding Performance

Performance Metrics, Amdahl s Law

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

Computer Architecture

CS4617 Computer Architecture

Metrics How to improve performance? CPI MIPS Benchmarks CSC3501 S07 CSC3501 S07. Louisiana State University 4- Performance - 1

Chapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:

Assessing and. Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Administrative Issues

CSE 305: Computer Architecture

CSE502: Computer Architecture Welcome to CSE 502

CS 110 Computer Architecture Lecture 11: Pipelining

Lec 24: Parallel Processors. Announcements

Best Instruction Per Cycle Formula >>>CLICK HERE<<<

CS429: Computer Organization and Architecture

ECE473 Computer Architecture and Organization. Pipeline: Introduction

Suggested Readings! Lecture 12" Introduction to Pipelining! Example: We have to build x cars...! ...Each car takes 6 steps to build...! ! Readings!

CS586: Distributed Computing Tutorial 1

Final Report: DBmbench

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

CS152 Computer Architecture and Engineering Lecture 3: ReviewTechnology & Delay Modeling. September 3, 1997

A Static Power Model for Architects

Ben Baker. Sponsored by:

Lecture 4: Introduction to Pipelining

Grade 7/8 Math Circles February 3/4, 2015 Arithmetic Aerobics Solutions

Combined analytical and FEM method for prediction of synchronous generator no-load voltage waveform

CS61c: Introduction to Synchronous Digital Systems

Interpolation Error in Waveform Table Lookup

Outline Simulators and such. What defines a simulator? What about emulation?

Performance Evaluation of Recently Proposed Cache Replacement Policies

COTSon: Infrastructure for system-level simulation

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir

Computer Architecture

Computer Hardware. Pipeline

Improving Loop-Gain Performance In Digital Power Supplies With Latest- Generation DSCs

EECE 321: Computer Organiza5on

Pipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold

Helping good businesses become great businesses

Recent Advances in Simulation Techniques and Tools

Digital Filters Using the TMS320C6000

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

Power of Realtime 3D-Rendering. Raja Koduri

Neuromorphic Analog VLSI

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

Use only <, 5 or >. a b c d (25)

tackling the battery problem a scenario based approach

OPEN CONTENTS. Business benefits at a glance.

CMSC 611: Advanced Computer Architecture

If you have decided to start your own online business and are ready to go, first, there is one thing that you have to decide on.

Google SEO Optimization

How user throughput depends on the traffic demand in large cellular networks

10 Arithmetic: Fractions

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors

Pipelined Beta. Handouts: Lecture Slides. Where are the registers? Spring /10/01. L16 Pipelined Beta 1

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

Efficient Constant-Round Multiparty Computation

Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance

CS420/520 Computer Architecture I

What I Would Do Differently If I Was Starting Today (Transcript)

A B C D. Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold. Time

Orion Master Data Workflow Article

Understanding Low Phase Noise Signals. Presented by: Riadh Said Agilent Technologies, Inc.

Asanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.

Synthetic Aperture Beamformation using the GPU

Dead Simple Trick Brings Any Old Battery Back To Life Again!

Benchmarking C++ From video games to algorithmic trading. Alexander Radchenko

Meet #4 February Intermediate Mathematics League of Eastern Massachusetts

Publishing Your Research. Margaret Martonosi, Princeton Lydia Tapia, University of New Mexico

1) Fixed point [15 points] a) What are the primary reasons we might use fixed point rather than floating point? [2]

WPF CHARTS PERFORMANCE BENCHMARK Page 1 / 16. February 18, 2013

HOW TO BUY DEALERSHIP SOFTWARE

Advances in Antenna Measurement Instrumentation and Systems

CS 61C: Great Ideas in Computer Architecture Finite State Machines, Functional Units

SATURN 101: Part 3 Improving Convergence

20 Arithmetic: Fractions

6. four inches less than Kimi 7. the quotient of a number and nine, minus three

THE KEYS TO FREEDOM. How to save smart for a sweet ride

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

Communications Planner for Operational and Simulation Effects With Realism (COMPOSER)

CMOS Process Variations: A Critical Operation Point Hypothesis

Midterm: In Perspective

Alberto Fernandez Fall 2010 Why Industrial Engineering? There are many different career opportunities in the world now, and that is what

Instruction Level Parallelism. Data Dependence Static Scheduling

Electric Circuits Vocabulary

Fast Statistical Timing Analysis By Probabilistic Event Propagation

How to Create Video Files that Aren t Actually Video. Steve Haskin, S > Media

my bank account number and sort code the bank account number and sort code for the cheque paid in the amount of the cheque.

RISC Central Processing Unit

Foxton Technology. HotChips Sam Naffziger Intel Corp.

Copyright UCRP

CSE502: Computer Architecture CSE 502: Computer Architecture

Multipliers for 5616 MHz and MHz

Motor Control. Consider a motor which has a maximum speed of 5000 RPM. The speed vs. duty cycle may look something like this:

Console Architecture 1

LS-DYNA Performance Enhancement of Fan Blade Off Simulation on Cray XC40

Gta San Andreas Game Manual Pc Full Version For Windows Xp

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

Linear vs. PWM/ Digital Drives

Transcription:

CS 6290 Evaluation & Metrics

Performance Two common measures Latency (how long to do X) Also called response time and execution time Throughput (how often can it do X) Example of car assembly line Takes 6 hours to make a car (latency is 6 hours) A car leaves every 5 minutes (throughput is 12 cars per hour) Overlap results in Throughput > 1/Latency

Measuring Performance Peak (MIPS, MFLOPS) Often not useful unachievable in practice, or unsustainable

Measuring Performance Benchmarks Real applications and application suites E.g., SPEC CPU2000, SPEC2006, TPC-C, TPC-H Kernels Representative parts of real applications Easier and quicker to set up and run Often not really representative of the entire app Toy programs, synthetic benchmarks, etc. Not very useful for reporting Sometimes used to test/stress specific functions/features

SPEC CPU (integer) Representative applications keeps growing with time!

SPEC CPU (floating point)

Price-Performance

TPC Benchmarks Measure transaction-processing throughput Benchmarks for different scenarios TPC-C: warehouses and sales transactions TPC-H: ad-hoc decision support TPC-W: web-based business transactions Difficult to set up and run on a simulator Requires full OS support, a working DBMS Long simulations to get stable results

Throughput-Server Perf/Cost High performance Very expensive!

CPU Performance Equation (1) CPU time = CPU Clock Cycles Clock cycle time CPU time = Instruction Count Cycles Per Instruction Clock cycle time CPU time = Seconds Program = Instructions Program Clock Cycles Instruction Seconds Clock Cycle ISA, Compiler Technology Organization, ISA Hardware Technology, Organization A.K.A. The iron law of performance

Car Analogy Need to drive from Klaus to CRC Clock Speed = 3500 RPM CPI = 5250 rotations/km or 0.19 m/rot Insts = 800m CPU time = Seconds Program = Instructions Program Clock Cycles Instruction Seconds Clock Cycle 800 m 1 rotation 0.19 m 1 minute 3500 rotations = 1.2 minutes

CPU Version Program takes 33 billion instructions to run CPU processes insts at 2 cycles per inst Clock speed of 3GHz CPU time = Seconds Program = Instructions Program Clock Cycles Instruction Seconds Clock Cycle Sometimes clock cycle time given instead (ex. cycle = 333 ps) IPC sometimes used instead of CPI = 22 seconds

CPU Performance Equation (2) CPU time = CPU Clock Cycles Clock cycle time CPU n time = ICi CPIi Clock cycle time i= 1 For each kind of instruction How many cycles it takes to execute an instruction of this kind How many instructions of this kind are there in the program

CPU performance w/ different instructions Instruction Type Frequency CPI Integer 40% 1.0 Branch 20% 4.0 Load 20% 2.0 Store 10% 3.0 Total Insts = 50B, Clock speed = 2 GHz n CPU time = ICi CPIi Clock cycle time i= 1

Comparing Performance X is n times faster than Y Execution time Execution time Y = X n Throughput of X is n times that of Y Tasks Tasks per unit time per unit time X = Y n

If Only it Were That Simple X is n times faster than Y on A Execution time of Execution time of app A on machine Y app A on machine X = n But what about different applications (or even parts of the same application) X is 10 times faster than Y on A, and 1.5 times on B, but Y is 2 times faster than X on C, and 3 times on D, and So does X have better performance than Y? Which would you buy?

Summarizing Performance Arithmetic mean Average execution time Gives more weight to longer-running programs Weighted arithmetic mean More important programs can be emphasized But what do we use as weights? Different weight will make different machines look better

Speedup Machine A Machine B Program 1 5 sec 4 sec Program 2 3 sec 6 sec What is the speedup of A compared to B on Program 1? What is the speedup of A compared to B on Program 2? What is the average speedup? What is the speedup of A compared to B on Sum(Program1, Program2)?

Normalizing & the Geometric Mean Speedup of arithmeitc means!= arithmetic mean of speedup Use geometric mean: n n i=1 Normalized execution time on i Neat property of the geometric mean: Consistent whatever the reference machine Do not use the arithmetic mean for normalized execution times

CPI/IPC Often when making comparisons in comparch studies: Program (or set of) is the same for two CPUs The clock speed is the same for two CPUs So we can just directly compare CPI s and often we use IPC s

Average CPI vs. Average IPC Average CPI =(CPI 1 + CPI 2 + + CPI n )/n A.M. of IPC = (IPC 1 + IPC 2 + + IPC n )/n Not Equal to A.M. of CPI!!! Must use Harmonic Mean to remain to runtime

Harmonic Mean H.M.(x 1,x 2,x 3,,x n ) = n 1 + 1 + 1 + + 1 x 1 x 2 x 3 x n What in the world is this? Average of inverse relationships

A.M.(CPI) vs. H.M.(IPC) Average IPC = 1 = 1 A.M.(CPI) CPI 1 + CPI 2 + CPI 3 + + CPI n n n n n = n CPI 1 + CPI 2 + CPI 3 + + CPI n = n 1 + 1 + 1 + + 1 =H.M.(IPC) IPC 1 IPC 2 IPC 3 IPC n

Amdahl s Law (1) Execution Time without Enhancement Speedup = = Execution Time with Enhancement Execution Time Execution Time What if enhancement does not enhance everything? old new Speedup = Execution Time without using Enhancement at all Execution Time using Enhancement when Possible Execution Time new = Execution Time old Fraction Enhanced ( 1 Fraction ) + Enhanced SpeedupEnhanced Caution: fraction of What? OverallSpeedup = 1 Fraction Enhanced ( 1 Fraction ) + Enhanced SpeedupEnhanced

Amdahl s Law (2) Make the Common Case Fast OverallSpeedup = 1 Fraction Enhanced ( 1 Fraction ) + Enhanced SpeedupEnhanced Speedup Enhanced = 20 Fraction Enhanced = 0.1 VS Speedup Enhanced = 1.2 Fraction Enhanced = 0.9 1 Speedup = = 1.105 ( 1 0.1) 0.1 + 20 1 Speedup = = 1.176 0.9 ( 1 0.9) + 1.2 Important: Principle of locality Approx. 90% of the time spent in 10% of the code

Amdahl s Law (3) Diminishing Returns Generation 1 Total Execution Time Green Phase Generation 2 Total Execution Time Green Generation 3 Blue Phase Speedup Overall =1.33 Blue Total Execution Time Blue Speedup Overall =1.2 over Generation 1 Speedup Green = 2 1 Fraction Green = 3 over Generation 2 Speedup Green = 2 1 Fraction Green = 2

Yet Another Car Analogy From GT to Mall of Georgia (35mi) you ve got a Turbo for your car, but can only use on highway Spaghetti Junction to Mall of GA (23mi) avg. speed of 60mph avg. speed of 120mph with Turbo GT to Spaghetti junction (12 mi) stuck in bad rush hour traffic avg. speed of 5 mph Turbo gives 100% speedup across 66% of the distance but only results in <10% reduction on total trip time (which is a <11% speedup)

Now Consider Price-Performance Without Turbo Car costs $8,000 to manufacture Selling price is $12,000 $4K profit per car If we sell 10,000 cars, that s $40M in profit With Turbo Car costs extra $3,000 Selling price is $16,000 $5K profit per car But only a few gear heads buy the car: We only sell 400 cars and make $2M in profit

CPU Design is Similar What does it cost me to add some performance enhancement? How much effective performance do I get out of it? 100% speedup for small fraction of time wasn t a big win for the car example How much more do I have to charge for it? Extra development, testing, marketing costs How much more can I charge for it? Does the market even care? How does the price change affect volume?