CS Computer Architecture Spring Lecture 04: Understanding Performance
|
|
- Mildred Parker
- 5 years ago
- Views:
Transcription
1 CS Computer Architecture Spring 2008 Lecture 04: Understanding Performance Taken from Mary Jane Irwin ( and Kevin Schaffer [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] CS35101 L04 Understanding Performance.1
2 Airplane Example Airplane Passenger capacity Cruising range (mi) Cruising speed (MPH) Passenger throughput (passengers MPH) Boeing ,750 Boeing ,700 Concorde ,200 DC ,424 CS35101 L04 Understanding Performance.2
3 Performance Metrics Purchasing perspective given a collection of machines, which has the - best performance? - least cost? - best cost/performance? Design perspective faced with design options, which has the - best performance improvement? - least cost? - best cost/performance? Both require basis for comparison metric for evaluation Our goal is to understand what factors in the architecture contribute to overall system performance and the relative importance (and cost) of these factors CS35101 L04 Understanding Performance.3
4 Defining (Speed) Performance Normally interested in reducing Response time (aka execution time) the time between the start and the completion of a task - Important to individual users (desktop computers, user-interactive) Thus, to maximize performance, need to minimize execution time Performance X = 1 Execution Time X Performance 1 Execution Time Execution Time X X X > > < Performance Y 1 Execution Time Execution Time Y Y CS35101 L04 Understanding Performance.4
5 Relative Performance Commonly we want to compare performance of two different computers If X is n times faster than Y, then... Performance Performance X Y = Execution Time Execution Time Y X = n Throughput the total amount of work done in a given time - Important to data center managers (non-user-interactive systems, servers) Decreasing response time almost always improves throughput CS35101 L04 Understanding Performance.5
6 Measuring Time Wall-clock time is the total time to complete a task including OS overhead, I/O time, etc. CPU time is the time spent by the CPU and excluding time spent waiting for I/O devices User CPU time is the time the CPU spends executing a program excluding time performing OS tasks, which is called system CPU time Time can also be measured in clock cycles CS35101 L04 Understanding Performance.6
7 Clock Cycles Clock cycle time (clock period) is the length of one clock cycle in seconds (ps, ns) Clock rate (clock frequency) is the number of clock cycles per second measured in Hertz (GHz, MHz) Clock rate = 1 / Clock cycle time Make sure units match CS35101 L04 Understanding Performance.7
8 Review: Machine Clock Rate CC = 1 / CR one clock period 10 nsec clock cycle => 100 MHz clock rate 5 nsec clock cycle => 200 MHz clock rate 2 nsec clock cycle => 500 MHz clock rate 1 nsec clock cycle => 1 GHz clock rate 500 psec clock cycle => 2 GHz clock rate 250 psec clock cycle => 4 GHz clock rate 200 psec clock cycle => 5 GHz clock rate CS35101 L04 Understanding Performance.8
9 Performance Factors Time = Seconds Program = Instructions Program! Clock Cycles Instruction! Seconds Clock Cycle Program is a sequence of instructions Each instruction requires a some number of clock cycles to execute CS35101 L04 Understanding Performance.9
10 Performance Factors continued Want to distinguish elapsed time and the time spent on our task CPU execution time (CPU time) time the CPU spends working on a task Does not include time waiting for I/O or running other programs CPU execution time for a program CPU execution time for a program = # CPU clock cycles x clock cycle time for a program = # CPU clock cycles for a program clock rate Can improve performance by reducing either the length of the clock cycle or the number of clock cycles required for a program or CS35101 L04 Understanding Performance.10
11 Clock Cycles per Instruction Not all instructions take the same amount of time to execute One way to think about execution time is that it equals the number of instructions executed multiplied by the average time per instruction # CPU clock cycles # Instructions Average clock cycles for a program = for a program x per instruction Clock cycles per instruction (CPI) the average number of clock cycles each instruction takes to execute A way to compare two different implementations of the same ISA CPI CPI for this instruction class A B C CS35101 L04 Understanding Performance.11
12 Effective CPI Computing the overall effective CPI is done by looking at the different types of instructions and their individual cycle counts and averaging Overall effective CPI = Σ (CPI i x IC i ) i = 1 Where IC i is the count (percentage) of the number of instructions of class i executed CPI i is the (average) number of clock cycles per instruction for that instruction class n is the number of instruction classes n The overall effective CPI varies by instruction mix a measure of the dynamic frequency of instructions across one or many programs CS35101 L04 Understanding Performance.12
13 THE Performance Equation Our basic performance equation is then CPU time = Instruction_count x CPI x clock_cycle Instruction_count x CPI CPU time = clock_rate These equations separate the three key factors that affect performance or Can measure the CPU execution time by running the program The clock rate is usually given Can measure overall instruction count by using profilers/ simulators without knowing all of the implementation details CPI varies by instruction type and ISA implementation for which we must know the implementation details CS35101 L04 Understanding Performance.13
14 Determinates of CPU Performance CPU time = Instruction_count x CPI x clock_cycle Algorithm Instruction_ count CPI clock_cycle Programming language Compiler ISA Processor organization Technology CS35101 L04 Understanding Performance.14
15 A Simple Example Op Freq CPI i Freq x CPI i ALU 50% 1. Load 20% 5 Store 10% 3 Branch 20% Σ = How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? How does this compare with using branch prediction to shave a cycle off the branch time? What if two ALU instructions could be executed at once? 2 CS35101 L04 Understanding Performance.16
16 Evaluating Performance Set of programs run on a computer is a workload A benchmark is a workload specifically designed to measure a computer's performance The best benchmarks are made up of real programs Synthetic benchmarks, on the other hand, try to measure low-level performance by repeating short blocks of code CS35101 L04 Understanding Performance.18
17 Comparing and Summarizing Performance How do we summarize the performance for benchmark set with a single number? The average of execution times that is directly proportional to total execution time is the arithmetic mean (AM) n AM = 1/n Σ Time i i = 1 Where Time i is the execution time for the i th program of a total of n programs in the workload A smaller mean indicates a smaller average execution time and thus improved performance Guiding principle in reporting performance measurements is reproducibility list everything another experimenter would need to duplicate the experiment (version of the operating system, compiler settings, input set used, specific computer configuration (clock rate, cache sizes and speed, memory size and speed, etc.)) CS35101 L04 Understanding Performance.19
18 SPEC Standard sets of benchmarks for modern computers based on real programs Covers a number of application areas including graphics, file servers, web servers, etc. CPU benchmarks measure CPU performance on integer and floating-point programs CS35101 L04 Understanding Performance.20
19 SPEC Benchmarks Integer benchmarks gzip compression vpr FPGA place & route gcc GNU C compiler mcf Combinatorial optimization crafty Chess program parser Word processing program eon Computer visualization perlbmk perl application gap Group theory interpreter vortex Object oriented database bzip2 compression twolf Circuit place & route wupwise swim mgrid applu mesa galgel art equake facerec ammp lucas fma3d sixtrack apsi FP benchmarks Quantum chromodynamics Shallow water model Multigrid solver in 3D fields Parabolic/elliptic pde 3D graphics library Computational fluid dynamics Image recognition (NN) Seismic wave propagation simulation Facial image recognition Computational chemistry Primality testing Crash simulation fem Nuclear physics accel Pollutant distribution CS35101 L04 Understanding Performance.21
20 Example SPEC Ratings CS35101 L04 Understanding Performance.22
21 Pentium M Performance CS35101 L04 Understanding Performance.23
22 Power Efficiency Power consumption especially in the embedded market where battery life is important (and passive cooling) For power-limited applications, the most important metric is energy efficiency Modern mobile processors implement features to reduce power usage such as dynamic clock scaling Goal is to maximize performance/power ratio CS35101 L04 Understanding Performance.24
23 Performance/Power CS35101 L04 Understanding Performance.25
24 Speedup Speedup tells us how many times faster our system is after making some improvement That is, a speedup of a 2 means the new version is twice as fast as the old one Speedup = Time before improvement Time after improvement CS35101 L04 Understanding Performance.26
25 Amdahl's Law Amdahl's law provides a limit on the improvement in system performance from an improvement in one part of the system Demonstrates the law of diminishing returns Speedup = f s + 1 (1! f ) f is the fraction of the computation that is improved s is the speedup of the improvement CS35101 L04 Understanding Performance.27
26 Amdahl's Law (2) Speedup = Time before improvement Time affected + Time unaffected Amount of improvement Alternate form of Amdahl's law based on actual execution time instead of fractions This is the form used by the book CS35101 L04 Understanding Performance.28
27 Poor Performance Measures Clock rate Instructions per second (IPS) MIPS (million instructions per second) - MIPS = Instruction count / (Execution time x 10 6 ) Floating-point operations per second (FLOPS) CS35101 L04 Understanding Performance.29
28 Summary: Evaluating ISAs Design-time metrics: Can it be implemented, in how long, at what cost? Can it be programmed? Ease of compilation? Static Metrics: How many bytes does the program occupy in memory? Dynamic Metrics: How many instructions are executed? How many bytes does the processor fetch to execute the program? How many clocks are required per instruction? CPI How "lean" a clock is practical? Best Metric: Time to execute the program! depends on the instructions set, the processor organization, and compilation techniques. Inst. Count Cycle Time CS35101 L04 Understanding Performance.30
29 Next Lecture and Reminders Next lecture MIPS non-pipelined datapath/control path review - Reading assignment PH, Chapter 5 Reminders Choose one of the first 100 listed at the most current list at top500.org CS35101 L04 Understanding Performance.31
Performance Metrics, Amdahl s Law
ecture 26 Computer Science 61C Spring 2017 March 20th, 2017 Performance Metrics, Amdahl s Law 1 New-School Machine Structures (It s a bit more complicated!) Software Hardware Parallel Requests Assigned
More informationMeasuring and Evaluating Computer System Performance
Measuring and Evaluating Computer System Performance Performance Marches On... But what is performance? The bottom line: Performance Car Time to Bay Area Speed Passengers Throughput (pmph) Ferrari 3.1
More informationMetrics How to improve performance? CPI MIPS Benchmarks CSC3501 S07 CSC3501 S07. Louisiana State University 4- Performance - 1
Performance of Computer Systems Dr. Arjan Durresi Louisiana State University Baton Rouge, LA 70810 Durresi@Csc.LSU.Edu LSUEd These slides are available at: http://www.csc.lsu.edu/~durresi/csc3501_07/ Louisiana
More informationMemory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors
Memory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors STIJN EYERMAN and LIEVEN EECKHOUT Ghent University A thread executing on a simultaneous multithreading (SMT) processor
More informationCS 6290 Evaluation & Metrics
CS 6290 Evaluation & Metrics Performance Two common measures Latency (how long to do X) Also called response time and execution time Throughput (how often can it do X) Example of car assembly line Takes
More informationPerformance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics
Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Performance Metrics http://www.yildiz.edu.tr/~naydin 1 2 Objectives How can we meaningfully measure and compare
More informationMLP-Aware Runahead Threads in a Simultaneous Multithreading Processor
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor Kenzo Van Craeynest, Stijn Eyerman, and Lieven Eeckhout Department of Electronics and Information Systems (ELIS), Ghent University,
More informationMLP-Aware Runahead Threads in a Simultaneous Multithreading Processor
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor Kenzo Van Craeynest, Stijn Eyerman, and Lieven Eeckhout Department of Electronics and Information Systems (ELIS), Ghent University,
More information7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10 CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan CSE-2021 July-12-2012 2 Single Cycle with Jump Multi-Cycle Implementation
More informationAdministrative Issues
dministrative Issues Text book ($56.69 in mazon.com) Scanned problem set Email list Homework 1 announced, due 01/13/10 Quiz, 01/15/10 Graduate students meeting Relevant chapters in textbook? Technology
More informationCS4617 Computer Architecture
1/26 CS4617 Computer Architecture Lecture 2 Dr J Vaughan September 10, 2014 2/26 Amdahl s Law Speedup = Execution time for entire task without using enhancement Execution time for entire task using enhancement
More informationCS152 Computer Architecture and Engineering Lecture 3: ReviewTechnology & Delay Modeling. September 3, 1997
CS152 Computer Architecture and Engineering Lecture 3: ReviewTechnology & Delay Modeling September 3, 1997 Dave Patterson (httpcsberkeleyedu/~patterson) lecture slides: http://www-insteecsberkeleyedu/~cs152/
More informationComputer Architecture
Computer Architecture Lecture 01 Arkaprava Basu www.csa.iisc.ac.in Acknowledgements Several of the slides in the deck are from Luis Ceze (Washington), Nima Horanmand (Stony Brook), Mark Hill, David Wood,
More informationHeat-and-Run: Leveraging SMT and CMP to Manage Power Density Through the Operating System
To appear in the 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2004) Heat-and-Run: Leveraging SMT and CMP to Manage Power Density Through
More informationChapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:
Chapter 4 The Processor Part II Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup p = 2n/(0.5n + 1.5) 4 =
More informationPerformance Evaluation of Recently Proposed Cache Replacement Policies
University of Jordan Computer Engineering Department Performance Evaluation of Recently Proposed Cache Replacement Policies CPE 731: Advanced Computer Architecture Dr. Gheith Abandah Asma Abdelkarim January
More informationPipeline Damping: A Microarchitectural Technique to Reduce Inductive Noise in Supply Voltage
Pipeline Damping: A Microarchitectural Technique to Reduce Inductive Noise in Supply Voltage Michael D. Powell and T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University {mdpowell,
More informationExploiting Resonant Behavior to Reduce Inductive Noise
To appear in the 31st International Symposium on Computer Architecture (ISCA 31), June 2004 Exploiting Resonant Behavior to Reduce Inductive Noise Michael D. Powell and T. N. Vijaykumar School of Electrical
More informationBalancing Resource Utilization to Mitigate Power Density in Processor Pipelines
Balancing Resource Utilization to Mitigate Power Density in Processor Pipelines Michael D. Powell, Ethan Schuchman and T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University
More informationMitigating Inductive Noise in SMT Processors
Mitigating Inductive Noise in SMT Processors Wael El-Essawy and David H. Albonesi Department of Electrical and Computer Engineering, University of Rochester ABSTRACT Simultaneous Multi-Threading, although
More informationCombined Circuit and Microarchitecture Techniques for Effective Soft Error Robustness in SMT Processors
Combined Circuit and Microarchitecture Techniques for Effective Soft Error Robustness in SMT Processors Xin Fu, Tao Li and José Fortes Department of ECE, University of Florida xinfu@ufl.edu, taoli@ece.ufl.edu,
More informationBest Instruction Per Cycle Formula >>>CLICK HERE<<<
Best Instruction Per Cycle Formula 6 Performance tuning, 7 Perceived performance, 8 Performance Equation, 9 See also is the average instructions per cycle (IPC) for this benchmark. Even. Click Card to
More informationCSE 305: Computer Architecture
CSE 305: Computer Architecture Tanvir Ahmed Khan takhandipu@gmail.com Department of Computer Science and Engineering Bangladesh University of Engineering and Technology. September 6, 2015 1/16 Recap 2/16
More informationWavelet Analysis for Microprocessor Design: Experiences with Wavelet-Based di/dt Characterization
Wavelet Analysis for Microprocessor Design: Experiences with Wavelet-Based di/dt Characterization Russ Joseph Dept. of Electrical Eng. Princeton University rjoseph@ee.princeton.edu Zhigang Hu T.J. Watson
More informationCS429: Computer Organization and Architecture
CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: November 8, 2017 at 09:27 CS429 Slideset 14: 1 Overview What s wrong
More informationUsing ECC Feedback to Guide Voltage Speculation in Low-Voltage Processors
Using ECC Feedback to Guide Voltage Speculation in Low-Voltage Processors Anys Bacha Computer Science and Engineering The Ohio State University bacha@cse.ohio-state.edu Radu Teodorescu Computer Science
More informationECE473 Computer Architecture and Organization. Pipeline: Introduction
Computer Architecture and Organization Pipeline: Introduction Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 11.1 The Laundry Analogy Student A,
More informationAssessing and. Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.
Assessing and Understanding Performance Rui Wang, Assistant professor Dept. of Information and Communication Tongji University it Email: ruiwang@tongji.edu.cn 4.1 Introduction Pi Primary reason for examining
More informationMitigating the Effects of Process Variation in Ultra-low Voltage Chip Multiprocessors using Dual Supply Voltages and Half-Speed Stages
Mitigating the Effects of Process Variation in Ultra-low Voltage Chip Multiprocessors using Dual Supply Voltages and Half-Speed Stages Timothy N. Miller, Renji Thomas, Radu Teodorescu Department of Computer
More informationFinal Report: DBmbench
18-741 Final Report: DBmbench Yan Ke (yke@cs.cmu.edu) Justin Weisz (jweisz@cs.cmu.edu) Dec. 8, 2006 1 Introduction Conventional database benchmarks, such as the TPC-C and TPC-H, are extremely computationally
More informationCSE502: Computer Architecture Welcome to CSE 502
Welcome to CSE 502 Introduction & Review Today s Lecture Course Overview Course Topics Grading Logistics Academic Integrity Policy Homework Quiz Key basic concepts for Computer Architecture Course Overview
More informationMLP-aware Instruction Queue Resizing: The Key to Power-Efficient Performance
MLP-aware Instruction Queue Resizing: The Key to Power-Efficient Performance Pavlos Petoumenos 1, Georgia Psychou 1, Stefanos Kaxiras 1, Juan Manuel Cebrian Gonzalez 2, and Juan Luis Aragon 2 1 Department
More informationIF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps
CSE 30321 Computer Architecture I Fall 2010 Homework 06 Pipelined Processors 85 points Assigned: November 2, 2010 Due: November 9, 2010 PLEASE DO THE ASSIGNMENT ON THIS HANDOUT!!! Problem 1: (25 points)
More informationCS61c: Introduction to Synchronous Digital Systems
CS61c: Introduction to Synchronous Digital Systems J. Wawrzynek March 4, 2006 Optional Reading: P&H, Appendix B 1 Instruction Set Architecture Among the topics we studied thus far this semester, was the
More informationCMOS Process Variations: A Critical Operation Point Hypothesis
CMOS Process Variations: A Critical Operation Point Hypothesis Janak H. Patel Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign jhpatel@uiuc.edu Computer Systems
More informationMLP-aware Instruction Queue Resizing: The Key to Power- Efficient Performance
MLP-aware Instruction Queue Resizing: The Key to Power- Efficient Performance Pavlos Petoumenos 1, Georgia Psychou 1, Stefanos Kaxiras 1, Juan Manuel Cebrian Gonzalez 2, and Juan Luis Aragon 2 1 Department
More informationCS 110 Computer Architecture Lecture 11: Pipelining
CS 110 Computer Architecture Lecture 11: Pipelining Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on
More informationParallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir
Parallel Computing 2020: Preparing for the Post-Moore Era Marc Snir THE (CMOS) WORLD IS ENDING NEXT DECADE So says the International Technology Roadmap for Semiconductors (ITRS) 2 End of CMOS? IN THE LONG
More informationInherent Time Redundancy (ITR): Using Program Repetition for Low-Overhead Fault Tolerance
Inherent Time Redundancy (ITR): Using Program Repetition for Low-Overhead Fault Tolerance Vimal Reddy, Eric Rotenberg Center for Efficient, Secure and Reliable Computing, ECE, North Carolina State University
More informationREVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.
December 3-6, 2018 Santa Clara Convention Center CA, USA REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND. https://tmt.knect365.com/risc-v-summit @risc_v ACCELERATING INFERENCING ON THE EDGE WITH RISC-V
More informationChapter 16 - Instruction-Level Parallelism and Superscalar Processors
Chapter 16 - Instruction-Level Parallelism and Superscalar Processors Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 16 - Superscalar Processors 1 / 78 Table of Contents I 1 Overview
More informationComputer Science 246. Advanced Computer Architecture. Spring 2010 Harvard University. Instructor: Prof. David Brooks
Advanced Computer Architecture Spring 2010 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture Outline Instruction-Level Parallelism Scoreboarding (A.8) Instruction Level Parallelism
More informationLec 24: Parallel Processors. Announcements
Lec 24: Parallel Processors Kavita ala CS 3410, Fall 2008 Computer Science Cornell University P 3 out Hack n Seek nnouncements The goal is to have fun with it Recitations today will talk about it Pizza
More informationOutline Simulators and such. What defines a simulator? What about emulation?
Outline Simulators and such Mats Brorsson & Mladen Nikitovic ICT Dept of Electronic, Computer and Software Systems (ECS) What defines a simulator? Why are simulators needed? Classifications Case studies
More informationEECS 473. Review etc.
EECS 473 Review etc. Nice job folks Projects went well. Was nervous until the last minute, but things came out well. Same thing in 470 btw. Still have a demo to do due to snow delay, but otherwise all
More informationDeCoR: A Delayed Commit and Rollback Mechanism for Handling Inductive Noise in Processors
DeCoR: A Delayed Commit and Rollback Mechanism for Handling Inductive Noise in Processors Meeta S. Gupta, Krishna K. Rangan, Michael D. Smith, Gu-Yeon Wei and David Brooks School of Engineering and Applied
More informationEECS 473. Review etc.
EECS 473 Review etc. Nice job folks Projects went well. Last groups demoed on Sunday. Due date issues Assignment 2 and the Final Report are both due today. There was some communication issues with due
More informationLeveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors
Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan a) Key Laboratory of Computer System and Architecture, Institute of Computing
More informationControl Techniques to Eliminate Voltage Emergencies in High Performance Processors
Control Techniques to Eliminate Voltage Emergencies in High Performance Processors Russ Joseph David Brooks Margaret Martonosi Department of Electrical Engineering Princeton University rjoseph,mrm @ee.princeton.edu
More informationChapter 1 Introduction
Chapter 1 Introduction 1.1 Introduction There are many possible facts because of which the power efficiency is becoming important consideration. The most portable systems used in recent era, which are
More informationLecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)
Lecture Topics Today: Pipelined Processors (P&H 4.5-4.10) Next: continued 1 Announcements Milestone #4 (due 2/23) Milestone #5 (due 3/2) 2 1 ISA Implementations Three different strategies: single-cycle
More informationIF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps
CSE 30321 Computer Architecture I Fall 2011 Homework 06 Pipelined Processors 75 points Assigned: November 1, 2011 Due: November 8, 2011 PLEASE DO THE ASSIGNMENT ON THIS HANDOUT!!! Problem 1: (15 points)
More informationAsanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.
Pipeline Hazards Krste Asanovic Laboratory for Computer Science M.I.T. Pipelined DLX Datapath without interlocks and jumps 31 0x4 RegDst RegWrite inst Inst rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext A B OpSel
More informationDr. D. M. Akbar Hussain
Course Objectives: To enable the students to learn some more practical facts about DSP architectures. Objective is that they can apply this knowledge to map any digital filtering algorithm and related
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science !!! Basic MIPS integer pipeline Branches with one
More informationRamon Canal NCD Master MIRI. NCD Master MIRI 1
Wattch, Hotspot, Hotleakage, McPAT http://www.eecs.harvard.edu/~dbrooks/wattch-form.html http://lava.cs.virginia.edu/hotspot http://lava.cs.virginia.edu/hotleakage http://www.hpl.hp.com/research/mcpat/
More informationEECE 321: Computer Organiza5on
EECE 321: Computer Organiza5on Mohammad M. Mansour Dept. of Electrical and Compute Engineering American University of Beirut Lecture 21: Pipelining Processor Pipelining Same principles can be applied to
More informationInstructor: Dr. Mainak Chaudhuri. Instructor: Dr. S. K. Aggarwal. Instructor: Dr. Rajat Moona
NPTEL Online - IIT Kanpur Instructor: Dr. Mainak Chaudhuri Instructor: Dr. S. K. Aggarwal Course Name: Department: Program Optimization for Multi-core Architecture Computer Science and Engineering IIT
More informationInstruction Scheduling for Low Power Dissipation in High Performance Microprocessors
Instruction Scheduling for Low Power Dissipation in High Performance Microprocessors Abstract Mark C. Toburen Thomas M. Conte Department of Electrical and Computer Engineering North Carolina State University
More informationConventional 4-Way Set-Associative Cache
ISLPED 99 International Symposium on Low Power Electronics and Design Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption Koji Inoue, Tohru Ishihara, and Kazuaki Murakami
More informationOverview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture
Overview 1 Trends in Microprocessor Architecture R05 Robert Mullins Computer architecture Scaling performance and CMOS Where have performance gains come from? Modern superscalar processors The limits of
More informationStatistical Simulation of Multithreaded Architectures
Statistical Simulation of Multithreaded Architectures Joshua L. Kihm and Daniel A. Connors University of Colorado at Boulder Department of Electrical and Computer Engineering UCB 425, Boulder, CO, 80309
More informationBus-Switch Encoding for Power Optimization of Address Bus
May 2006, Volume 3, No.5 (Serial No.18) Journal of Communication and Computer, ISSN1548-7709, USA Haijun Sun 1, Zhibiao Shao 2 (1,2 School of Electronics and Information Engineering, Xi an Jiaotong University,
More informationPipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold
Pipelining Readings: 4.5-4.8 Example: Doing the laundry Ann, Brian, Cathy, & Dave A B C D each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes Folder takes
More informationCombinatorial Logic Design Multiplexers and ALUs CS 64: Computer Organization and Design Logic Lecture #14
Combinatorial Logic Design Multiplexers and ALUs CS 64: Computer Organization and Design Logic Lecture #14 Ziad Matni Dept. of Computer Science, UCSB Administrative Remaining on the calendar This supersedes
More informationΕΠΛ 605: Προχωρημένη Αρχιτεκτονική
ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική Υπολογιστών Presentation of UniServer Horizon 2020 European project findings: X-Gene server chips, voltage-noise characterization, high-bandwidth voltage measurements,
More informationDesign Challenges in Multi-GHz Microprocessors
Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor Development www.compaq.com Introduction Moore s Law ( Law (the trend that the demand for IC functions and the
More informationREAL TIME DIGITAL SIGNAL PROCESSING. Introduction
REAL TIME DIGITAL SIGNAL Introduction Why Digital? A brief comparison with analog. PROCESSING Seminario de Electrónica: Sistemas Embebidos Advantages The BIG picture Flexibility. Easily modifiable and
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Out-of-Order Schedulers Data-Capture Scheduler Dispatch: read available operands from ARF/ROB, store in scheduler Commit: Missing operands filled in from bypass Issue: When
More informationAREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER
American Journal of Applied Sciences 11 (2): 180-188, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.180.188 Published Online 11 (2) 2014 (http://www.thescipub.com/ajas.toc) AREA
More informationDynamic Scheduling I
basic pipeline started with single, in-order issue, single-cycle operations have extended this basic pipeline with multi-cycle operations multiple issue (superscalar) now: dynamic scheduling (out-of-order
More informationEN164: Design of Computing Systems Lecture 22: Processor / ILP 3
EN164: Design of Computing Systems Lecture 22: Processor / ILP 3 Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationDesign of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm
Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm Vijay Kumar Ch 1, Leelakrishna Muthyala 1, Chitra E 2 1 Research Scholar, VLSI, SRM University, Tamilnadu, India 2 Assistant Professor,
More informationCLIPPER: Counter-based Low Impact Processor Power Estimation at Run-time
CLIPPER: Counter-based Low Impact Processor Power Estimation at Run-time Jorgen Peddersen, Sri Parameswaran School of Computer Science and Engineering The University of New South Wales & National ICT Australia
More informationPower Signal Processing: A New Perspective for Power Analysis and Optimization
Power Signal Processing: A New Perspective for Power Analysis and Optimization Quming Zhou, Lin Zhong and Kartik Mohanram Department of Electrical and Computer Engineering Rice University, Houston, TX
More informationArchitecture Performance Prediction Using Evolutionary Artificial Neural Networks
Architecture Performance Prediction Using Evolutionary Artificial Neural Networks P.A. Castillo 1,A.M.Mora 1, J.J. Merelo 1, J.L.J. Laredo 1,M.Moreto 2, F.J. Cazorla 3,M.Valero 2,3, and S.A. McKee 4 1
More informationIMPROVED THERMAL MANAGEMENT WITH RELIABILITY BANKING
IMPROVED THERMAL MANAGEMENT WITH RELIABILITY BANKING USING A FIXED TEMPERATURE FOR THERMAL THROTTLING IS PESSIMISTIC. REDUCED AGING DURING PERIODS OF LOW TEMPERATURE CAN COMPENSATE FOR ACCELERATED AGING
More informationLecture 4: Introduction to Pipelining
Lecture 4: Introduction to Pipelining Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes A B C D Dryer takes 40 minutes Folder
More informationPower Signal Processing: A New Perspective for Power Analysis and Optimization
Power Signal Processing: A New Perspective for Power Analysis and Optimization Quming Zhou, Lin Zhong and Kartik Mohanram Department of Electrical and Computer Engineering Rice University, Houston, TX
More informationIJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN
An efficient add multiplier operator design using modified Booth recoder 1 I.K.RAMANI, 2 V L N PHANI PONNAPALLI 2 Assistant Professor 1,2 PYDAH COLLEGE OF ENGINEERING & TECHNOLOGY, Visakhapatnam,AP, India.
More informationPipelined Beta. Handouts: Lecture Slides. Where are the registers? Spring /10/01. L16 Pipelined Beta 1
Pipelined Beta Where are the registers? Handouts: Lecture Slides L16 Pipelined Beta 1 Increasing CPU Performance MIPS = Freq CPI MIPS = Millions of Instructions/Second Freq = Clock Frequency, MHz CPI =
More informationLow Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS
Low Power Design Part I Introduction and VHDL design Ricardo Santos ricardo@facom.ufms.br LSCAD/FACOM/UFMS Motivation for Low Power Design Low power design is important from three different reasons Device
More informationContents. Basic Concepts. Histogram of CPU-burst Times. Diagram of Process State CHAPTER 5 CPU SCHEDULING. Alternating Sequence of CPU And I/O Bursts
Contents CHAPTER 5 CPU SCHEDULING Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Basic Concepts Maximum CPU utilization obtained with multiprogramming
More informationAI Application Processing Requirements
AI Application Processing Requirements 1 Low Medium High Sensor analysis Activity Recognition (motion sensors) Stress Analysis or Attention Analysis Audio & sound Speech Recognition Object detection Computer
More informationWhat this paper is about:
The Impact of Pipelining on Energy per Operation in Field-Programmable Gate Arrays Steve Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, Canada Su-Shin
More informationA B C D. Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold. Time
Pipelining Readings: 4.5-4.8 Example: Doing the laundry A B C D Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes Folder takes
More informationEvolution of DSP Processors. Kartik Kariya EE, IIT Bombay
Evolution of DSP Processors Kartik Kariya EE, IIT Bombay Agenda Expected features of DSPs Brief overview of early DSPs Multi-issue DSPs Case Study: VLIW based Processor (SPXK5) for Mobile Applications
More informationUSING EMBEDDED PROCESSORS IN HARDWARE MODELS OF ARTIFICIAL NEURAL NETWORKS
USING EMBEDDED PROCESSORS IN HARDWARE MODELS OF ARTIFICIAL NEURAL NETWORKS DENIS F. WOLF, ROSELI A. F. ROMERO, EDUARDO MARQUES Universidade de São Paulo Instituto de Ciências Matemáticas e de Computação
More informationEfficiently Exploiting Memory Level Parallelism on Asymmetric Coupled Cores in the Dark Silicon Era
28 Efficiently Exploiting Memory Level Parallelism on Asymmetric Coupled Cores in the Dark Silicon Era GEORGE PATSILARAS, NIKET K. CHOUDHARY, and JAMES TUCK, North Carolina State University Extracting
More informationLow-Power Design for Embedded Processors
Low-Power Design for Embedded Processors BILL MOYER, MEMBER, IEEE Invited Paper Minimization of power consumption in portable and batterypowered embedded systems has become an important aspect of processor
More informationSuggested Readings! Lecture 12" Introduction to Pipelining! Example: We have to build x cars...! ...Each car takes 6 steps to build...! ! Readings!
1! CSE 30321 Lecture 12 Introduction to Pipelining! CSE 30321 Lecture 12 Introduction to Pipelining! 2! Suggested Readings!! Readings!! H&P: Chapter 4.5-4.7!! (Over the next 3-4 lectures)! Lecture 12"
More informationExploring Heterogeneity within a Core for Improved Power Efficiency
Computer Engineering Exploring Heterogeneity within a Core for Improved Power Efficiency Sudarshan Srinivasan Nithesh Kurella Israel Koren Sandip Kundu May 2, 215 CE Tech Report # 6 Available at http://www.eng.biu.ac.il/segalla/computer-engineering-tech-reports/
More informationEECS 427 Lecture 22: Low and Multiple-Vdd Design
EECS 427 Lecture 22: Low and Multiple-Vdd Design Reading: 11.7.1 EECS 427 W07 Lecture 22 1 Last Time Low power ALUs Glitch power Clock gating Bus recoding The low power design space Dynamic vs static EECS
More informationDigital Systems Design
Digital Systems Design Clock Networks and Phase Lock Loops on Altera Cyclone V Devices Dr. D. J. Jackson Lecture 9-1 Global Clock Network & Phase-Locked Loops Clock management is important within digital
More informationAuthor: Yih-Yih Lin. Correspondence: Yih-Yih Lin Hewlett-Packard Company MR Forest Street Marlboro, MA USA
4 th European LS-DYNA Users Conference MPP / Linux Cluster / Hardware I A Correlation Study between MPP LS-DYNA Performance and Various Interconnection Networks a Quantitative Approach for Determining
More informationProject 5: Optimizer Jason Ansel
Project 5: Optimizer Jason Ansel Overview Project guidelines Benchmarking Library OoO CPUs Project Guidelines Use optimizations from lectures as your arsenal If you decide to implement one, look at Whale
More informationLeveraging Simultaneous Multithreading for Adaptive Thermal Control
Leveraging Simultaneous Multithreading for Adaptive Thermal Control James Donald and Margaret Martonosi Department of Electrical Engineering Princeton University {jdonald, mrm}@princeton.edu Abstract The
More informationLow-Power Digital CMOS Design: A Survey
Low-Power Digital CMOS Design: A Survey Krister Landernäs June 4, 2005 Department of Computer Science and Electronics, Mälardalen University Abstract The aim of this document is to provide the reader with
More informationDesign of Adjustable Reconfigurable Wireless Single Core
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 6, Issue 2 (May. - Jun. 2013), PP 51-55 Design of Adjustable Reconfigurable Wireless Single
More informationArchitectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance
Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance Michael D. Powell, Arijit Biswas, Shantanu Gupta, and Shubu Mukherjee SPEARS Group, Intel Massachusetts EECS, University
More information