CSE502: Computer Architecture Welcome to CSE 502

Similar documents
Computer Architecture

CS4617 Computer Architecture

Measuring and Evaluating Computer System Performance

Performance Metrics, Amdahl s Law

Power Spring /7/05 L11 Power 1

A Static Power Model for Architects

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics

ECE 484 VLSI Digital Circuits Fall Lecture 02: Design Metrics

Low Power VLSI Circuit Synthesis: Introduction and Course Outline

CS Computer Architecture Spring Lecture 04: Understanding Performance

Processors Processing Processors. The meta-lecture

Chapter 1 Introduction

Low-Power Digital CMOS Design: A Survey

Low Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

Low Power Design in VLSI

Interconnect-Power Dissipation in a Microprocessor

Instructor: Dr. Mainak Chaudhuri. Instructor: Dr. S. K. Aggarwal. Instructor: Dr. Rajat Moona

Topics. Low Power Techniques. Based on Penn State CSE477 Lecture Notes 2002 M.J. Irwin and adapted from Digital Integrated Circuits 2002 J.

An Overview of Static Power Dissipation

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

CS 6290 Evaluation & Metrics

Chapter 3. H/w s/w interface. hardware software Vijaykumar ECE495K Lecture Notes: Chapter 3 1

CSCI-564 Advanced Computer Architecture

MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng.

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

Course Content. Course Content. Course Format. Low Power VLSI System Design Lecture 1: Introduction. Course focus

CMOS circuits and technology limits

CS 110 Computer Architecture Lecture 11: Pipelining

Power of Realtime 3D-Rendering. Raja Koduri

CS429: Computer Organization and Architecture

Administrative Issues

Recap. RISC vs. CISC. Caches. Load, Store instructions. Locality of reference It is small and it is fast

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

Trends and Challenges in VLSI Technology Scaling Towards 100nm

Outline. Technology Trends Moore's Law: Process, Feature Size, Scaling Power, Energy

A Case for Opportunistic Embedded Sensing In Presence of Hardware Power Variability

Lecture 04 CSE 40547/60547 Computing at the Nanoscale Interconnect

Jan Rabaey, «Low Powere Design Essentials," Springer tml

UNIT-II LOW POWER VLSI DESIGN APPROACHES

CMOS Process Variations: A Critical Operation Point Hypothesis

Suggested Readings! Lecture 12" Introduction to Pipelining! Example: We have to build x cars...! ...Each car takes 6 steps to build...! ! Readings!

CS61c: Introduction to Synchronous Digital Systems

Lec 24: Parallel Processors. Announcements

Low Power Design. Prof. MacDonald

The challenges of low power design Karen Yorav

EECS 473 Advanced Embedded Systems. Lecture 9: Groups introduce their projects Power integrity issues

6.111 Lecture # 19. Controlling Position. Some General Features of Servos: Servomechanisms are of this form:

VLSI Design I; A. Milenkovic 1

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir

UNIT-III POWER ESTIMATION AND ANALYSIS

On the Rules of Low-Power Design

The Transistor. Survey: What is Moore s Law? Survey: What is Moore s Law? Technology Unit Overview. Technology Generations

CSE 305: Computer Architecture

CS152 Computer Architecture and Engineering Lecture 3: ReviewTechnology & Delay Modeling. September 3, 1997

Project 5: Optimizer Jason Ansel

Low-Power CMOS VLSI Design

EECS 427 Lecture 22: Low and Multiple-Vdd Design

Contents 1 Introduction 2 MOS Fabrication Technology

EECS 473 Advanced Embedded Systems. Lecture 9: Groups introduce their projects Power integrity issues

High Speed ECC Implementation on FPGA over GF(2 m )

Temperature-adaptive voltage tuning for enhanced energy efficiency in ultra-low-voltage circuits

Static Power and the Importance of Realistic Junction Temperature Analysis

High Performance Computing for Engineers

Lecture 1. Tinoosh Mohsenin

COTSon: Infrastructure for system-level simulation

Lecture 1: Introduction to Digital System Design & Co-Design

Assessing and. Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Low-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

Department Computer Science and Engineering IIT Kanpur

Evolution of DSP Processors. Kartik Kariya EE, IIT Bombay

Low Power Design of Successive Approximation Registers

DESIGN CONSIDERATIONS FOR SIZE, WEIGHT, AND POWER (SWAP) CONSTRAINED RADIOS

Power Issues with Embedded Systems. Rabi Mahapatra Computer Science

MICROPROCESSOR TECHNOLOGY

Mohit Arora. The Art of Hardware Architecture. Design Methods and Techniques. for Digital Circuits. Springer

EE4800 CMOS Digital IC Design & Analysis. Lecture 1 Introduction Zhuo Feng

Announcements. Advanced Digital Integrated Circuits. Midterm feedback mailed back Homework #3 posted over the break due April 8

Lecture #29. Moore s Law

ECE 471 Embedded Systems Lecture 31

Introduction. Reading: Chapter 1. Courtesy of Dr. Dansereau, Dr. Brown, Dr. Vranesic, Dr. Harris, and Dr. Choi.

Pipelined Processor Design

Lecture 4&5 CMOS Circuits

Low Power Embedded Systems in Bioimplants

Low-Power Design for Embedded Processors

! Technology basis! MOS transistors! Moore s Law: transistor scaling. ! The metrics! Transistor speed! Cost! Power! Reliability

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations

EECS 270 Schedule and Syllabus for Fall 2011 Designed by Prof. Pinaki Mazumder

Introduction to co-simulation. What is HW-SW co-simulation?

Copyright 2003 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Slides prepared by Walid A. Najjar & Brian J.

A LOW POWER DESIGN FOR ARITHMETIC AND LOGIC UNIT

1) Fixed point [15 points] a) What are the primary reasons we might use fixed point rather than floating point? [2]

I DDQ Current Testing

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

Low Power Design for Systems on a Chip. Tutorial Outline

Microcontroller Systems. ELET 3232 Topic 13: Load Analysis

Dr. D. M. Akbar Hussain

Transcription:

Welcome to CSE 502 Introduction & Review

Today s Lecture Course Overview Course Topics Grading Logistics Academic Integrity Policy Homework Quiz Key basic concepts for Computer Architecture

Course Overview (1/3) Caveat 1: I m (kind of) new here. Caveat 2: This is a (somewhat) new course. Computer Architecture is the science and art of selecting and interconnecting hardware and software components to create computers

Course Overview (2/3) Ever wonder what s inside that box, anyway? Computer Architecture is an umbrella term Architecture: software-visible interface Micro-architecture: internal organization of components This course is mostly about micro-architecture What s inside the processor (CPU) What implications this has on software

Course Overview (3/3) This course is hard, roughly like CSE 506 In CSE 506, you learn what s inside an OS In CSE 502, you learn what s inside a CPU This is a project course Learn why things are the way they are, first hand We will build emulators of CPU components

Hardware Design Process Conceptual Design CSE502 Behavioral Implementation Evaluation Packaging Manufacturing Layout Structural Implementation

Course Topics Intro/Review Instruction Decode Pipelining Memory Hierarchy Processor Front-end Execution Core Multi-[socket(SMP,DSM) thread(smt,cmt) core(cmp)] Vector Processing and GPUs Will devote most attention to items in bold

Grading (Standard Option) Due Date Points Grading Required? 1 Quiz Today 0 Binary Yes 1 Homework Mar 31 10 Curve 0 to 100 No 2 Warm-up Projects Feb 17/Mar 3 20 Absolute Value No 1 Course Project Last class 100 See below Yes 1 Final 40 Absolute value No Participation 10 Curve 0 to 100 No Course Project Points 5+ stage, Direct-mapped Caches 50 5+ stage, Set-Associative Caches 60 Super-Scalar, Set-Associative Caches 70 Super-Scalar, Out-of-order, Set-Associative Caches 80 Super-Scalar, Out-of-order, Set-Associative Caches, Branch Predictor 90 Super-Scalar, Out-of-order, Set-Associative Caches, Branch Predictor, SMT 100 Without curve, need 100 points to get an A

If you are Grading (Research Option) Pursuing a PhD Pursuing an MS thesis Planning to take 523/524 with me You may select a research option for the grade Only available with instructor s approval When selecting this option Must work alone on everything Attain at least 60 points of the Standard Option Grade will be based on subjective research progress Note: Of the two, this is the harder option

Project milestones Logistics (1/3) There are no official project milestones If you need milestones, send me a milestone schedule Books I will deduct 5 points for each milestone you miss Recommended for reference, not required Does not mean you shouldn t get them Do not pirate books Computer Organization and Embedded Systems Computer Architecture (Hennessy & Patterson)

Working in groups Logistics (2/3) Permitted on everything except Quiz and Final Groups may be of any size Points deducted on group work are multiplied by group size Great opportunity or Rope to hang yourself you pick Attendance Optional (but highly advised) No laptop, tablet, or phone use in class Don t test me - I will deduct grade points

Blackboard Logistics (3/3) Grades will be posted there, nothing else Course Mailing List Subscription Is required http://lists.cs.stonybrook.edu/mailman/listinfo/cse502 Quiz Completion is required If you missed the 1 st class, come to office hours for it

You may... Academic Integrity Policy Discuss assignment, design, techniques You may not Share code outside your group Use any code not distributed as part of project handouts Exceptions are possible, but must receive explicit permission You must declare group composition Explicitly via email to TA and instructor Explicitly for each assignment At most five days after assignment handout

Questions?

Homework Independent hacking projects Mostly on QEMU and related software If interested Pick up assignment during office hours Come with all group members If can t make it during office hours Schedule an appointment

Quiz

Review Understanding and Measuring Performance Memory Locality Power and Energy Parallelism and Critical Paths Instruction Set Architecture Basic Processor Organization This is intended to be a review!

Amdahl s Law Speedup = time without enhancement / time with enhancement An enhancement speeds up fraction f of a task by factor S time new = time orig ( (1-f) + f/s ) S overall = 1 / ( (1-f) + f/s ) time orig (1 1 - f) f f time new (1 - f) f/s f/s

The Iron Law of Processor Performance Time Program Instructions Program Cycles Instruction Time Cycle Total Work In Program CPI or 1/IPC 1/f (frequency) Algorithms, Compilers, ISA Extensions Microarchitecture Microarchitecture, Process Tech Architects target CPI, but must understand the others

Performance Latency (execution time): time to finish one task Throughput (bandwidth): number of tasks/unit time Throughput can exploit parallelism, latency can t Sometimes complimentary, often contradictory Example: move people from A to B, 10 miles Car: capacity = 5, speed = 60 miles/hour Bus: capacity = 60, speed = 20 miles/hour Latency: car = 10 min, bus = 30 min Throughput: car = 15 PPH (count return trip), bus = 60 PPH No right answer: pick metric for your goals

Performance Improvement Processor A is X times faster than processor B if Latency(P,A) = Latency(P,B) / X Throughput(P,A) = Throughput(P,B) * X Processor A is X% faster than processor B if Latency(P,A) = Latency(P,B) / (1+X/100) Throughput(P,A) = Throughput(P,B) * (1+X/100) Car/bus example Latency? Car is 3 times (200%) faster than bus Throughput? Bus is 4 times (300%) faster than car

Partial Performance Metrics Pitfalls Which processor would you buy? Processor A: CPI = 2, clock = 2.8 GHz Processor B: CPI = 1, clock = 1.8 GHz Probably A, but B is faster (assuming same ISA/compiler) Classic example 800 MHz Pentium III faster than 1 GHz Pentium 4 Same ISA and compiler

Averaging Performance Numbers (1/2) Latency is additive, throughput is not Latency(P1+P2,A) = Latency(P1,A) + Latency(P2,A) Throughput(P1+P2,A)!= Throughput(P1,A)+Throughput(P2,A) Example: 180 miles @ 30 miles/hour + 180 miles @ 90 miles/hour 6 hours at 30 miles/hour + 2 hours at 90 miles/hour Total latency is 6 + 2 = 8 hours Total throughput is not 60 miles/hour Total throughput is only 45 miles/hour! (360 miles / (6 + 2 hours))

Averaging Performance Numbers (2/2) Arithmetic: times proportional to time e.g., latency Harmonic: rates inversely proportional to time e.g., throughput Geometric: ratios unit-less quantities e.g., speedups 1 n i 1Time i n n i 1 n n i 1 n 1 Ratei Ratio i Memorize these to avoid looking them up later

Locality Principle Recent past is a good indication of near future Temporal Locality: If you looked something up, it is very likely that you will look it up again soon Spatial Locality: If you looked something up, it is very likely you will look up something nearby soon

What uses power in a chip? Power vs. Energy (1/2) Power: instantaneous rate of energy transfer Expressed in Watts In Architecture, implies conversion of electricity to heat Power(Comp1+Comp2)=Power(Comp1)+Power(Comp2) Energy: measure of using power for some time Expressed in Joules power * time (joules = watts * seconds) Energy(OP1+OP2)=Energy(OP1)+Energy(OP2)

Power vs. Energy (2/2) Does this example help or hurt?

Why is energy important? Because electricity consumption has costs Impacts battery life for mobile Impacts electricity costs for tethered Delivering power for buildings, countries Gets worse with larger data centers ($7M for 1000 racks)

Why is power important? Because power has a peak All power spent is converted to heat Must dissipate the heat Need heat sinks and fans What if fans not fast enough? Chip powers off (if it s smart enough) Melts otherwise Thermal failures even when fans OK 50% server reliability degradation for +10oC 50% decrease in hard disk lifetime for +15oC

What uses power in a chip? Power: The Basics Dynamic power vs. Static power Static: leakage power Dynamic: switching power Static power: steady, constant energy cost Dynamic power: transitions from 0 1 and 1 0

What uses power in a chip? Dynamic Power Dissipation (Capacitive) Capacitance: Function of wire length, transistor size Supply Voltage: Function of technology and operating frequency Power ½ CV 2 Af Activity factor: Average fraction of all possible transitions (0 1 and 1 0) per cycle? Clock frequency: Function of desired performance

What uses power in a chip? Lowering Dynamic Power Reducing Voltage (V) has quadratic effect Has a negative (~linear) effect on frequency Limited by technology (insufficient difference of 1 & 0) Lowering Capacitance (C) has linear effect May improve frequency Limited by technology (small transistors, short wires) Reducing switching Activity (A) has linear effect A function of signal transition stats Turns off idle units to reduce activity Impacted by logic and architecture decisions

Leakage Power (1/3) Gate Applied Voltage Source Drain Current Gate Threshold Voltage + + + + + Current - - - - - Source Drain

Leakage Power (2/3) Gate Leakage Channel Leakage Sub-threshold Conductance

Leakage Power (3/3) I ox = K 2 W(V/T ox ) 2 e -at ox/v Gate Oxide Thickness keeps Shrinking (faster transistors) Source Channel Length keeps Shrinking (faster transistors) Drain Probability of Quantum Tunneling Increases (Leakage increases) Channel resistance decreases (Leakage increases) -V th /nv I q -V/V sub = K 1 We (1-e q ) Thermal Voltage (important take-away is on the next slide)

Thermal Runaway Leakage is a function of temperature Temp leads to Leakage Which burns more power -V th /nv I q -V/V sub = K 1 We (1-e q ) Which leads to Temp, which leads to Positive feedback loop will melt your chip

Power Management in Processors Clock gating Stop switching in unused components Done automatically in most designs Near instantaneous on/off behavior Power gating Turn off power to unused cores/caches High latency for on/off Saving SW state, flushing dirty cache lines, turning off clock tree Carefully done to avoid voltage spikes or memory bottlenecks Issue: Area & power consumption of power gate Opportunity: use thermal headroom for other cores

DVFS: Dynamic Voltage/Frequency Scaling Set frequency to the lowest needed Execution time = IC * CPI * F Scale back V to lowest for that frequency Lower voltage slower transistors Power C * V 2 * F Provides P states for power management Heavy load: frequency, voltage, power high Light load: frequency, voltage, power low Trade-off: power savings vs overhead of scaling Effectiveness limited by voltage range

Parallelism: Work and Critical Path Parallelism: number of independent tasks available Work (T1): time on sequential system Critical Path (T ): time on infinitely-parallel system Average Parallelism: P avg = T1 / T For a p-wide system: T p max{ T1/p, T } P avg >> p T p T1/p x = a + b; y = b * 2 z =(x-y) * (x+y) Can trade off frequency for parallelism

ISA: A contract between HW and SW ISA: Instruction Set Architecture A well-defined hardware/software interface The contract between software and hardware Functional definition of operations supported by hardware Precise description of how to invoke all features No guarantees regarding How operations are implemented Which operations are fast and which are slow (and when) Which operations take more energy (and which take less)

Components of an ISA Programmer-visible states Program counter, general purpose registers, memory, control registers Programmer-visible behaviors What to do, when to do it Example register-transfer-level description of an instruction A binary encoding if imem[rip]== add rd, rs, rt then rip rip+1 gpr[rd]=gpr[rs]+grp[rt] ISAs last forever, don t add stuff you don t need

RISC vs. CISC Recall Iron Law: (instructions/program) * (cycles/instruction) * (seconds/cycle) CISC (Complex Instruction Set Computing) Improve instructions/program with complex instructions Easy for assembly-level programmers, good code density RISC (Reduced Instruction Set Computing) Improve cycles/instruction with many single-cycle instructions Increases instruction/program, but hopefully not as much Help from smart compiler Perhaps improve clock cycle time (seconds/cycle) via aggressive implementation allowed by simpler instructions Today s x86 chips translate CISC into ~RISC

Prototypical Processor Organization Addr-gen. Fetch Decode Issue Execute Memory +4 (Write-back) PC Instruction Access Register File ALU Data Access

Conclusion Know the topics from today s lecture If you don t, you need to catch up So far, we had intro + review potpourri The rest of this course will be very unlike this lecture Very few new terms Practically no formulas Lots of new material Questions?