CS4617 Computer Architecture

Similar documents
Lec 24: Parallel Processors. Announcements

The future of lithography and its impact on design

CS61c: Introduction to Synchronous Digital Systems

Lecture #29. Moore s Law

CS Computer Architecture Spring Lecture 04: Understanding Performance

ECE 484 VLSI Digital Circuits Fall Lecture 02: Design Metrics

PC accounts for 353 Cory will be created early next week (when the class list is completed) Discussions & Labs start in Week 3

A Static Power Model for Architects

Recap. RISC vs. CISC. Caches. Load, Store instructions. Locality of reference It is small and it is fast

ECE 471/571 The CMOS Inverter Lecture-6. Gurjeet Singh

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

CS152 Computer Architecture and Engineering Lecture 3: ReviewTechnology & Delay Modeling. September 3, 1997

CSE502: Computer Architecture Welcome to CSE 502

Measuring and Evaluating Computer System Performance

1 Digital EE141 Integrated Circuits 2nd Introduction

Introduction. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, 2002

Outline. Technology Trends Moore's Law: Process, Feature Size, Scaling Power, Energy

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Topics. Low Power Techniques. Based on Penn State CSE477 Lecture Notes 2002 M.J. Irwin and adapted from Digital Integrated Circuits 2002 J.

The Transistor. Survey: What is Moore s Law? Survey: What is Moore s Law? Technology Unit Overview. Technology Generations

Chapter 1 Introduction

Low Power VLSI Circuit Synthesis: Introduction and Course Outline

VLSI Design I; A. Milenkovic 1

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir

CS 6290 Evaluation & Metrics

Computer Architecture

CPE/EE 427, CPE 527 VLSI Design I L01: Introduction, Design Metrics. What is this course all about?

Olivier Sentieys. IRISA/INRIA Cairn team. Power Consumption in Silicon Chips. Chips, logic gates and transistors.

CMOS Process Variations: A Critical Operation Point Hypothesis

Overview of Design Methodology. A Few Points Before We Start 11/4/2012. All About Handling The Complexity. Lecture 1. Put things into perspective

Low Power Design in VLSI

Lecture 04 CSE 40547/60547 Computing at the Nanoscale Interconnect

Course Outcome of M.Tech (VLSI Design)

Performance Metrics, Amdahl s Law

Low Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS

Formal Hardware Verification: Theory Meets Practice

Design Challenges in Multi-GHz Microprocessors

Leakage Power Minimization in Deep-Submicron CMOS circuits

Introduction. Reading: Chapter 1. Courtesy of Dr. Dansereau, Dr. Brown, Dr. Vranesic, Dr. Harris, and Dr. Choi.

Systems with Digital Integrated Circuits

EEC 118 Lecture #12: Dynamic Logic

The Critical Role of Firmware and Flash Translation Layers in Solid State Drive Design

Pramoda N V Department of Electronics and Communication Engineering, MCE Hassan Karnataka India

Low Power Design for Systems on a Chip. Tutorial Outline

Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, Digital EE141 Integrated Circuits 2nd Introduction

VLSI Design I; A. Milenkovic 1

Low-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering

UNIT-III POWER ESTIMATION AND ANALYSIS

Study the Analysis of Low power and High speed CMOS Logic Circuits in 90nm Technology

EMT 251 Introduction to IC Design

Homework 10 posted just for practice. Office hours next week, schedule TBD. HKN review today. Your feedback is important!

FPGA Based System Design

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics

Lecture 12 Memory Circuits. Memory Architecture: Decoders. Semiconductor Memory Classification. Array-Structured Memory Architecture RWM NVRWM ROM

Introduction to VLSI ASIC Design and Technology

Design Of Arthematic Logic Unit using GDI adder and multiplexer 1

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

VLSI: An Introduction

Digital Integrated Circuits (83-313) Lecture 3: Design Metrics

Reducing Transistor Variability For High Performance Low Power Chips

EECS 579 Fall What is Testing?

MICROPROCESSOR TECHNOLOGY

Digital Integrated CircuitDesign

Lecture Introduction

On-chip Networks in Multi-core era

Course Content. Course Content. Course Format. Low Power VLSI System Design Lecture 1: Introduction. Course focus

A Novel Approach for High Speed and Low Power 4-Bit Multiplier

Testing of Complex Digital Chips. Juri Schmidt Advanced Seminar

Thermal Management in the 3D-SiP World of the Future

1 Introduction COPYRIGHTED MATERIAL

NANOELECTRONIC TECHNOLOGY: CHALLENGES IN THE 21st CENTURY

Interconnect/Via CONCORDIA VLSI DESIGN LAB

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

Lecture 6: Electronics Beyond the Logic Switches Xufeng Kou School of Information Science and Technology ShanghaiTech University

CSE 305: Computer Architecture

Datorstödd Elektronikkonstruktion

EE4800 CMOS Digital IC Design & Analysis. Lecture 1 Introduction Zhuo Feng

Chapter 4: The Building Blocks: Binary Numbers, Boolean Logic, and Gates

Investigation on Performance of high speed CMOS Full adder Circuits

Practical Information

Semiconductor Devices

Microcontroller Systems. ELET 3232 Topic 13: Load Analysis

Low-Power CMOS VLSI Design

Practical Information

Lecture 1 Introduction to Solid State Electronics

19. Design for Low Power

LSI ON GLASS SUBSTRATES

! Technology basis! MOS transistors! Moore s Law: transistor scaling. ! The metrics! Transistor speed! Cost! Power! Reliability

A new 6-T multiplexer based full-adder for low power and leakage current optimization

Probabilistic and Variation- Tolerant Design: Key to Continued Moore's Law. Tanay Karnik, Shekhar Borkar, Vivek De Circuit Research, Intel Labs

BICMOS Technology and Fabrication

Chapter 7 Introduction to 3D Integration Technology using TSV

EECS 427 Lecture 21: Design for Test (DFT) Reminders

Power Spring /7/05 L11 Power 1

ECE 334: Electronic Circuits Lecture 10: Digital CMOS Circuits

Design of High Performance Arithmetic and Logic Circuits in DSM Technology

Semiconductor Memory: DRAM and SRAM. Department of Electrical and Computer Engineering, National University of Singapore

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

Low Transistor Variability The Key to Energy Efficient ICs

Transcription:

1/26 CS4617 Computer Architecture Lecture 2 Dr J Vaughan September 10, 2014

2/26 Amdahl s Law Speedup = Execution time for entire task without using enhancement Execution time for entire task using enhancement when possible Speedup overall = Execution time old Execution time new Speedup overall = 1 (1 Fraction enhanced )+ Fraction enhanced Speedup enhanced

3/26 Example Processor enhancement: New CPU ten times faster If original CPU is busy 40% of time and waits for I/O 60% of time, what is overall speedup? Fraction enhanced = 0.4 Speedup enhanced = 10 Speedup overall = 1 0.6+ 0.4 10 1.56

4/26 Example Floating point square root (FPSQR) enhancement Suppose FPSQR responsible for 20% of a graphics benchmark. Suppose FP instructions responsible for 50% of execution time benchmark Proposal 1: Speed up FPSQR H/W by 10 Proposal 2: make all FP instruction run 1.6 times faster 1 Speedup FPSQR = 1.22 (1 0.2)+ 0.2 10 1 Speedup FP = 1.23 (1 0.5)+ 0.5 1.6

5/26 The Processor Performance Equation CPU time = CPU clock cycles for a program clock cycle time Number of instructions executed = Instruction count (IC) CPI = CPU clock cycles for a program Instruction count Thus, clock cycles = CPI IC CPU time = CPI IC clock cycle time CPU clock cycles = n i=1 IC i CPI i Where IC i is the number of times instruction i is executed in a program, CPI i is the average number of clock cycles for instruction i and the sum gives the total processor clock cycles in a program Therefore CPU time = Clock cycle time n i=1 IC i CPI i n CPI = n i=1 IC i CPI i Instruction count = i=1 IC i Instruction count CPI i

6/26 Example Frequency of FP operations = 25% Average CPI of FP operations = 4.0 Average CPI of other instructions = 1.33 Frequency of FPSQR = 2% CPI of FPSQR = 20 Proposal 1: Decrease CPI of FPSQR to 2 Proposal 2: Decrease average CPI of all FP operations to 2.5

/26 Comparing the proposals CPI original = n i=1 IC i Instruction count CPI i = (4 25%) + (1.33 75%) = 2.0 CPI new FPSQR = CPI original 2% (CPI old FPSQR CPI new FPSQR only ) = 2.0 2% (20 2) = 1.64 CPI newfp = (75% 1.33) + (25% 2.5) = 1.625 So the FP enhancement gives marginally better performance

8/26 Addressing modes MIPS: Register, Immediate, Displacement (Constant offset + Reg content) 80x86: Absolute, Base + index + displacement, Base + scaled index + displacement, etc. ARM: MIPS addressing, PC-relative, Sum of two registers, autoincrement, autodecrement

9/26 Types and sizes of operands 80x86, ARM, MIPS 8-bit ASCII character 16-bit Unicode character or half-word 32-bit integer or word 64-bit double work or long integer IEEE 754 floating point 32-bit (single precision) and 64-bit (double precision) 80x86: 80-bit floating point (extended double precision)

10/26 Operations Data transfer Arithmetic and logic Control Floating point

11/26 Control flow Conditional jumps Unconditional jumps Procedure call and return PC-relative addressing MIPS tests contents of registers 8086/ARM test condition flags ARM/MIPS put return address in a register 8086 call puts return address on stack in memory

12/26 Encoding an ISA Fixed vs. Variable length instructions 80x86 variable, 1 to 18 bytes ARM/MIPS fixed, 32 bits ARM/MIPS reduced instruction size 16 bits ARM: Thumb MIPS: MIPS16

13/26 Computer Architecture ISA Organisation or Microarchitecture Hardware

14/26 Five rapidly-changing technologies 1. IC Logic Transistor count on a chip doubles every 18 to 24 months (Moore s Law) 2. Semiconductor DRAM Capacity per DRAM chip doubles every 2-3 years, but this rate is slowing 3. Semiconductor Flash (EEPROM) Standard for personal mobile devices (PMDs) Capacity per chip doubles every 2 years approximately 15-20 times cheaper per bit than DRAM 4. Magnetic disk technology Density doubles every 3 years approximately. 15-20 times cheaper per bit than flash 300-500 times cheaper than DRAM Central to server and warehouse-scale storage 5. Network technology Depends on performance of switches Depends on performance of the transmission system

15/26 Technology Continuous technology improvement can lead to step-change in effect Example: MOS density reached 25K-50K transistors/chip Possible to design single-chip 32-bit microprocessor...then microprocessors + L1 cache...then multicores + caches Cost and energy savings can occur for a given performance

16/26 Energy and Power in a Microprocessor For transistors used as switches, dynamic energy dissipated is Energy dynamic Capacitive Load Voltage 2 The power dissipated in a transistor is Power dynamic Capacitive Load Voltage 2 Switching Frequency Slowing the clock reduces power, not energy Reducing voltage decreases energy and power, so voltages have dropped from 5V to under 1V Capacitive load is a function of the number of transistors, the transistor and interconnection capacitance and the layout

17/26 Example 15% reduction in voltage Dynamic energy change is Energy new (Voltage 0.85)2 = Energy old Voltage 2 = 0.85 2 = 0.72 Some microprocessors are designed to reduce switching frequency when voltage drops, so Dynamic power change = Power new Power old frequency switched 0.85 = 0.72 frequency switched = 0.61

8/26 Power Power consumption increases as processor complexity increases Number of transistors increases Switching frequency increases Early microprocessors consumed about 1W 80386 microprocessors consumed about 2W 3.3GHz Intel Core i7 consumes about 130W Must be dissipated from a chip that is about 1.5cm 1.5cm

19/26 Managing power for further expansion Voltage cannot be reduced further Power per chip cannot be increased because the air cooling limit has been reached Therefore, clock frequency growth has slowed Heat dissipation is now the major constraint on using transistors

20/26 Energy efficiency strategies 1. Do nothing well Turn off clock of inactive modules, e.g., FP unit, idle cores to save energy 2. Dynamic Voltage-Frequency Scaling (DVFS) Reduce clock frequency and/or voltage when highest performance is not needed. Most µps now offer a range of operating frequencies and voltages. 3. Design for typical case PMDs and laptops are often idle Use low power mode DRAM to save energy Spin disk at lower rate PCs use emergency slowdown if program execution causes overheating

21/26 Energy efficiency strategies (continued) 4. Overclocking Run at higher clock rate on a few cores until temperature rises 3.3 GHz Core i7 can run in short bursts at 3.6 GHz 5. Power gating Power static Current static Voltage Current flows in transistors even when idle: leakage current Leakage ranges from 25% to 50% of total power Power Gating turns off power to inactive modules 6. Race-to-halt Processor is only part of system cost Use faster, less energy-efficient processor to allow the rest of the system to halt

22/26 Effect of power on performance measures Old Performance per mm 2 of Si New Performance per Watt Tasks per Joule Approaches to parallelism are affected

23/26 Cost of an Integrated Circuit PMDs rely on systems on a chip (SOC) Cost of PMD Cost of IC Si manufacture: Wafer, test, chop into die, package, test Cost of IC = Cost of die + Cost of testing die + Cost of packaging and final test Cost of die = Final test yield Cost of wafer Dies per wafer Die yield This cost equation is sensitive to die size

24/26 Cost of an Integrated Circuit (2) π (Wafer diameter/2)2 π Wafer diameter Dies per wafer = Die area 2 Die area The first term is the wafer area divided by die area However, the wafer is circular and the die is rectangular So the second term divides the circumference (2πR) by the diagonal of a square die to give the approximate number of dies along the rim of the wafer Subtracting the partial dies along the rim gives the maximum number of dies per wafer

25/26 Die yield Fraction of good dies on wafer = die yield Die yield = Wafer yield 1/(1 + Defects per unit area Die area) N This is the Bose-Einstein formula: an empirical model Wafer yield accounts for wafers that are completely bad, with no need for testing Defects per unit area accounts for random manufacturing defects = 0.016 to 0.057 per cm 2 N = process complexity factor, measures manufacturing difficulty = 11.5 to 15.5 for a 40nm process (in 2010)

26/26 Yield Example Find the number of dies per 300mm wafer for a die that is 1.5 cm square. Solution Die area = 1.5 1.5 = 2.25cm 2 Dies per wafer = π (30/2)2 2.25 = 270 π 30 2 2.25