Pipelining and ISA Design

Similar documents
CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

CS61C : Machine Structures

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #20. Warehouse Scale Computer

CS61C : Machine Structures

CMSC 611: Advanced Computer Architecture

CS 61C: Great Ideas in Computer Architecture Pipelining. Anything can be represented as a number, i.e., data or instrucvons

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

Chapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:

CS 110 Computer Architecture Lecture 11: Pipelining

EECE 321: Computer Organiza5on

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

7/19/2012. IF for Load (Review) CSE 2021: Computer Organization. EX for Load (Review) ID for Load (Review) WB for Load (Review) MEM for Load (Review)

CSE 2021: Computer Organization

ECE473 Computer Architecture and Organization. Pipeline: Introduction

Suggested Readings! Lecture 12" Introduction to Pipelining! Example: We have to build x cars...! ...Each car takes 6 steps to build...! ! Readings!

Computer Architecture

Pipelined Processor Design

Pipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold

Lecture 4: Introduction to Pipelining

A B C D. Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold. Time

CMSC 611: Advanced Computer Architecture

RISC Design: Pipelining

Asanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.

Lecture 2: Review of Pipelines

Instruction Level Parallelism. Data Dependence Static Scheduling

LECTURE 8. Pipelining: Datapath and Control

ECE 2300 Digital Logic & Computer Organization. More Pipelined Microprocessor

6.S084 Tutorial Problems L19 Control Hazards in Pipelined Processors

STACK DECODING OF LINEAR BLOCK CODES FOR DISCRETE MEMORYLESS CHANNEL USING TREE DIAGRAM

CS429: Computer Organization and Architecture

RISC Central Processing Unit

Pipelined Beta. Handouts: Lecture Slides. Where are the registers? Spring /10/01. L16 Pipelined Beta 1

OPTIMUM MEDIUM ACCESS TECHNIQUE FOR NEXT GENERATION WIRELESS SYSTEMS

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Lecture 23. OUTLINE BJT Differential Amplifiers (cont d) Reading: Chapter

where and are polynomials with real coefficients and of degrees m and n, respectively. Assume that and have no zero on axis.

ABSTRACTT FFT FFT-' Proc. of SPIE Vol U-1

EE 457 Homework 5 Redekopp Name: Score: / 100_

THE UNIVERSITY OF NEW SOUTH WALES. School of Electrical Engineering & Telecommunications

Computer Hardware. Pipeline

Design and Implementation of 4 - QAM VLSI Architecture for OFDM Communication

Single vs. Mul2- cycle MIPS. Single Clock Cycle Length

Single-Cycle CPU The following exercises are taken from Hennessy and Patterson, CO&D 2 nd, 3 rd, and 4 th Ed.

INCREMENTAL REDUNDANCY (IR) SCHEMES FOR W-CDMA HS-DSCH

Minimizing Ringing and Crosstalk

CSEN 601: Computer System Architecture Summer 2014

An Efficient Control Approach for DC-DC Buck-Boost Converter

Sliding Mode Control for Half-Wave Zero Current Switching Quasi-Resonant Buck Converter

Dynamic Scheduling I

VLSI Implementation of Low Complexity MIMO Detection Algorithms

Computer Science 246. Advanced Computer Architecture. Spring 2010 Harvard University. Instructor: Prof. David Brooks

Spectrum Sharing between Public Safety and Commercial Users in 4G-LTE

EECS 470 Lecture 4. Pipelining & Hazards II. Winter Prof. Ronald Dreslinski h8p://

Statement of Works Data Template Version: 4.0 Date:

IEEE Broadband Wireless Access Working Group < Modifications to the Feedback Methodologies in UL Sounding

1 Performance and Cost

Feasibility of a triple mode, low SAR material coated antenna for mobile handsets

EN164: Design of Computing Systems Lecture 22: Processor / ILP 3

CHEVY TH350/700R AUTO TRANSMISSION TO JEEP 4.0L, ENGINE BLOCKS NEW STYLE

WIRELESS SENSORS EMBEDDED IN CONCRETE

Investigation. Name: a About how long would the threaded rod need to be if the jack is to be stored with

Development of a Gesture Detection System for the Control of a Robotic Arm

Low-Complexity Time-Domain SNR Estimation for OFDM Systems

CSE502: Computer Architecture CSE 502: Computer Architecture

PERFORMANCE OF TOA ESTIMATION TECHNIQUES IN INDOOR MULTIPATH CHANNELS

Efficient Power Control for Broadcast in Wireless Communication Systems

CS420/520 Computer Architecture I

Activatable Tiles: Compact, Robust Programmable Assembly and other Applications

Selected Solutions to Problem-Set #3 COE 608: Computer Organization and Architecture Single Cycle Datapath and Control

CSE502: Computer Architecture CSE 502: Computer Architecture

MX100/MW100 Data Acquisition Unit Installation and Connection Guide

CMP 301B Computer Architecture. Appendix C

COSC4201. Scoreboard

Configurable M-factor VLSI DVB-S2 LDPC decoder architecture with optimized memory tiling design

Instruction Level Parallelism Part II - Scoreboard

EECS 470 Lecture 5. Intro to Dynamic Scheduling (Scoreboarding) Fall 2018 Jon Beaumont

An Improved Implementation of Activity Based Costing Using Wireless Mesh Networks with MIMO Channels

Optimization of the law of variation of shunt regulator impedance for Proximity Contactless Smart Card Applications to reduce the loading effect.

CISC 662 Graduate Computer Architecture. Lecture 9 - Scoreboard

1550 nm WDM read-out of volume holographic memory

A multichannel Satellite Scheduling Algorithm

GAMMA SHAPED MONOPOLE PATCH ANTENNA FOR TABLET PC

On Reducing Blocking Probability in Cooperative Ad-hoc Networks

A Transmission Scheme for Continuous ARQ Protocols over Underwater Acoustic Channels

Out-of-Order Execution. Register Renaming. Nima Honarmand

An Efficient Method for Load Flow Solution of Radial Distribution Networks

Experimental Investigation of Influence on Non-destructive Testing by Form of Eddy Current Sensor Probe

Design and Characterization of Conformal Microstrip Antennas Integrated into 3D Orthogonal Woven Fabrics

You are Here! Processor Design Process. Agenda. Agenda 10/25/12. CS 61C: Great Ideas in Computer Architecture Single Cycle MIPS CPU Part II

FMP For More Practice

HYBRID FUZZY PD CONTROL OF TEMPERATURE OF COLD STORAGE WITH PLC

Design of FIR Filter using Filter Response Masking Technique

Figure 1-1 Sample Antenna Pattern

IAS 2.4. Year 12 Mathematics. Contents. Trigonometric Relationships. ulake Ltd. Robert Lakeland & Carl Nugent

Optimal Design of Smart Mobile Terminal Antennas for Wireless Communication and Computing Systems

ECEN326: Electronic Circuits Fall 2017

Fast Response I/O IC610MDL115

Discussion #7 Example Problem This problem illustrates how Fourier series are helpful tools for analyzing electronic circuits. Often in electronic

Transcription:

Pipelined instuc.on Execu.on 1 Pipelining and ISA Design MIPS Instuc:on Set designed fo pipelining All instuc:ons ae 32- bits Easie to fetch and decode in one cycle x86: 1- to 17- byte instuc:ons (x86 HW actually tanslates to intenal RISC instuc:ons!) Few and egula instuc:on fomats, 2 souce egiste fields always in same place Can decode and ead egistes in one step Memoy opeands only in Loads and Stoes Can calculate addess 3 d stage, access memoy 4 th stage Alignment of memoy opeands Memoy access takes only one cycle 2 1

Pipelined Contol 3 Time Pipelined Execu:on Repesenta:on Evey instuc:on must take same numbe of steps, also called pipeline stages, so some will go idle some:mes 2

Gaphical Pipeline Diagams PC instuction memoy d s t egistes Data memoy +4 imm 1. Instuction Fetch 2. Decode/ 3. Execute 4. Memoy Registe Read Use datapath figue below to epesent pipeline 5. Wite Back Gaphical Pipeline Repesenta:on (In Reg, ight half highlight ead, left half wite) Time (clock cycles) I n I$ Reg D$ Reg s Load t Add I$ Reg D$ Reg. Stoe I$ Reg D$ Reg O I$ Reg D$ Reg Sub d e O I$ Reg D$ Reg 3

Pipeline Pefomance Assume :me fo stages is 100ps fo egiste ead o wite 200ps fo othe stages What is pipelined clock ate? Compae pipelined datapath with single- cycle datapath Inst Inst fetch Registe ead op Memoy access Registe wite Total time lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-fomat 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps Fall 3/2/15 2011 - - Lectue #31 7 Pipeline Pefomance Single-cycle (T c = 800ps) Pipelined (T c = 200ps) 8 4

Pipeline Speedup If all stages ae balanced i.e., all take the same :me Time between instuc:ons pipelined = Time between instuc:ons nonpipelined Numbe of stages If not balanced, speedup is less Speedup due to inceased thoughput Latency (:me fo each instuc:on) does not decease 9 Hazads Situa:ons that pevent sta:ng the next logical instuc:on in the next clock cycle 1. Stuctual hazads Requied esouce is busy 2. Data hazad Need to wait fo pevious instuc:on to complete its data ead/wite (e.g., pai of socks in diffeent loads) 3. Contol hazad Deciding on contol ac:on depends on pevious instuc:on (e.g., how much detegent based on how clean pio load tuns out) 10 5

1. Stuctual Hazads Conflict fo use of a esouce In MIPS pipeline with a single memoy Load/Stoe equies memoy access fo data Instuc:on fetch would have to stall fo that cycle Causes a pipeline Hence, pipelined datapaths equie sepaate instuc:on/data memoies In eality, povide sepaate L1 I$ and L1 D$ 11 1. Stuctual Hazad #1: Single Memoy Time (clock cycles) I n s Load t Inst 1. Inst 2 O Inst 3 d Inst 4 e Read same memoy twice in same clock cycle 12 6

1. Stuctual Hazad #2: Registes (1/2) I n s t. O d e sw Inst 1 Inst 2 Inst 3 Inst 4 Time (clock cycles) I$ Reg D$ Reg Can we ead and wite to egistes simultaneously? 13 1. Stuctual Hazad #2: Registes (2/2) Two diffeent solu:ons have been used: 1) RegFile access is VERY fast: takes less than half the :me of stage Wite to Registes duing fist half of each clock cycle Read fom Registes duing second half of each clock cycle 2) Build RegFile with independent ead and wite pots Result: can pefom Read and Wite duing same clock cycle 14 7

Data Hazads (1/2) Conside the following sequence of instuc:ons add $t0, $t1, $t2 sub $t4, $t0,$t3 and $t5, $t0,$t6 o $t7, $t0,$t8 xo $t9, $t0,$t10 I n s t. Data Hazads (2/2) Data- flow backwad in :me ae hazads Time (clock cycles) add $t0,$t1,$t2 sub $t4,$t0,$t3 IF ID/RF EX MEM WB O d e and $t5,$t0,$t6 o $t7,$t0,$t8 xo $t9,$t0,$t10 I$ Reg D$ Reg 8

Data Hazad Solu:on: Fowading Fowad esult fom one stage to anothe add $t0,$t1,$t2 sub $t4,$t0,$t3 and $t5,$t0,$t6 IF ID/RF EX MEM WB o $t7,$t0,$t8 xo $t9,$t0,$t10 I$ Reg D$ Reg o hazad solved by egiste hadwae Data Hazad: Load/Use (1/4) Dataflow backwads in :me ae hazads lw $t0,0($t1) IF ID/RF EX MEM WB sub $t3,$t0,$t2 Can t solve all cases with fowading Must stall instuc:on dependent on load, then fowad (moe hadwae) 9

Data Hazad: Load/Use (2/4) Hadwae stalls pipeline (Called intelock ) lw $t0, 0($t1) sub $t3,$t0,$t2 and $t5,$t0,$t4 IF ID/RF EX MEM WB I$ Reg D$ Reg o $t7,$t0,$t6 I$ Reg D$ Not in MIPS: (MIPS = Micopocesso without Intelocked Pipeline Stages) Data Hazad: Load/Use (3/4) Instuc:on slot ake a load is called load delay slot If that instuc:on uses the esult of the load, then the hadwae intelock will stall it fo one cycle. Altena:ve: If the compile puts an unelated instuc:on in that slot, then no stall Lelng the hadwae stall the instuc:on in the delay slot is equivalent to pulng a nop in the slot (except the lame uses moe code space) 10

Data Hazad: Load/Use (4/4) Stall is equivalent to nop lw $t0, 0($t1) nop sub $t3,$t0,$t2 and $t5,$t0,$t4 o $t7,$t0,$t6 I$ Reg D$ Data Hazads: Code Scheduling to Avoid Stalls Reode code to avoid use of load esult in the next instuc:on C code fo A = B + E; C = B + F; stall stall lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 13 cycles lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 11 cycles 22 11

Data Hazads: Code Scheduling 1 2 3 4 5 6 7 8 9 10 11 12 13 lw $t1, 0($t0) IF ID EX MEM WB lw $t2, 4($t0) IF ID EX MEM WB add $t3, $t1, $t2 IF ID EX MEM WB sw $t3,12($t0) IF ID EX MEM WB lw $t4,8($t0) IF ID EX MEM WB add $t5,$1, $t4 IF ID EX MEM WB sw$t5, 16($t0) IF ID EX MEM WB lw $t1, 0($t0) IF ID EX MEM WB lw $t2, 4($t0) IF ID EX MEM WB lw $t4,8($t0) IF ID EX MEM WB add $t3, $t1, $t2 IF ID EX MEM WB sw $t3,12($t0) IF ID EX MEM WB add $t5,$1, $t4 IF ID EX MEM WB sw$t5, 16($t0) IF ID EX MEM WB 3/2/15 Fall 2011 - - Lectue #31 23 12