RISC Design: Pipelining

Similar documents
Computer Architecture

LECTURE 8. Pipelining: Datapath and Control

Lecture 4: Introduction to Pipelining

Selected Solutions to Problem-Set #3 COE 608: Computer Organization and Architecture Single Cycle Datapath and Control

7/19/2012. IF for Load (Review) CSE 2021: Computer Organization. EX for Load (Review) ID for Load (Review) WB for Load (Review) MEM for Load (Review)

CSE 2021: Computer Organization

EECE 321: Computer Organiza5on

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

Pipelined Processor Design

Pipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold

Chapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:

A B C D. Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold. Time

CS420/520 Computer Architecture I

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

RISC Central Processing Unit

CSEN 601: Computer System Architecture Summer 2014

CS 110 Computer Architecture Lecture 11: Pipelining

CMSC 611: Advanced Computer Architecture

Asanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

You are Here! Processor Design Process. Agenda. Agenda 10/25/12. CS 61C: Great Ideas in Computer Architecture Single Cycle MIPS CPU Part II

EE 457 Homework 5 Redekopp Name: Score: / 100_

Computer Hardware. Pipeline

Instruction Level Parallelism. Data Dependence Static Scheduling

ECE473 Computer Architecture and Organization. Pipeline: Introduction

Single vs. Mul2- cycle MIPS. Single Clock Cycle Length

Suggested Readings! Lecture 12" Introduction to Pipelining! Example: We have to build x cars...! ...Each car takes 6 steps to build...! ! Readings!

6.S084 Tutorial Problems L19 Control Hazards in Pipelined Processors

Single-Cycle CPU The following exercises are taken from Hennessy and Patterson, CO&D 2 nd, 3 rd, and 4 th Ed.

EECS150 - Digital Design Lecture 2 - Synchronous Digital Systems Review Part 1. Outline

Computer Elements and Datapath. Microarchitecture Implementation of an ISA

Pipelined Beta. Handouts: Lecture Slides. Where are the registers? Spring /10/01. L16 Pipelined Beta 1

Pipelining and ISA Design

FMP For More Practice

ECE 2300 Digital Logic & Computer Organization. More Pipelined Microprocessor

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

EECS 470 Lecture 4. Pipelining & Hazards II. Winter Prof. Ronald Dreslinski h8p://

Embedded Hardware (1) Kai Huang

CZ3001 ADVANCED COMPUTER ARCHITECTURE

Computer Science 246. Advanced Computer Architecture. Spring 2010 Harvard University. Instructor: Prof. David Brooks

CMP 301B Computer Architecture. Appendix C

Pipelined Architecture (2A) Young Won Lim 4/10/18

Pipelined Architecture (2A) Young Won Lim 4/7/18

CS61C : Machine Structures

CISC 662 Graduate Computer Architecture. Lecture 9 - Scoreboard

On the Rules of Low-Power Design

Outline Single Cycle Processor Design Multi cycle Processor. Pipelined Processor Design. Overall clock period. Analyzing performance 3/18/2015

EN164: Design of Computing Systems Lecture 22: Processor / ILP 3

Understanding Engineers #2

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #20. Warehouse Scale Computer

Project 5: Optimizer Jason Ansel

CMSC 611: Advanced Computer Architecture

Registers. CS152 Computer Architecture and Engineering Lecture 3

Problem: hazards delay instruction completion & increase the CPI. Compiler scheduling (static scheduling) reduces impact of hazards

CS152 Computer Architecture and Engineering Lecture 3: ReviewTechnology & Delay Modeling. September 3, 1997

Dynamic Scheduling I

CS429: Computer Organization and Architecture

Dynamic Scheduling II

CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture Finite State Machines, Functional Units

Metrics How to improve performance? CPI MIPS Benchmarks CSC3501 S07 CSC3501 S07. Louisiana State University 4- Performance - 1

ECE 4750 Computer Architecture, Fall 2016 T09 Advanced Processors: Superscalar Execution

Multiple Predictors: BTB + Branch Direction Predictors

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Topics. Low Power Techniques. Based on Penn State CSE477 Lecture Notes 2002 M.J. Irwin and adapted from Digital Integrated Circuits 2002 J.

Low Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS

L10: Analog Building Blocks (OpAmps,, A/D, D/A)

Computer Architecture ( L), Fall 2017 HW 3: Branch handling and GPU SOLUTIONS

EECS 427 Lecture 13: Leakage Power Reduction Readings: 6.4.2, CBF Ch.3. EECS 427 F09 Lecture Reminders

Instruction Level Parallelism Part II - Scoreboard

CS61C : Machine Structures

Lecture 02: Digital Logic Review

Serial Addition. Lecture 29 1

CS521 CSE IITG 11/23/2012

Controller Implementation--Part I. Cascading Edge-triggered Flip-Flops

CSE502: Computer Architecture CSE 502: Computer Architecture

Korea Advanced Institute of Science and Technology Korea Advanced Institute of Science and Technology

Compiler Optimisation

The Metrics and Designs of an Arithmetic Logic Function over

EECS 470. Tomasulo s Algorithm. Lecture 4 Winter 2018

A Static Power Model for Architects

Department Computer Science and Engineering IIT Kanpur

On Path Memory in List Successive Cancellation Decoder of Polar Codes

VLSI testing Introduction

Out-of-Order Execution. Register Renaming. Nima Honarmand

COSC4201. Scoreboard

What you can do with very little:

ECOM 4311 Digital System Design using VHDL. Chapter 9 Sequential Circuit Design: Practice

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

Parallel architectures Electronic Computers LM

Olivier Sentieys. IRISA/INRIA Cairn team. Power Consumption in Silicon Chips. Chips, logic gates and transistors.

Datapath Components. Multipliers, Counters, Timers, Register Files

Digital Electronics 8. Multiplexer & Demultiplexer

Lec 24: Parallel Processors. Announcements

EECS 470. Lecture 9. MIPS R10000 Case Study. Fall 2018 Jon Beaumont

TSTE17 System Design, CDIO. General project hints. Behavioral Model. General project hints, cont. Lecture 5. Required documents Modulation, cont.

Lecture 0: Introduction

EE382V-ICS: System-on-a-Chip (SoC) Design

Administrative Issues

Transcription:

RISC Design: Pipelining Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in CP-226: Computer Architecture Lecture 10 (20 Feb 2013)

0-25 Jump Shift 4 Add RegDst opcode 26-31 left 2 CONTROL Branch 0 mux 1 MemtoReg PC Instr. mem. Combined Datapaths 0-15 21-25 16-20 11-15 Sign ext. Reg. File Shift left 2 Cont. zero MemWrite MemRead Data mem. 0 mux 1 0-5 20 Feb 2013 Computer Architecture@MNIT 2

Pipelining in a Computer Ø Divide datapath into nearly equal tasks, to be performed serially and requiring non- overlapping resources. Ø Insert registers at task boundaries in the datapath; registers pass the output data from one task as input data to the next task. Ø Synchronize tasks with a clock having a cycle Bme that just exceeds the Bme required by the longest task. Ø Break each instrucbon down into a fixed number of tasks so that instrucbons can be executed in a staggered fashion. 20 Feb 2013 Computer Architecture@MNIT 3

Single-Cycle Datapath Instruction class Instr. fetch (IF) Instr. Decode (also reg. file read) (ID) Execution ( Operation) (EX) Data access (MEM) Write Back (Reg. file write) (WB) Total time lw 1ns 1ns 8ns sw 1ns 8ns R-format add, sub, and, or, slt 1ns 1ns 8ns B-format, beq 1ns 8ns No operation on data; idle time equalizes instruction length to a fixed clock period. 20 Feb 2013 Computer Architecture@MNIT 4

Execution Time: Single-Cycle 0 2 4 6 8 10 12 14 16.. lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) IF ID EX MEM WB IF ID EX MEM WB IF ID Time (ns) EX MEM WB Clock cycle Bme = 8 ns Total Bme for execubng three lw instrucbons = 24 ns 20 Feb 2013 Computer Architecture@MNIT 5

Pipelined Datapath Instruction class Instr. fetch (IF) Instr. Decode (also reg. file read) (ID) Execution ( Operation) (EX) Data access (MEM) Write Back (Reg. file write) (WB) Total time lw 1ns 1ns 10ns sw 1ns 1ns 10ns R-format: add, sub, and, or, slt 1ns 1ns 10ns B-format: beq 1ns 1ns 10ns No operation on data; idle time inserted to equalize instruction lengths. 20 Feb 2013 Computer Architecture@MNIT 6

Execution Time: Pipeline lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) 0 2 4 6 8 10 12 14 16.. IF ID EX MEM RW IF ID EX MEM RW IF ID EX MEM RW Time (ns) Clock cycle time = 2 ns, four times faster than single-cycle clock Total time for executing three lw instructions = 14 ns Single-cycle time 24 Performance ratio = = = 1.7 Pipeline time 14 20 Feb 2013 Computer Architecture@MNIT 7

Pipeline Performance Clock cycle Bme = 2 ns 1,003 lw instruc+ons: Total Bme for execubng 1,003 lw instrucbons = 2,014 ns Single- cycle Bme 8,024 Performance rabo = = = 3.98 Pipeline Bme 2,014 10,003 lw instruc+ons: Performance rabo = 80,024 / 20,014 = 3.998 Clock cycle rabo (4) Pipeline performance approaches clock- cycle rabo for long programs. 20 Feb 2013 Computer Architecture@MNIT 8

IF: Instr. fetch Single-Cycle Datapath ID: Instr. decode, reg. file read EX: Execute, address calc. MEM: mem. access WB: write back PC 4 Add Instr. mem. 0-15 0-5 opcode 26-31 21-25 16-20 11-15 RegDst Sign ext. CONTROL RegWrite Reg. File Shift left 2 Branch Src Op Cont. zero 20 Feb 2013 Computer Architecture@MNIT 9 MemWrite MemRead Data mem. MemtoReg 0 mux 1

Pipelining of RISC Instructions Fetch Instruction Examine Opcode Fetch Operands Perform Operation Store Result IF ID EX MEM WB Instruction Instruction Execute Memory Write Fetch Decode and Operation Back Fetch operands to Reg file Although an instruc/on takes five clock cycles, one instruc/on is completed every cycle. 20 Feb 2013 Computer Architecture@MNIT 10

Pipeline Registers PC 4 Add Instr. mem. This requires a CONTROL not too different from single-cycle IF/ID ID/EX EX/MEM 0-15 0-5 opcode 26-31 21-25 16-20 11-15 RegDst Sign ext. CONTROL RegWrite Reg. File Shift left 2 Branch Src Op Cont. zero MemWrite MemRead Data mem. MemtoReg MEM/WB 0 mux 1 20 Feb 2013 Computer Architecture@MNIT 11

Pipeline Register Functions Four pipeline registers are added: Register name IF/ID ID/EX EX/MEM MEM/WB Data held PC+4, Instruction word (IW) PC+4, R1, R2, IW(0-15) sign ext., IW(11-15) PC+4, zero, Result, R2, IW(11-15) or IW(16-20) M[Result], Result, IW(11-15) or IW(16-20) 20 Feb 2013 Computer Architecture@MNIT 12

Pipelined Datapath PC 4 Add Instr mem 11-15 for R-type 16-20 for I-type lw IF/ID ID/EX EX/MEM MEM/WB opcode 26-31 21-25 16-20 Reg. File Sign ext. Shift left 2 zero Data mem. 0 mux 1 0-15 20 Feb 2013 Computer Architecture@MNIT 13

Five-Cycle Pipeline CC1 CC2 CC3 CC4 CC5 IM IF/ID ID, REG. READ ID/EX EX/MEM DM MEM/WB REG. WRITE 20 Feb 2013 Computer Architecture@MNIT 14

Add Instruction add $t0, $s1, $s2 Machine instrucbon word 000000 10001 10010 01000 00000 100000 opcode $s1 $s2 $t0 funcbon CC1 CC2 CC3 CC4 CC5 IM IF/ID ID, REG. READ ID/EX EX/MEM DM MEM/WB REG. WRITE IF ID EX MEM WB read $s1 add write $t0 read $s2 $s1+$s2 20 Feb 2013 Computer Architecture@MNIT 15

Pipelined Datapath Executing add PC 4 11-15 for R-type 16-20 for I-type lw t0 Add Instr mem IF/ID ID/EX EX/MEM MEM/WB opcode 0-15 26-31 21-25 s1 Reg. File 16-20 s2 $s2 Sign ext. Shift left 2 $s1 zero addr Data mem data 0 mux 1 20 Feb 2013 Computer Architecture@MNIT 16

Load Instruction lw $t0, 1200 ($t1) 100011 01001 01000 0000 0100 1000 0000 opcode $t1 $t0 1200 CC1 CC2 CC3 CC4 CC5 IM IF/ID ID, REG. READ ID/EX EX/MEM DM MEM/WB REG. WRITE IF ID EX MEM WB read $t1 add read write $t0 sign ext $t1+1200 M[addr] 1200 20 Feb 2013 Computer Architecture@MNIT 17

Pipelined Datapath Executing lw PC 4 Add 11-15 for R-type 16-20 for I-type lw t0 Instr mem IF/ID ID/EX EX/MEM MEM/WB opcode 0-15 26-31 21-25 16-20 t1 Reg. File Sign ext. Shift left 2 $t1 zero 1200 20 Feb 2013 Computer Architecture@MNIT 18 addr Data mem data 0 mux 1

Store Instruction sw $t0, 1200 ($t1) 101011 01001 01000 0000 0100 1000 0000 opcode $t1 $t0 1200 CC1 CC2 CC3 CC4 CC5 IM IF/ID ID, REG. READ ID/EX EX/MEM DM MEM/WB REG. WRITE IF ID EX MEM WB read $t1 add write sign ext $t1+1200 M[addr] 1200 (addr) $t0 20 Feb 2013 Computer Architecture@MNIT 19

Pipelined Datapath Executing sw PC 4 Add Instr mem 11-15 for R-type 16-20 for I-type lw IF/ID ID/EX EX/MEM MEM/WB opcode 0-15 26-31 21-25 16-20 t0 t1 Reg. File Sign ext. $t0 Shift left 2 $t1 zero addr Data mem data 0 mux 1 1200 20 Feb 2013 Computer Architecture@MNIT 20

Executing a Program Consider a five- instruc+on segment: lw $10, 20($1) sub $11, $2, $3 add $12, $3, $4 lw $13, 24($1) add $14, $5, $6 20 Feb 2013 Computer Architecture@MNIT 21

Program Execution CC1 CC2 CC3 CC4 CC5 time IM IF/ID ID, REG. READ ID/EX EX/MEM DM MEM/WB REG. WRITE lw $10, 20($1) IM IF/ID add $12, $3, $4 ID, REG. READ ID/EX IM IF/ID ID, REG. READ EX/MEM ID/EX DM MEM/WB EX/MEM REG. WRITE DM MEM/WB sub $11, $2, $3 REG. WRITE Program instructions lw $13, 24($1) IM IF/ID ID, REG. READ ID/EX EX/MEM DM MEM/WB REG. WRITE add $14, $5, $6 IM IF/ID ID, REG. READ ID/EX 20 Feb 2013 Computer Architecture@MNIT 22 EX/MEM DM MEM/WB REG. WRITE

IF: add $14, $5, $6 CC5 ID: lw $13, 24($1) EX: add $12, $3, $4 MEM: sub $11, $2, $3 WB: lw $10, 20($1) PC 4 Add Instr mem 11-15 for R-type 16-20 for I-type lw IF/ID ID/EX EX/MEM MEM/WB opcode 26-31 21-25 16-20 Reg. File Sign ext. Shift left 2 zero Data mem. 0 mux 1 0-15 20 Feb 2013 Computer Architecture@MNIT 23

Advantages of Pipeline A^er the fi^h cycle (CC5), one instrucbon is completed each cycle; CPI 1, neglecbng the inibal pipeline latency of 5 cycles. Pipeline latency is defined as the number of stages in the pipeline, or The number of clock cycles a@er which the first instruc+on is completed. The clock cycle Bme is about four Bmes shorter than that of single- cycle datapath and about the same as that of mulbcycle datapath. For mulbcycle datapath, CPI = 3.. So, pipelined execubon is faster, but... 20 Feb 2013 Computer Architecture@MNIT 24

Thank You 20 Feb 2013 Computer Architecture@MNIT 25