ECE 2300 Digital Logic & Computer Organization. More Pipelined Microprocessor

Similar documents
EECE 321: Computer Organiza5on

7/19/2012. IF for Load (Review) CSE 2021: Computer Organization. EX for Load (Review) ID for Load (Review) WB for Load (Review) MEM for Load (Review)

CSE 2021: Computer Organization

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

Chapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

CS 110 Computer Architecture Lecture 11: Pipelining

A B C D. Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold. Time

Pipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold

Instruction Level Parallelism. Data Dependence Static Scheduling

6.S084 Tutorial Problems L19 Control Hazards in Pipelined Processors

EECS 470 Lecture 4. Pipelining & Hazards II. Winter Prof. Ronald Dreslinski h8p://

Pipelined Processor Design

Suggested Readings! Lecture 12" Introduction to Pipelining! Example: We have to build x cars...! ...Each car takes 6 steps to build...! ! Readings!

Pipelined Beta. Handouts: Lecture Slides. Where are the registers? Spring /10/01. L16 Pipelined Beta 1

Lecture 4: Introduction to Pipelining

Dynamic Scheduling I

Asanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.

CS420/520 Computer Architecture I

RISC Central Processing Unit

Computer Hardware. Pipeline

LECTURE 8. Pipelining: Datapath and Control

ECE 4750 Computer Architecture, Fall 2016 T09 Advanced Processors: Superscalar Execution

CMSC 611: Advanced Computer Architecture

Pipelining and ISA Design

EN164: Design of Computing Systems Lecture 22: Processor / ILP 3

Out-of-Order Execution. Register Renaming. Nima Honarmand

Department Computer Science and Engineering IIT Kanpur

RISC Design: Pipelining

Instructor: Dr. Mainak Chaudhuri. Instructor: Dr. S. K. Aggarwal. Instructor: Dr. Rajat Moona

Computer Architecture

CS61C : Machine Structures

Instruction Level Parallelism Part II - Scoreboard

CSEN 601: Computer System Architecture Summer 2014

ECE473 Computer Architecture and Organization. Pipeline: Introduction

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

CSE502: Computer Architecture CSE 502: Computer Architecture

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors

CMP 301B Computer Architecture. Appendix C

CSE502: Computer Architecture CSE 502: Computer Architecture

Problem: hazards delay instruction completion & increase the CPI. Compiler scheduling (static scheduling) reduces impact of hazards

CSE502: Computer Architecture CSE 502: Computer Architecture

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

On the Rules of Low-Power Design

EE 457 Homework 5 Redekopp Name: Score: / 100_

CS521 CSE IITG 11/23/2012

COSC4201. Scoreboard

Computer Architecture ( L), Fall 2017 HW 3: Branch handling and GPU SOLUTIONS

CS429: Computer Organization and Architecture

Computer Science 246. Advanced Computer Architecture. Spring 2010 Harvard University. Instructor: Prof. David Brooks

Single-Cycle CPU The following exercises are taken from Hennessy and Patterson, CO&D 2 nd, 3 rd, and 4 th Ed.

Outline Single Cycle Processor Design Multi cycle Processor. Pipelined Processor Design. Overall clock period. Analyzing performance 3/18/2015

Parallel architectures Electronic Computers LM

USE AND CARE GUIDE BRAMFORD GAZEBO

Tomasolu s s Algorithm

Dynamic Scheduling II

Lecture Topics. Announcements. Today: Memory Management (Stallings, chapter ) Next: continued. Self-Study Exercise #6. Project #4 (due 10/11)

Topics. Low Power Techniques. Based on Penn State CSE477 Lecture Notes 2002 M.J. Irwin and adapted from Digital Integrated Circuits 2002 J.

EECS 470. Tomasulo s Algorithm. Lecture 4 Winter 2018

FMP For More Practice

CISC 662 Graduate Computer Architecture. Lecture 9 - Scoreboard

EECS 470 Lecture 5. Intro to Dynamic Scheduling (Scoreboarding) Fall 2018 Jon Beaumont

CZ3001 ADVANCED COMPUTER ARCHITECTURE

Synthesis of Combinational Logic

EECS 470. Lecture 9. MIPS R10000 Case Study. Fall 2018 Jon Beaumont

CS61C : Machine Structures

U. Wisconsin CS/ECE 752 Advanced Computer Architecture I

EECS 427 Lecture 13: Leakage Power Reduction Readings: 6.4.2, CBF Ch.3. EECS 427 F09 Lecture Reminders

EE 330 Lecture 30. Thyristors. SCR Basic circuits and limitations Triacs Other thyristor types

Instruction Level Parallelism III: Dynamic Scheduling

ICS312 Machine-level and Systems Programming

EC4205 Microprocessor and Microcontroller

CMSC 611: Advanced Computer Architecture

July 12, 2018 VER. 3 Subject Allocation Summary (Dept of EE) for Autumn

ProMark 500 White Paper

OOO Execution & Precise State MIPS R10000 (R10K)

HARDWOOD CLOSET SYSTEM

Project 5: Optimizer Jason Ansel

I. Computational Logic and the Five Basic Logic Gates 1

TASK NOP CIJEVI ROBOTI RELJEF. standard output

Low Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS

Lecture 1: Introduction to Digital System Design & Co-Design

EE382V-ICS: System-on-a-Chip (SoC) Design

Pipelined Architecture (2A) Young Won Lim 4/10/18

Multiple Predictors: BTB + Branch Direction Predictors

Pipelined Architecture (2A) Young Won Lim 4/7/18

Sirindhorn International Institute of Technology Thammasat University at Rangsit

Hodedah ASSEMBLY INSTRUCTION HI K78 (MICROWAVE CART)

SURVEY AND EVALUATION OF LOW-POWER FULL-ADDER CELLS

Time Matters How Power Meters Measure Fast Signals

All Restoration Hardware bunk beds are designed to meet the highest safety standards and comply with all U.S. and Canadian Bunk Bed regulations.

Precise State Recovery. Out-of-Order Pipelines

Architectural Power Management for High Leakage Technologies

ASSEMBLY INSTRUCTIONS 10 X14 HIGGINS HARDTOP GAZEBO ITEM# L-GZ212PST-4

UN DOS TREZ Sudoku Competition. Puzzle Booklet for Preliminary Round. 19-Feb :45PM 75 minutes

Chapter # 1: Introduction

ALOE Framework and Tools

CS-28MM1030-GRY ASSEMBLY, CARE & USE INSTRUCTIONS

Transcription:

ECE 2300 Digital ogic & Computer Organization Spring 2018 ore Pipelined icroprocessor ecture 18: 1

nnouncements No instructor office hour today Rescheduled to onday pril 16, 4:00-5:30pm Prelim 2 review sessions Friday pril 13, 4:30-6:00pm, PH 219 onday pril 16, 7:00-8:30pm, PH 203 ecture 18: 2

Pipelined icroprocessor C Control Signals PCJ P C PC +2 Inst R Decoder dder D S SB DR RF D_in V C Z N F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E B E/E E/WB Key idea: Keep all resources fully utilized ecture 18: 3

Data Hazard Problem CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 DD R1,R2,R3 OR R4,R1,R3 SB R5,R2,R1 ND R6,R1,R2 DDI R7,R7,3 The OR, SB, and ND instructions are data dependent on the DD instruction ecture 18: 4

Identify all data hazards in the following instruction sequences by circling each source register that is read before the updated value is written back SB R4, R2, R3 DDI R1, R1, 1 DDI R2, R4, 1 DDI R3, R4, 1 SB R5, R3, R4 Example: Data Hazards ecture 18: 5

Review: Compiler Inserts NOPs (Solution 1) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 DD R1,R2,R3 NOP NOP NOP OR R4,R1,R3 ecture 18: 6

Review: HW Stalls the Pipeline (Solution 2) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 DD R1,R2,R3 OR R4,R1,R3 bubble bubble bubble SB R5,R2,R1 ND R6,R1,R2 DDI R7,R7,3 The pipeline is stalled for three cycles ecture 18: 7

Solution 3: HW Forwarding (Bypassing) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 DD R1,R2,R3 OR R4,R1,R3 SB R5,R2,R1 ND R6,R1,R2 DDI R7,R7,3 ecture 18: 8

Pipeline odifications for Forwarding? IF/ID ID/E E/E E/WB ecture 18: 9

Pipelined icroprocessor w/o Forwarding C Control Signals PCJ P C PC +2 Inst R Decoder dder D S SB DR RF D_in V C Z N F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E B E/E E/WB ecture 18: 10

Pipelined Processor with Forwarding PCJ P C PC +2 Inst R Decoder SE dder D S SB DR RF D_in C V C Z N Control Signals F m F 0 V C Z N Data R D_IN W IF/ID ID/E E/E E/WB B D ecture 18: 11

Decoder dder D S SB DR RF D_in Forwarding in ction C V C Z N B Control Signals F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E E/E E/WB OR R4,R1,R3 DD R1,R2,R3 ecture 18: 12

Decoder dder D S SB DR RF D_in Forwarding in ction C V C Z N B Control Signals F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E E/E E/WB SB R5,R2,R1 OR R4,R1,R3 DD R1,R2,R3 ecture 18: 13

Decoder dder D S SB DR RF D_in Forwarding in ction C V C Z N B Control Signals F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E E/E E/WB ND R6,R1,R2 SB R5,R2,R1 OR R4,R1,R3 DD R1,R2,R3 ecture 18: 14

HW Forwarding C Control Signals PCJ P C PC +2 Inst R Decoder dder D S SB DR RF D_in B F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E E/E E/WB Trade-off between performance and cost ecture 18: 15

Example: Data Hazards w/o Forwarding Identify all data hazards in the following instruction sequences by circling each source register that is read before the updated value is written back DD R1, R2, R3 OR R4, R1, R3 SB R5, R2, R1 SB R6, R1, R2 ecture 18: 16

Example: Data Hazards w/ Forwarding Identify all data hazards in the following instruction sequences by circling each source register that is read before the updated value is written back DD R1, R2, R3 OR R4, R1, R3 SB R5, R2, R1 SB R6, R1, R2 Data hazards resolved by R-type to R-type forwarding ecture 18: 17

nother Example: Data Hazards w/ Forwarding Identify all data hazards in the following instruction sequences by circling each source register that is read before the updated value is written back W R1, 0(R2) OR R4, R1, R3 SB R5, R2, R1 Data hazard not resolved by R-type to R-type forwarding ecture 18: 18

Data Hazards Caused by oad CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 W R1,0(R2) OR R4,R1,R3 SB R5,R2,R1 ND R6,R1,R2 DDI R7,R7,3 ecture 18: 19

oad Instructions and Forwarding CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 W R1,0(R2) OR R4,R1,R3 SB R5,R2,R1 ND R6,R1,R2 DDI R7,R7,3 ecture 18: 20

Solution 1: Compiler Inserts NOP Instruction CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 W R1,0(R2) NOP OR R4,R1,R3 SB R5,R2,R1 ND R6,R1,R2 ecture 18: 21

Solution 2: HW Stalls the Pipeline CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 W R1,0(R2) OR R4,R1,R3 bubble SB R5,R2,R1 bubble ND R6,R1,R2 DDI R7,R7,3 ecture 18: 22

Solution 3: Delay Slots delay slot is a location in the program where the compiler is required to insert an instruction between dependent instructions The IS defines the delay slots The compiler can fill delay slots with NOPs Even better: ove a non-dependent instruction from elsewhere in the program into the delay slot Doing so must not change the function of the program ecture 18: 23

Filling the oad Delay Slot CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 W R1,0(R2) DDI NOP R7,R7,3 OR R4,R1,R3 SB R5,R2,R1 ND R6,R1,R2 DDI R7,R7,3 ecture 18: 24

Filling the oad Delay Slot? CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 W R1,0(R2) NOP OR R4,R1,R3 SB R5,R2,R1 ND R6,R1,R2 DDI R7,R6,3 ecture 18: 25

The Problem with Branches C Control Signals PCJ P C PC +2 Inst R Decoder dder D S SB DR RF D_in B F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E E/E E/WB If the condition is met, the PC is updated at the end of E, after we ve already fetched the next two instructions ecture 18: 26

The Problem with Branches CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 BEQ R2,R3, OR R4,R1,R3 SB R5,R2,R1 : ND R6,R1,R2 DDI R7,R7,3... OR and SB are fetched before the branch condition is evaluated ecture 18: 27

Control Hazard Occurs when instructions following a branch are fetched before the branch outcome is known BEQ R2, R3, OR R4, R1, R3 SB R5, R2, R1 IF ID E WB IF ID E WB IF ID E WB : ND R6,R1,R2 What should happen If branch is not taken, next fetched instruction should be at address PC+2 (OR) If branch is taken, next fetched instruction should be at address (ND) What actually happens Instructions at PC+2 and PC+4 are fetched before branch outcome is known ecture 18: 28

Branch Delay Slot If the IS defines a branch delay slot, the instruction immediately following a branch is always executed after the branch The compiler finds an instruction to put there, or puts in a NOP The hardware must execute the instruction immediately following the branch, regardless of whether the branch is taken or not ecture 18: 29

Reducing the Branch Delay We already calculate the branch target in ID Put dedicated hardware to also evaluate the condition in ID Hence only 1 branch delay slot needed ecture 18: 30

Evaluating Branch Condition in ID C sign bit sign bit Control Signals =? PCJ P C PC +2 Inst R Decoder dder D S SB DR RF D_in B F m F 0 Data R D_IN W D SE IF/ID ID/E E/E E/WB ecture ecture 18: 31

Filling the Branch Delay Slot with a NOP CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 BEQ R2,R3, NOP OR R4,R1,R3 SB R5,R2,R1 : ND R6,R1,R2 DDI R7,R7,3... ecture 18: 32

Filling the Branch Delay Slot CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 BEQ R2,R3, DDI NOP R7,R7,3 OR R4,R1,R3 SB R5,R2,R1 : ND R6,R1,R2 DDI R7,R7,3... The DDI is always executed after the BEQ. If the BEQ is taken, executing the DDI must not cause incorrect behavior ecture 18: 33

Branch Target ddress (PC+2+OFF) C Control Signals PCJ P C PC +2 Inst R Decoder dder D S SB DR RF D_in V C Z N F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E B E/E E/WB Branch delay slot is accounted for in the branch target calculation ecture 18: 34

Before Next Class Next Time ore Pipelined icroprocessor ecture 18: 35