COSC4201. Scoreboard

Similar documents
EN164: Design of Computing Systems Lecture 22: Processor / ILP 3

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Instruction Level Parallelism Part II - Scoreboard

CMP 301B Computer Architecture. Appendix C

CISC 662 Graduate Computer Architecture. Lecture 9 - Scoreboard

Computer Science 246. Advanced Computer Architecture. Spring 2010 Harvard University. Instructor: Prof. David Brooks

Problem: hazards delay instruction completion & increase the CPI. Compiler scheduling (static scheduling) reduces impact of hazards

Tomasulo s Algorithm. Tomasulo s Algorithm

Dynamic Scheduling I

Parallel architectures Electronic Computers LM

CS521 CSE IITG 11/23/2012

EECS 470 Lecture 5. Intro to Dynamic Scheduling (Scoreboarding) Fall 2018 Jon Beaumont

Tomasolu s s Algorithm

Out-of-Order Execution. Register Renaming. Nima Honarmand

CSE502: Computer Architecture CSE 502: Computer Architecture

U. Wisconsin CS/ECE 752 Advanced Computer Architecture I

CSE502: Computer Architecture CSE 502: Computer Architecture

Instruction Level Parallelism III: Dynamic Scheduling

EECS 470. Tomasulo s Algorithm. Lecture 4 Winter 2018

Chapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

Instruction Level Parallelism. Data Dependence Static Scheduling

Dynamic Scheduling II

DAT105: Computer Architecture

EECS 470. Lecture 9. MIPS R10000 Case Study. Fall 2018 Jon Beaumont

CSE502: Computer Architecture CSE 502: Computer Architecture

EECS 470 Lecture 8. P6 µarchitecture. Fall 2018 Jon Beaumont Core 2 Microarchitecture

Precise State Recovery. Out-of-Order Pipelines

EECE 321: Computer Organiza5on

Project 5: Optimizer Jason Ansel

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

OOO Execution & Precise State MIPS R10000 (R10K)

ECE473 Computer Architecture and Organization. Pipeline: Introduction

A B C D. Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold. Time

Pipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold

Lecture 8-1 Vector Processors 2 A. Sohn

Lecture 4: Introduction to Pipelining

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

CSE502: Computer Architecture CSE 502: Computer Architecture

Pipelined Processor Design

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

ECE 4750 Computer Architecture, Fall 2016 T09 Advanced Processors: Superscalar Execution

Issue. Execute. Finish

Suggested Readings! Lecture 12" Introduction to Pipelining! Example: We have to build x cars...! ...Each car takes 6 steps to build...! ! Readings!

FMP For More Practice

CS 110 Computer Architecture Lecture 11: Pipelining

7/19/2012. IF for Load (Review) CSE 2021: Computer Organization. EX for Load (Review) ID for Load (Review) WB for Load (Review) MEM for Load (Review)

CSE 2021: Computer Organization

CS429: Computer Organization and Architecture

CMSC 611: Advanced Computer Architecture

Computer Hardware. Pipeline

Compiler Optimisation

LECTURE 8. Pipelining: Datapath and Control

EECS 470 Lecture 4. Pipelining & Hazards II. Winter Prof. Ronald Dreslinski h8p://

Asanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.

RISC Central Processing Unit

Pipelined Beta. Handouts: Lecture Slides. Where are the registers? Spring /10/01. L16 Pipelined Beta 1

Pipelined Architecture (2A) Young Won Lim 4/7/18

Pipelined Architecture (2A) Young Won Lim 4/10/18

Computer Architecture

CMSC 611: Advanced Computer Architecture

6.S084 Tutorial Problems L19 Control Hazards in Pipelined Processors

Digital Integrated CircuitDesign

SATSim: A Superscalar Architecture Trace Simulator Using Interactive Animation

A LOW POWER DESIGN FOR ARITHMETIC AND LOGIC UNIT

ECE 2300 Digital Logic & Computer Organization. More Pipelined Microprocessor

5. (Adapted from 3.25)

On the Rules of Low-Power Design

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

Chapter 4 The Data Encryption Standard

Pipelining and ISA Design

EECS150 - Digital Design Lecture 23 - Arithmetic and Logic Circuits Part 4. Outline

CS61c: Introduction to Synchronous Digital Systems

DIGITAL DESIGN WITH SM CHARTS

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

AutoBench 1.1. software benchmark data book.

ASIP Solution for Implementation of H.264 Multi Resolution Motion Estimation

High-Speed RSA Crypto-Processor with Radix-4 4 Modular Multiplication and Chinese Remainder Theorem

Design of Baugh Wooley Multiplier with Adaptive Hold Logic. M.Kavia, V.Meenakshi

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

Department Computer Science and Engineering IIT Kanpur

Computer Architecture Lab Session

Computer Architecture ( L), Fall 2017 HW 3: Branch handling and GPU SOLUTIONS

Relocatable Fleet Code

Improving GPU Performance via Large Warps and Two-Level Warp Scheduling

Reading Material + Announcements

Digital Hearing Aids Specific μdsp Chip Design by Verilog HDL

CSEN 601: Computer System Architecture Summer 2014

Instructor: Dr. Mainak Chaudhuri. Instructor: Dr. S. K. Aggarwal. Instructor: Dr. Rajat Moona

Chapter 3 Digital Logic Structures

PSoC Academy: How to Create a PSoC BLE Android App Lesson 9: BLE Robot Schematic 1

Energy efficient multi-granular arithmetic in a coarse-grain reconfigurable architecture

CZ3001 ADVANCED COMPUTER ARCHITECTURE

Evolution of DSP Processors. Kartik Kariya EE, IIT Bombay

Implementing an intelligent version of the classical sliding-puzzle game. for unix terminals using Golang's concurrency primitives

ICS312 Machine-level and Systems Programming

EE 308 Spring 2006 FINAL PROJECT: INTERFACING AND MOTOR CONTROL WEEK 1 PORT EXPANSION FOR THE MC9S12

CHAPTER 1 INTRODUCTION

An Analysis of Multipliers in a New Binary System

Transcription:

COSC4201 Scoreboard Prof. Mokhtar Aboelaze York University Based on Slides by Prof. L. Bhuyan (UCR) Prof. M. Shaaban (RIT) 1 Overcoming Data Hazards with Dynamic Scheduling In the pipeline, if there is data dependency between an instruction already in the pipe and a fetched instruction that can not be hidden by forwarding, the pipeline stalls. That is known as static scheduling. In dynamic scheduling the hardware rearranges the instructions to reduce stalls. It simplifies the compiler and deals with dependences that were not known during the compilation. In dynamic scheduling, processor can not remove true data dependence, it tries to avoid stalls. 2

Dynamic Scheduling DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F12,F8,F14 ADDD depends on DIVD, can not proceed, SUBD will be stalled also although it doesn t depend on any other instruction. The reason is structural and data hazards are checked in the ID stage, once we detected that ADDD depends on DIVD, the pipeline is stalled and no instructions will be fetched To proceed with SUBD, we must separate the issue process into 2 parts, checking the structural hazard, and waiting for the absence of data hazards We will check for structural hazard when we issue; thus we still use in-order issue. However we want the instructions to begin execution as soon as their data operands are ready 3 Dynamic Scheduling That of course creates out-of-order completion, which creates problems with exception handling, for the time being, we assume imprecise exception. We split the ID stage into two parts Issue Decode and check structural hazards Read operands Wait until no data hazard and read operands An instruction fetch stage proceeds the ID stage and may fetch in a single-entry latch, or a queue After that the EX stage 4

Dynamic Scheduling with scoreboard All instructions pass through the issue stage in order. Scoreboard was first used in CDC 6600 Have to check for WAW and RAW hazards Assume we have one integer unit, two multipliers, one adder, and one divide unit. Scoreboard keep information about every instruction from fetch to execute The scoreboard controls when an instruction can read operands, start execution and when it car write its result. 5 Dynamic Scheduling with scoreboard There is no forwarding. Notice that there is no specific stage for write back, which means the operands can be written back in the cycle after completion. One cycle delay, but the overall performance is better 6

Dynamic Scheduling with scoreboard Ignoring memory, 4 stages (NO forwarding) Issue If a FU is free, and no other active instruction has the same destination register, the instruction is issued. Otherwise, the instruction issue stalls, and no other instructions can be fetched (replaces a portion of ID stage in MIPS). Read Operands the scoreboard monitors the availability of the source operands, if available (no active instruction will write to the source registers, the instruction can read operands and execute (probably outof-order) 7 Dynamic Scheduling with scoreboard Execution FU starts execution, after completion it notifies the scoreboard Write Result Scoreboard cheeks for WAR hazards, and stalls if necessary DIV ADD SUB F0,F2,F4 F10,F0,F8 F8,F8,F14 Scoreboard stalls SUB until ADD reads its operand (note that because DIV will take a long time, ADD stalls). 8

Scoreboard 1. Instruction status Indicates which of the 4 steps the instruction is in 2. Functional Unit Status Indicates the state of the FU, there are nine fields Busy OP F i F j, F k Q j, Q k R j, R k busy or not operation to be performed in the unit Destination register source register number FU producing source registers F j, and F k Flags to indicates if F j, and F k are ready or not 3. Register Result Status Indicates which FU will write to each register 9 10

Dynamic Scheduling with Scoreboard Consider the following example LD F6,34(R2) LD F2,45(R3) MULTD F0,F2,F4 SUBD F8,F6,F2 DIVD F10,F0,F6 ADDD F6,F8,F2 Add is 2 cycles, MULT is 10 and divide is 40 11 Scoreboard Example Cycle 1 LD F6 34+ R2 1 LD F2 45+ R3 MULT F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 Integer Yes Load F6 R2 Yes Divide No 1 FU Integer 12

Scoreboard Example Cycle 2 LD F6 34+ R2 1 2 LD F2 45+ R3 MULT F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 Integer Yes Load F6 R2 Yes Divide No 2 FU Integer Issue 2nd LD? 13 Scoreboard Example Cycle 3 LD F6 34+ R2 1 2 3 LD F2 45+ R3 MULT F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 Integer Yes Load F6 R2 Yes Divide No 3 FU Integer Issue MULT? 14

Scoreboard Example Cycle 4 LD F2 45+ R3 MULT F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 Integer Yes Load F6 R2 Yes Divide No 4 FU Integer 15 Scoreboard Example Cycle 5 LD F2 45+ R3 5 MULT F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 Integer Yes Load F2 R3 Yes Divide No 5 FU Integer 16

Scoreboard Example Cycle 6 LD F2 45+ R3 5 6 MULT F0 F2 F4 6 SUBD F8 F6 F2 DIVD F10 F0 F6 Integer Yes Load F2 R3 Yes Mult1 Yes Mult F0 F2 F4 Yes Divide No 6 FU Mult1 Integer 17 Scoreboard Example Cycle 7 LD F2 45+ R3 5 6 7 MULT F0 F2 F4 6 SUBD F8 F6 F2 7 DIVD F10 F0 F6 Integer Yes Load F2 R3 Yes Mult1 Yes Mult F0 F2 F4 Yes Add Yes Sub F8 F6 F2 Integer Yes No Divide No 7 FU Mult1 Integer Add Read multiply operands? 18

Scoreboard Example Cycle 8a LD F2 45+ R3 5 6 7 MULT F0 F2 F4 6 SUBD F8 F6 F2 7 Integer Yes Load F2 R3 Yes Mult1 Yes Mult F0 F2 F4 Yes Add Yes Sub F8 F6 F2 Integer Yes No 8 FU Mult1 Integer Add Divide 19 Scoreboard Example Cycle 8b MULT F0 F2 F4 6 SUBD F8 F6 F2 7 Mult1 Yes Mult F0 F2 F4 Yes Yes Add Yes Sub F8 F6 F2 Yes Yes 8 FU Mult1 Add Divide 20

Scoreboard Example Cycle 9 MULT F0 F2 F4 6 9 SUBD F8 F6 F2 7 9 10 Mult1 Yes Mult F0 F2 F4 Yes Yes 2 Add Yes Sub F8 F6 F2 Yes Yes 9 FU Mult1 Add Divide Read operands for MULT 21 & SUBD? Issue ADDD? Scoreboard Example Cycle 11 MULT F0 F2 F4 6 9 SUBD F8 F6 F2 7 9 11 8 Mult1 Yes Mult F0 F2 F4 Yes Yes 0 Add Yes Sub F8 F6 F2 Yes Yes 11 FU Mult1 Add Divide 22

Scoreboard Example Cycle 12 MULT F0 F2 F4 6 9 7 Mult1 Yes Mult F0 F2 F4 Yes Yes 12 FU Mult1 Divide Read operands for DIVD? 23 Scoreboard Example Cycle 13 MULT F0 F2 F4 6 9 13 6 Mult1 Yes Mult F0 F2 F4 Yes Yes Add Yes Add F6 F8 F2 Yes Yes 13 FU Mult1 Add Divide 24

Scoreboard Example Cycle 14 MULT F0 F2 F4 6 9 13 14 5 Mult1 Yes Mult F0 F2 F4 Yes Yes 2 Add Yes Add F6 F8 F2 Yes Yes 14 FU Mult1 Add Divide 25 Scoreboard Example Cycle 15 MULT F0 F2 F4 6 9 13 14 4 Mult1 Yes Mult F0 F2 F4 Yes Yes 1 Add Yes Add F6 F8 F2 Yes Yes 15 FU Mult1 Add Divide 26

Scoreboard Example Cycle 16 MULT F0 F2 F4 6 9 13 14 16 3 Mult1 Yes Mult F0 F2 F4 Yes Yes 0 Add Yes Add F6 F8 F2 Yes Yes 16 FU Mult1 Add Divide 27 Scoreboard Example Cycle 17 MULT F0 F2 F4 6 9 13 14 16 2 Mult1 Yes Mult F0 F2 F4 Yes Yes Add Yes Add F6 F8 F2 Yes Yes 17 FU Mult1 Add Divide Write result of ADDD (DIV did not read F6)? 28

Scoreboard Example Cycle 18 MULT F0 F2 F4 6 9 13 14 16 1 Mult1 Yes Mult F0 F2 F4 Yes Yes Add Yes Add F6 F8 F2 Yes Yes 18 FU Mult1 Add Divide 29 Scoreboard Example Cycle 19 MULT F0 F2 F4 6 9 19 13 14 16 0 Mult1 Yes Mult F0 F2 F4 Yes Yes Add Yes Add F6 F8 F2 Yes Yes 19 FU Mult1 Add Divide 30

Scoreboard Example Cycle 20 MULT F0 F2 F4 6 9 19 20 13 14 16 Add Yes Add F6 F8 F2 Yes Yes Divide Yes Div F10 F0 F6 Yes Yes 20 FU Add Divide 31 Scoreboard Example Cycle 21 MULT F0 F2 F4 6 9 19 20 21 13 14 16 Add Yes Add F6 F8 F2 Yes Yes Divide Yes Div F10 F0 F6 Yes Yes 21 FU Add Divide 32

Scoreboard Example Cycle 22 MULT F0 F2 F4 6 9 19 20 21 13 14 16 22 40 Divide Yes Div F10 F0 F6 Yes Yes 22 FU Divide 33 Scoreboard Example Cycle 61 MULT F0 F2 F4 6 9 19 20 21 61 13 14 16 22 0 Divide Yes Div F10 F0 F6 Yes Yes 61 FU Divide 34

Scoreboard Example Cycle 62 MULT F0 F2 F4 6 9 19 20 21 61 62 13 14 16 22 0 Divide No 62 FU 35