CSEN 601: Computer System Architecture Summer 2014

Similar documents
LECTURE 8. Pipelining: Datapath and Control

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

RISC Design: Pipelining

7/19/2012. IF for Load (Review) CSE 2021: Computer Organization. EX for Load (Review) ID for Load (Review) WB for Load (Review) MEM for Load (Review)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

CSE 2021: Computer Organization

Chapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:

Selected Solutions to Problem-Set #3 COE 608: Computer Organization and Architecture Single Cycle Datapath and Control

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

EECE 321: Computer Organiza5on

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

Computer Architecture

AccuBuild Version 9.3 Release 05/11/2015. Document Management Speed Performance Improvements

Excel Step by Step Instructions Creating Lists and Charts. Microsoft

Hands-Free Music Tablet

A B C D. Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold. Time

Big Kahuna Assembly Instructions

Pipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold

You Be The Chemist Challenge Official Competition Format

VIP-200. Point to Point Extension Configuration Quick Start Guide. Video over IP Extender and Matrix System

Dispatcher Control for MotoTRBO Capacity Plus Systems

1.12 Equipment Manager

Photoshop Elements: Color and Tonal Correction Basics

The Mathematics of the Rubik s Cube

E-Learning, DC drives DCS800 Hardware Options, part 1 Size D1 D4. ABB Group February 2, 2010 Slide 1 DCS800_HARDWARE_OPTIONS_01R0101

Hospital Task Scheduling using Constraint Programming

Ten-Tec Model RX-366 Subreceiver 565/566 Subreceiver Installation and Operation Manual-74467

My Little Pony CCG Comprehensive Rules

Altis Flight Manager. PC application for AerobTec devices. AerobTec Altis v3 User Manual 1

Automated Design of an ASIP for Image Processing Applications

Exam solutions FYS3240/

Creating Gift Card Batches

Operating Instructions

Martel LC-110H Loop Calibrator and HART Communications/Diagnostics

Lab2 Digital Weighing Scale (Sep 18)

Connection tariffs

Laboratory: Introduction to Mechatronics. Instructor TA: Edgar Martinez Soberanes Lab 1.

Dialectical Journals. o o. Sample Dialectical Journal entry: The Things They Carried, by Tim O Brien Passages from the text Pg#s Comments & Questions

SHADOW OF THE DRAGON AGE OF SIGMAR

EE 457 Homework 5 Redekopp Name: Score: / 100_

Formative Evaluation of GeeGuides: Educational Technology to Enhance Art Exploration

Table of Contents. ilab Solutions: Core Facilities Core Usage Reporting

ANALOG-TO-DIGITAL (ADC) & DIGITAL-TO-ANALOG (DAC) CONVERTERS

PROBABILITY OF DETECTION OF FLAWS IN A GAS TURBINE ENGINE. Gary L. Burkhardt and R.E. Beissner

Desktop Teller Exception User Guide

Pipelined Processor Design

DEAD MAN S DOUBLOONS. Rules v1.2

Processors with Sub-Microsecond Response Times Control a Variety of I/O. *Adapted from PID Control with ADwin, by Doug Rathburn, Keithley Instruments

LED wdali MC Switch Input Modul Set - User Manual

Automatic Number Plate Recognition

Lecture 4: Introduction to Pipelining

COMP 110 INTRODUCTION TO PROGRAMMING WWW

Experion MX Formation Measurement

ACT-R models of training

Renton School District

Operating Instructions

COSC 6374 Parallel Computation. Communication Performance Modeling. Edgar Gabriel Fall Motivation

Configure and Use Bar Tabs

BV4115. RF Packet Transmitter. Product specification. February ByVac 2007 ByVac Page 1 of 5

TC 60 THERMOCOMPUTER TC 60. prog. start stop. Operating Instructions

The WHO e-atlas of disaster risk for the European Region Instructions for use

Electrical devices may only be mounted and connected by electrically skilled persons.

1. Give an example of how one can exploit the associative property of convolution to more efficiently filter an image.

Betrayal of the Guardian Frequently Asked Questions

Image Processing of ST2000XM Images with Small Focal Length

PAPER SPACE AND LAYOUTS

PreLab5 Temperature-Controlled Fan (Due Oct 16)

DreamHack Official rules DreamHack Winter 2010

Preparing microwave transport network for the 5G world

AP Language and Composition

Roof Safe Netting CONTENTS

ECE 3829: Advanced Digital System Design with FPGAs A Term 2017

Puget Sound Company Overview. Purpose of the Project. Solution Overview

Figure 1: A Battleship game by Pogo

Lab 1 Load Cell Measurement System (Jan 09/10)

Dragon Fall Age of Sigmar Event

Dispersion is the splitting of white light into its colour components.

Creating HyperLynx DDRx Memory Controller Timing Model

ECE473 Computer Architecture and Organization. Pipeline: Introduction

RiverSurveyor S5/M9 & HydroSurveyor Second Generation Power & Communications Module (PCM) Jan 23, 2014

How are humans responsible for the environment?

Operating instructions

Single-Cycle CPU The following exercises are taken from Hennessy and Patterson, CO&D 2 nd, 3 rd, and 4 th Ed.

National Curriculum Programme of Study:

Notes on using an external GNSS receiver with smart phone mapping app

INSTALLATION INSTRUCTIONS

a) Which points will be assigned to each center in the first iteration? b) What will be the values of the k new centers (means)?

CENTRE FOR DISTANCE EDUCATION ANNA UNIVERSITY CHENNAI GUIDELINES FOR PREPARATION OF MCA PROJECT REPORT

Insert Picture, reduce the size of a Picture and Wrap text around a picture

Manual Zeiss Axio Zoom.V16 microscope and ZEN 2 Pro software

SISTEMA ELEVATÓRIO ETV 460A

Study of New architecture needs for AOCS / Avionics Abstract. Abstract

1 Logistics. Chengkai Li. Department of Computer Science and Engineering University of Texas at Arlington Fall 2017

POWERSLED CIRCUIT INTRODUCTION GAME COMPONENTS

DXF2DAT 3.0 Professional Designed Computing Systems 848 W. Borton Road Essexville, Michigan 48732

ELEC 7250 VLSI TESTING. Term Paper. Analog Test Bus Standard

Security Exercise 12

User Guide. ACC Mobile 3 Preview App for ios

CMSC 611: Advanced Computer Architecture

Transcription:

CSEN 601: Cmputer System Architecture Summer 2014 Practice Assignment 7 Slutin Exercise 7-1: Based n the MIPS pipeline implementatin yu studied, what are the cntrl signals that have t be stred in the ID/EX pipeline register? Grup them based n the stage they are needed in. Slutin: Cntrl signals needed in the EX phase: ALUSrc (1- bit), RegDest (1- bit), ALUOp (2- bits) Cntrl signals needed in the MEM phase: MemRead (1- bit), MemWrite (1- bit), Branch (1- bit) Cntrl signals needed in the WB phase: RegWrite (1- bit), MemTReg (1- bit) Exercise 7-2: Based n the MIPS pipeline implementatin yu studied, what are the sizes f the pipeline registers? Justify yur answer. Ignre any bits required t detect r handle hazards. Slutin: The IF/ID pipeline register has: 64- bits 32- bits instructin 32- bits incremented PC The ID/EX pipeline register has: 147- bits 32- bits incremented PC 32- bits read register 1 value 32- bits read register 2 value 32- bits sign extended ffset 5- bits Rt field 5- bits Rd field 2- bits WB cntrl signals 3- bits MEM cntrl signals 4- bits EX cntrl signals The EX/MEM pipeline register has: 107- bits 32- bits branch address 1- bit zer flag 32- bits ALU result/address 32- bits register value t write t memry 5- bits Rd field (writereg) 2- bits WB cntrl signals 3- bits MEM cntrl signals The MEM/WB pipeline register has: 71- bits 1

32- bits ALU result 32- bits memry wrd read 5- bits Rd field (writereg) 2- bits WB cntrl signals Exercise 7-3: Fr the fllwing sequences f instructins: 1. lw $1, 40($6) beq $2, $0, Label Assume $2 == $0 sw $6, 50($2) Label: add $2, $3, $4 sw $3, 50($4) 2. lw $5, - 16($5) sw $4, - 16($4) lw $3, - 20($4) beq $2, $0, Label Assume $2!= $0 add $5, $1, $4 Assuming the fllwing latencies fr the individual pipeline stages: 1. 100ps 120ps 90ps 130ps 60ps 2. 180ps 100ps 170ps 220ps 60ps a. Assume that all branches are perfectly predicted (eliminating cntrl hazards) If we have nly ne memry (fr bth instructins and data), there is a structural hazard every time we need t fetch an instructin in the same cycle in which anther instructin accesses data. T guarantee frward prgress, this hazard must always be reslved in favr f the instructin that accesses data. What is the ttal executin time f this instructin sequence in the five- stage pipeline that nly has ne memry? Data hazards can be eliminated by adding nps t the cde. Can structural hazard be eliminated in the same way? Why? 2

b. Assume that all branches are perfectly predicted (eliminating cntrl hazards) If we change lad/stre instructins t use a register (withut an ffset) as the address, these instructins n lnger need t use the ALU. As a result, MEM and EX stages can be verlapped and the pipeline has nly fur stages. Change this cde t accmmdate this changed ISA. Assuming this change desn t affect clck cycle time, what speed- up is achieved this instructin sequence? c. Repeat the speed- up calculatin f part b, but take int accunt the pssible change in clck cycle time and the prvided pipeline stage latencies. When EX and MEM are dne in a single stage, mst f their wrk can be dne in parallel. As a result, EX/MEM stage has a latency that is larger f the riginal tw plus 20ps needed fr the wrk that culdn t be dne in parallel. d. Assuming stall- n- branch, what speed- up is achieved n this cde if branch utcmes are determined in the ID stage, relative t the executin where branch utcmes are determined in the EX stage? e. Assume the latency ID stage increases by 50% and the latency f the EX stage decreases by 10ps when branch utcme reslutin is mved t ID. Repeat the speed- up calculatin f part d, but take int accunt the pssible change in clck cycle time and the prvided pipeline stage latencies. f. Assume stall- n- branch, what is the new clck cycle time and executin time f this instructin sequence if beq address cmputatin is mved t the MEM stage? What is the speed- up in this case? Assume that the latency f the EX stage is reduced by 20ps and the latency f the MEM stage remains unchanged. 3

Slutin: a. Perfect branch predictin leads t n stalls. In the pipelined executin, *** represents a stall when an instructin can t be fetched because a lad r stre instructin is using the memry in that cycle. We can t add nps t eliminate structural hazards as nps need t be fetched just like any ther instructins, s this hazard must be addressed with a hardware hazard detectin unit in the prcessr. Instructins Pipeline stage Cycles 1. lw $1, 40($6) beq $2, $0, Label 9 add $2, $3, $4 sw $3, 50($4) *** 2. lw $5, - 16($5) sw $4, - 16($4) 12 lw $3, - 20($4) beq $2, $0, Label *** *** *** add $5, $1, $4 b. This change nly saves ne cycle in an entire executin withut data hazards. If there were data hazards frm lads t ther instructins, the change wuld help eliminate sme stall cycles. Instructins Cycles with 5 stages Cycles with 4 Speed- up executed stages 1. 4 4+4 = 8 3+4 = 7 8/7 = 1.14 2. 5 4+5 = 9 3+5 = 8 9/8 = 1.13 c. The clck cycle time is equal t the latency f the lngest- latency stage. Cmbining EX and MEM stages affect clck time nly if the cmbined EX/MEM stage becmes the lngest- latency. Cycles time with Cycles time with 4 Speed- up 5 stages stages 1. 130ps (MEM) 150ps (MEM +20ps) (8*130)/(7*150) = 0.99 2. 220ps (MEM) 240ps (MEM +20ps) (9*220)/(8*240) = 1.03 4

d. Stall- n- branch delays the fetch f the next instructin until the branch is executed. When branches execute in the ID stage, each branch cause ne stall nly. e. Instructin Branches Cycles with Cycles with Speed- up executed executed branch in EX branch in ID 1. 4 1 4+4+1*2 = 10 4+4+1*1=9 10/9 = 1.11 2. 5 1 4+5+1*2 = 11 4+5+1*1=10 11/10 = 1.1 New ID NEW EX New cycle Old cycle Speed- up latency latency time time 1. 180ps 80ps 180ps (ID) 130ps (MEM) (10*130)/(9*180) = 0.8 2. 150ps 160ps 220ps (MEM) 220ps (MEM) (11*220)/(10*220) = 1.1 f. The cycle time remains unchanged; a 20ps reductin in EX latency has n effect n clck cycle time because EX is nt the lngest- latency stage. The change affects the executin time because it adds ne additinal stall cycle t each branch, because the clck cycle time desn t imprve but the number f cycles increases. Cycles with branch in EX Executin time (branch in EX) Cycles with branch in MEM Executin time (branch in MEM) Speed- up 1. 4+4+1*2 = 10 10*130 = 1300ps 4+4+1*3 = 11 11*130 = 1430ps 0.91 2. 4+5+1*2 = 11 11*220 = 2420ps 4+5+1*3 = 12 12*220 = 2640ps 0.92 5