Pipelined Architecture (2A) Young Won Lim 4/7/18

Similar documents
Pipelined Architecture (2A) Young Won Lim 4/10/18

ICS312 Machine-level and Systems Programming

Lecture 4: Introduction to Pipelining

Audio Signal Generation. Young Won Lim 1/12/18

A B C D. Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold. Time

RISC Central Processing Unit

Monoid (4A) Young Won Lim 5/8/18

Pipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold

Audio Signal Generation. Young Won Lim 1/22/18

Understanding Engineers #2

LECTURE 8. Pipelining: Datapath and Control

Computer Hardware. Pipeline

ECE473 Computer Architecture and Organization. Pipeline: Introduction

Signal Analysis. Young Won Lim 2/9/18

Signal Analysis. Young Won Lim 2/10/18

Selected Solutions to Problem-Set #3 COE 608: Computer Organization and Architecture Single Cycle Datapath and Control

COSC4201. Scoreboard

Department Computer Science and Engineering IIT Kanpur

7/19/2012. IF for Load (Review) CSE 2021: Computer Organization. EX for Load (Review) ID for Load (Review) WB for Load (Review) MEM for Load (Review)

CSE 2021: Computer Organization

AUTOMATIC IMPLEMENTATION OF FIR FILTERS ON FIELD PROGRAMMABLE GATE ARRAYS

RISC Design: Pipelining

CMP 301B Computer Architecture. Appendix C

Chapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

Numbers (8A) Young Won Lim 5/22/17

Numbers (8A) Young Won Lim 5/24/17

Instructor: Dr. Mainak Chaudhuri. Instructor: Dr. S. K. Aggarwal. Instructor: Dr. Rajat Moona

Numbers (8A) Young Won Lim 6/21/17

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

Computer Science 246. Advanced Computer Architecture. Spring 2010 Harvard University. Instructor: Prof. David Brooks

Disseny físic. Disseny en Standard Cells. Enric Pastor Rosa M. Badia Ramon Canal DM Tardor DM, Tardor

Project 5: Optimizer Jason Ansel

EE382V-ICS: System-on-a-Chip (SoC) Design

Topics. Low Power Techniques. Based on Penn State CSE477 Lecture Notes 2002 M.J. Irwin and adapted from Digital Integrated Circuits 2002 J.

Basic Symbols for Register Transfers. Symbol Description Examples

Advanced Digital Logic Design

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

EECE 321: Computer Organiza5on

6.S084 Tutorial Problems L19 Control Hazards in Pipelined Processors

Computer Architecture and Organization:

CS429: Computer Organization and Architecture

Digital Integrated CircuitDesign

EECS 427 Lecture 22: Low and Multiple-Vdd Design

Octave Functions for Filters. Young Won Lim 2/19/18

A Novel Area-Efficient Binary Adder

EN164: Design of Computing Systems Lecture 22: Processor / ILP 3

CS521 CSE IITG 11/23/2012

Computer Architecture

Subra Ganesan DSP 1.

Lecture 8-1 Vector Processors 2 A. Sohn

Evolution of DSP Processors. Kartik Kariya EE, IIT Bombay

Control Systems Overview REV II

32-Bit CMOS Comparator Using a Zero Detector

Dynamic Scheduling I

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

Mapping Multiplexers onto Hard Multipliers in FPGAs

AC-1: A Clock-Powered Microprocessor

BJT Amplifier Power Amp Overview(H.21)

Korea Advanced Institute of Science and Technology Korea Advanced Institute of Science and Technology

BJT h-parameter (H.16)

Problem: hazards delay instruction completion & increase the CPI. Compiler scheduling (static scheduling) reduces impact of hazards

DELD MODEL ANSWER DEC 2018

EC4205 Microprocessor and Microcontroller

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N

Digital Hearing Aids Specific μdsp Chip Design by Verilog HDL

Out-of-Order Execution. Register Renaming. Nima Honarmand

FMP For More Practice

Clock-Powered CMOS: A Hybrid Adiabatic Logic Style for Energy-Efficient Computing

ACIIR IP CORE IIR FILTERS

How a processor can permute n bits in O(1) cycles

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

ELEC 204 Digital Systems Design

Multi-Channel FIR Filters

Datapath Components. Multipliers, Counters, Timers, Register Files

Single-Cycle CPU The following exercises are taken from Hennessy and Patterson, CO&D 2 nd, 3 rd, and 4 th Ed.

Low Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS

CZ3001 ADVANCED COMPUTER ARCHITECTURE

Pipelined Processor Design

Analyzing Metrics of ALU Designs Traversing from Years 2002 to 2015

Tomasolu s s Algorithm

Marvell 88E1000-RJJ Gigabit Ethernet Transceiver Partial Circuit Analysis

EECS 427 Lecture 13: Leakage Power Reduction Readings: 6.4.2, CBF Ch.3. EECS 427 F09 Lecture Reminders

A Scientific Insight to Exemplary ALU s, Floating Point Designs, and Effective Processing Units

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

EE 457 Homework 5 Redekopp Name: Score: / 100_

Asanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.

A High Definition Motion JPEG Encoder Based on Epuma Platform

CSE502: Computer Architecture CSE 502: Computer Architecture

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

Computer Architecture and Organization: L08: Design Control Lines

CMSC 611: Advanced Computer Architecture

CS420/520 Computer Architecture I

EECS 470. Tomasulo s Algorithm. Lecture 4 Winter 2018

Datapath Components. Control vs. Datapath, Registers, Adders (Binary Addition) Copyright (c) 2012 Sean Key

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

Energy efficient multi-granular arithmetic in a coarse-grain reconfigurable architecture

CS 110 Computer Architecture Lecture 11: Pipelining

Transcription:

Pipelined Architecture (2A)

Copyright (c) 2014-2018 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". Please send corrections (or suggestions) to youngwlim@hotmail.com. This document was produced by using LibreOffice.

Based on ARM System-on-Chip Architecture, 2 nd ed, Steve Furber Pipelined Architecture 3

B bus A bus ALU bus 3-stage Pipeline A[31:0] MAR +1 Register Bank Instruction Decoder / mult Shifter ALU MDO MDI D[31:0] Pipelined Architecture 4

B bus A bus ALU bus Register-Register Operations A[31:0] MAR +1 Rd Rn Register Bank PC Rm Instruction Decoder / mult Shifter ALU MDO MDI I. Pipe D[31:0] Pipelined Architecture 5

B bus A bus ALU bus Register-Immediate Operations A[31:0] MAR +1 Rd Rn Register Bank PC Rm Instruction Decoder / mult Shifter ALU MDO MDI I. Pipe D[31:0] Pipelined Architecture 6

B bus A bus ALU bus STR - 1 st Cycle A[31:0] MAR +1 Rd Rn Register Bank PC Rm Instruction Decoder / mult Shifter ALU MDO MDI I. Pipe D[31:0] Pipelined Architecture 7

B bus A bus ALU bus STR - 2 nd Cycle A[31:0] MAR +1 Rd PC Rn Register Bank Rm Instruction Decoder / mult Shifter ALU MDO MDI I. Pipe D[31:0] Pipelined Architecture 8

B bus A bus ALU bus B - 1 st Cycle A[31:0] MAR +1 Rd PC Register Bank Rm Instruction Decoder / mult Shifter ALU MDO MDI I. Pipe D[31:0] Pipelined Architecture 9

B bus A bus ALU bus B - 2 nd Cycle A[31:0] MAR +1 Rd PC Rn Register Bank Rm Instruction Decoder / mult Shifter ALU MDO MDI I. Pipe D[31:0] Pipelined Architecture 10

ARM Instruction Set The load-store architecture 3-address data processing instructions (2 source registers + 1 destination register) Conditionally executes every instruction Multiple data transfer instruction Single cycle execution of shift and ALU operations Open instruction set for coprocessors A very dense 16-bit compressed instruction set (Thumb) Pipelined Architecture 11

3-stage fetch the instruction is fetched from memory it is placed in the instruction pipeline decode the instruction is decoded next cycle control signal is prepared the decode logic but not the datapath is dedicated execute the datapath is dedicated reading the register bank shifting an operand performing ALU operations writing back the result into the register bank Pipelined Architecture 12

3 stage pipeline single cycle fetch decode execute fetch decode execute fetch decode execute Pipelined Architecture 13

3-stage pipeline multi-cycle Fetch ADD decode execute Fetch STR decode Calc address Data transfer Fetch ADD decode execute Fetch ADD decode execute Fetch ADD decode execute the decode logic is involved in all the decode cycle the address calculation the datapath is involved in all the execute cycle the address calculation the data transfer Pipelined Architecture 14

3-stage pipeline multi-cycle decode logic datapath decode Calc address execute Calc address Data transfer Pipelined Architecture 15

3-stage pipeline multi-cycle decode logic datapath datapath fetch fetch decode logic datapath i-th instruction fetch decode logic (i+1)-th instruction fetch (i+2)-th instruciton Pipelined Architecture 16

ARM Exception Handling 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 N Z C V R R R R R R R R R R R R R R R R R R R R R R R R R R R R Pipelined Architecture 17

References [1] ftp://ftp.geoinfo.tuwien.ac.at/navratil/haskelltutorial.pdf [2] https://www.umiacs.umd.edu/~hal/docs/daume02yaht.pdf