Pipelined instuc.on Execu.on 1 Pipelining and ISA Design MIPS Instuc:on Set designed fo pipelining All instuc:ons ae 32- bits Easie to fetch and decode in one cycle x86: 1- to 17- byte instuc:ons (x86 HW actually tanslates to intenal RISC instuc:ons!) Few and egula instuc:on fomats, 2 souce egiste fields always in same place Can decode and ead egistes in one step Memoy opeands only in Loads and Stoes Can calculate addess 3 d stage, access memoy 4 th stage Alignment of memoy opeands Memoy access takes only one cycle 2 1
Pipelined Contol 3 Time Pipelined Execu:on Repesenta:on Evey instuc:on must take same numbe of steps, also called pipeline stages, so some will go idle some:mes 2
Gaphical Pipeline Diagams PC instuction memoy d s t egistes Data memoy +4 imm 1. Instuction Fetch 2. Decode/ 3. Execute 4. Memoy Registe Read Use datapath figue below to epesent pipeline 5. Wite Back Gaphical Pipeline Repesenta:on (In Reg, ight half highlight ead, left half wite) Time (clock cycles) I n I$ Reg D$ Reg s Load t Add I$ Reg D$ Reg. Stoe I$ Reg D$ Reg O I$ Reg D$ Reg Sub d e O I$ Reg D$ Reg 3
Pipeline Pefomance Assume :me fo stages is 100ps fo egiste ead o wite 200ps fo othe stages What is pipelined clock ate? Compae pipelined datapath with single- cycle datapath Inst Inst fetch Registe ead op Memoy access Registe wite Total time lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-fomat 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps Fall 3/2/15 2011 - - Lectue #31 7 Pipeline Pefomance Single-cycle (T c = 800ps) Pipelined (T c = 200ps) 8 4
Pipeline Speedup If all stages ae balanced i.e., all take the same :me Time between instuc:ons pipelined = Time between instuc:ons nonpipelined Numbe of stages If not balanced, speedup is less Speedup due to inceased thoughput Latency (:me fo each instuc:on) does not decease 9 Hazads Situa:ons that pevent sta:ng the next logical instuc:on in the next clock cycle 1. Stuctual hazads Requied esouce is busy 2. Data hazad Need to wait fo pevious instuc:on to complete its data ead/wite (e.g., pai of socks in diffeent loads) 3. Contol hazad Deciding on contol ac:on depends on pevious instuc:on (e.g., how much detegent based on how clean pio load tuns out) 10 5
1. Stuctual Hazads Conflict fo use of a esouce In MIPS pipeline with a single memoy Load/Stoe equies memoy access fo data Instuc:on fetch would have to stall fo that cycle Causes a pipeline Hence, pipelined datapaths equie sepaate instuc:on/data memoies In eality, povide sepaate L1 I$ and L1 D$ 11 1. Stuctual Hazad #1: Single Memoy Time (clock cycles) I n s Load t Inst 1. Inst 2 O Inst 3 d Inst 4 e Read same memoy twice in same clock cycle 12 6
1. Stuctual Hazad #2: Registes (1/2) I n s t. O d e sw Inst 1 Inst 2 Inst 3 Inst 4 Time (clock cycles) I$ Reg D$ Reg Can we ead and wite to egistes simultaneously? 13 1. Stuctual Hazad #2: Registes (2/2) Two diffeent solu:ons have been used: 1) RegFile access is VERY fast: takes less than half the :me of stage Wite to Registes duing fist half of each clock cycle Read fom Registes duing second half of each clock cycle 2) Build RegFile with independent ead and wite pots Result: can pefom Read and Wite duing same clock cycle 14 7
Data Hazads (1/2) Conside the following sequence of instuc:ons add $t0, $t1, $t2 sub $t4, $t0,$t3 and $t5, $t0,$t6 o $t7, $t0,$t8 xo $t9, $t0,$t10 I n s t. Data Hazads (2/2) Data- flow backwad in :me ae hazads Time (clock cycles) add $t0,$t1,$t2 sub $t4,$t0,$t3 IF ID/RF EX MEM WB O d e and $t5,$t0,$t6 o $t7,$t0,$t8 xo $t9,$t0,$t10 I$ Reg D$ Reg 8
Data Hazad Solu:on: Fowading Fowad esult fom one stage to anothe add $t0,$t1,$t2 sub $t4,$t0,$t3 and $t5,$t0,$t6 IF ID/RF EX MEM WB o $t7,$t0,$t8 xo $t9,$t0,$t10 I$ Reg D$ Reg o hazad solved by egiste hadwae Data Hazad: Load/Use (1/4) Dataflow backwads in :me ae hazads lw $t0,0($t1) IF ID/RF EX MEM WB sub $t3,$t0,$t2 Can t solve all cases with fowading Must stall instuc:on dependent on load, then fowad (moe hadwae) 9
Data Hazad: Load/Use (2/4) Hadwae stalls pipeline (Called intelock ) lw $t0, 0($t1) sub $t3,$t0,$t2 and $t5,$t0,$t4 IF ID/RF EX MEM WB I$ Reg D$ Reg o $t7,$t0,$t6 I$ Reg D$ Not in MIPS: (MIPS = Micopocesso without Intelocked Pipeline Stages) Data Hazad: Load/Use (3/4) Instuc:on slot ake a load is called load delay slot If that instuc:on uses the esult of the load, then the hadwae intelock will stall it fo one cycle. Altena:ve: If the compile puts an unelated instuc:on in that slot, then no stall Lelng the hadwae stall the instuc:on in the delay slot is equivalent to pulng a nop in the slot (except the lame uses moe code space) 10
Data Hazad: Load/Use (4/4) Stall is equivalent to nop lw $t0, 0($t1) nop sub $t3,$t0,$t2 and $t5,$t0,$t4 o $t7,$t0,$t6 I$ Reg D$ Data Hazads: Code Scheduling to Avoid Stalls Reode code to avoid use of load esult in the next instuc:on C code fo A = B + E; C = B + F; stall stall lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 13 cycles lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 11 cycles 22 11
Data Hazads: Code Scheduling 1 2 3 4 5 6 7 8 9 10 11 12 13 lw $t1, 0($t0) IF ID EX MEM WB lw $t2, 4($t0) IF ID EX MEM WB add $t3, $t1, $t2 IF ID EX MEM WB sw $t3,12($t0) IF ID EX MEM WB lw $t4,8($t0) IF ID EX MEM WB add $t5,$1, $t4 IF ID EX MEM WB sw$t5, 16($t0) IF ID EX MEM WB lw $t1, 0($t0) IF ID EX MEM WB lw $t2, 4($t0) IF ID EX MEM WB lw $t4,8($t0) IF ID EX MEM WB add $t3, $t1, $t2 IF ID EX MEM WB sw $t3,12($t0) IF ID EX MEM WB add $t5,$1, $t4 IF ID EX MEM WB sw$t5, 16($t0) IF ID EX MEM WB 3/2/15 Fall 2011 - - Lectue #31 23 12