Election Data is now available Puple Ameica! inst.eecs.bekeley.edu/~cs61c CS61C : Machine Stuctues Lectue 31 Pipelined Execution, pat II 2004-11-10 Lectue PSOE Dan Gacia www.cs.bekeley.edu/~ddgacia The Incedibles! www.pinceton.edu/~vdb/java/election2004/ www.usatoday.com/news/politicselections/vote2004/countymap.htm CS61C L31 Pipelined Execution, pat II (1)
Review: Pipeline (1/2) Optimal Pipeline Each stage is executing pat of an instuction each clock cycle. One inst. finishes duing each clock cycle. On aveage, execute fa moe quickly. What makes this wok? Similaities between instuctions allow us to use same stages fo all instuctions (geneally). Each stage takes about the same amount of time as all othes: little wasted time. CS61C L31 Pipelined Execution, pat II (2)
Review: Pipeline (2/2) Pipelining is a BIG IDEA widely used concept What makes it less than pefect? Stuctual hazads: suppose we had only one cache? Need moe HW esouces Contol hazads: need to woy about banch instuctions? Delayed banch Data hazads: an instuction depends on a pevious instuction? CS61C L31 Pipelined Execution, pat II (3)
Contol Hazad: Banching (1/7) I n s t. O d e beq Inst 1 Inst 2 Inst 3 Inst 4 Time (clock cycles) I$ Whee do we do the compae fo the banch? Reg D$ Reg CS61C L31 Pipelined Execution, pat II (4)
Contol Hazad: Banching (2/7) We put banch decision-making hadwae in stage theefoe two moe instuctions afte the banch will always be fetched, whethe o not the banch is taken Desied functionality of a banch if we do not take the banch, don t waste any time and continue executing nomally if we take the banch, don t execute any instuctions afte the banch, just go to the desied label CS61C L31 Pipelined Execution, pat II (5)
Contol Hazad: Banching (3/7) Initial Solution: Stall until decision is made inset no-op instuctions: those that accomplish nothing, just take time Dawback: banches take 3 clock cycles each (assuming compaato is put in stage) CS61C L31 Pipelined Execution, pat II (6)
Contol Hazad: Banching (4/7) Optimization #1: move asynchonous compaato up to Stage 2 as soon as instuction is decoded (Opcode identifies is as a banch), immediately make a decision and set the value of the PC (if necessay) Benefit: since banch is complete in Stage 2, only one unnecessay instuction is fetched, so only one no-op is needed Side Note: This means that banches ae idle in Stages 3, 4 and 5. CS61C L31 Pipelined Execution, pat II (7)
Contol Hazad: Banching (5/7) I n s t. O d e Inset a single no-op (bubble) add beq lw Time (clock cycles) bub ble I$ Reg D$ Reg Impact: 2 clock cycles pe banch instuction slow CS61C L31 Pipelined Execution, pat II (8)
Contol Hazad: Banching (6/7) Optimization #2: Redefine banches Old definition: if we take the banch, none of the instuctions afte the banch get executed by accident New definition: whethe o not we take the banch, the single instuction immediately following the banch gets executed (called the banch-delay slot) The tem Delayed Banch means we always execute inst afte banch CS61C L31 Pipelined Execution, pat II (9)
Contol Hazad: Banching (7/7) Notes on Banch-Delay Slot Wost-Case Scenaio: can always put a no-op in the banch-delay slot Bette Case: can find an instuction peceding the banch which can be placed in the banch-delay slot without affecting flow of the pogam - e-odeing instuctions is a common method of speeding up pogams - compile must be vey smat in ode to find instuctions to do this - usually can find such an instuction at least 50% of the time - Jumps also have a delay slot CS61C L31 Pipelined Execution, pat II (10)
Example: Nondelayed vs. Delayed Banch Nondelayed Banch o $8, $9,$10 add $1,$2,$3 sub $4, $5,$6 beq $1, $4, Exit xo $10, $1,$11 Delayed Banch add $1,$2,$3 sub $4, $5,$6 beq $1, $4, Exit o $8, $9,$10 xo $10, $1,$11 Exit: Exit: CS61C L31 Pipelined Execution, pat II (11)
Data Hazads (1/2) Conside the following sequence of instuctions add $t0, $t1, $t2 sub $t4, $t0,$t3 and $t5, $t0,$t6 o $t7, $t0,$t8 xo $t9, $t0,$t10 CS61C L31 Pipelined Execution, pat II (12)
Data Hazads (2/2) Dependencies backwads in time ae hazads I Time (clock cycles) n IF ID/RF EX MEM WB s add $t0,$t1,$t2 t. sub $t4,$t0,$t3 O d e and $t5,$t0,$t6 o $t7,$t0,$t8 xo $t9,$t0,$t10 I$ Reg D$ Reg CS61C L31 Pipelined Execution, pat II (13)
Data Hazad Solution: Fowading Fowad esult fom one stage to anothe add $t0,$t1,$t2 sub $t4,$t0,$t3 and $t5,$t0,$t6 IF ID/RF EX MEM WB o $t7,$t0,$t8 xo $t9,$t0,$t10 I$ Reg D$ Reg o hazad solved by egiste hadwae CS61C L31 Pipelined Execution, pat II (14)
Data Hazad: Loads (1/4) Dependencies backwads in time ae hazads lw $t0,0($t1) IF ID/RF EX MEM WB sub $t3,$t0,$t2 Can t solve with fowading Must stall instuction dependent on load, then fowad (moe hadwae) CS61C L31 Pipelined Execution, pat II (15)
Data Hazad: Loads (2/4) Hadwae must stall pipeline Called intelock lw $t0, 0($t1) sub $t3,$t0,$t2 and $t5,$t0,$t4 IF ID/RF EX MEM WB I$ bub Reg D$ Reg ble I$ bub Reg D$ Reg ble bub ble o $t7,$t0,$t6 I$ Reg D$ CS61C L31 Pipelined Execution, pat II (16)
Data Hazad: Loads (3/4) Instuction slot afte a load is called load delay slot If that instuction uses the esult of the load, then the hadwae intelock will stall it fo one cycle. If the compile puts an unelated instuction in that slot, then no stall Letting the hadwae stall the instuction in the delay slot is equivalent to putting a nop in the slot (except the latte uses moe code space) CS61C L31 Pipelined Execution, pat II (17)
Data Hazad: Loads (4/4) Stall is equivalent to nop lw $t0, 0($t1) nop bub ble bub ble bub ble bub ble bub ble sub $t3,$t0,$t2 and $t5,$t0,$t4 o $t7,$t0,$t6 I$ Reg D$ CS61C L31 Pipelined Execution, pat II (18)
Histoical Tivia Fist MIPS design did not intelock and stall on load-use data hazad Real eason fo name behind MIPS: Micopocesso without Intelocked Pipeline Stages Wod Play on aconym fo Millions of Instuctions Pe Second, also called MIPS CS61C L31 Pipelined Execution, pat II (19)
Administivia No lab this week (wed, thu o fi) Due to Veteans Day holiday on Thusday. The lab is posted as a take-home lab; show TA you esults in the following lab. Gade feezing update : though HW4 You have until next Wed to equest egades on HW3,HW4 & P1 Back to 61C Advanced Pipelining! Out-of-ode Execution Supescala Execution CS61C L31 Pipelined Execution, pat II (20)
Review Pipeline Hazad: Stall is dependency T a s k O d e 12 2 AM 6 PM 7 8 9 10 11 1 A B C D E F 3030 30 30 30 30 30 bubble Time A depends on D; stall since folde tied up CS61C L31 Pipelined Execution, pat II (21)
Out-of-Ode Laundy: Don t Wait T a s k O d e 12 2 AM 6 PM 7 8 9 10 11 1 A B C D E F 3030 30 30 30 30 30 bubble A depends on D; est continue; need moe esouces to allow out-of-ode CS61C L31 Pipelined Execution, pat II (22) Time
Supescala Laundy: Paallel pe stage T a s k O d e 12 2 AM 6 PM 7 8 9 10 11 1 A B C D E F 3030 30 30 30 Moe esouces, HW to match mix of paallel tasks? CS61C L31 Pipelined Execution, pat II (23) Time (light clothing) (dak clothing) (vey dity clothing) (light clothing) (dak clothing) (vey dity clothing)
Supescala Laundy: Mismatch Mix 12 2 AM 6 PM 7 8 9 10 11 1 T a s k A Time 3030 30 30 30 30 30 (light clothing) O d e B C D (light clothing) (dak clothing) (light clothing) Task mix undeutilizes exta esouces CS61C L31 Pipelined Execution, pat II (24)
Pee Instuction Assume 1 inst/clock, delayed banch, 5 stage pipeline, fowading, intelock on unesolved load hazads (afte 10 3 loops, so pipeline full) Loop: lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addiu $s1, $s1, -4 bne $s1, $zeo, Loop nop How many pipeline stages (clock cycles) pe loop CS61C iteation L31 Pipelined Execution, to pat execute II (25) this code? 1 2 3 4 5 6 7 8 9 10
Pee Instuction Answe Assume 1 inst/clock, delayed banch, 5 stage pipeline, fowading, intelock on unesolved load hazads. 10 3 iteations, so pipeline full. 2. (data hazad so stall) Loop: 1. lw $t0, 0($s1) 3. addu $t0, $t0, $s2 4. sw $t0, 0($s1) 5. addiu $s1, $s1, -4 6. bne $s1, $zeo, Loop 7. nop (delayed banch so exec. nop) How many pipeline stages (clock cycles) pe loop iteation to execute this code? 1 2 3 4 5 6 7 8 9 10 CS61C L31 Pipelined Execution, pat II (26)
And in Conclusion.. Pipeline challenge is hazads Fowading helps w/many data hazads Delayed banch helps with contol hazad in 5 stage pipeline Moe aggessive pefomance: Supescala Out-of-ode execution CS61C L31 Pipelined Execution, pat II (27)