inst.eecs.bekeley.edu/~cs61c CS61C : Machine Stuctues Lectue 29 Intoduction to Pipelined Execution Lectue PSOE Dan Gacia www.cs.bekeley.edu/~ddgacia Bionic Eyes let blind see! Johns Hopkins eseaches have announced they have invented a bionic eye with a compute chip on the back of the eye and a small wieless video camea in a pai of glasses. Face ecognition? Not yet, but soon! news.bbc.co.uk/1/hi/health/4411591.stm CS 61C L30 Intoduction to Pipelined Execution (1)
Review: Single cycle 5 steps datapath to design a pocesso 1. Analyze instuction set => datapath equiements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the equiements 4. Analyze implementation of each instuction to detemine setting of contol points that effects the egiste tansfe. 5. Assemble the contol logic Contol is the had pat MIPS makes that easie Instuctions same size Souce egistes always in same place Immediates same size, location Opeations always on egistes/immediates CS 61C L30 Intoduction to Pipelined Execution (2) Pocesso Contol Datapath Memoy Input Output
Review Datapath (1/3) Datapath is the hadwae that pefoms opeations necessay to execute pogams. Contol instucts datapath on what to do next. Datapath needs: access to stoage (geneal pupose egistes and memoy) computational ability () helpe hadwae (local egistes and PC) CS 61C L30 Intoduction to Pipelined Execution (3)
Review Datapath (2/3) Five stages of datapath (executing an instuction): 1. Instuction Fetch (Incement PC) 2. Instuction Decode (Read Registes) 3. (Computation) 4. Memoy Access 5. Wite to Registes ALL instuctions must go though ALL five stages. CS 61C L30 Intoduction to Pipelined Execution (4)
Review Datapath (3/3) PC instuction memoy d s t egistes Data memoy +4 imm 1. Instuction Fetch 2. Decode/ Registe Read 3. Execute 4. Memoy 5. Wite Back CS 61C L30 Intoduction to Pipelined Execution (5)
Gotta Do Laundy Ann, Bian, Cathy, Dave each have one load of clothes to wash, dy, fold, and put away A B C D Washe takes 30 minutes Dye takes 30 minutes Folde takes 30 minutes Stashe takes 30 minutes to put clothes into dawes CS 61C L30 Intoduction to Pipelined Execution (6)
Sequential Laundy 6 PM 7 8 9 10 11 12 1 2 AM T a s k O d e A B C D 3030 30303030 30 303030 30 303030 30 30 Time Sequential laundy takes 8 hous fo 4 loads CS 61C L30 Intoduction to Pipelined Execution (7)
Pipelined Laundy 12 2 AM 6 PM 7 8 9 10 11 1 T a s k O d e A B C D 3030 3030303030 Time Pipelined laundy takes 3.5 hous fo 4 loads! CS 61C L30 Intoduction to Pipelined Execution (8)
Geneal Definitions Latency: time to completely execute a cetain task fo example, time to ead a secto fom disk is disk access time o disk latency Thoughput: amount of wok that can be done ove a peiod of time CS 61C L30 Intoduction to Pipelined Execution (9)
T a s k O d e Pipelining Lessons (1/2) Pipelining doesn t help 6 PM 7 8 9 A B C D Time 3030 3030303030 latency of single task, it helps thoughput of entie wokload Multiple tasks opeating simultaneously using diffeent esouces Potential speedup = Numbe pipe stages Time to fill pipeline and time to dain it educes speedup: 2.3X v. 4X in this example CS 61C L30 Intoduction to Pipelined Execution (10)
T a s k O d e Pipelining Lessons (2/2) Suppose new 6 PM 7 8 9 Washe takes 20 minutes, new Time Stashe takes 20 3030 3030303030 minutes. How A much faste is pipeline? B C D Pipeline ate limited by slowest pipeline stage Unbalanced lengths of pipe stages also educes speedup CS 61C L30 Intoduction to Pipelined Execution (11)
Steps in Executing MIPS 1) IFetch: Fetch Instuction, Incement PC 2) Decode Instuction, Read Registes 3) Execute: Mem-ef: Calculate Addess Aith-log: Pefom Opeation 4) Memoy: Load: Stoe: Read Data fom Memoy Wite Data to Memoy 5) Wite Back: Wite Data to Registe CS 61C L30 Intoduction to Pipelined Execution (12)
Pipelined Execution Repesentation Time IFtch Dcd IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB Exec Mem WB Evey instuction must take same numbe of steps, also called pipeline stages, so some will go idle sometimes CS 61C L30 Intoduction to Pipelined Execution (13)
Review: Datapath fo MIPS PC instuction memoy d s t egistes Data memoy +4 imm 1. Instuction Fetch 2. Decode/ Registe Read 3. Execute 4. Memoy5. Wite Back Use datapath figue to epesent pipeline IFtch Dcd Exec Mem WB I$ Reg D$ Reg CS 61C L30 Intoduction to Pipelined Execution (14)
I n s t. O d e Gaphical Pipeline Repesentation (In Reg, ight half highlight ead, left half wite) Time (clock cycles) Load Add Stoe Sub O I$ Reg I$ Reg I$ CS 61C L30 Intoduction to Pipelined Execution (15) D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg Reg D$ Reg D$ Reg
Example Suppose 2 ns fo memoy access, 2 ns fo opeation, and 1 ns fo egiste file ead o wite; compute inst ate Nonpipelined Execution: lw : IF + Read Reg + + Memoy + Wite Reg = 2 + 1 + 2 + 2 + 1 = 8 ns add: IF + Read Reg + + Wite Reg = 2 + 1 + 2 + 1 = 6 ns Pipelined Execution: Max(IF,Read Reg,,Memoy,Wite Reg) = 2 ns CS 61C L30 Intoduction to Pipelined Execution (16)
Pipeline Hazad: Matching socks in late load 12 2 AM 6 PM 7 8 9 10 11 1 T a s k O d e A B C D E F 3030 303030 3030 bubble Time A depends on D; stall since folde tied up CS 61C L30 Intoduction to Pipelined Execution (17)
Administivi a Any administation? CS 61C L30 Intoduction to Pipelined Execution (18)
Poblems fo Computes Limits to pipelining: Hazads pevent next instuction fom executing duing its designated clock cycle Stuctual hazads: HW cannot suppot this combination of instuctions (single peson to fold and put clothes away) Contol hazads: Pipelining of banches & othe instuctions stall the pipeline until the hazad; bubbles in the pipeline Data hazads: Instuction depends on esult of pio instuction still in the pipeline (missing sock) CS 61C L30 Intoduction to Pipelined Execution (19)
Stuctual Hazad #1: Single Memoy (1/2) I n s t. O d e Load Inst 1 Inst 2 Inst 3 Inst 4 Time (clock cycles) I$ Reg D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg Read same memoy twice in same clock cycle CS 61C L30 Intoduction to Pipelined Execution (20)
Stuctual Hazad #1: Single Memoy (2/2) Solution: infeasible and inefficient to ceate second memoy (We ll lean about this moe next week) so simulate this by having two Level 1 Caches (a tempoay smalle [of usually most ecently used] copy of memoy) have both an L1 Instuction Cache and an L1 Data Cache need moe complex hadwae to contol when both caches miss CS 61C L30 Intoduction to Pipelined Execution (21)
I n s t. Stuctual Hazad #2: Registes (1/2) sw Inst 1 Time (clock cycles) I$ Reg D$ Reg I$ Reg D$ Reg O d e Inst 2 Inst 3 Inst 4 I$ Reg D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg Can t ead and wite to egistes simultaneously CS 61C L30 Intoduction to Pipelined Execution (22)
Stuctual Hazad #2: Registes (2/2) Fact: Registe access is VERY fast: takes less than half the time of stage Solution: intoduce convention always Wite to Registes duing fist half of each clock cycle always Read fom Registes duing second half of each clock cycle Result: can pefom Read and Wite duing same clock cycle CS 61C L30 Intoduction to Pipelined Execution (23)
Pee Instuction A. Thanks to pipelining, I have educed the time it took me to wash my shit. B. Longe pipelines ae always a win (since less wok pe stage & a faste clock). C. We can ely on compiles to help us avoid data hazads by eodeing insts. CS 61C L30 Intoduction to Pipelined Execution (24) ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT
Pee Instuction Answe A. Thoughput bette, not execution time B. longe pipelines do usually mean faste clock, but banches cause poblems! C. they happen too often & delay too long. Fowading! (e.g, Mem ) F A L S E F A L S E A. Thanks to pipelining, I have educed the time it took me to wash my shit. B. Longe pipelines ae always a win (since less wok pe stage & a faste clock). C. We can ely on compiles to help us avoid data hazads by eodeing insts. F A L S E CS 61C L30 Intoduction to Pipelined Execution (25) ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT
Things to Remembe Optimal Pipeline Each stage is executing pat of an instuction each clock cycle. One instuction finishes duing each clock cycle. On aveage, execute fa moe quickly. What makes this wok? Similaities between instuctions allow us to use same stages fo all instuctions (geneally). Each stage takes about the same amount of time as all othes: little wasted time. CS 61C L30 Intoduction to Pipelined Execution (26)