RISC Design: Pipelining Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in CP-226: Computer Architecture Lecture 10 (20 Feb 2013)
0-25 Jump Shift 4 Add RegDst opcode 26-31 left 2 CONTROL Branch 0 mux 1 MemtoReg PC Instr. mem. Combined Datapaths 0-15 21-25 16-20 11-15 Sign ext. Reg. File Shift left 2 Cont. zero MemWrite MemRead Data mem. 0 mux 1 0-5 20 Feb 2013 Computer Architecture@MNIT 2
Pipelining in a Computer Ø Divide datapath into nearly equal tasks, to be performed serially and requiring non- overlapping resources. Ø Insert registers at task boundaries in the datapath; registers pass the output data from one task as input data to the next task. Ø Synchronize tasks with a clock having a cycle Bme that just exceeds the Bme required by the longest task. Ø Break each instrucbon down into a fixed number of tasks so that instrucbons can be executed in a staggered fashion. 20 Feb 2013 Computer Architecture@MNIT 3
Single-Cycle Datapath Instruction class Instr. fetch (IF) Instr. Decode (also reg. file read) (ID) Execution ( Operation) (EX) Data access (MEM) Write Back (Reg. file write) (WB) Total time lw 1ns 1ns 8ns sw 1ns 8ns R-format add, sub, and, or, slt 1ns 1ns 8ns B-format, beq 1ns 8ns No operation on data; idle time equalizes instruction length to a fixed clock period. 20 Feb 2013 Computer Architecture@MNIT 4
Execution Time: Single-Cycle 0 2 4 6 8 10 12 14 16.. lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) IF ID EX MEM WB IF ID EX MEM WB IF ID Time (ns) EX MEM WB Clock cycle Bme = 8 ns Total Bme for execubng three lw instrucbons = 24 ns 20 Feb 2013 Computer Architecture@MNIT 5
Pipelined Datapath Instruction class Instr. fetch (IF) Instr. Decode (also reg. file read) (ID) Execution ( Operation) (EX) Data access (MEM) Write Back (Reg. file write) (WB) Total time lw 1ns 1ns 10ns sw 1ns 1ns 10ns R-format: add, sub, and, or, slt 1ns 1ns 10ns B-format: beq 1ns 1ns 10ns No operation on data; idle time inserted to equalize instruction lengths. 20 Feb 2013 Computer Architecture@MNIT 6
Execution Time: Pipeline lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) 0 2 4 6 8 10 12 14 16.. IF ID EX MEM RW IF ID EX MEM RW IF ID EX MEM RW Time (ns) Clock cycle time = 2 ns, four times faster than single-cycle clock Total time for executing three lw instructions = 14 ns Single-cycle time 24 Performance ratio = = = 1.7 Pipeline time 14 20 Feb 2013 Computer Architecture@MNIT 7
Pipeline Performance Clock cycle Bme = 2 ns 1,003 lw instruc+ons: Total Bme for execubng 1,003 lw instrucbons = 2,014 ns Single- cycle Bme 8,024 Performance rabo = = = 3.98 Pipeline Bme 2,014 10,003 lw instruc+ons: Performance rabo = 80,024 / 20,014 = 3.998 Clock cycle rabo (4) Pipeline performance approaches clock- cycle rabo for long programs. 20 Feb 2013 Computer Architecture@MNIT 8
IF: Instr. fetch Single-Cycle Datapath ID: Instr. decode, reg. file read EX: Execute, address calc. MEM: mem. access WB: write back PC 4 Add Instr. mem. 0-15 0-5 opcode 26-31 21-25 16-20 11-15 RegDst Sign ext. CONTROL RegWrite Reg. File Shift left 2 Branch Src Op Cont. zero 20 Feb 2013 Computer Architecture@MNIT 9 MemWrite MemRead Data mem. MemtoReg 0 mux 1
Pipelining of RISC Instructions Fetch Instruction Examine Opcode Fetch Operands Perform Operation Store Result IF ID EX MEM WB Instruction Instruction Execute Memory Write Fetch Decode and Operation Back Fetch operands to Reg file Although an instruc/on takes five clock cycles, one instruc/on is completed every cycle. 20 Feb 2013 Computer Architecture@MNIT 10
Pipeline Registers PC 4 Add Instr. mem. This requires a CONTROL not too different from single-cycle IF/ID ID/EX EX/MEM 0-15 0-5 opcode 26-31 21-25 16-20 11-15 RegDst Sign ext. CONTROL RegWrite Reg. File Shift left 2 Branch Src Op Cont. zero MemWrite MemRead Data mem. MemtoReg MEM/WB 0 mux 1 20 Feb 2013 Computer Architecture@MNIT 11
Pipeline Register Functions Four pipeline registers are added: Register name IF/ID ID/EX EX/MEM MEM/WB Data held PC+4, Instruction word (IW) PC+4, R1, R2, IW(0-15) sign ext., IW(11-15) PC+4, zero, Result, R2, IW(11-15) or IW(16-20) M[Result], Result, IW(11-15) or IW(16-20) 20 Feb 2013 Computer Architecture@MNIT 12
Pipelined Datapath PC 4 Add Instr mem 11-15 for R-type 16-20 for I-type lw IF/ID ID/EX EX/MEM MEM/WB opcode 26-31 21-25 16-20 Reg. File Sign ext. Shift left 2 zero Data mem. 0 mux 1 0-15 20 Feb 2013 Computer Architecture@MNIT 13
Five-Cycle Pipeline CC1 CC2 CC3 CC4 CC5 IM IF/ID ID, REG. READ ID/EX EX/MEM DM MEM/WB REG. WRITE 20 Feb 2013 Computer Architecture@MNIT 14
Add Instruction add $t0, $s1, $s2 Machine instrucbon word 000000 10001 10010 01000 00000 100000 opcode $s1 $s2 $t0 funcbon CC1 CC2 CC3 CC4 CC5 IM IF/ID ID, REG. READ ID/EX EX/MEM DM MEM/WB REG. WRITE IF ID EX MEM WB read $s1 add write $t0 read $s2 $s1+$s2 20 Feb 2013 Computer Architecture@MNIT 15
Pipelined Datapath Executing add PC 4 11-15 for R-type 16-20 for I-type lw t0 Add Instr mem IF/ID ID/EX EX/MEM MEM/WB opcode 0-15 26-31 21-25 s1 Reg. File 16-20 s2 $s2 Sign ext. Shift left 2 $s1 zero addr Data mem data 0 mux 1 20 Feb 2013 Computer Architecture@MNIT 16
Load Instruction lw $t0, 1200 ($t1) 100011 01001 01000 0000 0100 1000 0000 opcode $t1 $t0 1200 CC1 CC2 CC3 CC4 CC5 IM IF/ID ID, REG. READ ID/EX EX/MEM DM MEM/WB REG. WRITE IF ID EX MEM WB read $t1 add read write $t0 sign ext $t1+1200 M[addr] 1200 20 Feb 2013 Computer Architecture@MNIT 17
Pipelined Datapath Executing lw PC 4 Add 11-15 for R-type 16-20 for I-type lw t0 Instr mem IF/ID ID/EX EX/MEM MEM/WB opcode 0-15 26-31 21-25 16-20 t1 Reg. File Sign ext. Shift left 2 $t1 zero 1200 20 Feb 2013 Computer Architecture@MNIT 18 addr Data mem data 0 mux 1
Store Instruction sw $t0, 1200 ($t1) 101011 01001 01000 0000 0100 1000 0000 opcode $t1 $t0 1200 CC1 CC2 CC3 CC4 CC5 IM IF/ID ID, REG. READ ID/EX EX/MEM DM MEM/WB REG. WRITE IF ID EX MEM WB read $t1 add write sign ext $t1+1200 M[addr] 1200 (addr) $t0 20 Feb 2013 Computer Architecture@MNIT 19
Pipelined Datapath Executing sw PC 4 Add Instr mem 11-15 for R-type 16-20 for I-type lw IF/ID ID/EX EX/MEM MEM/WB opcode 0-15 26-31 21-25 16-20 t0 t1 Reg. File Sign ext. $t0 Shift left 2 $t1 zero addr Data mem data 0 mux 1 1200 20 Feb 2013 Computer Architecture@MNIT 20
Executing a Program Consider a five- instruc+on segment: lw $10, 20($1) sub $11, $2, $3 add $12, $3, $4 lw $13, 24($1) add $14, $5, $6 20 Feb 2013 Computer Architecture@MNIT 21
Program Execution CC1 CC2 CC3 CC4 CC5 time IM IF/ID ID, REG. READ ID/EX EX/MEM DM MEM/WB REG. WRITE lw $10, 20($1) IM IF/ID add $12, $3, $4 ID, REG. READ ID/EX IM IF/ID ID, REG. READ EX/MEM ID/EX DM MEM/WB EX/MEM REG. WRITE DM MEM/WB sub $11, $2, $3 REG. WRITE Program instructions lw $13, 24($1) IM IF/ID ID, REG. READ ID/EX EX/MEM DM MEM/WB REG. WRITE add $14, $5, $6 IM IF/ID ID, REG. READ ID/EX 20 Feb 2013 Computer Architecture@MNIT 22 EX/MEM DM MEM/WB REG. WRITE
IF: add $14, $5, $6 CC5 ID: lw $13, 24($1) EX: add $12, $3, $4 MEM: sub $11, $2, $3 WB: lw $10, 20($1) PC 4 Add Instr mem 11-15 for R-type 16-20 for I-type lw IF/ID ID/EX EX/MEM MEM/WB opcode 26-31 21-25 16-20 Reg. File Sign ext. Shift left 2 zero Data mem. 0 mux 1 0-15 20 Feb 2013 Computer Architecture@MNIT 23
Advantages of Pipeline A^er the fi^h cycle (CC5), one instrucbon is completed each cycle; CPI 1, neglecbng the inibal pipeline latency of 5 cycles. Pipeline latency is defined as the number of stages in the pipeline, or The number of clock cycles a@er which the first instruc+on is completed. The clock cycle Bme is about four Bmes shorter than that of single- cycle datapath and about the same as that of mulbcycle datapath. For mulbcycle datapath, CPI = 3.. So, pipelined execubon is faster, but... 20 Feb 2013 Computer Architecture@MNIT 24
Thank You 20 Feb 2013 Computer Architecture@MNIT 25