ECE 2300 Digital Logic & Computer Organization. More Pipelined Microprocessor

ECE 2300 Digital ogic & Computer Organization Spring 2018 ore Pipelined icroprocessor ecture 18: 1

nnouncements No instructor office hour today Rescheduled to onday pril 16, 4:00-5:30pm Prelim 2 review sessions Friday pril 13, 4:30-6:00pm, PH 219 onday pril 16, 7:00-8:30pm, PH 203 ecture 18: 2

Pipelined icroprocessor C Control Signals PCJ P C PC +2 Inst R Decoder dder D S SB DR RF D_in V C Z N F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E B E/E E/WB Key idea: Keep all resources fully utilized ecture 18: 3

Data Hazard Problem CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 DD R1,R2,R3 OR R4,R1,R3 SB R5,R2,R1 ND R6,R1,R2 DDI R7,R7,3 The OR, SB, and ND instructions are data dependent on the DD instruction ecture 18: 4

Identify all data hazards in the following instruction sequences by circling each source register that is read before the updated value is written back SB R4, R2, R3 DDI R1, R1, 1 DDI R2, R4, 1 DDI R3, R4, 1 SB R5, R3, R4 Example: Data Hazards ecture 18: 5

Review: Compiler Inserts NOPs (Solution 1) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 DD R1,R2,R3 NOP NOP NOP OR R4,R1,R3 ecture 18: 6

Review: HW Stalls the Pipeline (Solution 2) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 DD R1,R2,R3 OR R4,R1,R3 bubble bubble bubble SB R5,R2,R1 ND R6,R1,R2 DDI R7,R7,3 The pipeline is stalled for three cycles ecture 18: 7

Solution 3: HW Forwarding (Bypassing) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 DD R1,R2,R3 OR R4,R1,R3 SB R5,R2,R1 ND R6,R1,R2 DDI R7,R7,3 ecture 18: 8

Pipeline odifications for Forwarding? IF/ID ID/E E/E E/WB ecture 18: 9

Pipelined icroprocessor w/o Forwarding C Control Signals PCJ P C PC +2 Inst R Decoder dder D S SB DR RF D_in V C Z N F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E B E/E E/WB ecture 18: 10

Pipelined Processor with Forwarding PCJ P C PC +2 Inst R Decoder SE dder D S SB DR RF D_in C V C Z N Control Signals F m F 0 V C Z N Data R D_IN W IF/ID ID/E E/E E/WB B D ecture 18: 11

Decoder dder D S SB DR RF D_in Forwarding in ction C V C Z N B Control Signals F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E E/E E/WB OR R4,R1,R3 DD R1,R2,R3 ecture 18: 12

Decoder dder D S SB DR RF D_in Forwarding in ction C V C Z N B Control Signals F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E E/E E/WB SB R5,R2,R1 OR R4,R1,R3 DD R1,R2,R3 ecture 18: 13

Decoder dder D S SB DR RF D_in Forwarding in ction C V C Z N B Control Signals F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E E/E E/WB ND R6,R1,R2 SB R5,R2,R1 OR R4,R1,R3 DD R1,R2,R3 ecture 18: 14

HW Forwarding C Control Signals PCJ P C PC +2 Inst R Decoder dder D S SB DR RF D_in B F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E E/E E/WB Trade-off between performance and cost ecture 18: 15

Example: Data Hazards w/o Forwarding Identify all data hazards in the following instruction sequences by circling each source register that is read before the updated value is written back DD R1, R2, R3 OR R4, R1, R3 SB R5, R2, R1 SB R6, R1, R2 ecture 18: 16

Example: Data Hazards w/ Forwarding Identify all data hazards in the following instruction sequences by circling each source register that is read before the updated value is written back DD R1, R2, R3 OR R4, R1, R3 SB R5, R2, R1 SB R6, R1, R2 Data hazards resolved by R-type to R-type forwarding ecture 18: 17

nother Example: Data Hazards w/ Forwarding Identify all data hazards in the following instruction sequences by circling each source register that is read before the updated value is written back W R1, 0(R2) OR R4, R1, R3 SB R5, R2, R1 Data hazard not resolved by R-type to R-type forwarding ecture 18: 18

Data Hazards Caused by oad CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 W R1,0(R2) OR R4,R1,R3 SB R5,R2,R1 ND R6,R1,R2 DDI R7,R7,3 ecture 18: 19

oad Instructions and Forwarding CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 W R1,0(R2) OR R4,R1,R3 SB R5,R2,R1 ND R6,R1,R2 DDI R7,R7,3 ecture 18: 20

Solution 1: Compiler Inserts NOP Instruction CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 W R1,0(R2) NOP OR R4,R1,R3 SB R5,R2,R1 ND R6,R1,R2 ecture 18: 21

Solution 2: HW Stalls the Pipeline CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 W R1,0(R2) OR R4,R1,R3 bubble SB R5,R2,R1 bubble ND R6,R1,R2 DDI R7,R7,3 ecture 18: 22

Solution 3: Delay Slots delay slot is a location in the program where the compiler is required to insert an instruction between dependent instructions The IS defines the delay slots The compiler can fill delay slots with NOPs Even better: ove a non-dependent instruction from elsewhere in the program into the delay slot Doing so must not change the function of the program ecture 18: 23

Filling the oad Delay Slot CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 W R1,0(R2) DDI NOP R7,R7,3 OR R4,R1,R3 SB R5,R2,R1 ND R6,R1,R2 DDI R7,R7,3 ecture 18: 24

Filling the oad Delay Slot? CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 W R1,0(R2) NOP OR R4,R1,R3 SB R5,R2,R1 ND R6,R1,R2 DDI R7,R6,3 ecture 18: 25

The Problem with Branches C Control Signals PCJ P C PC +2 Inst R Decoder dder D S SB DR RF D_in B F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E E/E E/WB If the condition is met, the PC is updated at the end of E, after we ve already fetched the next two instructions ecture 18: 26

The Problem with Branches CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 BEQ R2,R3, OR R4,R1,R3 SB R5,R2,R1 : ND R6,R1,R2 DDI R7,R7,3... OR and SB are fetched before the branch condition is evaluated ecture 18: 27

Control Hazard Occurs when instructions following a branch are fetched before the branch outcome is known BEQ R2, R3, OR R4, R1, R3 SB R5, R2, R1 IF ID E WB IF ID E WB IF ID E WB : ND R6,R1,R2 What should happen If branch is not taken, next fetched instruction should be at address PC+2 (OR) If branch is taken, next fetched instruction should be at address (ND) What actually happens Instructions at PC+2 and PC+4 are fetched before branch outcome is known ecture 18: 28

Branch Delay Slot If the IS defines a branch delay slot, the instruction immediately following a branch is always executed after the branch The compiler finds an instruction to put there, or puts in a NOP The hardware must execute the instruction immediately following the branch, regardless of whether the branch is taken or not ecture 18: 29

Reducing the Branch Delay We already calculate the branch target in ID Put dedicated hardware to also evaluate the condition in ID Hence only 1 branch delay slot needed ecture 18: 30

Evaluating Branch Condition in ID C sign bit sign bit Control Signals =? PCJ P C PC +2 Inst R Decoder dder D S SB DR RF D_in B F m F 0 Data R D_IN W D SE IF/ID ID/E E/E E/WB ecture ecture 18: 31

Filling the Branch Delay Slot with a NOP CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 BEQ R2,R3, NOP OR R4,R1,R3 SB R5,R2,R1 : ND R6,R1,R2 DDI R7,R7,3... ecture 18: 32

Filling the Branch Delay Slot CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 BEQ R2,R3, DDI NOP R7,R7,3 OR R4,R1,R3 SB R5,R2,R1 : ND R6,R1,R2 DDI R7,R7,3... The DDI is always executed after the BEQ. If the BEQ is taken, executing the DDI must not cause incorrect behavior ecture 18: 33

Branch Target ddress (PC+2+OFF) C Control Signals PCJ P C PC +2 Inst R Decoder dder D S SB DR RF D_in V C Z N F m F 0 V C Z N Data R D_IN W D SE IF/ID ID/E B E/E E/WB Branch delay slot is accounted for in the branch target calculation ecture 18: 34

Before Next Class Next Time ore Pipelined icroprocessor ecture 18: 35