DAT105: Computer Architecture

Size: px

Start display at page:

Download "DAT105: Computer Architecture"

Byron Charles
5 years ago
Views:

1 Department of Computer Science & Engineering Chalmers University of Techlogy DAT05: Computer Architecture Exercise 6 (Old exam questions) By Minh Quang Do

2 Question 4a [2006/2/22] () Loop: LD F0,0(R) SD F4, 0(R) SUBI R, R, #8 BNEZ R, Loop Unroll the loop once (branch-taken assumed) Loop: LD F0,0(R) SD F4, 0(R) SUBI R, R, #8 LD F0,0(R) SD F4, 0(R) SUBI R, R, #8 BNEZ R, Loop /8/2007 2

3 Question 4a [2006/2/22] (2) Loop: LD F0,0(R) SD F4, 0(R) SUBI R, R, #8 LD F0,0(R) SD F4, 0(R) SUBI R, R, #8 BNEZ R, Loop Loop: LD F0,0(R) LD F0,8(R) re-ordering (re-scheduling) SD F4, 0(R) SD F4, 8(R) SUBI R, R, #6 BNEZ R, Loop /8/2007 3

4 Question 4a [2006/2/22] (3) T9, T0, T, T2, ============= Loop: LD F0,0(R) LD F0,8(R) SD F4, 0(R) SD F4, 8(R) SUBI R, R, #6 BNEZ R, Loop Register renaming Loop: LD F0, 0(R) LD T9, 8(R) ADDD T0, T9, F2 SD F4, 0(R) SD T0, 8(R) SUBI R, R, #6 BNEZ R, Loop /8/2007 4

5 Question 4b [2006/2/22] Execution time (original): 8 cycles/loop Execution time (modified): 9/2 = 4.5 cycles/loop Speedup = 8 / 4.5 =.78 X /8/2007 5

6 Question 4c [2006/2/22] Given: VLIW-architecture with 2 memory ports (LD/ST), integer Functional Units, 2 Addition Functional Units How many cycles to execute the code?: 8 / 2 = 4 cycles/loop Integer Funct. U Add Funct. U Add Funct. U2 LD/ST LD/ST2 Branch 2 LD F0,0(R) LD T9,8(R) 3 4 ADDD T0, T9, F2 5 SD F4, 0(R) SD T0, 8(R) 6 SUBI R, R, #6 7 BNEZ R, Loop 8 /8/2007 6

7 Question 4c [2006/2/22] Extra Discussion () Given: Unroll loop in 4 times; The same VLIW-architecture How many cycles to execute the code? Loop: LD F0,0(R) T9, T0, T, T2, T3, T4, T5 LD T9,8(R) ========================== LD T,6(R) LD T3,24(R) Loop: LD F0,0(R) ADDD T0, T9, F2 ADDD T2, T, F2 ADDD T4, T3, F2 SD F4, 0(R) SD F4, 0(R) SD T0, 8(R) SUBI R, R, #8 SD T2, 6(R) BNEZ R, Loop SD T4, 24(R) SUBI R, R, #32 BNEZ R, Loop /8/2007 7

8 Question 4c [2006/2/22] Extra Discussion (2) Integer Funct. U Add Funct. U Add Funct. U2 LD/ST LD/ST2 Branch LD F0,0(R) LD T9,8(R) 2 LD T,6(R) LD T3,24(R) 3 ADDD T0, T9, F2 4 ADDD T2, T, F2 ADDD T4, T3, F2 SD F4, 0(R) SD T0, 8(R) 5 SD T2, 6(R) SD T4, 24(R) 6 SUBI R, R, #32 7 BNEZ R, Loop 8 How many cycles to execute the code?: 8 / 4 = 2 cycles/loop /8/2007 8

9 Question 2a [2005/2/2] Given: mix: condition branch (20%); Others (80%) Prediction accuracy: for Less than : 50% (penalty=0 cycles); for other branches: 00% (CPI = cycle) CPI for the integer applications? /8/2007 9

10 Question 2a [2005/2/2] Relative occurence of the less than branches among all instructions: 0.2 x 0.35 = 0.07 Relative occurence of the mispredicted less than branches among all instructions: 0.2 x 0.35 x 0.5 = CPI for integer applications: = x 0 + ( ) x =.35 cycles For checking: CPI = 0.2 x [(0.5 x x ) x x ] x =.35 cycles /8/2007 0

11 /8/2007

12 /8/2007 2

13 Question 3 [2006/2/22] A pipeline with Tomasulo s Algorithm SUB F0, F, F2 (8 cycles) DIV F3, F0, F4 (0 cycles) ADD F4, F5, F6 (6 cycles) /8/2007 3

14 Question 3a [2006/2/22] RAW WAR True data dependence SUB F0, F, F2 (8 cycles) DIV F3, F0, F4 (0 cycles) ADD F4, F5, F6 (6 cycles) Name dependence /8/2007 4

15 Question 3b [2006/2/22] (cycle ) From Iteration Issue Execute Write Result SUB F0, F, F2 DIV F3, F0, F4 ADD F4, F5, F6 Reservation Station Name Busy Op Vj Vk Qj Qk A Add SUB Regs[F] Regs[F2] ADD Regs[F5] Regs[F6] DIV Regs[F4] Add Register Field F0 F F2 F3 F4 F5 F6 Qi Add /8/2007 5

16 Question 3b [2006/2/22] (cycle 2) From Iteration Issue Execute Write Result SUB F0, F, F2 DIV F3, F0, F4 ADD F4, F5, F6 Reservation Station Name Busy Op Vj Vk Qj Qk A Add SUB Regs[F] Regs[F2] ADD Regs[F5] Regs[F6] DIV Regs[F4] Add Register Field F0 F F2 F3 F4 F5 F6 Qi Add /8/2007 6

17 Question 3b [2006/2/22] (cycles 3 6) From Iteration Issue Execute Write Result SUB F0, F, F2 DIV F3, F0, F4 ADD F4, F5, F6 Reservation Station Name Busy Op Vj Vk Qj Qk A Add SUB Regs[F] Regs[F2] ADD Regs[F5] Regs[F6] DIV Regs[F4] Add Register Field F0 F F2 F3 F4 F5 F6 Qi Add /8/2007 7

18 Question 3b [2006/2/22] (cycle 7) From Iteration Issue Execute Write Result SUB F0, F, F2 DIV F3, F0, F4 ADD F4, F5, F6 Reservation Station Name Busy Op Vj Vk Qj Qk A Add SUB Regs[F] Regs[F2] ADD Regs[F5] Regs[F6] DIV Regs[F4] Add Register Field F0 F F2 F3 F4 F5 F6 Qi Add /8/2007 8

19 Question 3b [2006/2/22] (cycle 8) From Iteration Issue Execute Write Result SUB F0, F, F2 DIV F3, F0, F4 ADD F4, F5, F6 Reservation Station Name Busy Op Vj Vk Qj Qk A Add SUB Regs[F] Regs[F2] ADD Regs[F5] Regs[F6] DIV Regs[F4] Add Register Field F0 F F2 F3 F4 F5 F6 Qi Add /8/2007 9

20 Question 3b [2006/2/22] (cycle 9) From Iteration Issue Execute Write Result SUB F0, F, F2 DIV F3, F0, F4 ADD F4, F5, F6 Reservation Station Name Busy Op Vj Vk Qj Qk A Add SUB Regs[F] Regs[F2] ADD Regs[F5] Regs[F6] DIV Regs[F4] Add Register Field F0 F F2 F3 F4 F5 F6 Qi Add /8/

21 Question 3b [2006/2/22] (cycles 0 8) From Iteration Issue Execute Write Result SUB F0, F, F2 DIV F3, F0, F4 ADD F4, F5, F6 Reservation Station Name Busy Op Vj Vk Qj Qk A Add SUB Regs[F] Regs[F2] ADD Regs[F5] Regs[F6] DIV Regs[F4] Add Register Field F0 F F2 F3 F4 F5 F6 Qi Add /8/2007 2

22 Question 3b [2006/2/22] (cycle 9) From Iteration Issue Execute Write Result SUB F0, F, F2 DIV F3, F0, F4 ADD F4, F5, F6 Reservation Station Name Busy Op Vj Vk Qj Qk A Add SUB Regs[F] Regs[F2] ADD Regs[F5] Regs[F6] DIV Regs[F4] Add Register Field F0 F F2 F3 F4 F5 F6 Qi Add /8/

23 Question 3b [2006/2/22] (cycle 20) From Iteration Issue Execute Write Result SUB F0, F, F2 DIV F3, F0, F4 ADD F4, F5, F6 Reservation Station Name Busy Op Vj Vk Qj Qk A Add SUB Regs[F] Regs[F2] ADD Regs[F5] Regs[F6] DIV Regs[F4] Add Register Field F0 F F2 F3 F4 F5 F6 Qi Add /8/

24 Question 3c,d [2006/2/22] C). Assumed: in the instruction DIV F3, F0, F4 data stored in F4 are provided by other previous instructions that are also loaded in the reservation station Vk stores the result in an entry of reservation station so that DIV instruction would point to it It makes DIV and ADD independent instructions from each other, and eliminating the WAR hazard involving F4 D). Value of F0 is provided by the first entry (add) of the reservation station (See previous slides for more details) /8/

Tomasulo s Algorithm. Tomasulo s Algorithm

Tomasulo s Algorithm. Tomasulo s Algorithm Tomasulo s Algorithm Load and store buffers Contain data and addresses, act like reservation stations Branch Prediction Top-level design: 56 Tomasulo s Algorithm Three Steps: Issue Get next instruction