Computer Elements and Datapath. Microarchitecture Implementation of an ISA

Size: px

Start display at page:

Download "Computer Elements and Datapath. Microarchitecture Implementation of an ISA"

Dinah Robertson
5 years ago
Views:

1 6.823, L5--1 Computer Elements and atapath Laboratory for Computer Science M.I.T. status lines Microarchitecture Implementation of an ISA ler control points 6.823, L5--2 ata path Structure: How components are connected. Static Sequencing: How data moves beten components ynamic Page 1

2 6.823, L5--3 An ISA can be implemented by many dierent microarchitectures Components: MUXs and wires s flip flops & register files memories atapath: interconnection of components control and status lines number of busses Harvard vs Princeton style memory organiation pipelined vs non-pipelined ler: clocks & timing, sequencer, ROMs, PLAs, FSMs hardwired vs microprogrammed pipelined vs non-pipelined Combinational Circuits: Multiplexers 6.823, L5--4 Sel Sel I 0 I 1 I n. Mux O I e- Mux. O 0 O 1 O n Combinational circuits have Fixed mapping from input to output (no state) propagation delay Page 2

3 Combinational Circuits: 6.823, L5--5 ect OperandA Result NCVZ OperandB ect selects function: Signed, Unsigned Arithmetic: A, SUB, AU,... Logical: AN, OR, NOT,... Shift: SL, SRA, SRL,... Signed, Unsigned Compare: GT, LT, E Edge-Triggered Flip-flops 6.823, L5--6 C C Metastability ata is sampled at the rising edge Page 3

4 Flip-flops with Write Enables 6.823, L5--7 EN C EN C dangerous! C EN EN C 0 1 Registers 6.823, L5--8 En C Register: A group of flip-flops with a common clock and enable Register file: A group of registers with a common clock, input and output port(s) Page 4

5 Register Files 6.823, L5--9 Clock WE ReadSel1 ReadSel2 WriteSel Writeata Register file 2R + 1W Readata1 Readata2 WSel C Wata RSel1 WE register 0 RSel2 Rata1 register 1 Rata2 No timing issues in reading a selected register Register Files and Ports WE 6.823, L5--10 ReadSel1 ReadSel2 WriteSel Writeata ReadSel R/WSel Register file 2R + 1W WE Register file 1R + 1R/W Readata1 Readata2 Readata R/Wata Ports re expensive multiplex a port for read & write In 1996, GPR File has up to 64 registers and 12 ports!!! Page 5

6 A Simple Model 6.823, L5--11 Clock WriteEnable ress Writeata MAGIC RAM Readata Reads and writes are always completed in one cycle a Read can be done any time (i.e. combinational) a Write is performed at the rising clock edge if it is enabled the write ess and data must be stable at the clock edge Hierarchy , L5--12 Proc On-chip Cache I$ $ O-chip Cache SRAM Interleaved Banks of RAM Hard isk 5~10ns 1 32~64KB < 20ns 5~10 128K~1MB ~60ns 20~50 32M~1GB ~10ms seek time ~ K~16+GB Our memory model is a good approximation of the hierarchical memory system when hit in the on-chip cache Page 6

7 6.823, L5--13 A atapath to Implement LX The LX ISA Processor State bit, R0 always contains a 0 32 single precision FPRs, may also be vied as 16 double precision FPRs FP status register, used for FP compares & exceptions, the program counter some other special registers ata types 8-bit byte, 2-byte half word 32-bit word for integers 32-bit word for single precision floating point 64-bit word for double precision floating point Load/Store style ruction set data essing modes- immediate & indexed branch essing modes- relative & register indirect Byte essable memory- big endian mode All ructions are 32 bits 6.823, L5--14 Page 7

8 Instruction Execution 6.823, L5--15 Execution of an ruction involves 1. ruction fetch 2. decode and register fetch 3. operation 4. memory operation (optional) 5. write back and the computation of the ess of the next ruction 6.823, L5--16 atapath: Reg-Reg Instructions worksheet rf1 rf2 rf3 0 func rf3 (rf1) func (rf2) Page 8

9 6.823, L5--17 atapath: Reg-Reg Instructions <25:21> <20:16> <15:11> <5:0> OpCode Timing? rf1 rf2 rf3 0 func rf3 (rf1) func (rf2) , L5--18 atapath: Reg- Instructions worksheet OpCode opcode rf1 rf2 immediate rf2 (rf1) op immediate Page 9

10 6.823, L5--19 atapath: Reg- Instructions <25:21> <20:16> <15:0> <31:26> OpCode Sel opcode rf1 rf2 immediate rf2 (rf1) op immediate , L5--20 Conflicts in Merging atapaths <25:21> <20:16> <20:16> <15:11> <15:0> <31:26> <5:0> OpCode Sel rf1 rf2 rf3 0 func rf3 (rf1) func (rf2) opcode rf1 rf2 immediate rf2 (rf1) op immediate Page 10

11 atapath for Instructions 6.823, L5--21 <25:21> <20:16> <15:11> <15:0> <31:26> <5:0> OpCode Regst rf2 / rf3 Sel BSrc Reg / rf1 rf2 rf3 0 func rf3 (rf1) func (rf2) opcode rf1 rf2 immediate rf2 (rf1) op immediate LX Load/Store Instructions opcode rf1 rf2 displacement base Load/store byte, halfword, word to/from GPR: LB, LBU, SB, LH, LHU, SH, LW, SW byte and half-word can be sign or ero extended Load/store single and double FP to/from FPR: LF, L, SF, S Byte essable machine access must be data aligned A single essing mode (base) + displacement Big endian byte ordering , L5--22 Page 11

12 6.823, L5--23 atapath for Instructions Should program and data memory be separate? Harvard style: separate (Aiken and Mark 1 influence) - read-only program memory - read/write data memory at some level the two memories have to be the same Princeton style: the same (von Neumann s influence) - A Load or Store ruction requires accessing the memory more than once during its execution Load/Store Instructions: Harvard atapath 6.823, L5--24 base disp MemWrite rdata ata wdata WBSrc / Mem OpCode Regst Sel essing mode opcode rf1 rf2 displacement (rf1) + displacement BSrc rf1 is the base register rf2 is the destination of a Load or the source for a Store Page 12

13 LX Instructions Conditional branch on GPR opcode rf1 oset from ()+4 BEZ, BNEZ 6.823, L5--25 Unconditional register-indirect jumps opcode rf1 JR, JALR Unconditional -relative jumps 6 26 opcode oset from ()+4 J, JAL -oset are specified in bytes All Transfers are delayed by 1 ruction jump-&-link stores ()+8 into the link register (R31) Conditional Branches (no delay slots) Src ( ~j / j ) MemWrite 6.823, L5--26 WBSrc rdata ata wdata OpCode Regst Sel BSrc ero? Page 13

14 6.823, L5--27 Register-Indirect Jumps(no delay slots) Src ( ~j / j RInd / j R ) MemWrite WBSrc Jump & Link? rdata ata wdata OpCode Regst Sel BSrc ero? Jump & Link (no delay slot) 6.823, L5--28 Src MemWrite WBSrc / Mem / 31 rdata ata wdata OpCode Regst Sel rf3 / rf2 / R31 BSrc ero? Page 14

15 6.823, L Relative Jumps (no delay slot) Src MemWrite WBSrc No new datapath required 31 rdata ata wdata OpCode Regst Sel 16 / 26 BSrc ero? elay Slot Complications 6.823, L5--30 signal and the next- computed by the current ruction aects the ruction after next Registers to hold temporary values delay dly next- Invisible processor state? Page 15

16 Src1 ( j / ~j ) Instructions (delay slot) Src2 ( R / RInd) MemWrite 6.823, L5--31 WBSrc delay dly 31 rdata ata wdata OpCode Regst Sel BSrc ero? The next few lectures 6.823, L5--32 We will study the following microarchitectures for LX: A simple single-cycle Harvard-stayle implementation A simple two-cycle Princeton-style implementation A simple five-stage pipelined implementation Complex pipelined implementations - out-of-order ruction completion - out-of-order ruction issue and register renaming - speculative execution and branch prediction - superscaler execution Page 16

Asanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.

Asanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T. Pipeline Hazards Krste Asanovic Laboratory for Computer Science M.I.T. Pipelined DLX Datapath without interlocks and jumps 31 0x4 RegDst RegWrite inst Inst rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext A B OpSel