Computer Hardware Pipeline
Conventional Datapath 2.4 ns is required to perform a single operation (i.e. 416.7 MHz). Register file MUX B 0.6 ns Clock 0.6 ns 0.2 ns Function unit 0.8 ns MUX D 0.2 ns c.
Production Line Analogy Automated car wash: Cars are pulled through a series of stations at which a particular step if performed: 1. Wash 2. Rinse 3. Dry Think of latency time = time needed to wash, rinse and dry. Think of rate of delivery of washed cars or throughput Based on this analogy à pipelined datapaths with n- stages have a processing rate or throughput for instructions that is n times that of non-pipelined datapaths.
Pipelined Datapath 3 Clock 0.6 ns A Pipelined Datapath is done by breaking a conventional datapath into parts by inserting registers as pipeline platforms between these parts These registers provide temporary storage for data passing through the pipeline Data moves synchronously with the clock Delay of operand fetch (OF) is 0.8 ns, delay of execution (EX) is 1.0 ns, delay of write-back (WB) is 1.0 ns min clock period = 1.0 ns Operating frequency= 1.0 Ghz MHz (2.4 times that of the non-pipelined.) Even though there are 3 stages, the improvement factor is not quite 3. Why? WB OF 1 OF EX 2 EX WB 3 Register file MUX B Function unit MUX D 0.6 ns 0.2 ns 0.2 ns 0.8 ns 0.2 ns 0.2 ns (b) Pipelined
Pipelined Datapath OF AA Register file A data B data BA OF consists of reading register values (A&B), or selecting constant value (MB). The pipeline platform stores the operand(s) to be used in EX during next clock cycle In EX a function unit operation occurs, and the results captured by the 2nd pipeline platform WB is the write-back stage: the result is saved from the EX stage or the value on Data in (selected by MUX D). Constant in 1 Operand Fetch (OF) OF EX 2 FS Execute (EX) V C N Z EX WB 3 Write-back (WB) MD WB RW DA FUNDAMENTALS,4e A MUX B Function unit F B 0 1 MUX D D data Register file (same as above) MB Address out Data out Data in
Pipelined Execution Pattern Clock cycle 1 2 3 4 5 6 7 8 9 R1 R2 R3 1 R4 sl R6 2 R7 R7 1 R1 R0 2 3 4 Data out R3 5 R4 Data in 6 R5 0 7 Microoperation What is total time required by conventional datapath for execution? à 7 (microoperation) 2.4 (ns) = 16.8 ns What is total time required by pipelined datapath for execution? à (9 cycles) 1 = 9 ns
Pipelined Execution Pattern Clock cycle 1 2 3 4 5 6 7 8 9 R1 R2 R3 1 R4 sl R6 2 R7 R7 1 R1 R0 2 3 4 Data out R3 5 R4 Data in 6 R5 0 7 Microoperation Maximum improvement of pipelined over conventional can be obtained when the pipeline if fully utilized (all stages are active) e.g. over the 5 clock cycles, 3 to 7 (the pipeline is full), 5 operations are completed in 5 ns. While in the same time the conventional can execute 5ns 2.6 ns/ microoperation = 2.083 microoperations à the pipelined executes 5 2.083 = 2.4 times as many microoperations as conventional
Pipelined Computer PC Registers are added to the pipeline platforms to pass the instruction information through the pipeline. Stage 1 DOF Stage 2 Address Instruction memory Instruction IR Instruction decoder AA Zero fill Register file A data B data MUX B MB BA DOF EX AA BAMB FS MW Data A Data B Address out Stage 3 EX WB FS C V N Z 4 Data F A Function unit F B Data in Data memory Data out Address Data out MW Data I Data in Address Stage 4 WB ALS,4e DA MD RW RW DA MD CONTROL DATAPATH MUX D D data Register file (same as above) Data memory (same as above)
Performance of Pipelined Computer 1 Clock cycle 1 2 3 4 5 6 7 8 9 10 D 2 D 3 D 4 D 5 D 6 D 7 D Instruction Compare the performance of the single-cycle computer with the performance of the pipelined computer (Compare for the situation in which the pipeline is fully utilized.) 4 instructions versus 20ns/17ns/inst. or 1.18 instructions Throughput Pipelined = 3.4x Single-Cycle
Performance Issues If a pipeline has 4 stages performance is improved 4 times! Why? Pipelining Hazards cause the pipe to stall because of some conflict in the pipe (prevents the next instruction in pipe from executing in its turn) Types of hazards Structural: contention for same hardware resource Data: dependency on earlier instruction for the correct sequencing of register reads and writes Control: branch/jump instructions stall the pipe until get correct target address into PC
Reduction in Throughput Filling and flushing of the pipeline reduces the throughput executed below the maximum level. Data and control hazards are timing problems that arise because the execution of an operation in a pipeline is delayed by one or more clock cycles from the time at which the instruction containing the operation was fetched.
Data Hazard Problem
A hardware-based solution MOVA R1, R5 (ADD R2, R1, R6) ADD R2, R1, R6 ADD R3, R1, R2 R1 R5 R2 R1 R6 D DOF R2 data hazard detected, pipeline stalled, and bubble launched. R1 write and reads 1 2 3 4 5 6 7 R2 R1 R6 D (ADD R3, R1, R2) R3 R1 R2 DOF R1 data hazard detected pipeline stalled, and bubble launched R3 R1 R2 R2 Write and read D 8
Control Hazards
Control Hazards R1 = 0 evaluated 1 BZ R1, 18 2 MOVA R2 R3 3 MOVA R1 R2 20 MOVA R5 R6 PC set to 20 1 2 3 4 5 6 7 D No change D No change DOF WB D Branch detected and bubbles launched Instruction MOV R5, R6 fetched from target address