Digital Integrated CircuitDesign Lecture 13 Building Blocks (Multipliers) Register Adder Shift Register Adib Abrishamifar EE Department IUST
Acknowledgement This lecture note has been summarized and categorized from lecture note on Introduction to VLSI Design and VLSI Circuit Design all over the world. I can t remember where those slide come from. However, I d like to thank all professors who create such a good work on those lecture notes. Without those lectures, this slide can t be finished. 2/76
Contents Outline Introduction Signed Integers Review -Multiplication Multipliers Sequential (Serial) Multiplier Array Multipliers Combinational Multiplier Booth Multiplier Pipelined Multiplier Wallace Tree Multiplier Carry Save Multiplier Serial-Parallel Multiplier Summary 3/76
Outline Building Blocks for Digital Architectures Arithmetic unit Bit-sliced datapath (Adders, Multipliers, Shifters, Comparators, etc.) Memory RAM, ROM, Buffers, Shift registers Control Finite state machine (PLA, random logic) Counters Interconnect Switches Arbiters Bus 4/76
Contents Outline Introduction Signed Integers Review -Multiplication Multipliers Sequential (Serial) Multiplier Array Multipliers Combinational Multiplier Booth Multiplier Pipelined Multiplier Wallace Tree Multiplier Carry Save Multiplier Serial-Parallel Multiplier Summary 5/76
Introduction Multipliers are used in a lot of DSP applications Vector product, matrix multiplication Convolution Filtering (tap filters, FIR, )... At least one good reason for studying multiplication and division is that there is an infinite number of ways of performing these operations 6/76
Contents Outline Introduction Signed Integers Review -Multiplication Multipliers Sequential (Serial) Multiplier Array Multipliers Combinational Multiplier Booth Multiplier Pipelined Multiplier Wallace Tree Multiplier Carry Save Multiplier Serial-Parallel Multiplier Summary 7/76
Signed Integers What Not to Do Use fixed length binary representation Use left-most bit (called most significant bit or MSB) for sign: Example: 0 for positive 1 for negative +18 ten = 00010010 two 18 ten = 10010010 two 8/76
Signed Integers Why Not to Use Sign Bit Sign and magnitude bits should be differently treated in arithmetic operations Addition and subtraction require different logic circuits Overflow is difficult to detect Zero has two representations: + 0 ten = 00000000 two 0 ten = 10000000 two Signed-integers are not used in modern computers 9/76
Signed Integers Integers With Sign Other Ways Use fixed-length representation, but no sign bit 1 s complement: To form a negative number, complement each bit in the given number 2 s complement: To form a negative number, start with the given number, subtract one, and then complement each bit, or first complement each bit, and then add 1 2 s complement is the preferred representation 10/76
Signed Integers 2 s-complement Why not 1 s-complement? Don t like two zeros Add 1 to 1 s-complement representation Some properties: Only one representation for 0 Exactly as many positive numbers as negative numbers Slight asymmetry there is one negative number with no positive counterpart 11/76
Signed Integers Three Systems 1111 7 0000 0 0010 2 1111 0 0000 0 1111 10000 0000 0 5 2 7 0101 6 6 5 7 1010 7 0110 0 1010 0111 7 8 1010 1000 1000 1000 1010 = 2 1010 = 5 1010 = 6 Signed integers 12/76 1 s complement integers 2 s complement integers
Signed Integers Three Representations 2 s complement Sign-magnitude 000 = +0 001 = +1 010 = +2 011 = +3 100 = - 0 101 = - 1 110 = - 2 111 = - 3 1 s complement 000 = +0 001 = +1 010 = +2 011 = +3 100 = - 3 101 = - 2 110 = - 1 111 = - 0 13/76 000 = +0 001 = +1 010 = +2 011 = +3 100 = - 4 101 = - 3 110 = - 2 111 = - 1 (Preferred)
Signed Integers 2 s Complement n-bit Numbers Range: 2 n 1 through 2 n 1 1 Unique zero: 00000000..... 0 Expansion of bit length: stretch the left-most bit all the way, e.g., 11111101 is still 3. Overflow rule: If two numbers with the same sign bit (both positive or both negative) are added, the overflow occurs if and only if the result has the opposite sign 14/76
Signed Integers 2 s-compliment to Decimal Conversion n-2 a n-1 a n-2... a 1 a 0 = -2 n-1 a n-1 + Σ 2 i a i i=0 8-bit conversion box -128 64 32 16 8 4 2 1 Example -128 64 32 16 8 4 2 1 1 1 1 1 1 1 0 1 15/76-128+64+32+16+8+4+1 = -128 + 125 = -3
Contents Outline Introduction Signed Integers Review -Multiplication Multipliers Sequential (Serial) Multiplier Array Multipliers Combinational Multiplier Booth Multiplier Pipelined Multiplier Wallace Tree Multiplier Carry Save Multiplier Serial-Parallel Multiplier Summary 16/76
Review - Multiplication Basic algorithm analogous to decimal multiplication Break multiplier into digits Multiply one digit at a time; shift multiplicand to form partial products Create product as sum of partial products Multiplicand 0110 (6) Multiplier X 0011 (3) 0110 0110 0000 0000 Partial Products Product 00010010 (18) n bit multiplicand X m bit multiplier = (n+m) bit product 17/76
Review - Multiplication 2 s complement Multiplier positive, Multiplicand +/- : Sign extend the partial products when adding up Example: 0101 +5x 0011 +3 0101 0101 0000 0000 0001111 +15 18/76 1011-5x 0011 +3 1 1 1 1011 1 1 1011 0 0000 0000 1110001-15
Review - Multiplication 2 s complement (cont.) Mplier negative, Mcand +/- : convert negative Mplier to positive, do the multiplication, negate the result Example: 1011-5x 1101-3 1011-5x 0011 +3 1 1 1 1011 1 1 1011 0 0000 0000 1110001-15 19/76 0001111 +15
Review - Multiplication Example 0010 two 0011 two = 0110 two, i.e., 2 ten 3 ten = 6 ten Iteration Step Multiplicand Product 0 Initial values 0010 0000 0011 1 LSB=1 => Prod=Prod+Mcand 0010 0010 0011 Right shift product 0010 0001 0001 2 LSB=1 => Prod=Prod+Mcand 0010 0011 0001 Right shift product 0010 0001 1000 3 LSB=0 => no operation 0010 0001 1000 Right shift product 0010 0000 1100 4 LSB=0 => no operation 0010 0000 1100 Right shift product 0010 0000 0110 20/76
Review - Multiplication Example 1010 two 0011 two = 101110 two, i.e., -6 ten 3 ten = -18 ten Iteration Step Multiplicand Product 0 Initial values 11010 00000 0011 1 LSB=1 => Prod=Prod+Mcand 11010 11010 0011 Right shift product 11010 11101 0001 2 LSB=1 => Prod=Prod+Mcand 11010 10111 0001 Right shift product 11010 11011 1000 3 LSB=0 => no operation 11010 11011 1000 Right shift product 11010 11101 1100 4 LSB=0 => no operation 11010 11101 1100 Right shift product 11010 11110 1110 21/76
Review - Multiplication Example 1010 two 1011 two = 011110 two, i.e., -6 ten (-5 ten ) = 30 ten Iteration Step Multiplicand Product 0 Initial values 11010 00000 1011 1 LSB=1 => Prod=Prod+Mcand 11010 11010 1011 Right shift product 11010 11101 0101 2 LSB=1 => Prod=Prod+Mcand 11010 10111 0101 Right shift product 11010 11011 1010 3 LSB=0 => no operation 11010 11011 1010 Right shift product 11010 11101 1101 4 LSB=1 => Prod=Prod Mcand* 00110 00011 1101 Right shift product 11010 00001 1110 *Last iteration with a negative multiplier in 2 s 2 s complement 22/76
Contents Outline Introduction Signed Integers Review -Multiplication Multipliers Sequential (Serial) Multiplier Array Multipliers Combinational Multiplier Booth Multiplier Pipelined Multiplier Wallace Tree Multiplier Carry Save Multiplier Serial-Parallel Multiplier Summary 23/76
Multipliers There are many different circuits for multiplication Each one has a different balance between speed (performance) and amount of logic (cost) 24/76
Contents Outline Introduction Signed Integers Review -Multiplication Multipliers Sequential (Serial) Multiplier Array Multipliers Combinational Multiplier Booth Multiplier Pipelined Multiplier Wallace Tree Multiplier Carry Save Multiplier Serial-Parallel Multiplier Summary 25/76
Sequential Multiplier Compute the sums as a sequence of separate steps Main benefit: only requires one 2:1 adder Much lower hardware cost for large n Use a register to store the partial products Use two registers to store the multiplier and multiplicand Requires a state machine (with the corresponding control logic) to control the sequence of additions used 26/76
Sequential Multiplier Shift register Originally holds multiplicand Shifts it left for each partial product One bit of multiplier at a time presented to the AND gates Initialized w/mcand, shifts it left 2N bits Shift Register Adder Register 0 One bit of mplier applied each cycle 27/76
Sequential Multiplier Resource Requirements Adder: 2N-bit Registers: 2N-bit wide A state machine Register Adder Shift Register 28/76
Sequential Multiplier Better design: Shift result register to right Uses N AND gates Uses N-bit adder Register Adder Shift Register 29/76
Contents Outline Introduction Signed Integers Review -Multiplication Multipliers Sequential (Serial) Multiplier Array Multipliers Combinational Multiplier Booth Multiplier Pipelined Multiplier Wallace Tree Multiplier Carry Save Multiplier Serial-Parallel Multiplier Summary 30/76
Array Multipliers Adding Partial Products y3 y2 y1 y0 multiplicand x3 x2 x1 x0 multiplier x0y3 x0y2 x0y1 x0y0 four carry x1y3 x1y2 x1y1 x1y0 partial carry x2y3 x2y2 x2y1 x2y0 products carry x3y3 x3y2 x3y1 x3y0 to be summed p7 p6 p5 p4 p3 p2 p1 p0 Requires three 4-bit additions. Slow. 31/76
Array Multipliers Carry Forward y3 y2 y1 y0 multiplicand x3 x2 x1 x0 multiplier x0y3 x0y2 x0y1 x0y0 four x1y3 x1y2 x1y1 x1y0 partial x2y3 x2y2 x2y1 x2y0 products x3y3 x3y2 x3y1 x3y0 to be summed p7 p6 p5 p4 p3 p2 p1 p0 Note: Carry is added to the next partial product (carry-save addition). Adding the carry from the final stage needs an extra (ripple-carry stage. These additions are faster but we need four stages. 32/76
Array Multipliers Structure x0 y3 y2 y1 y0 ppk yj FA xi ci x2 x3 0 Critical path 0 x1 0 0 0 0 0 0 co ppk+1 FA FA FA FA 0 p7 p6 p5 p4 p3 p2 p1 p0 33/76
Contents Outline Introduction Signed Integers Review -Multiplication Multipliers Sequential (Serial) Multiplier Array Multipliers Combinational Multiplier Booth Multiplier Pipelined Multiplier Wallace Tree Multiplier Carry Save Multiplier Serial-Parallel Multiplier Summary 34/76
Combinational Multiplier Also referred to as a parallel multiplier Implement multiplication using a 2-dimensional array of 1-bit full adders Basic structure is the same as the addition array Can also be implemented using a linear array of CSA Each partial product P i = 2 i a i B At first CSA level, add three partial products At all subsequent CSA levels, add one more partial product A CLA (or other type of 2:1, 2-input/1-output, adder) is required after the final CSA level 35/76
Combinational Multiplier Idea Use an array of AND gates to generate the partial products in parallel LSB 1 multiplier 1 multiplicand 1 1 0 1 1 0 LSB 0 1 36/76 1 1 0 0 0 0 0 0 1 0
Combinational Multiplier Adding PProds X 3 X 2 X 1 X 0 Y 0 X 3 X 2 X 1 X 0 Y 1 Z 0 HA FA FA HA X 3 X 2 X 1 X 0 Y 2 Z 1 FA FA FA HA X 3 X 2 X 1 X 0 Y 3 Z 2 FA FA FA HA Z 7 Z 6 37/76 Z 5 Z 4 Z 3
Combinational Multiplier Critical Path A lot of critical paths, same delay (AND gates not shown) MxN Multiplier M N HA FA FA HA FA FA FA HA FA FA FA HA Delay=(M+N-2)t carry +(N-1)t sum +t AND Critical Path 1 Critical Path 2 38/76
Combinational Multiplier MxN Critical Paths HA FA FA HA FA FA FA HA Critical Path 1 Critical Path 2 Critical Path 1 & 2 FA FA FA HA ( 1) ( 2) ( 1) ( 1) t = M + N t + N t + N t mult carry sum and 39/76
Combinational Multiplier Better floorplan for compact layout Send partial product diagonally Results in better area AND gates and hence the first row not shown HA FA FA HA FA FA FA HA FA FA FA HA 40/76
Contents Outline Introduction Signed Integers Review -Multiplication Multipliers Sequential (Serial) Multiplier Array Multipliers Combinational Multiplier Booth Multiplier Pipelined Multiplier Wallace Tree Multiplier Carry Save Multiplier Serial-Parallel Multiplier Summary 41/76
Booth Multiplier Originally proposed to reduce addition steps Bonus: works for two s complement numbers Encoding scheme to reduce number of stages in multiplication Performs two bits of multiplication at once requires half the stages Each stage is slightly more complex than simple multiplier, but adder/subtracter is almost as small/fast as adder 42/76
Booth Multiplier There are multiple ways to create a product Example: multiply 2 ten by 6 ten (0010 two X 0110 two ) Product = (2 X 2) + (2 X 4) OR Product = (2 X -2) + (2 X 8) Idea Recode each 1 in multiplier as +2-1 Converts sequences of 1 to 10 0(-1) Might reduce the number of 1 s 43/76
Booth Multiplier Example 0 0 1 1 1 1 1 1 0 0 +1-1 +1-1 +1-1 +1-1 +1-1 +1-1 0 1 0 0 0 0 0-1 0 0 44/76
Booth Multiplier Example 0 0 1 1 0 6x 0 1 1 1 0 14 +1 0 0-1 0 0 0 0 0 0 1 1 1 1 1 0 1 0 (-6) 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 1 0 0 84 Sign extension 45/76
Booth Multiplier Booth encoding Two s-complement form of multiplier y = -2 n y n + 2 n-1 y n-1 + 2 n-2 y n-2 +... Rewrite using 2 a = 2 a+1-2 a y = -2 n (y n -y n-1 ) + 2 n-1 (y n-2 -y n-1 ) + 2 n-2 (y n-3 -y n-2 ) +... Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product 46/76
Booth Multiplier Booth actions y i y i-1 y i-2 increment 0 0 0 0 0 0 1 x 0 1 0 x 0 1 1 2x 1 0 0-2x 1 0 1 -x 1 1 0 -x 1 1 1 0 47/76
Booth Multiplier Booth example x = 011001 (25 10 ), y = 101110 (-18 10 ) y 1 y 0 y -1 = 100, P 1 = P 0 -(10 011001) = 11111001110 Y 3 y 2 y 1 = 111, P 2 = P 1 + 0 = 11111001110 Y 5 y 4 y 3 = 101, P 3 = P 2-0110010000 = 11000111110 48/76
Booth Multiplier Question: How do we know when to subtract? When do we know when to add? Answer: look for runs of 1s in multiplier Example: 001110011 Working from Right to Left, any run of 1 s is equal to: -value of first digit that s one +value of first digit that s zero Example : 001110011 First run: -1 + 4 = 3 Second run: -16 + 128 = 112 Total: 112 + 3 = 115 49/76
Booth Multiplier Scan multiplier bits from right to left Recognize the beginning and in of a run looking at only 2 bits at a time 0 1 1 0 0 1 1 1 0 0 Current bit a i Bit to right of current bit a i-1 End Of Run Middle Of Run Beginning Of Run Bit a i Bit a i-1 Explanation 1 0 Begin Run of 1 s 1 1 Middle of Run of 1 s 0 1 End of Run 0 0 Middle of Run of 0 s 50/76
Booth Multiplier Key idea: test 2 bits of multiplier at once 10 - subtract (beginning of run of 1 s) 01 - add (end of run of 1 s) 00, 11 - do nothing (middle of run of 0 s or 1 s) Multiplicand (32 bits) 32-bit ALU ADD/ SUB Shift Left LHPRODProduct MP/RHPROD (32 bits) (64 bits) (32 bits) Write Bits 1:0 2 Control 51/76
Booth Multiplier Booth Structure 52/76
Booth Multiplier Advantages and Disadvantages Depends on the architecture Potential advantage: might reduce the # of 1 s in multiplier In the multipliers that we have seen so far Doesn t save in speed (still have to wait for the critical path, e.g., the shift-add delay in sequential multiplier) Increases area: recoding circuitry AND subtraction 53/76
Contents Outline Introduction Signed Integers Review -Multiplication Multipliers Sequential (Serial) Multiplier Array Multipliers Combinational Multiplier Booth Multiplier Pipelined Multiplier Wallace Tree Multiplier Carry Save Multiplier Serial-Parallel Multiplier Summary 54/76
Pipelined Multipliers Insert registers (latches) between rows Insert registers for bits of multiplier Schedule MSB bits to arrive later HA FA FA HA FA FA FA HA FA FA FA HA 55/76
Pipelined Multipliers Example Sum/ carry path Latch a 4 a 3 a 2 a 1 a0 x 0 x 1 x 2 x 3 x 4 FA with AND gate and latches (for a i, intermediate sum and carry) FA 56/76 p 8 p 7 p 6 p 9 p 5 p 4 p 3 p 2 p 1 p 0
Contents Outline Introduction Signed Integers Review -Multiplication Multipliers Sequential (Serial) Multiplier Array Multipliers Combinational Multiplier Booth Multiplier Pipelined Multiplier Wallace Tree Multiplier Carry Save Multiplier Serial-Parallel Multiplier Summary 57/76
Wallace Tree Multiplier Idea: divide & conquer Why add the k numbers one by one? Tree structure logarithmic.............................................................................................................................. 58/76
Wallace Tree Multiplier Example Delay = 4 CSA + 1 CLA 59/76
Wallace Tree Multiplier For 7 k-bit [0,k-1][0,k-1][0,k-1] K-bit CSA [1,k] [0,k-1] K-bit CSA [2,k+1] 0,[2,k] [k+1] [2,k+1] [0,k-1][0,k-1][0,k-1] K-bit CSA [1,k] [0,k-1] [1,k] [1,k] K-bit CSA K-bit CPA K-bit CSA [0,k-1] [1,k-1], 0 [1,k+1] [2,k+1] [0,k-1] 60/76 [k+2] [2,k+1] [1] [0]
Wallace Tree Multiplier At each step, # of operands reduces to 2/3 n k-bit numbers CSA CSA CSA CSA CSA CSA CSA CSA CSA (2/3) n CSA CSA CSA CSA CSA nums (2/3) 2 n CSA CSA CSA CSA... CSA (2/3) h n = 2 CSA h levels 61/76
Wallace Tree Multiplier Delay depends on height h h = O ( log n ) Logarithmic delay Max # N of k-bit numbers that can be added using a Wallace tree of height h h N h N h N 0 2 7 28 14 474 1 3 8 42 15 711 2 4 9 63 16 1066 3 6 10 94 17 1599 4 9 11 141 18 2398 5 13 12 211 19 3597 6 19 13 316 20 5395 62/76
Wallace Tree Multiplier Reduces depth of adder chain Built from carry-save adders: Three inputs a, b, c Produces two outputs y, z such that y + z = a + b + c Carry-save equations: y i = parity(a i,b i,c i ) z i = majority(a i,b i,c i ) 63/76
Wallace Tree Multiplier Wallace Tree Multiplier Partial products First stage 6 5 4 3 2 1 0 6 5 4 3 2 1 0 Bit position (a) (b) Second stage Final adder 6 5 4 3 2 1 0 6 5 4 3 2 1 0 FA 64/76 (c) HA (d)
Wallace Tree Multiplier Wallace Tree Multiplier Partial products x 3 y 3 x 3 y 2 x 2 y 2 x 3 y 1 x 1 y 2 x 3 y 0 x 1 y 1 x 2 y 0 x 0 y 1 x 2 y 3 x1 y 3 x 0 y 3 x 2 y 1 x 0 y 2 x 1 y 0 x 0 y 0 First stage HA HA Second stage FA FA FA FA Final adder z 7 z 6 z 5 z 4 z 3 z 2 z 1 z 0 65/76
Wallace Tree Multiplier At each stage, i numbers are combined to form ceil(2i/3) sums Final adder completes the summation Wiring is more complex Can build a Booth-encoded Wallace tree multiplier 66/76
Contents Outline Introduction Signed Integers Review -Multiplication Multipliers Sequential (Serial) Multiplier Array Multipliers Combinational Multiplier Booth Multiplier Pipelined Multiplier Wallace Tree Multiplier Carry Save Multiplier Serial-Parallel Multiplier Summary 67/76
Carry Save Multiplier Speeding up multiplication is a matter of speeding up the summing of the partial products Carry-save addition can help Carry-save addition passes (saves) the carries to the output, rather than propagating them In general, carry-save addition takes in 3 numbers and produces 2 Whereas, carry-propagate takes 2 and produces 1 With this technique, we can avoid carry propagation until final addition 68/76
Carry Save Multiplier Sum three numbers, 3 10 = 0011, 2 10 = 0010, 3 10 = 0011 3 10 0011 + 2 10 0010 c 0100 = 4 10 s 0001 = 1 10 carry-save add 3 10 0011 carry-save add carry-propagate add c 0010 = 2 10 s 0110 = 6 10 1000 = 8 10 69/76
Carry Save Multiplier Carry Save Adder HA HA HA HA HA FA FA FA HA FA FA FA HA FA FA HA 70/76 Vector Merging Adder ( 1) ( 1) t = N t + N t + t mult carry and merge
Carry Save Multiplier Carry Save Multiplier Floorplan X 3 X 2 X 1 X 0 Y 0 HA Multiplier Cell Y 1 C S C S C S C S Z 0 FA Multiplier Cell Vector Merging Cell Y 2 C S C S C S C S Z 1 X and Y signals are broadcasted through the complete array. Y 3 C S C S C S C S Z 2 C S C S C S C S 71/76 Z 7 Z 6 Z 5 Z 4 Z 3
Contents Outline Introduction Signed Integers Review -Multiplication Multipliers Sequential (Serial) Multiplier Array Multipliers Combinational Multiplier Booth Multiplier Pipelined Multiplier Wallace Tree Multiplier Carry Save Multiplier Serial-Parallel Multiplier Summary 72/76
Serial-Parallel Multiplier Used in serial-arithmetic operations Multiplicand can be held in place by register Multiplier is shfited into array 73/76
Serial-Parallel Multiplier Structure 74/76
Contents Outline Introduction Signed Integers Review -Multiplication Multipliers Sequential (Serial) Multiplier Array Multipliers Combinational Multiplier Booth Multiplier Pipelined Multiplier Wallace Tree Multiplier Carry Save Multiplier Serial-Parallel Multiplier Summary 75/76
Summary Goals different than addition In some structures, sum and carry delay equal Analysis more difficult : Multiple critical paths Different levels of optimization Data encoding (Booth) Architecture-level: Wallace Tree Gate-level: pipelining Transistor-level: equal sum, carry delays More to cover Constant multiplication Floating point, precision 76/76