Chapter 2 Literature Review

Similar documents
CHAPTER 2 LITERATURE STUDY

Design and Development of 8-Bits Fast Multiplier for Low Power Applications

Area-Time Efficient Digit-Serial-Serial Two s Complement Multiplier

Sequential Logic (2) Synchronous vs Asynchronous Sequential Circuit. Clock Signal. Synchronous Sequential Circuits. FSM Overview 9/10/12

Mixed CMOS PTL Adders

DIGITAL multipliers [1], [2] are the core components of

Geometric quantities for polar curves

To provide data transmission in indoor

Kirchhoff s Rules. Kirchhoff s Laws. Kirchhoff s Rules. Kirchhoff s Laws. Practice. Understanding SPH4UW. Kirchhoff s Voltage Rule (KVR):

CS 135: Computer Architecture I. Boolean Algebra. Basic Logic Gates

Design and implementation of a high-speed bit-serial SFQ adder based on the binary decision diagram

Student Book SERIES. Patterns and Algebra. Name

Solutions to exercise 1 in ETS052 Computer Communication

Math Circles Finite Automata Question Sheet 3 (Solutions)

Module 9. DC Machines. Version 2 EE IIT, Kharagpur

MOS Transistors. Silicon Lattice

ISSCC 2006 / SESSION 21 / ADVANCED CLOCKING, LOGIC AND SIGNALING TECHNIQUES / 21.5

First Round Solutions Grades 4, 5, and 6

A COMPARISON OF CIRCUIT IMPLEMENTATIONS FROM A SECURITY PERSPECTIVE

Dataflow Language Model. DataFlow Models. Applications of Dataflow. Dataflow Languages. Kahn process networks. A Kahn Process (1)

Asynchronous Data-Driven Circuit Synthesis

The Design and Verification of A High-Performance Low-Control-Overhead Asynchronous Differential Equation Solver

Multi-beam antennas in a broadband wireless access system

Automatic Synthesis of Compressor Trees: Reevaluating Large Counters

Understanding Basic Analog Ideal Op Amps

ECE 274 Digital Logic. Digital Design. Datapath Components Shifters, Comparators, Counters, Multipliers Digital Design

10.4 AREAS AND LENGTHS IN POLAR COORDINATES

Three-Phase Synchronous Machines The synchronous machine can be used to operate as: 1. Synchronous motors 2. Synchronous generators (Alternator)

The Discussion of this exercise covers the following points:

Experiment 3: Non-Ideal Operational Amplifiers

SOLVING TRIANGLES USING THE SINE AND COSINE RULES

CHAPTER 3 AMPLIFIER DESIGN TECHNIQUES

Application Note. Differential Amplifier

1 tray of toffee 1 bar of toffee. 10 In the decimal number, 0 7, the 7 refers to 7 tenths or

Experiment 3: Non-Ideal Operational Amplifiers

Algorithms for Memory Hierarchies Lecture 14

Experiment 8 Series DC Motor (II)

METHOD OF LOCATION USING SIGNALS OF UNKNOWN ORIGIN. Inventor: Brian L. Baskin

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

CS2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2005

& Y Connected resistors, Light emitting diode.

On the Description of Communications Between Software Components with UML

Use of compiler optimization of software bypassing as a method to improve energy efficiency of exposed data path architectures

MAXIMUM FLOWS IN FUZZY NETWORKS WITH FUNNEL-SHAPED NODES

Direct Current Circuits. Chapter Outline Electromotive Force 28.2 Resistors in Series and in Parallel 28.3 Kirchhoff s Rules 28.

Soft switched DC-DC PWM Converters

arxiv: v1 [cs.cc] 29 Mar 2012

Pennsylvania State University. University Park, PA only simple two or three input gates (e.g., AND/NAND,

A New Algorithm to Compute Alternate Paths in Reliable OSPF (ROSPF)

Implementation of Different Architectures of Forward 4x4 Integer DCT For H.264/AVC Encoder

(CATALYST GROUP) B"sic Electric"l Engineering

Lecture 20. Intro to line integrals. Dan Nichols MATH 233, Spring 2018 University of Massachusetts.

University of North Carolina-Charlotte Department of Electrical and Computer Engineering ECGR 4143/5195 Electrical Machinery Fall 2009

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

EE Controls Lab #2: Implementing State-Transition Logic on a PLC

Example. Check that the Jacobian of the transformation to spherical coordinates is

A Practical DPA Countermeasure with BDD Architecture

A Novel Back EMF Zero Crossing Detection of Brushless DC Motor Based on PWM

High Speed On-Chip Interconnects: Trade offs in Passive Termination

Threshold Logic Computing: Memristive-CMOS Circuits for Fast Fourier Transform and Vedic Multiplication

Convolutional Networks. Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow

ABB STOTZ-KONTAKT. ABB i-bus EIB Current Module SM/S Intelligent Installation Systems. User Manual SM/S In = 16 A AC Un = 230 V AC

Analysis of circuits containing active elements by using modified T - graphs

MONOCHRONICLE STRAIGHT

Spiral Tilings with C-curves

Open Access A Novel Parallel Current-sharing Control Method of Switch Power Supply

April 9, 2000 DIS chapter 10 CHAPTER 3 : INTEGRATED PROCESSOR-LEVEL ARCHITECTURES FOR REAL-TIME DIGITAL SIGNAL PROCESSING

Digital Design. Sequential Logic Design -- Controllers. Copyright 2007 Frank Vahid

A Highly Efficient Carry Select Adder

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center

Alternating-Current Circuits

Student Book SERIES. Fractions. Name

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Discontinued AN6262N, AN6263N. (planed maintenance type, maintenance type, planed discontinued typed, discontinued type)

A Development of Earthing-Resistance-Estimation Instrument

Section 16.3 Double Integrals over General Regions

Engineer-to-Engineer Note

Network Theorems. Objectives 9.1 INTRODUCTION 9.2 SUPERPOSITION THEOREM

Experiment 3: The research of Thevenin theorem

Interference Cancellation Method without Feedback Amount for Three Users Interference Channel

Exercise 1-1. The Sine Wave EXERCISE OBJECTIVE DISCUSSION OUTLINE. Relationship between a rotating phasor and a sine wave DISCUSSION

Patterns and Algebra

Synchronous Machine Parameter Measurement

Synchronous Machine Parameter Measurement

Translate and Classify Conic Sections

Compared to generators DC MOTORS. Back e.m.f. Back e.m.f. Example. Example. The construction of a d.c. motor is the same as a d.c. generator.

Th ELI1 09 Broadband Processing of West of Shetland Data

Polar Coordinates. July 30, 2014

EQ: What are the similarities and differences between matrices and real numbers?

Control of high-frequency AC link electronic transformer

Performance Comparison between Network Coding in Space and Routing in Space

Triangles and parallelograms of equal area in an ellipse

Power-Aware FPGA Logic Synthesis Using Binary Decision Diagrams

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

EET 438a Automatic Control Systems Technology Laboratory 5 Control of a Separately Excited DC Machine

Domination and Independence on Square Chessboard

Lab 8. Speed Control of a D.C. motor. The Motor Drive

Faster and Low Power Twin Precision Multiplier

Lecture 16: Four Quadrant operation of DC Drive (or) TYPE E Four Quadrant chopper Fed Drive: Operation

Francis Gaspalou Second edition of February 10, 2012 (First edition on January 28, 2012) HOW MANY SQUARES ARE THERE, Mr TARRY?

Transcription:

Chpter 2 Literture Review 2.1 ADDER TOPOLOGIES Mny different dder rchitectures hve een proposed for inry ddition since 1950 s to improve vrious spects of speed, re nd power. Ripple Crry Adder hve the simplest rchitecture, ut performs slower ddition due to its longest crry propgtion dely (R.Um et l (2012)). Crry Sve Adder improves the speed of ddition y using N dditionl hlf dders in Ripple Crry Adder to reduce long crry propgtion dely, ut it consumes more re nd power thn Ripple Crry Adder (Chki Aloui (2011)). A crry-lookhed dder performs fst ddition y reducing the mount of time required to determine crry its (Yu-Ting Pi nd Yu-Kumg Chen (2004)). It finds the crry it in dvnce for ech it position, whether tht position is going to propgte crry if 1 comes from the nerest LSB. On the other hnd, Crry Skip Adder nd Crry Select Adder speeding up the ddition where in the dders re split in locks of N its. In Crry Skip Adder, ech lock clcultes the crry it to propgte to the next lock sed on MSB crry-out, ech it sum out nd LSB crry-in (Yu Png et l (2012)). So tht the next lock towrds MSB need not to wit till the previous lock completes the ddition. The Crry Select Adder performs prllel ddition with crry-in for 0 nd crry-in for 1 (Sudhnshu Shekhr et l (2013)). Ech lock of dders generte finl sum with only multiplexer dely. So the Crry Select Adder performs fster thn ll other dders. 2.1.1 RIPPLE CARRY ADDER (RCA) The RCA is constructed y cscding series of full dder s s shown in Figure 2.1. The crry-out of ech full dder is directly fed to the crry-in of the next full dder. Ech full dder dding three digits nd generte crry it to the next full dder to strt computtion. Until the crry it is received from the previous dder, the next dder would not strt its computtion. This cuses the longest dely in RCA nd it increses linerly with the it size. 4

The dely of the RCA defined s, t = O(N) (2.1) where N is the opernd size in its. Even though RCA consumes more dely, due to its regulr structure, it tkes lesser re nd consumes lesser power. This mkes RCA s est choice to use in the low power pplictions. An Eqully Shred Block Scheme (ESBS) sed 16-it RCA is shown in Figure 2.1. C16 4- it lock C12 4- it lock C8 4- it lock C4 3 2 1 0 C0 Ripple Crry Stges Figure 2.1: Schemtic of 16-it RCA(C-Crry it) 2.1.2 CARRY SAVE ADDER (CSA) A 16-it CSA structure is shown in Figure 2.2. It consists of N+1 hlf dders in the first stge nd N-1 full dders in second stge. In the first stge, unlike sequentil 3-it ddition in RCA, here two N-it ddition hppens in prllel to generte prtil sum. The prtil sum vlues re stored in the second stge full dders. The finl sum is then computed y shifting the crry sequence from LSB to MSB through the prtil sum vlues. 5

The dely of the CSA defined s, t = O(log N) (2.2) Even though CSA performs fster thn RCA, it increses re nd power due to its N dditionl hlf dder s. Since CSA hs regulr connectivity to propgte sum & crry to next stge, it is mostly used in multiplier designs to propgte the prtil sum nd prtil crry from ech stge. 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 H H H H H H H H H H H H H H H F Cin H F F F F F F F F F F F F F F H c15 c14 c13 c12 c11 c10 c9 c8 c7 c6 c5 c4 c3 c2 c1 s17 s16 s15 s14 s13 s12 s11 s10 s9 s8 s7 s6 s5 s4 s3 s2 s1 s0 Figure 2.2: Schemtic of 16-it CSA (H-Hlf Adder, F-Full Adder) 2.1.3 CARRY LOOK-AHEAD ADDER (CLA) A 4-it CLA structure is shown in Figure 2.3. It speeds up the ddition y reducing the mount of time required to determine crry its. It uses two locks, crry genertor (Gi) nd crry propgtor (Pi) which finds the crry it in dvnce for ech it position from the nerest LSB, if the crry is 1 then tht position is going to propgte crry to next dder. 6

The generte lock cn e relized using the expression Gi = Ai. Bi for i=0,1,2,3 (2.3) Similrly the propgte lock cn e relized using the expression Pi = Ai Bi for i=0,1,2,3 (2.4) The crry output of the (i-1) th stge is otined from Ci = Gi +Pi C i-1 for i=0,1,2,3 (2.5) The sum output cn e otined using Si = Ai BiC i-1 for i=0,1,2,3 (2.6) A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 1- itfa 1- itfa 1- itfa 1- itfa C 0 S 3 S 2 S 1 S 0 C 4 P 3 G 3 C 3 P 2 G 2 C 2 P 1 G 1 C 1 P 0 G 0 4-it Crry Look-Ahed Logic PG CG Figure 2.3: Schemtic of 4-it CLA Even though the CLA is fster thn RCA, it increses re nd power due to its crry genertor nd propgtor logic. 7

2.1.4 CARRY SKIP ADDER (CSKA) A CSKA performs fst ddition since dders re split in locks of N its. It gretly reduces the dely of the dder through its criticl pth, since the crry it for ech lock cn e ypssed (skip) over the locks. It consists of simple RCA with AND-OR skip logic s shown in Figure 2.4. It genertes crry-out from ech lock depending on MSB full dder crry-out, LSB full dder crry-in nd sum it of ech full dder. If the AND-OR skip logic output is 1, the current lock will e ypssed nd next lock will strt computtion. The dely of the CSKA defined s; t = (O( N) (2.7) The dditionl skip logic consumes slight re overhed in the CSKA, ut it is lesser thn CSA nd CLA. The design schemtic of 16-it CSKA is shown in Figure 2.4. 4- it lock C12 4- it lock C8 4- it lock C4 3 2 1 0 C0 P[15,12] P[11,8] P[7,4] P[3,0] C16 Skip Logic Skip Logic Skip Logic Skip Logic(2 Gtes) Figure 2.4: Schemtic of 16-it CSKA 8

2.1.5 CARRY SELECT ADDER (CSLA) A CSLA generlly consists of two RCA nd Multiplexer (Mux). It performs two dditions in prllel, y ssuming crry-in of 0 nd crry-in of 1. A CSLA performs fst ddition since dders re split in locks of N its A K-it CSLA is shown in Fig.2.5. It contins two groups of dders s one for lower N/2 its nd nother for higher N/2 its. The higher N/2 its group dder computes the prtil sum nd prtil crry y ssuming crry-in 1 nd crry-in 0 in prllel with the lower N/2 its. It genertes finl sum nd crry sed on the Mux selection input. Hence the dely of the CSLA cn e defined s, T select-dd (N) = T dd (N/2) + 1 (2.8) The CSLA is widely used in High performnce pplictions. But it consumes lrge re nd power due to its incresed hrdwre resources. Mny reserch rticles hve proposed vrious hints to reduce re nd power in the CSLA structure. N 1 N/2 k/2 N/2-it RCA 0 N/2 1 0 N/2-it k/2 it RCA 1 N/2-it k/2 it RCA Cin N/2 + 1 N/2 + 1 k/2 + 1 1 0 Mux Cout, Higher N/2 its Lower N/2 its Figure 2.5: Schemtic of K-it CSLA 9

2.1.6 OTHER ADDERS Kogge Stone dder is prllel prefix form crry look-hed dder (Kogge P nd Stone H (1973)). It hs regulr lyout nd minimum logic depth (fn-out) which mkes fst dder ut hs lrge re. The dely of Kogge Stone dder is equl to log 2 N nd hs the re (N*log 2 N)-N+1, where N is the numer of input its (N Zmhri et l (2012)). Another prllel prefix dder is Brent Kung Adder (R. P Brent nd H. T. Kung (1982)). It hs more logic depth (fn-out) with minimum re chrcteristics. So it reduces its ddition speed, ut power efficient (N Zmhri et l (2012)). The dely of Brent Kung Adder is equl to (2*log 2 N)-2 nd hs the re of (2N)-2-log 2 N. Ldner Fischer dder is nother prllel prefix dder (R.E. Ldner nd M.J. Fischer (1980)). Its dely nd re re symptoticlly optiml (i.e., logrithmic dely nd liner re). It hs n dditionl type of recursive step for constructing prllel prefix circuit. This dditionl recursive step reduces the dely, ut increses re. It hs dely of O(log N) nd re of O(N). An improvement tht cn e mde to CLA design is the use of pseudo-crry s proposed y Ling, nd is clled Ling dder (H. Ling (1981)). This method llows single locl propgte signl to e removed from the criticl pth. Hn-Crlson dder is hyrid design tht mix of Kogge-Stone nd Brent-Kung dder (T. Hn nd D. Crlson (1987)). It hs logn+1 stges. The logic performs Kogge- Stone on the odd numered its, nd then uses one more stge to ripple into the even positions. The Sklnsky or divide-nd-conquer dder reduces the dely to log 2 N stges y computing intermedite prefixes long with the lrge group prefixes (J. Sklnsky (1960)). This comes t the expense of fnouts tht doule t ech level (8, 4, 2, 1). These high fnouts cuse poor performnce on wide dders unless the high fnout gtes re ppropritely sized or the criticl signls re uffered efore eing used for the intermedite prefixes. Trnsistor sizing cn cut into the regulrity of the lyout ecuse multiple sizes of ech cell re required, lthough the lrger gtes cn spred into djcent columns. 10

2.2 MULTIPLIER TOPOLOGIES Two clsses of prllel multipliers were defined in the 1960 s. The first clss of prllel multiplier use rectngulr rry of identicl cells which contins AND gte nd ddition logic to generte nd sum the prtil product its (J. C. Mjithi nd R. Kiti (1964)). These kinds of multipliers re clled rry multipliers nd they hve dely tht is proportionl to the multiplier input word size, i.e. O(N). Since rry multipliers hve regulr structure nd regulr wiring connectivity, it is esier to implement these t the lyout level (R P Pl Singh et l (2009)). The next clss of prllel multipliers termed column compression multipliers, uses counters or compressors to reduce the mtrix of prtil product rry to two words. Finlly crry propgte dder is used to sum these two words to get the finl product. The column compression multiplier hve dely proportionl to the logrithm of the multiplier word length, i.e. O(log N) So it is fster thn rry multiplier, ut due to its irregulr structure nd interconnections it is difficult to lyout. 2.2.1 ARRAY MULTIPLIERS A 4 y 4 rry multiplier structure is shown in Figure 2.6. Ech cell performs the two sic functions of prtil product genertion nd summtion. Hlf dders nd full dders re used to perform ddition function. An unsigned N y N rry multiplier requires N 2 + N cells, where N 2 contin n AND gte for prtil product genertion, 2N full dder nd N hlf dder to produce multiplier. The worst cse dely is (2N - 2) c (Bickerstff K.C (2007)), where c is the worst cse dder dely. Here ll the products re generted in prllel nd collected through n rry of full dders nd hlf dders, finlly they re summed using CPA. Since its regulr structures, the rry multiplier tkes less mount of re, ut is slowest in terms of the ltency. In the 1950 s Booth lgorithm used in rry multipliers to perform two s complement multipliction (Andrew D. Booth (1951)). It computes the prtil products y exmining two multiplicnd its t time. Lter higher rdix modified Booth lgorithm ws introduced to improve the ltency performnce of the regulr Booth rry multiplier. 11

2,1 3,0 1,1 2,0 0,1 1,0 0,0 HA HA HA 3,1 2,2 1,2 0,2 FA FA FA 3,2 2,3 1,3 0,3 FA FA FA 3,3 FA FA HA p7 p6 p5 p4 p3 p2 p1 p0 Figure 2.6: Schemtic of 4 y 4 Arry Multiplier The Booth Rdix-4 lgorithm (O. L. McSorley (1961)) reduces the numer of prtil products y hlf while keeping the circuit s complexity down to minimum. This result in fster less power in multipliction opertion. Booth Recoding mkes these dvntges possile y skipping clock cycles tht dd nothing new in the wy of product terms. The hrdwre implementtion for Rdix-4 Booth Recoding technique use simple mux tht selects the correct shift-nd-dd opertion sed on the groupings of its found in the product register. The product register holds the multiplier. The multiplicnd nd the two s complement of the multiplicnd re dded sed on the recoding vlue. The directions for the rdix-4 modified Booth recoding technique re shown in Tle 2.1. 12

The three it decodes five possile opertions re dd 2*multiplicnd, dd multiplicnd, dd 0, sutrct multiplicnd, or sutrct 2*multiplicnd. It increses the hrdwre complexity, ut consumes only hlf the delys of the regulr Booth multiplier. It is possile to use higher rdices, such s rdix-8 or rdix-16, ut the dditionl complexity, due to non-power of two multiples of the multiplicnd, compromises dely nd re improvements. Tle 2.1: Rdix-4 Modified Booth Recoding i i-1 i-2 opertions 0 0 0 Add 0 0 0 1 Add multiplicnd 0 1 0 Add multiplicnd 0 1 1 Add 2* multiplicnd 1 0 0 Sutrct 2* multiplicnd 1 0 1 Sutrct multiplicnd 1 1 0 Sutrct multiplicnd 1 1 1 Sutrct 0 Another method ws proposed y Bugh nd Wooley (Chrles R. Bugh nd Bruce. A. Wooley (1973)) to hndle signed its. This technique hs een developed in order to design regulr multipliers suited for 2 s complement numers. Due to the dditionl two rows, it increses the mximum column height y two. Becuse of the dditionl two stges of prtil product reduction, it increses overll dely. A modified form of the Bugh nd Wooley method (Shinn-Rong Kung et l (2009)) is more commonly used ecuse it does not increse the mximum column height. The prtil product orgniztion of the modified Bugh-Wooley method is shown in Figure 2.7. The strtegy of orgniztion is follows, 1) Invert the MSB its of ech row except the ottom row. 2) Invert ll the its in ottom row, except the MSB it. 3) Add single one to the (N+1) th nd 2N th columns. The negtive prtil product its cn e generted using NAND gte insted of n AND gte, which my reduce the re slightly in CMOS. 13

# inverted it positions x7 x6 x5 x4 x3 x2 x1 x0 y7 y6 y5 y4 y3 y2 y1 y0 1 #p70 p60 p50 p40 p30 p20 p10 p00 #p71 p61 p51 p41 p31 p21 p11 p01 #p72 p62 p52 p42 p32 p22 p12 p02 #p73 p63 p53 p43 p33 p23 p13 p03 #p74 p64 p54 p44 p34 p24 p14 p04 #p75 p65 p55 p45 p35 p25 p15 p05 #p76 p66 p56 p46 p36 p26 p16 p06 1 p77 #p67 #p57 #p47 #p37 #p27 #p17 #p07 #s15 s14 s13 s12 s11 s10 s9 s8 s7 s6 s5 s4 s3 s2 s1 s0 Figure 2.7: Two s Complement Multipliction y Modified Bugh-Wooley Method 2.2.2 COLUMN COMPRESSION MULTIPLIERS In 1964, Wllce (C.S.Wllce (1964)) introduced scheme for fst multipliction sed on using rry of full dders nd hlf dders. He used full dders for ll three its nd hlf dders for ll two its in the prtil products rry of multiplier to speed up the multipliction. Lter the Wllce s pproch ws modified y Ddd (Luigi Ddd (1965)) using counter plcement strtegy in the prtil product rry. Here the plcement of counters strts from the criticl pth in the prtil product rry. This plcement repets until we get finl two rows nd they re summed using crry propgte dder to get finl product. In oth Wllce nd Ddd methods, the dely of the multiplier is proportionl to the logrithm of the opernd word-length. Reduced re pproch is n nother type of prtil product reduction method proposed for re optimiztion in column compression multipliers (K Andre et l (1993)). Another re reduction pproch is proposed y Wng s (Z. Wng et l (1995)). These methods re sed on strtegic utiliztion of full dders nd hlf dders to improve re reduction nd lyout, while mintining the fst speed of the Wllce nd Ddd designs. 14

Another prtil product reduction lgorithm sed on the unequl dely pths proposed y Oklodzij (V. G. Oklodzij (1995)). He defined the connectivity strtegy of slow inputs/outputs nd fst inputs/outputs in the criticl dely pths tht cn tolerte n increse in dely. A new orgniztion of the reduction tree, which is sed on the prtil-product compression similr to the Ddd pproch, is proposed y Eriksson (H. Eriksson (2006)). The connectivity of the dding cells in the tringle-shped High-Performnce Multiplier (HPM) reduction tree is completely regulr. 2.2.2.1 PARTIAL PRODUCTS REDUCTION SCHEMES As shown in Figure 2.8, the multiplier strts with generting prtil products using AND gte rry nd reducing those to two rows using counters or compressors. It is good to understnd the difference etween counters nd compressors (V. G. Oklodzij nd D.Villeger (1995)). The counter counts the numer of ctive inputs nd the compressor (q:r) reduces q inputs to r outputs sed on the compression rtio. In this reserch we used counters for design multipliers, not used compressors. The column compression tree includes rry of counters or compressors. Finlly the two rows re dded using crry propgte dder to get finl product. Hyrid dder structure cn e used for crry propgte dder to perform fst ddition to get finl products. Multiplier N Multiplicnd N AND rry Column Compression Tree Finl Crry Propgte Adder 2N Figure 2.8: Bsic N y N unsigned prllel multiplier 15

Dot digrm is nottion for descriing multipliction column compression lgorithms. The symols used in dot digrms re listed elow, Ech dot - ech prtil product it Plin digonl line - ech full dder output Crossed digonl line - ech hlf dder output The dot digrm for 8 y 8 Wllce multiplier is shown in Figure 2.9. It ws constructed sed on the following Wllce lgorithm, 1) Tke ll three its in ech column nd dd them using full dder. 2) If there re two its left in ny of the column, dd them using hlf dder. 3) If there is just one it left in ny of the column, connect it to the next level. 4) Repet the steps 1 o 3 until get finl two rows. 5) Add the finl two numers using crry propgte dder to get the finl product. In ech stge of the reduction, Wllce performs preliminry grouping of prtil product rows into sets of three. Full dders nd hlf dders re then employed within ech three row set. In the 8 y 8 exmple, the counters shown in Stge 1 of the reduction re plced in four sections s determined y the preliminry grouping of prtil product its out of the AND rry into sets of three. If due to the preliminry grouping there is only one prtil product it, then tht it is directly moved down to the next stge. The reduction of the prtil product its in Stge 1 y the counters shown in Stge 2 demonstrtes tht rows which re not prt of three row set re moved down into the next stge without modifiction. 16

Figure 2.9: Dot Digrm of 8 y 8 Wllce Multiplier The complete prtil product reduction of 8 y 8 Wllce multiplier requires four stges (intermedite mtrix heights of 6, 4, 3, nd 2) nd uses 38 full dders nd 15 hlf dders. To complete the multipliction, n 11 it crry-propgte dder forms the finl product y dding the finl two rows of prtil product its shown in Stge 4. As mentioned erlier, lter the Ddd ws modified the Wllce s pproch using the counter plcement strtegy. Tle 2.2 indictes the numer of reduction stges sed on the numer of its in the Ddd multiplier. The reduction stges re determined from ottom (finl two rows) to top. In ech reduction stge the height of the mtrix is no more thn 1.5 times the height of its susequent mtrix. For exmple, 12 y 12 Ddd multiplier requires five reduction stges with intermedite heights of 9, 6, 4, 3 nd 2. 17

Tle 2.2: Reduction Strtegy for Ddd Multiplier Reduction Multiplier (N) Stges 3 Stge 1 4 Stge 2 5 <= N <= 6 Stge 3 7 <= N <= 9 Stge 4 10 <= N <= 13 Stge 5 14 <= N <= 19 Stge 6 20 <= N <= 28 Stge 7 29 <= N <= 42 Stge 8 43 <= N <= 63 Stge 9 64 <= N <= 94 Stge 10 The lgorithm used for Ddd multiplier is s follows: 1) Let d 1 = 2 nd d j+1 = 1.5 dj is the mtrix height for the j th stge from the finl two rows. It genertes the sequence: d 1 =2, d 2 =3, d 3 =4, etc. 2) For every column, use hlf dders nd full dders to ensure tht the numer of elements in ech column will e <= d j 3) Let j = j -1 nd repet step2 until you rech the mximum height of 2 it column. In Figure 2.10, the dot digrm for n8 y 8 Ddd multiplier is shown Figure 2.10. The first six mtrix heights clculted using the recursive lgorithm re 2, 3, 4, 6 nd 9. Since this is 8 y 8 multiplier, the mtrix height of 9 is unnecessry. The next mtrix height to trget is 6. Stge 1 of prtil product reduction pplies full dders nd hlf dders only to the columns whose totl height is greter thn 6. In Stge 2, full dders nd hlf dders re only used in columns whose totl height is greter thn 4. Note tht when evluting column s height it is importnt to ccount for crries from the previous column. The 8 y 8 Ddd multiplier requires four reduction stges (mtrix heights of 6, 4, 3, nd 2) nd uses 35 full dders, 7 hlf dders, nd 14 it crry-propgte dder. 18

Figure 2.10: Dot Digrm of 8 y 8 Ddd Multiplier Reduced Are multiplier (K Andre et l (1993), K Andre et l (1995), K Andre et l (2001)) is n nother reduction scheme to optimize the re thn Wllce nd Ddd scheme. The dot digrm for 8 y 8 Reduced Are multiplier is shown in Figure 2.11.This multiplier requires four stges (mtrix heights of 6, 4, 3, nd 2) nd uses 35full dders, 7hlf dders, nd 10 it crry-propgte dder. The reduction method for the Reduced Are multiplier is: 1) For ech reduction stge, the numer of full dders used in ech column is i / 3, where i is the numer of its in column i. This provides the mximum column reduction in the numer of its entering the next stge. 2) Hlf dders re used only for the elow two conditions, 19

(i) When required to reduce the numer of its in column to the numer of its specified y the Ddd sequence (or) (ii) To reduce the rightmost column contining only two its. Reduced Are multiplier reduction scheme is especilly useful for pipelined multipliers, ecuse it reduces the required ltches in the prtil product reduction stges. This scheme cn e pplied for oth signed nd unsigned numers. Figure 2.11: Dot Digrm of 8 y 8 Reduced Are Multiplier 20

A fourth type of reduction scheme, which uses full dders nd hlf dders, is clled the High Performnce Multiplier (HPM) multiplier (H. Eriksson et l (2006)). The dot digrm for n 8 y 8 High Performnce Multiplier is shown in Figure 2.12. This multiplier requires six stges (mtrix heights of 7, 6, 5, 4, 3, nd 2) nd uses 35full dders, 7hlf dders, nd 14 it crry-propgte dder. The reduction for ech stge in the High Performnce Multiplier is N-1, where N is mtrix height of previous stge. Figure 2.12: Dot Digrm of 8 y 8 HPM Multiplier 21

A fifth type of prtil product reduction scheme hs een proposed y Wng, et l. (Z. Wng et l (1995)) to design more re efficient with shorted interconnections in the column compression multipliers. First he determines the lower ounds on the numer of dders required y column compression multiplier. Then the constrints hve een nlyzed for the distriution of dders to the different stges. Finlly he proposed technique tht ttempts to mximize re efficiency while reducing the numer of cross-stge interconnections. The constrints for hlf dder nd full dder lloction in the column compression were nlyzed nd under these constrints, considerle flexiility for implementtion of the column compression multiplier nd choosing the length of the finl fst dder which yields higher re efficiency. In Wng s reserch, re efficiency of the column compression prt of the multiplier is defined s: N x 100% K.mx (N (k)) (2.9) where N is the totl numer of hlf dders nd full dders used in the reduction stges, K is the required numer of stges, nd N(k) is the numer of hlf dders nd full dders in stge k. The performnce of ny of these five multipliers Wllce, Ddd, Reduced Are, HPM nd Wng cn e improved y the proposed design techniques proposed in this reserch. 22

2.2.2.2 THE FINAL CARRY-PROPAGATE ADDER All the fst dder structures were developed under the ssumption tht the input signls re rriving t the sme time. This ssumption is not relistic for mny plces like input rrivl profile from the multiplier prtil product summtion tree to the crry propgte dder. Therefore this reserch concerned out which one of the schemes for ddition is most dequte s crry propgte dder for the multiplier. The literture dels severl types of crry-propgte dders, including CSA, CLA, CSLA nd CSKA (R.Um et l (2012)). The dder structures hve een evluted nd rted sed on the dely, re nd numer of logic trnsitions (Thoms K. Cllwy nd Erl E. Swrtzlnder, Jr (1992)). More specificlly the work hs een done to evlute the power consumption of dders (Thoms K. Cllwy nd Erl E. Swrtzlnder, Jr (1993)). It is well known tht the signls from column compression tree pplied to the inputs of the crry propgte dder rrive first t the ends of the crry propgte dder nd the lst ones re those in the middle of the crry propgte dder. So the determintion of the exct rrivl time to crry propgte dder is of prime importnce in the design of the optiml finl dder. To etter select nd design dders for column compression multipliers Oklodzij nlyzed the input rrivl times to the finl dder (V. G. Oklodzij nd D.Villeger (1995), V. G. Oklodzij (1995)) nd he suggests using either vrile lock dder or RCA to sum the erly LSB vlues, CLA to sum the middle region of its, nd either conditionl sum dder or CSLA to sum the erly MSB vlues. Since RCA hs simple nd regulr structure, it consumes lesser power nd is re efficient thn ll other existing dders. But ech stge in RCA genertes sum only fter receiving the crry it from the preceding stge it pirs. So it leds to lrge crry propgtion dely. The rrivl profile s shown y Oklodzij (V. G. Oklodzij nd D.Villeger (1995)) nd Blsurhmnym (Blsurhmnym et l (2012)), hs positive slope from the LSB region to middle region of the prtil products. Even though the crry it rrives fster from the preceding stge of the finl ddition, the rrivl of true vlues from the prtil products re slower in the positive slope region. So the fst dder is not est choice in this region nd the RCA is est choice in the positive slope. 23

But this slope is not lwys positive in the entire multiplier region. It hs constnt slope in the middle region nd negtive slope in the MSB side of the prtil products. So determintion of the suitle dder in ech region would led to optiml performnce of the multipliers. Bsed on the different rrivl profile region of the prtil products, this reserch proposed hyrid crry propgte dder structure for prllel multipliers which consumes lesser power, re efficient thn the regulr CSLA, nd fster thn the CSA. This enles optiml performnce in the finl ddition for the multipliers proposed in this reserch. 24