Automatic Synthesis of Compressor Trees: Reevaluating Large Counters

Similar documents
CHAPTER 2 LITERATURE STUDY

Mixed CMOS PTL Adders

Sequential Logic (2) Synchronous vs Asynchronous Sequential Circuit. Clock Signal. Synchronous Sequential Circuits. FSM Overview 9/10/12

Chapter 2 Literature Review

Area-Time Efficient Digit-Serial-Serial Two s Complement Multiplier

To provide data transmission in indoor

MAXIMUM FLOWS IN FUZZY NETWORKS WITH FUNNEL-SHAPED NODES

Kirchhoff s Rules. Kirchhoff s Laws. Kirchhoff s Rules. Kirchhoff s Laws. Practice. Understanding SPH4UW. Kirchhoff s Voltage Rule (KVR):

Design and Development of 8-Bits Fast Multiplier for Low Power Applications

Implementation of Different Architectures of Forward 4x4 Integer DCT For H.264/AVC Encoder

Math Circles Finite Automata Question Sheet 3 (Solutions)

Geometric quantities for polar curves

A Development of Earthing-Resistance-Estimation Instrument

& Y Connected resistors, Light emitting diode.

Digital Design. Sequential Logic Design -- Controllers. Copyright 2007 Frank Vahid

DIGITAL multipliers [1], [2] are the core components of

Multi-beam antennas in a broadband wireless access system

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

Design and implementation of a high-speed bit-serial SFQ adder based on the binary decision diagram

CS 135: Computer Architecture I. Boolean Algebra. Basic Logic Gates

Use of compiler optimization of software bypassing as a method to improve energy efficiency of exposed data path architectures

Solutions to exercise 1 in ETS052 Computer Communication

A New Algorithm to Compute Alternate Paths in Reliable OSPF (ROSPF)

CHAPTER 3 AMPLIFIER DESIGN TECHNIQUES

Section 16.3 Double Integrals over General Regions

10.4 AREAS AND LENGTHS IN POLAR COORDINATES

Timing Macro-modeling of IP Blocks with Crosstalk

On the Description of Communications Between Software Components with UML

Module 9. DC Machines. Version 2 EE IIT, Kharagpur

Analysis of circuits containing active elements by using modified T - graphs

EE Controls Lab #2: Implementing State-Transition Logic on a PLC

Algorithms for Memory Hierarchies Lecture 14

arxiv: v1 [cs.cc] 29 Mar 2012

(CATALYST GROUP) B"sic Electric"l Engineering

First Round Solutions Grades 4, 5, and 6

Translate and Classify Conic Sections

Homework #1 due Monday at 6pm. White drop box in Student Lounge on the second floor of Cory. Tuesday labs cancelled next week

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center

Student Book SERIES. Patterns and Algebra. Name

LATEST CALIBRATION OF GLONASS P-CODE TIME RECEIVERS

9.4. ; 65. A family of curves has polar equations. ; 66. The astronomer Giovanni Cassini ( ) studied the family of curves with polar equations

ISSCC 2006 / SESSION 21 / ADVANCED CLOCKING, LOGIC AND SIGNALING TECHNIQUES / 21.5

Student Book SERIES. Fractions. Name

Lecture 20. Intro to line integrals. Dan Nichols MATH 233, Spring 2018 University of Massachusetts.

High Speed On-Chip Interconnects: Trade offs in Passive Termination

April 9, 2000 DIS chapter 10 CHAPTER 3 : INTEGRATED PROCESSOR-LEVEL ARCHITECTURES FOR REAL-TIME DIGITAL SIGNAL PROCESSING

ECE 274 Digital Logic. Digital Design. Datapath Components Shifters, Comparators, Counters, Multipliers Digital Design

Understanding Basic Analog Ideal Op Amps

Asynchronous Data-Driven Circuit Synthesis

Computing Logic-Stage Delays Using Circuit Simulation and Symbolic Elmore Analysis

Re: PCT Minimum Documentation: Updating of the Inventory of Patent Documents According to PCT Rule 34.1

Unit 1: Chapter 4 Roots & Powers

SOLVING TRIANGLES USING THE SINE AND COSINE RULES

Localization of Latent Image in Heterophase AgBr(I) Tabular Microcrystals

Example. Check that the Jacobian of the transformation to spherical coordinates is

Efficient and Resilient Key Discovery based on Pseudo-Random Key Pre-Deployment

Genetic Representations for Evolutionary Minimization of Network Coding Resources

Y9.ET1.3 Implementation of Secure Energy Management against Cyber/physical Attacks for FREEDM System

CHAPTER 3 EDGE DETECTION USING CLASICAL EDGE DETECTORS

Separation Constraint Partitioning - A New Algorithm for Partitioning. Non-strict Programs into Sequential Threads. David E. Culler, Seth C.

Open Access A Novel Parallel Current-sharing Control Method of Switch Power Supply

ABB STOTZ-KONTAKT. ABB i-bus EIB Current Module SM/S Intelligent Installation Systems. User Manual SM/S In = 16 A AC Un = 230 V AC

D I G I TA L C A M E R A S PA RT 4

Discontinued AN6262N, AN6263N. (planed maintenance type, maintenance type, planed discontinued typed, discontinued type)

CS2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2005

Patterns and Relationships

EET 438a Automatic Control Systems Technology Laboratory 5 Control of a Separately Excited DC Machine

A COMPARISON OF CIRCUIT IMPLEMENTATIONS FROM A SECURITY PERSPECTIVE

A New Stochastic Inner Product Core Design for Digital FIR Filters

Domination and Independence on Square Chessboard

Theme: Don t get mad. Learn mod.

(1) Non-linear system

Color gamut reduction techniques for printing with custom inks

Dataflow Language Model. DataFlow Models. Applications of Dataflow. Dataflow Languages. Kahn process networks. A Kahn Process (1)

Soft switched DC-DC PWM Converters

A Simple Approach to Control the Time-constant of Microwave Integrators

Control of high-frequency AC link electronic transformer

Triangles and parallelograms of equal area in an ellipse

Available online at ScienceDirect. Procedia Engineering 89 (2014 )

Synchronous Generator Line Synchronization

Study on SLT calibration method of 2-port waveguide DUT

Subword Permutation Instructions for Two-Dimensional Multimedia Processing in MicroSIMD Architectures

Network Theorems. Objectives 9.1 INTRODUCTION 9.2 SUPERPOSITION THEOREM

The Discussion of this exercise covers the following points:

Threshold Logic Computing: Memristive-CMOS Circuits for Fast Fourier Transform and Vedic Multiplication

Convolutional Networks. Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow

Experiment 3: The research of Thevenin theorem

ECE 274 Digital Logic Spring Digital Design. Combinational Logic Design Process and Common Combinational Components Digital Design

Energy Harvesting Two-Way Channels With Decoding and Processing Costs

Interference Cancellation Method without Feedback Amount for Three Users Interference Channel

Eliminating Non-Determinism During Test of High-Speed Source Synchronous Differential Buses

The Design and Verification of A High-Performance Low-Control-Overhead Asynchronous Differential Equation Solver

Exercise 1-1. The Sine Wave EXERCISE OBJECTIVE DISCUSSION OUTLINE. Relationship between a rotating phasor and a sine wave DISCUSSION

METHOD OF LOCATION USING SIGNALS OF UNKNOWN ORIGIN. Inventor: Brian L. Baskin

Section 17.2: Line Integrals. 1 Objectives. 2 Assignments. 3 Maple Commands. 1. Compute line integrals in IR 2 and IR Read Section 17.

Joanna Towler, Roading Engineer, Professional Services, NZTA National Office Dave Bates, Operations Manager, NZTA National Office

MATH 118 PROBLEM SET 6

AN ANALYSIS ON SYNTHETIC APERTURE RADAR DATA AND ENHANCEMENT OF RECONSTRUCTED IMAGES

Regular languages can be expressed as regular expressions.

Logic Design of Elementary Functional Operators in Quaternary Algebra

Transcription:

Automtic Snthesis of Compressor Trees: Reevluting Lrge Counters Aj K. Verm AjKumr.Verm@epfl.ch Polo Ienne Polo.Ienne@epfl.ch Ecole Poltechnique Fédérle de Lusnne (EPFL) School of Computer nd Communiction Sciences CH-5 Lusnne, Switzerlnd ABSTRACT Despite the progress of the lst decdes in electronic design utomtion, rithmetic circuits hve lws received w less ttention thn other clsses of digitl circuits. Logic snthesisers, which pl fundmentl role in design tod, pl minor role on most rithmetic circuits, performing some locl optimistions ut hrdl improving the overll structure of rithmetic components. Architecturl optimistions hve een often studied mnull, nd onl in the cse of ver common uilding locks such s fst dders nd multi-input dders, d-hoc techniques hve een developed. A notle cse is multi-input ddition, which is the core of mn circuits such s multipliers, etc. The most common technique to implement multi-input ddition is using compressor trees, which re often composed of crr-sve dders (sed on ( : ) counters, i.e., full dders). A lrge od of literture eists to implement compressor trees using lrge counters. However, ll the lrge counters were uilt using full nd hlf dders recursivel. In this pper we give some definite nswers to issues relted to the use of lrge counters. We present generl technique to implement lrge counters whose performnce is much etter thn the ones composed of full nd hlf dders. Also we show tht it is not lws useful to use lrger optimised counters nd sometimes comintion of vrious size counters gives the est performnce. Our results show 5% improvement in the criticl pth del. In some cses even hrdwre re is reduced using our counters.. INTRODUCTION Compressor trees re one of the ke components in rithmetic circuits, s these re the min constituents of prllel multiplier nd multi-input dders. Hence, improving the speed of compressor tree results in significnt speedup of the circuit. Unfortuntel logic snthesis tools do lous jo in optimising XOR-intensive circuits due to the shortcoming of lgeric fctoring. Hence, the direct snthesis of compressor trees (which re hevil XOR-dominted) results in poor qulit circuits. In fct, finding the optimum implementtion of compressor tree still remins chllenging tsk. Severl ttempts hs een mde to generte the optiml compressor trees. The most common strteg to implement compressor tree is using crr sve dders. A crr sve dder tkes three integers nd returns two integers such tht the sum of the inputs equls the sum of the outputs. The crr sve dder uses full dders (i.e., ( : ) counter) in prllel to reduce the three its in i th it position into two its t positions i nd i +. Mn lgorithms hve een proposed to use ( : ) counters in n effective w: some of them re lout constrined such s the compressor trees Wllce [4] nd Ddd [], which eploit the regulrit of the structure; however, some other methods such s Three Greed Approch Oklodzij et l. [5, 7] re greed lgorithms to find the optiml interconnections mong the vrious ( : ) counters. It is lso possile to use lrge counters insted of ( : ) counter, e.g., using (7 : ) counter one cn reduce 7 its t it position i into its t positions i, i +nd i +. It hs een oserved tht using (7 : ) counters is dvntgeous compred to using onl ( : ) counters. In fct, s we increse the counter size, the speed of the compressor tree increses. However, lter it ws noticed tht ll the lrge counters were implemented using ( : ) counter, ut hving proper interconnections mong the ( : ) locks. Other components used for multi-input dditions re (p : q) compressors. In contrst to counters, compressors use horizontl pth lso for crr propgtion. However, compressors lso use full dder (FA) nd hlf dder (HA) s their uilding locks, nd hence the specil dvntge of compressors cn lso e chieved implementing proper interconnections mong FA nd HA locks. As we hve lred mentioned, the most effective lgorithm to find the est interconnections mong the ( : ) counters ws given Oklodzij et l [5] nd is known s Three Greed Approch (TGA). In TGA, ech it position is considered individull from right to left nd the its in column re reduced to three or less using ( : ) counters. While choosing the inputs of ( : ) counter, one sums those its whose input rrivl time is lest mong ll the its. Once ech it position hs three or less its, finl sequence of ( : ) counters is used to reduce the three integers into two integers which re finll dded using n pproprite hrid dder. A similr pproch t word level ws suggested Kim et l. in their work [] for the optiml lloction of the inputs to cscded Crr-Sve Adders. In this pper we ddress the following questions: Is the implementtion of lrge counter using ( : ) counters (full dders) nd ( : ) counters (hlf dders) the est implementtion of it? If not, how to otin the est implementtion of counter? If uilding optiml counters of ritrr size is not fesile, is it sufficient to use onl priml counters ( priml counter reduces n input its into n n-it word)? Does it lws p off to use the lrgest ville counters insted of smller counters (e.g., should we ever use ( : ) counter when the numer of its is 7, nd(7 : ) cn e used)? In the rest of the pper we nswer these questions nd show tht it is not lws possile to otin the est implementtion of counter using smller counters. We lso propose new method to get the optiml implementtion of counters nd compressor trees. 978--988--4/DATE7 7 EDAA

The rest of the pper is orgnised s follows. In the net section we discuss some erlier work done on this topic. In Section we eplin wh the lrge counters nd compressors uilt of full nd hlf dders re not optiml. Section 4 discusses novel pproch to optimise counters nd its limittions. In Section 5 we formulte the prolem of optimising counters nd compressor trees, nd present method to solve it using Integer Liner Progrmming. Finll in Section 6 we present the results of our eperiments followed conclusions in Section 7.. STATE OF THE ART The prolem of multi-input ddition is not new prolem in the rithmetic communit s it ppers often in mn rithmetic circuits such s multipliers, etc. Not onl in multipliers, ut lso in some other pplictions which do not seem to contin multiinput dditions, some ppernces of multi-input dditions cn e found clustering dders seprted logic opertions s shown in [] nd []. The first rekthrough in this direction ws Wllce [4] when he introduced the notion of crr sve dder (constructed using ( : ) counters). Using the chin of crr sve dders, the inputs of multi-input ddition cn e reduced significntl fster thn tht doing seril ddition of inputs. The notion of Wllce tree ws generlised Ddd [] nd he proposed lgorithms to minimise the numer of counters. Since the compressor trees proposed Wllce nd Ddd were ver regulr nd lout constrined, the interconnections etween the ( : ) counters were fied, irrespective of the rrivl times of the inputs. It ws lter relized tht implementing proper interconnections etween vrious ( : ) counter locks, lrge counters nd compressors (such s (7 : ) counter, (4 : ) compressor, etc.) cn e generted. Since the lrge counters consider input rrivl times to interconnect the ( : ) locks up to some etent (t lest mong its constituents ( : ) counters), the compressor trees uilt on lrge counters hve usull smller dels. The notion of lrge counters nd compressors ws initited Weinerger [5] who introduced (4 : ) compressors. The use of lrger compressors nd counters were eplored Song nd De Micheli [6]. Almost ll lrge counters in literture re mde of full dders; however, discussion of fster qusi-digitl counters lso eists in literture s in the work of Swrtzlnder [9]. Unfortuntel these qusi-digitl counters re etremel comple nd prone to prolems due to drift, etc. For the first time Oklodzij presented lgorithmic methods (TGA) in his fundmentl work [5, 7] to find the optiml interconnections mong the ( : ) counter locks, which consider the different input rrivls of inputs. Since ll the lrge counters used previousl were uilt using ( : ) counters s sic lock, fter the introduction of the Three Greed Approch the use of lrger counters ppered unnecessr. The present work dvnces with respect to TGA nd shows tht it is dvntgeous to use lrge optimised counters (not necessril uilt using ( : ) counter s sic lock). Other thn optimising the compressor tree, some work hs een done to optimise the finl dder to dd the two output words of compressor trees. The choice of n optiml dder depends on the del profiles of its of two output words. An emple of such optimistion is presented in the work of Fdvi-Ardekni [] which optimises the finl dder of the compressor tree used to reduce prtil products it rr. The dder generted [] uses vrious stges of crr-select dders. However, the proposed method ssumes tht ll inputs of compressor tree re ville t the sme time, which is not lws true in generl. The optimistion of finl dder considering n hrid dder ws lso eplored Oklodzij [8]. In this work, we use the ltter pproch to optimise the () Figure : Best implementtion of the nd most significnt it of () four-it counter, nd, () five-it counter (the one which tpicll elongs to the criticl pth). Both implementtions cnnot e otined n comintion of full nd hlf dders. finl dder. Note tht ll the works mentioned ove ssume tht the inputs of ( : ) counter re independent of ech other, which is not the cse in generl s we hve shown previousl []. This work is sed on epliting the dependencies mong the inputs of n XOR gte, which re the core elements of full dder, to improve the speed of multipliers. However, the method is computtionll epensive nd remins prcticl onl for smller circuits. We will discuss the ppliction of this technique to counters in Section 4.. NON-OPTIMALITY OF COUNTERS BUILT ON FULL AND HALF ADDERS As we hve mentioned in the previous section, sometimes the inputs of XOR gte re correlted epressions nd using this fct the XOR gte cn e simplified into simple gtes such s NAND or OR. As n emple the epression for the second most significnt it of four-it counter (4 : ) with input its,,,nd cn e written s follows: out = =(( )( )) ( ). Note tht the epressions (( )( )) nd ( ) cnnot e true simultneousl for n vlues of,,,nd nd hence the XOR of the two epressions will e sme s the OR of the two epressions. Using this fct the epression of out cn e simplified: out =(( )( )) + ( ). This implementtion of four-it counter is shown in Fig. (). In similr w, in the epression of the second it of five-it counter, one XOR cn e converted into one OR, resulting in the implementtion shown in Fig. ().. Eplicit epression for Counter To understnd the nture of the correltion mong the vrious inputs of XOR gtes used in counter circuit, we should look t the ect epression of counter circuit. A (n : k) counter tkes n its,,..., n s inputs, nd returns k its c,c,..., c k such ()

tht vector (c k,c k,..., c ) corresponds to the inr representtion of the sum of n input its. The following theorem tells the ect epression for c i s in terms of,,..., n. THEOREM. If (c k,c k,..., c ) is the inr representtion of the sum of the its,,..., n, then the epression for c i cn e given s follows: c i = k k k i, where (k,k,,k i) runs over ll ( n i ) of i integers from the set {,,,n }. In order to illustrte the simplifiction of some counter circuits using correltion we define two terms. c(n, r) = k k kr, d(n, r) = k k kr. Note tht c i = c(n, i ). It is es to see tht d(n, r) corresponds to the sme epression s c(n, r) with ech XOR gte replced OR gte. Hence, often the del of circuit corresponding to d(n, r) cn e significntl less thn tht of corresponding to c(n, r). Also note tht in some cses c(n, r) cn e epressed ver esil in terms of d(n, r). Some such cses re stted in the net theorem. THEOREM. The following holds true for ll n nd i: ( i <n< i+ ) (c(n, i )=d(n, i )). ( i+ n< i ) (c(n, i )=d(n, i )d(n, i+ )). (n mod =) (c(n, n ) = d(n, n )). (n mod =) (c(n, n ) = d(n, n )d(n, n)). Note tht the first sttement tells tht for the clcultion of the most significnt it of counter we do not need n XOR gte. Also note tht full dder is etter thn implementing ( : ) counter using hlf dders ecuse it cn use the propert () mentioned ove (tht c(, ) = d(, );inotherwords c c = + c + c). Similrl, the epression for the second most significnt it of four-it counter cn e simplified using the propert () in the theorem ove, i.e., c(4, ) = d(4, )d(4, 4), which hs the sme performnce s the one shown in Fig. (). Note tht these re not the onl reltions etween c(n, r) nd d(n, r), ndit is etremel difficult to figure out which reltion etween c(n, r) nd d(n, r) we should use to get the est dvntge. Sometimes we might not even wnt to rewrite c(n, r) in terms of d(n, r). Thisis ecuse rewritingc(n, r) s improves the del of the circuit locll, ut introduces OR gtes which results in the loss of ring properties such s commuttivit, ssocitivit, nd distriutivit properties (i.e., OR nd XOR opertions do not follow ssoctivit rules with ech other); hence it hinders the fctoristion of the epression. 4. EXPLOITING CORRELATION TO BUILD OPTIMAL COUNTERS The previous section hs nswered negtivel to our first qustion: lrge counters cnnot lws e implemented optimll out of smller counters. It hs lso shown tht the reson for such impossiilit is the difficult of ccounting for correltions. We hve mentioned in the Section tht the lgorithm presented in [], known s Selective Epnsion improves the performnce of circuit eploiting the correltion etween the opernds of XOR opertion. Hence, one possiilit is to write the circuit for counter in Counter Del with FA nd HA Selective Epnsion size (ns) (ns) (.7,.) (.7,.) (.,.7) (.,.7) 4 (.9,.,.) (.9,.,.) 5 (.8,.4,.6) (.8,.,.6) 6 (.,.6,.8) (.4,.,.8) 7 (.7,.4,.8) (.4,.8,.8) 8 (.,.48,.48,.) (.,.5,.4,.) Tle : Comprison etween the dels of different counters efore nd fter using the Selective Epnsion lgorithm []. terms of full nd hlf dders, nd then use the Selective Epnsion lgorithm to improve the performnce of the circuit replcing full dders nd hlf dders with correlted inputs simpler opertors. In the Selective Epnsion lgorithm two kinds of correltion re mesured, clled locl correltion nd glol correltion. The locl correltion is the correltion etween the opernds of n XOR opertion, while glol correltion is the correltion mong the rest of the epression nd the opernds of n XOR opertion. If some empiricl correltion inde is ove threshold vlue, then the XOR gte is replced its equivlent epression in terms of AND nd OR gtes s shown elow: A B = AB(A + B), nd (A B)+C =(AB C)(A + B + C), where the epression ( ) is the sme s ( + ). Weused this lgorithm to optimise counters, nd the performnce of some of the counters optimised this lgorithm is shown in Tle. The first column shows the counter size, the second column shows the del vector (dels of individul output its from most significnt to lest significnt) of the underling counter when implemented using onl full nd hlf dders interconnected using the Three Greed Approch. The third column shows the del vector of the sme circuit fter optimising it using the Selective Epnsion lgorithm. In this section we hve eploited n eisting technique to generte etter lrge counters thn previousl possile with lgorithmic pproches. The onl prolem with the Selective Epnsion lgorithm is its computtionl compleit: lthough it produces fster counter nd compressor trees, it remins prcticl onl for smll size counters nd compressors. We turn therefore our ttention to the second set of questions mentioned in Section : how cn we uild lrge counter optimll using comintion of optimized counters up to certin size? 5. BUILDING COMPRESSORS FROM LARGE SET OF COUNTERS Since we cnnot implement counters nd compressors of ll sizes using the Selective Epnsion lgorithm, one possiilit is to find smll nd frequent locks in compressor tree, which cn e optimised using Selective Epnsion, nd then replce those locks the optimised circuit corresponding to them. One w is to consider counters s uilding locks nd implement the compressor tree using these optimised counters, thnks to the fct tht counters up to some significnt sizes (e.g., -it) cn e optimised using Selective Epnsion. Now the question is how lrge should e the set of uilding lock counters. As we hve seen, lrge counters hve more possiilit to use the correltion etween the opernds of XORs, we should etend the set of uilding lock counters s much s we cn. At this point one might lso think tht we should consider onl pri-

4 5 6 7 : t =. D XOR t =.5 D XOR c : t =. D XOR t =. D XOR c d 4 : t =. D XOR t =. D XOR t z =. D XOR z c d 7 : e f g t =. D XOR t = 4. D XOR t z =. D XOR z Note tht pths from different inputs to outputs hve different dels, e.g., i the output of : counter is implemented s (( ) c), then the del from c to is one XOR del while the del from to is XOR dels. The vlues on the left show the mimum of ll these dels. : : 4 5 6 7 4 5 6 7 : : : : : : 7 : : : 4 : 4 : : : : 5. XOR 5.5 XOR D D D D 5. XOR 4. XOR 5. XOR 5.5 D XOR 5. D XOR D D 4. XOR 4.5 D XOR 4.5 D XOR 4. D XOR. D XOR Figure : Implementtion of n eight-it counter using () onl full nd hlf dders, () onl priml (i.e., ( n ) to n) nd( : ) counters, nd (c) counters smller thn eight it. Note tht the use of lrge counters is not lws dvntgeous s it might leve the circuit lopsided.. ml counters, i.e., the counters which reduce ( n ) its into n its (e.g., ( : ), (7 : ), etc.) nd hlf dders. However, using onl priml counters might led to nonoptiml solution, ecuse it might mke the circuit unlnced, while using smller counters which re not priml m mke the circuit more lnced, nd hence fster. To understnd it more clerl let us consider the emple shown in Fig.. The first prt of the figure shows the dels of vrious counters in terms of the del of two-input XOR gte (which is n pproimtion of the rel del vlues). Net, Fig. () shows the implementtion of (8 : 4) counter using onl full nd hlf dders ccording to the Three Greed Approch. Fig. () shows the est implementtion of (8 : 4) counter using onl priml counters nd hlf dders. Finll, Fig. (c) shows the est implementtion of the sme (8 : 4) counter using n smller counters. Note tht in ll the three circuits, the computtion of the second most significnt it comes in the criticl pth. However, the criticl pth del in the three circuits is different. In the first nd second circuits the del is 5.5D XOR, nd in the lst circuit it is 4.5D XOR. The reson is tht when we implement the (8 : 4) counter using (4 : ) counters, we eploit the correltion mong the opernds of XOR more thn the correltion used in implementtion using full nd hlf dders, nd lso it is more lnced compred to the circuit uilt on (7 : ) counters. Although here we hve used n pproimte model to mesure the counter dels, the sme conclusion cn e deduced if we hd used the ctul dels of optimized counters s shown in Tle. 5. Approimte Del Model nd Prolem Formultion For the simplicit nd without loss of generlit we use onl upto (8 : 4) counters s uilding lock counters, nd to estimte the del of circuit uilt on these counters we consider the criticl pth del of the circuit, where the vlues corresponding to the del of the uilding lock counters re tken from Tle, i.e., the dels of counters optimised using Selective Epnsion. Now we cn formulte the prolem s follows. PROBLEM. Given set of input integers with not necessril identicl rrivl times, find the est implementtion of the compressor tree uilt on up to (8 : 4) counters to dd the input integers. 5. ILP Formultion Integer Liner Progrmming (ILP) hs een proved to e powerful method to solve comintoril optimistion prolems like the one mentioned ove. Although theoreticll solving n ILP is n NP-hrd prolem, however, mn tools like CPLEX [4] solve sufficientl lrge instnces of ILP within resonle time. An prolem which cn e formulted s n Integer Liner Progrm hs two elements: constrints nd ojective functions. The constrints should e in the form of liner inequlities nd equlities nd the ojective function must e liner function of the input vriles which hs to e minimised or mimised. Restriction of vriles to integers, Boolens, nd piecewise continuous vriles is lso llowed. Net we show how we formulte our prolem s n Integer Liner Progrmming prolem. First we define couple of terms which will help understnding the formultion. DEFINITION. Rnk of counter: If the inputs of counter re the inputs t it-position i, or the crries propgted from previous it positions to this it position, then the rnk of this counter is i. DEFINITION. Weight of signl: All the input signls t it-position i hve the weight i. Alsothej th output (strting from ) of counter with rnk i will hve weight (i + j). It is es to see tht ll the input signls of rnk i counter will hve weight i. Also the output t it position i will hve weight i. Note tht, since in ll counters ecept ( : ) counters the numer of inputs re t lest one more thn the numer of outputs, the numer of counters (ecluding the ( : ) counters) in compressor tree with N input its must e less thn N. If we consider ( : ) counters lso, then the upper ound cn e proved to e O(N ). However this is n etreme ound nd we llow onl cn counters in our compressor for some constnt c.

For the ske of revit we demonstrte our formultion using nd, or, if-else, m, min, etc. Such opertors cn e esil written using ILP with dditionl vriles nd constrints. The list of vriles used in the formultion nd their interprettion is given elow. size i: This denotes the size of the i th counter. Note tht size i cn vr from to 9 nd must e n integer (i.e., size i 9). If the size i is 9 for counter, tht mens the counter is null counter (i.e., unused counter). e ijk : It is Boolen vrile nd is true if there is connection etween the k th output signl of the i th counter nd the j th counter. The vlue of k vries from to, nd if the i th counter hs less thn (k +)outputs, then e ijk is set flse. p ij, ndq ijk : Both re Boolen vriles. p ij is true if there is n edge from the i th input it to the j th counter, while q ijk is true if the k th output signl of the i th counter corresponds to the j th output. Note tht e ijk nd q ij k cn not e true simultneousl (i.e., e ijk + q ij k ). t ij: It is rel vrile nd denotes the del of the j th output of the i th counter. r i: This denotes the rnk of the i th counter. h ijk : These re some specil vriles which mnge the counters with inputs of different rrivl times. Net we present the list of the constrints. The constrints cn e divided into three ctegories: I/O sed constrints, constrints sed on rnk of counters, del sed constrints, nd specil constrints. I/O Constrints: The numer of inputs nd outputs of counter should e consistent with its size, e.g., n (8 : 4) counter must hve 8 incoming connections nd 4 outgoing connections. This constrint cn e written s follows: if (size i =8), then (e jik )+ (p ki )=8, nd j<i k k if (size i =8), then (e ijk )+ (q ij k)=4. j>i k j k Also note tht some of the edge vriles cn e ssigned zero directl s mentioned ove. The following emples illustrte this kind of constrints: if (size i < 8), then j(e ij =). if (size i =9), then (j, k)(e ijk =). Rnk Bsed Constrints: The rnk of counter must e well defined, i.e., ll weight of ll its input signls must e equl to the rnk of this counter. As n emple suppose tht the m th input it ws t the n th it position, then the rnk of counter which uses this input must e n ndlsoll other input signls of this counter must hve weight n. More formll: if (p mi =), then r i = n. if (e ijk =), then r j = r i + k. Similr constrints for the outputs cn lso e pplied. Del Bsed Constrints: Del sed constrints put lower ounds on the dels of output signls of counter. A tpicl del constrints looks like: if (size j =5nd e ijk =), then t ik + d 5, t j, where d 5, is the del to compute the th it of (5 : ) counter. Note tht there is no upper ound on t ij. Thisisecuse our ojective function is to minimise the del, hence the vlues of t ij s will utomticll e set to their lower ounds. Also the ove constrint ssumes uniform rrivl times of the inputs of counter, which is not true. For emple if th of ( : ) counter with inputs,, c is implemented like (( ) c), then the del from c to output is D XOR, while the del from to output is D XOR. In order to del e minimized c should e the one with lrgest rrivl time. To hndle these kind of cses we define the new Boolen vrile h ijk which is true onl if, mong ll the inputs to j th counter, the one which is coming from i th hs the lrgest rrivl time. After introducing this vrile the del constrint for ( : ) counter will look s follows: if (size j =), then (h ijk )=, i<j k if (size j =nd e ijk =), then t ik + d, d,h ijk t j. Once gin, note tht we hve specified tht the sum of ll h ijk s is one; since the ojective function is to minimise the del, this will utomticll set true tht h ijk whose corresponding input hs the lrgest rrivl time mong ll the inputs. Specil Constrints: Other thn the ove constrints, we lso need to enforce tht ech input must e used ectl one counter nd ech output should correspond to ectl one output it of counter. In other words, (i) p ij =, nd j (j) q ijk =. i Ojective Function: One possiilit cn e to hve ectl one output it per it-position, in tht cse the ojective function will e to minimise the mimum of the dels of these output its. However, this method enforces tht the finl dder used must e ripple crr dder, ecuse tht is the onl dder which cn e mde using onl full dders nd other counters ut no other logic functions. Insted, we llow two temporr outputs per it-position so tht we cn use n pproprite finl dder. In this cse our ojective function corresponds to the following epression: minimise m i{m(tmpout i, tmpout i)+d i}. The d i s re constnts which denote the estimted del from i th temporr output its to the the slowest output it of finl dder. A resonle estimte of these constnt vlues cn e found implementing the compressor tree using Three Greed Approch. Using the ove constrints nd the mentioned ojective function, the prolem cn e fed to n stndrd ILP solver which cn find n optiml solution or n pproimtion fter some resonle time. 6. EXPERIMENTS We hve written C++ progrm which tkes the it-width nd rrivl times of input integers nd writes n ILP instnce corresponding to the optimistion of the compressor tree used to dd the integers. This instnce of ILP is solved the ILP solver CPLEX nd then we write the VHDL code corresponding to the resulting circuit with n pproprite choice of finl dder (the finl dder is k

-it Multiplier DesignWre 8.µm.4ns Three Greed Approch.µm.65ns Optimised Counter Approch 464.µm.4ns 6 6-it Multiplier DesignWre 5498.4µm.ns Three Greed Approch 7.5µm.8ns Optimised Counter Approch 4.µm.64ns 4 4-it Multiplier DesignWre 569.8µm 4.5ns Three Greed Approch 5.4µm.ns Optimised Counter Approch 49557.µm.95ns 6-it Counter Three Greed Approch 4.9µm.77ns Optimised Counter Approch 9.µm.6ns 4-it Counter Three Greed Approch 6.6µm.9ns Optimised Counter Approch 54.9µm.78ns -it Counter Three Greed Approch 69.µm.ns Optimised Counter Approch 886.µm.94ns 48-it Counter Three Greed Approch 89.µm.5ns Optimised Counter Approch 47.6µm.ns Tle : Optimistion results for ll our enchmrks. Dels (ns).4..8.6 Thrre Greed Approch Optimized Counter Approch 4 5 6 Bit Position Figure : Comprison of rrivl times of the 6 outputs of the 48-it counter generted the Three Greed Approch nd the Optimised Counter Approch. chosen using the lgorithm mentioned in [8]). The circuits re snthesised using common stndrd-cell lirr for UMC.µm CMOS technolog. Tle shows the results of our lgorithm. There re two qulittivel different kind of rithmeticcircuits: multipliers nd counters. In cse of multipliers we compre our results with the DesignWre implementtion nd lso with the multiplier generted using the Three Greed Approch. We hve implemented, 6 6, nd 4 4-it multipliers. As we cn see tht the multipliers generted our pproch (Optimised Counter Approch) re the fstest ones. The multiplier generted Optimised Counter Approch re 5% fster thn the ones generted the Three Greed Approch, nd the re penlt is lmost negligile. In some cses, such s the 4 4-it multiplier, the re of the Optimised Counter Approch multiplier is less thn tht of the Three Greed Approch Multiplier. The second set of enchmrks consist of counters. We hve implemented 6, 4,, nd 48-it counters. Here too we compre our results with the counters generted the Three Greed Approch using onl full nd hlf dders. Once gin, the counters generted our pproch re lmost 5% fster thn the ones produced the Three Greed Approch t the cost of negligile or no re overhed. The comprison of the del vectors of 48-it counter generted the two pproches is shown in Fig.. 7. CONCLUSIONS In this pper we hve shown tht there re still chnces to improve compressor trees, one of the most studied component in rithmetic circuits. We hve shown tht the compressor trees uilt on onl full nd hlf dders do not utilise the correltion mong vrious opernds nd hence produce suoptiml results. Also, compressor tree uilt on lrge size counters m e lopsided nd hence slow compred to the compressor tree uilt on smller counters. We hve presented n pproch sed on Integer Liner Progrmming which eploits the correltion mong vrious opernds s well s tries to mke the circuit s lnced s possile to improve the speed of the resulting circuit. The results show tht our pproch improves the speed of compressor tree lmost 5% compred to stte of rt techniques. 8. REFERENCES [] L. Ddd. Some schemes for prllel multipliers. Alt Frequenz, XXXIV:49 56, 965. [] L. Ddd nd D. Ferrri. Digitl multipliers: A unified pproch. Alt Frequenz, XXXVII():79 86, Nov. 968. [] J. Fdvi-Ardekni. M N Booth encoded multiplier genertor using optimized Wllce trees. IEEE Trnsctions on Ver Lrge Scle Integrtion (VLSI) Sstems, VLSI-(): 5, June 99. [4] ILOG. CPLEX Optimiztion Engine, 6. Version.. [5] V. G. Oklodzij, D. Villeger, nd S. S. Liu. A method for speed optimized prtil product reduction nd genertion of fst prllel multipliers using n lgorithmic pproch. IEEE Trnsctions on Computers, C-45():94 6, Mr. 996. [6] P. Song nd G. De Micheli. Circuit nd rchitecture trde-offs for high speed multipliction. IEEE Journl of Solid-Stte Circuits, 6(9), Sept. 99. [7] P. F. Stelling, C. U. Mrtel, V. G. Oklodzij, nd R. Rvi. Optiml circuits for prllel multipliers. IEEE Trnsctions on Computers, C-47():7 85, Mr. 998. [8] P. F. Stelling nd V. G. Oklodzij. Design strtegies for optiml hrid finl dders in prllel multiplier. Journl of VLSI Signl Processing, 4:, Dec. 996. [9] E. E. Swrtzlnder, Jr. Prllel counters. IEEE Trnsctions on Computers, C-(): 4, Nov. 97. [] Snopss. Creting High-Speed Dt-Pth Components Appliction Note, Aug.. Version.8. [] J. Um nd T. Kim. An optiml lloction of crr-sve-dders in rithmetic circuits. IEEE Trnsctions on Computers, C-5():5, Mr.. [] A. K. Verm nd P. Ienne. Improved use of the crr-sve representtion for the snthesis of comple rithmetic circuits. In Proceedings of the Interntionl Conference on Computer Aided Design, pges 79 98, Sn Jose, Clif., Nov. 4. [] A. K. Verm nd P. Ienne. Improving XOR-dominted rithmetic circuits eploiting dependencies etween opernds. In Proceedings of the Asi nd South Pcific Design Automtion Conference, Yokohm, Jpn, Jn. 7. [4] C. S. Wllce. A suggestion for fst multiplier. IEEE Trnsctions on Electronic Computers, C-():4 7, Fe. 964. [5] A. Weinerger. 4: crr-sve dder module. IBM Technicl Disclosure Bulletin,, Jn. 98.