Faster and Low Power Twin Precision Multiplier

Similar documents
Design of an optimized multiplier based on approximation logic

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India,

FPGA Implementation of Area Efficient and Delay Optimized 32-Bit SQRT CSLA with First Addition Logic

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

Design and Implementation of Carry Select Adder Using Binary to Excess-One Converter

DESIGN AND IMPLEMENTATION OF 64- BIT CARRY SELECT ADDER IN FPGA

An Efficient SQRT Architecture of Carry Select Adder Design by HA and Common Boolean Logic PinnikaVenkateswarlu 1, Ragutla Kalpana 2

Efficient Carry Select Adder Using VLSI Techniques With Advantages of Area, Delay And Power

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

International Journal of Scientific & Engineering Research, Volume 7, Issue 3, March-2016 ISSN

Index Terms: Low Power, CSLA, Area Efficient, BEC.

A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier

Design and Implementation of High Speed Carry Select Adder

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

Design and Analysis of CMOS Based DADDA Multiplier

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique

IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA

Design of 8-4 and 9-4 Compressors Forhigh Speed Multiplication

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

Design and Implementation of Efficient Carry Select Adder using Novel Logic Algorithm

DESIGN AND IMPLEMENTATION OF AREA EFFICIENT, LOW-POWER AND HIGH SPEED 128-BIT REGULAR SQUARE ROOT CARRY SELECT ADDER

DESIGN OF CARRY SELECT ADDER WITH REDUCED AREA AND POWER

Design of High Speed Hybrid Sqrt Carry Select Adder

128 BIT MODIFIED SQUARE ROOT CARRY SELECT ADDER

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

Low Power and Area EfficientALU Design

An Efficient Reconfigurable Fir Filter based on Twin Precision Multiplier and Low Power Adder

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

A Highly Efficient Carry Select Adder

Design and Implementation of High Speed Area Efficient Carry Select Adder Using Spanning Tree Adder Technique

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

NOVEL HIGH SPEED IMPLEMENTATION OF 32 BIT MULTIPLIER USING CSLA and CLAA

A VLSI Implementation of Fast Addition Using an Efficient CSLAs Architecture

Design and Analysis of Improved Sparse Channel Adder with Optimization of Energy Delay

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing

Design of 32-bit Carry Select Adder with Reduced Area

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

ADVANCES in NATURAL and APPLIED SCIENCES

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

International Journal of Modern Trends in Engineering and Research

II. LITERATURE REVIEW

An Inversion-Based Synthesis Approach for Area and Power efficient Arithmetic Sum-of-Products

Comparative Analysis of Various Adders using VHDL

VLSI IMPLEMENTATION OF AREA, DELAYANDPOWER EFFICIENT MULTISTAGE SQRT-CSLA ARCHITECTURE DESIGN

Tirupur, Tamilnadu, India 1 2

An Efficient Implementation of Downsampler and Upsampler Application to Multirate Filters

A Review on Different Multiplier Techniques

A Novel Approach For Designing A Low Power Parallel Prefix Adders

An Efficent Real Time Analysis of Carry Select Adder

Design and Implementation of Truncated Multipliers for Precision Improvement and Its Application to a Filter Structure

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

PUBLICATIONS OF PROBLEMS & APPLICATION IN ENGINEERING RESEARCH - PAPER CSEA2012 ISSN: ; e-issn:

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

A VLSI Implementation of Fast Addition Using an Efficient CSLAs Architecture

Efficient FIR Filter Design Using Modified Carry Select Adder & Wallace Tree Multiplier

LOW POWER AND AREA- EFFICIENT HALF ADDER BASED CARRY SELECT ADDER DESIGN USING COMMON BOOLEAN LOGIC FOR PROCESSING ELEMENT

Design and Analysis of CMOS based Low Power Carry Select Full Adder

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

Implementation of Cmos Adder for Area & Energy Efficient Arithmetic Applications

High Speed Non Linear Carry Select Adder Used In Wallace Tree Multiplier and In Radix-4 Booth Recorded Multiplier

National Conference on Emerging Trends in Information, Digital & Embedded Systems(NC e-tides-2016)

International Journal of Advanced Research in Biology Engineering Science and Technology (IJARBEST)

IJCAES. ISSN: Volume III, Special Issue, August 2013 I. INTRODUCTION

DESIGN OF MULTIPLE CONSTANT MULTIPLICATION ALGORITHM FOR FIR FILTER

An Optimized Design of High-Speed and Energy- Efficient Carry Skip Adder with Variable Latency Extension

A NOVEL IMPLEMENTATION OF HIGH SPEED MULTIPLIER USING BRENT KUNG CARRY SELECT ADDER K. Golda Hepzibha 1 and Subha 2

Compressor Based Area-Efficient Low-Power 8x8 Vedic Multiplier

Implementation of 32-Bit Carry Select Adder using Brent-Kung Adder

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY

A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast Arithmetic Circuits

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Design and Implementation of Wallace Tree Multiplier Using Kogge Stone Adder and Brent Kung Adder

CHAPTER 1 INTRODUCTION

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES

Low-Power Multipliers with Data Wordlength Reduction

Comparison of Conventional Multiplier with Bypass Zero Multiplier

Review On Design Of Low Power Multiply And Accumulate Unit Using Baugh-Wooley Based Multiplier

Generation of the Optimal Bit-Width Topology of the Fast Hybrid Adder in a Parallel Multiplier

High Performance 128 Bits Multiplexer Based MBE Multiplier for Signed-Unsigned Number Operating at 1GHz

VHDL Code Generator for Optimized Carry-Save Reduction Strategy in Low Power Computer Arithmetic

Design and Simulation of 16x16 Hybrid Multiplier based on Modified Booth algorithm and Wallace tree Structure

Structural VHDL Implementation of Wallace Multiplier

LowPowerConditionalSumAdderusingModifiedRippleCarryAdder

An Efficient Low Power and High Speed carry select adder using D-Flip Flop

Modified Design of High Speed Baugh Wooley Multiplier

Comparative Analysis of Multiplier in Quaternary logic

Wallace and Dadda Multipliers. Implemented Using Carry Lookahead. Adders

Low Power FIR Filter Structure Design Using Reversible Logic Gates for Speech Signal Processing

ISSN Vol.07,Issue.08, July-2015, Pages:

Transcription:

Faster and Low Twin Precision V. Sreedeep, B. Ramkumar and Harish M Kittur Abstract- In this work faster unsigned multiplication has been achieved by using a combination High Performance Multiplication [HPM] column reduction technique and implementing a N-bit multiplier using 4 N/2-bit multipliers (recursive multiplication) and acceleration the final addition using a hybrid adder. Low power has been achieved by using clock gating technique. Based on the proposed technique 16 and 32-bit multipliers are developed. The performance the proposed multiplier is analyzed by evaluating the delay, area and power, with TCBNPHP 9 nm process technology on interconnect and layout using Cadence NC launch, RTL compiler and ENCOUNTER tools. The results show that the 32-bit proposed multiplier is as much as 22% faster, occupies only 3% more area and consumes 3% lesser power with respect to the recently reported twin precision multiplier. Index Terms- Column compression, HPM multiplier, Hybrid final adder, gating. I I. INTRODUCTION n high performance digital systems such as microprocessors, FIR filters and digital signal processors etc., the multiplier is one the key hardware blocks. So the design multipliers stands challenging with advancement in technology. Many researchers have tried and are trying to design multipliers which fer either the following- high speed, low power consumption, regularity layout and hence less area or even combination them, thereby making them suitable for various compact, low power and high speed VLSI implementations. However area and speed are two conflicting constraints. So, improving speed results in larger area and vice versa. Hence we try to find out the best trade f solution amongst them. In recent trends the column compression multipliers are popular for faster computations due to their higher speeds [1-2]. The first column compression multiplier was introduced by Wallace in 1964 [3]. In 1965, Dadda altered the approach Wallace by starting with the exact placement the (3,2) counters and (2,2) counters in the maximum critical path delay the multiplier [4]. In 26, H. Eriksson along with his research team presented HPM reduction tree structure that has an ease layout compared to Dadda s approach [5]. Compared to Dadda, HPM is slightly faster and consumes lesser power while area being the same. So we implemented the multiplier design using HPM. V. Sreedeep completed this work while pursuing M. Tech in VLSI Design at the School Electronics Engineering, VIT University, Vellore (email: v.sreedeep@gmail.com). B. Ramkumar is with the School Electronics Engineering, VIT University, Vellore (email: ramkumar.b@vit.ac.in). Harish M Kittur.is with the School Electronics Engineering, VIT University, Vellore (email: kittur@vit.ac.in) The total delay the multiplier can be split up into three parts: 1. The Partial Product Generation (PPG) 2. The Partial Product Summation Tree (PPST), and 3.The Final Adder [6]. Of these the dominant components the multiplier delay are due to the PPST and the final adder. The relative delay due to the PPG is small. Therefore significant improvement in the speed the multiplier can be achieved by reducing the delay in the PPST and the final adder stage the multiplier. Here we are reducing the PPST delay using faster multiplication technique performing the N-bit multiplication by 4 N/2-bit multiplications running in parallel and by the hybrid adder we are reducing the final adder delay. The bit width the multiplier is same as that the bit width the largest operand the application that the processor executes. But most the times the operands do not occupy the maximum width and utilizes the resources unnecessarily which results in power loss. In the year 25 Magnus Sjalander explored on this idea to reduce this type power consumption by using operand guarding technique and named it as Twin Precision Technique [6]. Now in this paper we are utilizing the same property for reducing the power and operator isolation is being performed using clock gating technique. The remaining paper is organized as follows: Section II describes the design the faster multiplier structure. Section III describes the design hybrid adder and clock gating. Section IV is all about result analysis. Section V is the Conclusion. Section VI includes the bibliography. Assumtion- bitwidth and multiplicand bitwidth are same. II. DESIGN OF FASTER MULTIPLIER STRUCTURE The first step multiplication is PPG. PPG can be done by using an AND gate array or series multiplexers. The next step is PPST. The PPG and PPST are shown in the following subsections A. Partial Product Generation [PPG] Here we are considering N-bit multiplier, so let us assume Multiplicand Y = y n-1 y n-2 y n-3........ y 3 y 2 y 1 y X = x n-1 x n-2 x n-3........ x 3 x 2 x 1 x So the partial products are (y j x i ) {,1} where i, j =,1,.n-1. So for a N x N multiplication we are having a total N 2 partial products as shown in the figure. 1(a). The value y j x i is 1 when both the operand bits are high and when any one the operand bit zero. Thus an AND gate can be used for the generation partial products. For the convenience representation architecture we are considering N = 8. Figure. 1(b) there are four different partial products arrays, them the partial products that are marked

7 6 5 4 3 2 1 15 14 13 12 11 1 9 8 23 22 21 2 19 18 17 16 31 3 29 28 27 26 25 24 39 38 37 36 35 34 33 32 47 46 45 44 43 42 41 4 obtained result is given to N+1 bit RCA along with the MSB N/2 bits product from M1 and the LSB N/2-bits product from M4. The MSB N/2 bits M4 product are given to N/2-1 bit RCA with 1 as carry input and calculating the result before the actual carry arrives. We have used a multiplexer for selecting the product based on the actual carry generated by N+1-bit RCA. This dependency and flow can be clearly observed in figure. 3. 55 54 53 52 51 5 49 48 63 62 61 6 59 58 57 56 (a) Inp1 [N-1: Inp2 [N-1: N Bit Register N Bit Register 7 6 5 4 3 2 1 a [N-1: b [N-1: 15 14 13 12 11 1 9 8 a [N/2-1: b [N/2-1: a [N/2-1: b[n-1: N/2] a [N-1: N/2] b [N/2-1: a [N-1: N/2] b[n-1: N/2] 23 22 21 2 19 18 17 16 31 3 29 28 27 26 25 24 (M1) (M2) (M3) (M4) 39 38 37 36 35 34 33 32 47 46 45 44 43 42 41 4 55 54 53 52 51 5 49 48 63 62 61 6 59 58 57 56 7 6 5 4 3 2 1 mul2 15 14 13 12 mul1 11 1 9 8 23 22 21 2 19 18 17 16 31 3 29 28 27 26 25 24 (b) P1 [N-1: P1 [N-1: N/2] N Bit RCA Adder {P1 [N-1: N/2], P4 [N/2: PS [N: N+1 Bit RCA N/2-1 Bit RCA P [N/2-1: P2 [N-1: P3 [N-1: P4 [N-1: P [3N/2: N/2] Carry (select Par [N/2-1: 2N Bit Output Register Figure. 2 Res architecture. [2N-1: P4 [N/2: N/2-1 Multiplexer P [2N-1: 39 38 37 36 35 34 # # p4[7] p4[6] p4[5] p4[4] p4[3] p4[2] p4[1] p4[ p1[7] p1[6] p1[5] p1[4] p1[3] p1[2] p1[1] p1[ mul4 47 46 45 44 mul3 43 42 41 # p2[7] p2[6] p2[5] p2[4] p2[3] p2[2] p2[1] p2[ 55 54 53 52 51 5 49 48 p3[7] p3[6] p3[5] p3[4] p3[3] p3[2] p3[1] p3[ 63 62 61 6 59 58 57 56 (c) Figure. 1 Partitioning partial products: (a) Partial product array for N = 8. (b) Partial Product array showing four partial product arrays N = 4. (c) Rearranged partial products assigned to four different multipliers. in black are interdependent and cannot be used for parallel operation. But the partial products that are not in black can be operated in parallel as these are independent. This technique was used in [7] for twin precision multiplication. B. The Partial Product Summation Tree [RPPST] for Recursive In our design we have segregated the partial products as shown in figure. 1(c) and each partial product array is given to a N/2-bit multiplier. Each N/2-bit multiplier uses HPM as column reduction technique [5] and uses ripple carry adder (RCA) as a final adder for computing the product. The four products thus obtained are used for the computation final product. The proposed architecture with RCA as final adder and the flow data is shown in the figure. 2. The architecture each N/2- bit multiplier is shown in figure. 6 [7]. Now as mentioned earlier the partial products that are dependent (marked in black in figure. 1(b) and 1(c)) are given to M2 and M3 respectively. The products obtained from M2 and M3 are given to a N- bit RCA and the p4[7] p4[6] p4[5] p4[4] p4[3] p4[2] p4[1] p4[ p1[7] p1[6] p1[5] p1[4] p1[3] p1[2] p1[1] p1[ 1 pa[8] pa[7] pa[6] pa[5] pa[4] pa[3] pa[2] pa[1] pa[ Figure. 3 Products four multipliers [M1, M2, M3, M4] III. THE HYBRID FINAL ADDER DESIGN AND CLOCK GATING A. MBEC ADDER DESIGN In previous works the hybrid final adder designs used to achieve the faster performance in parallel multipliers were made up Carry Look ahead Adder (CLA) and Carry Select Adder (CSLA) [8-1. But CSLA occupies very large chip area than other adders (2x times compared to RCA). Here in this paper we are using MBEC (Multiplexers with Binary to Excess-1 Converters) to achieve the optimal performance. When compared to Carry Save Adder (CSA) and CLA adder MBEC is much faster and occupies lesser area and consumes less power compared to CSLA [11]. In the proposed architecture we have used N/2-1 bit RCA and multiplexer for adding the MSB n/2 bits M4 product before the carry bit arrives by giving 1 as carry input as shown in figure.2 thus making the operation slower, occupying more area and consuming more power. Now the N/2-1 bit RCA is replaced with Binary to Excess-1 Converter (BEC). The logic diagram a 5-bit BEC is

shown in figure. 4. The BEC is used for further improving the speed. Twin Inp1 [N-1: Inp2 [N-1: b4 b3 b2 b1 b DECODER b4 b3 b2 b1 b IN1[N/2-1: IN1[N/2-1: IN1[N-1: N/2] IN1[N-1: N/2] IN2[N/2-1: IN2[N-1: N/2] IN2[N/2-1: IN2[N-1: N/2] B. Gating x4 x3 x2 x1 x x4 x3 x2 x1 x Figure. 4 5 Bit Binary to Excess-1 Converter a [N/2-1: (M1) P1 [N-1: a [N/2-1: (M2) a [N-1: N/2] b [N/2-1: b[n-1: N/2] b [N/2-1: (M3) P3 [N-1: P2 [N-1: N Bit RCA Adder a [N-1: N/2] (M4) b[n-1: N/2] P4 [N-1: As mentioned earlier we are using operator isolation for the reduction power using clock gating technique. gating technique is nothing but to control the clock using one control signal. This can be performed using a simple AND gate. In our design previously we were using two N- bit registers for inputs are now replaced by 8 N/2-bit registers i.e., 2 for each multiplier which are driven by 3 different clocks generated by using the original clock and a control circuit as shown in figure. 5. The circuit used here is a 2 to 3 decoder where our operation mode is input and we are generating 3 outputs that are in turn can be used for generating 3other clocks that control the flow data in to the multipliers through registers. The decoder truth table is shown in Table I. Mode Both M1 and M4 in operation for Twin Precision 1 Only M1 in operation 1 Only M4 in operation 11 Full Mode operation TABLE I DECODER TRUTH TABLE T[1] T[2] T[3] 1 1 1 1 1 1 1 P1 [N-1: N/2] {P1 [N-1: N/2], P4 [N/2: PS [N: N+1 Bit RCA N/2-1 Bit BEC P [N/2-1: P [3N/2: N/2] Carry (select Par [N/2-1: 2N Bit Output Register Res [2N-1: Figure. 5. Architecture with BEC Adder and Gating IV. RESULT ANALYSIS P4 [N/2: N/2-1 Multiplexer P [2N-1: The comparison between the Table II (regular Twin precision multiplier in [7]) and Table III (proposed multiplier with BEC adder and clock gating) summarizes the enhanced performance the proposed multiplier in terms percentages which are listed in Table IV. The power results are calculated dynamically with 1 inputs for 16 bit multiplier and with 15 inputs for 32 bit multiplier. The summary power comparisons in Table II and III for 16 and 32 bit are plotted respectively in figures 6 and 7. The area and timing comparison plots are shown in figures 8 and 9 respectively. The power delay products are shown in Table V. TABLE II REGULAR TWIN PRECISION MULTIPLIER (SJALANDER ET AL.) The advantage in this design compared to the regular twin precision multiplier in [7] is that we are isolating the operator instead operand guarding. So in this design we can make use one multiplier at a time for one N/2- bit multiplication but in regular twin precision we have to give all zeros for MSB N/2 bits multiplier and multiplicand in order to operate the multiplier for same operation, so there is restriction in giving inputs which is not feasible always. But the control circuit here provides an advantage to overcome this. The architecture shown in figure 5 not only increase speed but also provide the N/4 bit multiplication with less power consumption. This can be clearly observed from the result analysis. One Two (kµm2) 12.34 41.656 (ps) 3 5.5 (mw) 1.285.6.325 6.217 2.852 1.57

TABLE III PROPOSED MULTIPLIER (kµm2) 12.471 (ps) 2.6 (mw) 1.331 that area overheads are not significant when compared to the increase in speed and reduction in power consumption. The proposed multiplier design technique can be implemented with any type parallel multipliers to achieve faster and low power performance. This work can be easily extended to signed multiplication..568 16 x16.26 1.5 One Two 42.985 4.25 4.362 1.846.985 TABLE IV PERFORMANCE OF THE PROPOSED MULTIPLIER WITH RESPECT TO REGULAR TWIN PRECISION MULTIPLIER POWER (mw) 1.5 Conventional 16 x 16 Figure. 6 Comparison Plot for multiplier One Two +1.357 +3.19-13.334-22.727 3.476-5.261-19.927-29.835-35.278 (mw) 1 Comparision Conventional Two one Figure. 7 Comparison Plot for multiplier -34.618 TABLE V POWER DELAY PRODUCT COMPARISON OF THE PROPOSED MULTIPLIER WITH RESPECT TO REGULAR TWIN PRECISION MULTIPLIER Sjalander Energy (mj) 3.8484 Percentage (Kµm 2 ) comparison 16 and 32 bit s 6 4 2 3.4593-1.116 Figure. 8 Comparison Plot for both and multiplier Sjalander 34.1971 18.5397-45.7825 Figure. 1 represent all the percentage results shown in Table IV. V. CONCLUSION We have successfully achieved a faster and low power multiplication by using a combination High Performance Multiplication [HPM] column reduction technique and implementing a N-bit multiplier using 4 N/2-bit multipliers by rearranging partial products and acceleration the final addition using a hybrid adder, low power has been achieved by using clock gating technique. The result analysis shows Delay (ns) Delay Comparision 16 and 32 Bit s 6 4 2 Figure. 9 Delay Comparison Plot for both and multiplier

4 Percentage Improvement in Design Percentage 3 2 1 power for one power for Two Conventional -1 Figure. 1 Percentage Comparison Plot for Table I and Table II V. REFERENCES [1] B.Parhami, "Computer Arithmetic", Oxford University Press, 2. [2] E. E. Swartzlander, Jr. and G. Goto, "Computer arithmetic," The Computer Engineering Handbook, V. G. Oklobdzija, ed., Boca Raton, FL: CRC Press, 22. [3] C. S. Wallace, A Suggestion for a Fast, IEEE Transactions on Electronic Computers, Vol. EC-13, pp. 14-17, 1964. [4] Luigi Dadda, Some Schemes for Parallel s, Alta Frequenza, Vol. 34, pp. 349-356, August 1965 [5] H. Eriksson, P. Larsson-Edefors, M. Sheeran, M. Själander, D. Johansson, and M. Schölin, reduction tree with logarithmic logic depth and regular connectivity, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 26, pp. 4 8. [6] V. G. Oklobdzija and D.Villeger, Improving Design by Using Improved Column Compression Tree and Optimized Final Adder in CMOS Technology, IEEE transactions on Very Large Scale Integration (VLSI) systems, Vol. 3, no. 2, June 1995. [7] Magnus Själander and Per Larsson-Edefors, Multiplication Acceleration Through Twin Precision, IEEE Trans. O VLSI Systems vol. 17, no. 9, pp. 1233-1245 Sep 29. [8] V. G. Oklobdzija and D.Villeger, Improving Design by Using Improved Column Compression Tree and Optimized Final Adder in CMOS Technology, IEEE transactions on Very Large Scale Integration (VLSI) systems, Vol. 3, no. 2, June 1995. [9] Paul F.Stelling, Design strategies for optimal hybrid final adders in parallel multiplier,journal VLSI signal processing, vol 14,pp,321-331,1996. [1 Sabyasachi Das and Sunil P.Khatri,"Generation the Optimal Bit- Width Topology the Fast Hybrid Adder in a Parallel ", International Conference on Integrated Circuit Design and Technology (ICICDT) May, 27. [11] B.Ramkumar, Harish M Kittur and P.Mahesh Kannan, ASIC Implementation Modified Faster Carry Save Adder, European Journal Scientific Research, Vol. 42, Issue 1, 21. [12] B.Ramkumar, Harish M Kittur, Low, Low CSLA, IEEE Transactions on Very Large Scale Integration (VLSI) systems, accepted for publication DOI:1.119/TVLSI.21.211621 [13] K.C. Bickerstaff, E.E. Swartzlander, M.J. Schulte, Analysis column compression multipliers, Proceedings 15th IEEE Symposium on Computer Arithmeitc,21. [14] W. J. Townsend, Earl E. Swartzlander and J.A. Abraham, A comparison Dadda and Wallace multiplier delays, Advanced Signal Processing Algorithms, Architectures and Implementations XIII. Proceedings the SPIE, vol. 525, 23, pages 552-56. [15] Danysh and Swamlander Jr., "A recursive fast multiplier", Asilomar Conf. on Signals, Systems & Computers, vol. 1, pp. 197-21, 1998.