Abstract. 2. MUX Vs XOR-XNOR. 1. Introduction.

Similar documents
A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast Arithmetic Circuits

Design of 8-4 and 9-4 Compressors Forhigh Speed Multiplication

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Design A Power Efficient Compressor Using Adders Abstract

Design and Implementation of Complex Multiplier Using Compressors

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Design of an Energy Efficient 4-2 Compressor

IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA

II. Previous Work. III. New 8T Adder Design

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER

Low Power 3-2 and 4-2 Adder Compressors Implemented Using ASTRAN

A Review on Low Power Compressors for High Speed Arithmetic Circuits

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

High Performance 128 Bits Multiplexer Based MBE Multiplier for Signed-Unsigned Number Operating at 1GHz

Implementation of Carry Select Adder using CMOS Full Adder

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India,

Investigation on Performance of high speed CMOS Full adder Circuits

A Efficient Low-Power High Speed Digital Circuit Design by using 1-bit GDI Full Adder Circuit

Design of Delay-Power Efficient Carry Select Adder using 3-T XOR Gate

DESIGN AND IMPLEMENTATION OF AREA EFFICIENT, LOW-POWER AND HIGH SPEED 128-BIT REGULAR SQUARE ROOT CARRY SELECT ADDER

High Speed NP-CMOS and Multi-Output Dynamic Full Adder Cells

DESIGN AND ANALYSIS OF LOW POWER 10- TRANSISTOR FULL ADDERS USING NOVEL X-NOR GATES

By Dayadi Lakshmaiah, Dr. M. V. Subramanyam & Dr. K. Satya Prasad Jawaharlal Nehru Technological University, India

Performance Analysis Comparison of 4-2 Compressors in 180nm CMOS Technology

Design of 32-bit Carry Select Adder with Reduced Area

Design of an optimized multiplier based on approximation logic

nd International Conference on VLSI Design

Low power high speed hybrid CMOS Full Adder By using sub-micron technology

An Efficient Higher Order And High Speed Kogge-Stone Based CSLA Using Common Boolean Logic

A Compact Design of 8X8 Bit Vedic Multiplier Using Reversible Logic Based Compressor

A High Speed Low Power Adder in Multi Output Domino Logic

High Performance Low-Power Signed Multiplier

An energy efficient full adder cell for low voltage

Efficient FIR Filter Design Using Modified Carry Select Adder & Wallace Tree Multiplier

AREA OPTIMIZED ARITHMETIC AND LOGIC UNIT USING LOW POWER 1-BIT FULL ADDER

Efficient Carry Select Adder Using VLSI Techniques With Advantages of Area, Delay And Power

Design and Implementation of High Speed Carry Select Adder

LOW POWER AND AREA- EFFICIENT HALF ADDER BASED CARRY SELECT ADDER DESIGN USING COMMON BOOLEAN LOGIC FOR PROCESSING ELEMENT

LowPowerConditionalSumAdderusingModifiedRippleCarryAdder

An Efficient SQRT Architecture of Carry Select Adder Design by HA and Common Boolean Logic PinnikaVenkateswarlu 1, Ragutla Kalpana 2

Australian Journal of Basic and Applied Sciences. Optimized Embedded Adders for Digital Signal Processing Applications

NOVEL HIGH SPEED IMPLEMENTATION OF 32 BIT MULTIPLIER USING CSLA and CLAA

Implementation of Cmos Adder for Area & Energy Efficient Arithmetic Applications

A Low-Power 12 Transistor Full Adder Design using 3 Transistor XOR Gates

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

Implementation of 32-Bit Carry Select Adder using Brent-Kung Adder

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

Index Terms: Low Power, CSLA, Area Efficient, BEC.

2. URDHAVA TIRYAKBHYAM METHOD

FPGA Implementation of Area Efficient and Delay Optimized 32-Bit SQRT CSLA with First Addition Logic

Wallace Tree Multiplier Designs: A Performance Comparison Review

DESIGN AND IMPLEMENTATION OF 64- BIT CARRY SELECT ADDER IN FPGA

Implementation of Efficient 5:3 & 7:3 Compressors for High Speed and Low-Power Operations

Low Power and Area EfficientALU Design

International Journal of Modern Trends in Engineering and Research

ISSN:

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

IMPLEMENTATION OF POWER GATING TECHNIQUE IN CMOS FULL ADDER CELL TO REDUCE LEAKAGE POWER AND GROUND BOUNCE NOISE FOR MOBILE APPLICATION

Domino CMOS Implementation of Power Optimized and High Performance CLA adder

FIR Filter Fits in an FPGA using a Bit Serial Approach

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

Design and Implementation of High Speed Area Efficient Carry Select Adder Using Spanning Tree Adder Technique

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

VLSI IMPLEMENTATION OF AREA, DELAYANDPOWER EFFICIENT MULTISTAGE SQRT-CSLA ARCHITECTURE DESIGN

Design and Analysis of CMOS based Low Power Carry Select Full Adder

Power Efficient adder Cell For Low Power Bio MedicalDevices

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

An Efficient and High Speed 10 Transistor Full Adders with Lector Technique

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

Improved 32-bit Conditional Sum Adder for Low-Power High-Speed Applications

A VLSI Implementation of Fast Addition Using an Efficient CSLAs Architecture

A Literature Survey on Low PDP Adder Circuits

Compressors Based High Speed 8 Bit Multipliers Using Urdhava Tiryakbhyam Method

Design of New Full Swing Low-Power and High- Performance Full Adder for Low-Voltage Designs

National Conference on Emerging Trends in Information, Digital & Embedded Systems(NC e-tides-2016)

Design and Implementation of Carry Select Adder Using Binary to Excess-One Converter

Adder (electronics) - Wikipedia, the free encyclopedia

ASIC Implementation of High Speed Area Efficient Arithmetic Unit using GDI based Vedic Multiplier

Design of High performance and Low Power 16T Full Adder Cells for Subthreshold Voltage Technology

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

Design and Performance Analysis of High Speed Low Power 1 bit Full Adder

A NOVEL 4-Bit ARITHMETIC LOGIC UNIT DESIGN FOR POWER AND AREA OPTIMIZATION

Low-Power Multipliers with Data Wordlength Reduction

DESIGN OF EXTENDED 4-BIT FULL ADDER CIRCUIT USING HYBRID-CMOS LOGIC

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES

An Efficent Real Time Analysis of Carry Select Adder

Low power 18T pass transistor logic ripple carry adder

128 BIT MODIFIED SQUARE ROOT CARRY SELECT ADDER

Area and Power Efficient Pass Transistor Based (PTL) Full Adder Design

Design of 16-bit Heterogeneous Adder Architectures Using Different Homogeneous Adders

Compressor Based Area-Efficient Low-Power 8x8 Vedic Multiplier

Two New Low Power High Performance Full Adders with Minimum Gates

Two New Low Power High Performance Full Adders with Minimum Gates

Design and Analysis of Row Bypass Multiplier using various logic Full Adders

International Journal of Advance Engineering and Research Development

IMPLEMENTATION OF AREA EFFICIENT AND LOW POWER CARRY SELECT ADDER USING BEC-1 CONVERTER

An Efficient Carry Select Adder with Reduced Area and Low Power Consumption

DESIGN OF PARALLEL MULTIPLIERS USING HIGH SPEED ADDER

Transcription:

Novel rchitectures for High-peed and Low-Power 3-, 4- and - Compressors reehari Veeramachaneni, Kirthi Krishna M, Lingamneni vinash, reekanth Reddy Puppala, M.. rinivas Centre for VLI and Embedded ystem Technologies. International Institute of Information Technology Gachibowli, Hyderabad-3, India. srihari@research.iiit.ac.in, {kirthikrishna, avinashl, sreekanthp}@students.iiit.ac.in, srinivas@iiit.ac.in. bstract The 3-, 4- and - compressors are the basic components in many applications, in particular partial product summation in multipliers. In this paper novel architectures and designs of high speed, low power 3-, 4- and - compressors capable of operating at ultra-low voltages are presented. The power consumption, delay and area of these new compressor architectures are compared with existing and recently proposed compressor architectures and are shown to perform better. The proposed architecture lays emphasis on the use of multiplexers in arithmetic circuits that result in high speed and efficient design. lso in all existing implementations of gate and multiplexers, both output and its complement are available but current designs of compressors do not use these outputs efficiently. In the proposed architecture these outputs are efficiently utilized to improve the performance of compressors. The combination of low power, low transistor count and lesser delay makes the new compressors a viable option for efficient design. 1. Introduction. Multiplication is a basic arithmetic operation important in applications like digital signal processing which rely on efficient implementation of generic arithmetic logic units (LU) and floating point units to execute dedicated operations like convolution and filtering. In the implementation of multipliers, the main phases are generation of partial products, reduction of partial products using C (carry-save architecture) [7-1] and a carry propagation adder for the computation of the final result. It is obvious that the second phase, that is, the reduction of the partial products contributes most to the overall delay, area and power. In most of these implementations, compressor lies directly within the critical path dictating the overall circuit, due to which the demand for high-speed and low-power compressors is continuously increasing [7-9]. This paper presents new compressor architectures that lay emphasis on the use of multiplexers in place of gates to efficiently use the outputs from the previous stages and improve the overall performance. It is because the use of multiplexers improves the speed when placed in the critical path []. The rest of the paper is organized as follows: In ection the efficiency of and -XNR are compared and the possibility of replacing with -XNR is discussed. In section 3, 4, & 6 the proposed architectures of 3-, 4- and - compressors are presented and compared with the existing architectures. Implementations have been carried out in.18µm CM technology.. Vs -XNR. CM designs of x1 multiplexer and - input gate are shown in Fig.1 []. xnor xor xor -XNR xnor Fig.1. CM Implementations of - XNR In Fig.1, it can be seen that if both the select bit and its complement arrive before the inputs arrive then th International Conference on VLI Design (VLID'7) -769-76-/7 $. 7

the output is generated with very less delay because switching of the transistors is already completed. lso if both the select bit and its complement are generated in the previous stage then the additional stage of the inverter is eliminated which reduces the overall delay in the critical path []. y using the output and its complement in every stage the total number of garbage outputs is reduced. y decreasing the number of transistors the overall power consumption and the area occupied is reduced considerably [1]. n alternative design of the multiplexer is shown in Fig.. governing the existing 3- compressor outputs are shown below um = x1 x3 () = ) x3 + ) In the proposed architecture shown in Fig. 4, the fact that both the and XNR values are computed is efficiently used to reduce the delay by replacing the second with a. This is due to the availability of the select bit at the block before the inputs arrive. Thus the time taken for the switching of the transistors in the critical path is reduced. X3 (3) -XNR Fig.. Transmission Gate Implementation of a multiplexer This design of the multiplexer is faster than the CM design when buffers are not used at the output [1]. ut these can only be used in the intermediate stages because of their limited driving capability. This design also consumes lesser power than the CM design []. In the proposed architectures the blocks where this design can be used are shown as *. 3. 3- Compressor. 3- compressor takes 3 inputs,, X3 and generates outputs, the sum bit, and the carry bit C as shown in Fig.3a. The compressor is governed by the basic equation + + X3 = um + * (1) X3 3 um UM Fig.3. 3- Compressor Conventional Implementation of the 3- compressor The 3- compressor can also be employed as a full adder cell when the third input is considered as the input from the previous compressor block or X3 = C in. architectures shown in Fig.3 employ two gates in the critical path [3-6]. The equations UM Fig.4. architecture of the 3- Compressor The equations governing the 3- compressor outputs are shown below um= x) x) x3 (4) = x) x) () It can be seen that in this implementation the overall delay is - + - (where refers to delay). 4. 4- Compressor. The 4- compressor has 4 inputs,, X3 and X4 and outputs um and along with a -in () and a -out () as shown in Fig. The input is the output from the previous lower significant compressor. The is the output to the compressor in the next significant stage. X3 X4 4 um Fig.. 4- Compressor lock imilar to the 3- compressor the 4- compressor is governed by the basic equation th International Conference on VLI Design (VLID'7) -769-76-/7 $. 7

x1+x+x3+x4+ = um + *( + ) (6) The standard implementation [3-6] of the 4- compressor is done using Full dder cells as shown in Fig 6. X3 F X4 F um X3 X4 um Fig.6. 4- compressor implemented with full adders implementation of 4- compressor When the individual full dders are broken into their constituent blocks, it can be observed that the overall delay is equal to 4* -. The block diagram in Fig. 6 shows the existing architecture for the implementation of the 4- compressor with a delay of 3* - [3-6]. The equations governing the outputs in the existing architecture are shown below um = x1 x x4 = ) ) = + (7) (8) x4 (9) However, like in the case of 3- compressor, the fact that both the output and its complement are available at every stage, is neglected []. Thus replacing some blocks with multiplexers results in a significant improvement in delay. X3 X4 -XNR -XNR * um Fig 7. 4- Compressor rchitecture lso the block at the UM output gets the select bit before the inputs arrive and thus the transistors are already switched by the time they arrive. This minimizes the delay to a considerable extent. This is shown in Fig. 7. The equations governing the outputs in the proposed architecture are shown below um= x3 x4+ ( x3 + x3 x4 + ( x3 = = x x3 + x x3 x4 (1) (11) (1) The critical path delay of the proposed implementation is - + * -.. - Compressor. The - Compressor block has inputs,,x3,x4,x and outputs, um and, along with input carry bits (, ) and output carry bits (,) as shown in Fig.8a. The input carry bits are the outputs from the previous lesser significant compressor block and the output carry are passed on to the next higher significant compressor block. X3 X4 X um X3 X4 X F F F um Fig.8. - compressor block Conventional implementation of a - compressor block The basic equation that governs the function of the - compressor block is given below ++X3+X4+X++ =um+*( + + ) (13) The conventional implementation [3-6] of the compressor block is shown in Fig.8 where 3 cascaded full adder cells are used. When these full adders are replaced with their constituent blocks of gates then it can be observed that the overall delay is equal to 6* - for the sum or carry output. Many architectures have been proposed where the delay has been reduced to * - (Fig.9a) and then further reduced to 4* -. (Fig.9 b&c) [3-6]. th International Conference on VLI Design (VLID'7) -769-76-/7 $. 7

X3 X4 X UM CGEN1 (+) (X3+X4) * ^ X3 X4 X UM X3 X4 X * UM * ^ ( + X3X4) Fig.9 architectures of - compressors CGEN1 X3 X4 X -XNR * * -XNR * the block in the second stage with a block reduces the delay because the select bit X3 is already available and the time taken for the transistor switching to take place is done in parallel with the computation of the inputs of the block. s mentioned before, in all the general implementations of the or block, in particular CM implementation, the output and its complement are generated. ut in the existing architectures this advantage is not being utilized at all [3-6]. In the proposed architecture these outputs are utilized efficiently by using multiplexers at select stages in the circuit. lso additional inverter stages are eliminated. This in turn contributes to the reduction of delay, power consumption and transistor count (area). The equations governing the outputs are shown below: um = x1 x x4 x (14) = + x) x1 x (1) = ( x4 x) + ( x4 x) x4 (16) = ( x x3) ( x4 x )) + (17) ( x3) ( x4 x )) x3) The critical path delay of the proposed implementation is - + 3* -. In the generation module mentioned in Fig.1, we use the mathematical equation (1) to design a CM implementation of as shown in Fig.11. X3 UM Fig.1. architecture of the - compressor In the proposed architecture changes have been made, to efficiently use the outputs generated at every stage, by replacing a few blocks with blocks. lso the select bits to the multiplexers in the critical path are made available much ahead than the inputs so that the critical path delay is minimized. For example the output from the previous lesser significant compressor block is utilized as the select bit after a stage it is produced so that the block is already switched and the output is produced as soon as the inputs arrive. lso if the output of the multiplexer is used as select bit for another multiplexer, then it can be used efficiently in similar manner because the negation of select bit is also required, as shown in Figure 1, in the design and an extra stage to compute the negation can be saved. imilarly replacing X3 Fig.11. Generation Module (CGEN1) 6. imulation and results a. imulation environment. ll the simulations have been done using Cadence Tools. The calculation of power (including glitch power) and delay are carried out using the Virtual nalog imulation tool already integrated into Cadence Tools. ll the schematics and layouts (Fig 13, 1 & 17) are done using the CM.18-µm th International Conference on VLI Design (VLID'7) -769-76-/7 $. 7

technology. Hence the circuits are optimized for this process technology. The simulations are performed under various voltages ranging from.9v to 3.3V. ll the inputs are fed at a frequency of 1MHz. 6 4 3. imulation results. The proposed and the existing architectures [3-6] have been compared by implementing both of them in.18-µm CM technology. Power (nw) Delay (ns) 1 8 6 4 8 6 4 Power-delay product (nw-ns) Voltage (V) Voltage (V) 1 1 8 6 4 Voltage (V) Figure 1Power consumption(nw) Delay(ns) Power Delay product for - compressors 1 7 6 4 3 1 Figure 14 Power consumption (nw) Delay(ns) Power Delay product for 4- compressors Fig.1 Layout of the proposed 4- compressor architecture 3. 3 1 1 Exist ing 3. 1. Ex ist ing 1. Fig.13 Layout of the proposed - compressor rchitecture 6 3 1 1 4 3 1 Figure 16Power consumption(nw) Delay(ns) Power Delay product for 3- compressors th International Conference on VLI Design (VLID'7) -769-76-/7 $. 7

Fig.17 Layout of the proposed 3- compressor architecture The figures 1, 14 & 16 show that the proposed architecture for the - compressor consumes 13.% lesser power and is 6% faster than the existing architectures when operating at 1.8V. ecause of the decrease in the number of transistors the overall area decreases by about 11.1% in the proposed - compressor. The 4- compressor architecture is 33.3% faster and consumes 1% lesser power than the existing architectures. lso the proposed 3- compressor is 7% faster and consumes 1.% lesser power than the existing architectures. The improvement in the power-delay product is 36.4%, 7.8% and 4% in the proposed - compressor, 4- compressor and 3- compressor respectively. s mentioned in section 1, the * blocks in the proposed architecture can be implemented using transmission gate (CM+) logic. This new implementation is compared with the CM implementation and the results are shown below. 3 1 1 7 6 4 3 1 3 3 1 1 * CM * CM + * CM * CM + * CM * CM+ Figure 18 Power consumption (nw) Delay(ns) Power Delay product for proposed - compressors with * in CM and CM+ designs. Figure 18 shows that the implementation of the intermediate stages using CM+ design in the proposed - compressor results in a delay efficiency of 14.6%, power efficiency of.1% and efficiency of 18.% in power-delay product when compared to the CM implementation of the same design. imilar results have been obtained with 3- and 4- compressors also. 7. Conclusions. The architectures of the 3-, 4- and - compressor are analyzed using CM and CM+ implementations of and the blocks. New 3-, 4- and - compressor architectures have been proposed and compared with the existing architectures. imulations have been performed over a range of voltages, from.9v to 3.3V. The proposed architectures perform better than the existing ones in every aspect i.e., area, power, delay and power-delay product over the complete voltage range simulated. 8. References. [1]. P. Chandrakasan and R. W. rodersen, Low Power Digital CM Design. Norwell. M: Kluwer, 199. [] R. Zimmermann and W.Fichtner, Low-power logic styles: CM versus pass-transistor logic, IEEE J. olid- tate Circuits, vol. 3, pp. 179 19, July 1997. [3]. F. Hsiao, M. R. Jiang, and J.. Yeh, Design of highspeed low-power 3- counter and 4- compressor for fast multipliers, Electron. Lett, vol. 34, no. 4, pp. 341 343, 1998. [4]K. Prasad and K. K. Parhi, Low-power 4- and - compressors, in Proc. of the 3th silomar Conf. on ignals, ystems and Computers, vol. 1, 1, pp. 19 133. [] C. H. Chang, J. Gu, M. Zhang, Ultra low-voltage lowpower CM 4- and - compressors for fast arithmetic circuits IEEE Transactions on Circuits and ystems I: Regular Papers, Volume 1, Issue 1, ct. 4 Page(s):198 1997 [6]. F. Hsiao, M. R. Jiang, and J.. Yeh, Design of highspeed low-power 3- counter and 4- compressor for fast multipliers, Electron. Lett, pp. 341 343, 1998. [7] Z. Wang, G.. Jullien, and W. C. Miller, new design technique for column compression multipliers, IEEE Trans. Comput., vol. 44, pp. 96 97, ug. 199. [8] Milos Ercegovac, Tomas Lang, "Digital rithmetic", Morgan Kaufman, 4. [9] I. Koren, Computer rithmetic lgorithms. Englewood Cliffs, NJ, Prentice Hall, 1993. [1] J. M. Rabaey,. Chandrakasan, and. Nikolic, Digital Integrated Circuits ( design perspective), Prentice Hall, 3 th International Conference on VLI Design (VLID'7) -769-76-/7 $. 7