A High-Speed Multiplication Algorithm Using Modified Partial Product Reduction Tree

Similar documents
High Speed, Low Power And Area Efficient Carry-Select Adder

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques

PERFORMANCE EVALUATION OF BOOTH AND WALLACE MULTIPLIER USING FIR FILTER. Chirala Engineering College, Chirala.

IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES

COMPARISON OF VARIOUS RIPPLE CARRY ADDERS: A REVIEW

HIGH PERFORMANCE ADDER USING VARIABLE THRESHOLD MOSFET IN 45NM TECHNOLOGY

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

Uncertainty in measurements of power and energy on power networks

Figure.1. Basic model of an impedance source converter JCHPS Special Issue 12: August Page 13

Low Switching Frequency Active Harmonic Elimination in Multilevel Converters with Unequal DC Voltages

High Speed ADC Sampling Transients

Research of Dispatching Method in Elevator Group Control System Based on Fuzzy Neural Network. Yufeng Dai a, Yun Du b

Rejection of PSK Interference in DS-SS/PSK System Using Adaptive Transversal Filter with Conditional Response Recalculation

Design of Practical FIR Filter Using Modified Radix-4 Booth Algorithm

Implementation of Fan6982 Single Phase Apfc with Analog Controller

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf

Fast Algorithm of A 64-bit Decimal Logarithmic Converter

A study of turbo codes for multilevel modulations in Gaussian and mobile channels

Graph Method for Solving Switched Capacitors Circuits

Multiple Error Correction Using Reduced Precision Redundancy Technique

Fast Code Detection Using High Speed Time Delay Neural Networks

High Performance Integer DCT Architectures For HEVC

Logical Effort of Carry Propagate Adders

INSTANTANEOUS TORQUE CONTROL OF MICROSTEPPING BIPOLAR PWM DRIVE OF TWO-PHASE STEPPING MOTOR

A High-Sensitivity Oversampling Digital Signal Detection Technique for CMOS Image Sensors Using Non-destructive Intermediate High-Speed Readout Mode

Implementation Complexity of Bit Permutation Instructions

An Effective Approach for Distribution System Power Flow Solution

MASTER TIMING AND TOF MODULE-

AC-DC CONVERTER FIRING ERROR DETECTION

Mismatch-tolerant Capacitor Array Structure for Junction-splitting SAR Analog-to-digital Conversion

antenna antenna (4.139)

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985

Digital Transmission

Design of Shunt Active Filter for Harmonic Compensation in a 3 Phase 3 Wire Distribution Network

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

FPGA Implementation of Ultrasonic S-Scan Coordinate Conversion Based on Radix-4 CORDIC Algorithm

Walsh Function Based Synthesis Method of PWM Pattern for Full-Bridge Inverter

A NSGA-II algorithm to solve a bi-objective optimization of the redundancy allocation problem for series-parallel systems

Improvement of the Shunt Active Power Filter Dynamic Performance

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

A Novel Soft-Switching Two-Switch Flyback Converter with a Wide Operating Range and Regenerative Clamping

Figure 1. DC-DC Boost Converter

Fully Redundant Decimal Arithmetic

Calculation of the received voltage due to the radiation from multiple co-frequency sources

Fuzzy Logic Controlled Shunt Active Power Filter for Three-phase Four-wire Systems with Balanced and Unbalanced Loads

Figure 1. DC-DC Boost Converter

Comparative Analysis of Reuse 1 and 3 in Cellular Network Based On SIR Distribution and Rate

Dynamic Power Consumption in Virtex -II FPGA Family

Block-wise Extraction of Rent s Exponents for an Extensible Processor

Latency Insertion Method (LIM) for IR Drop Analysis in Power Grid

ECE 2133 Electronic Circuits. Dept. of Electrical and Computer Engineering International Islamic University Malaysia

Design and Implementation of DDFS Based on Quasi-linear Interpolation Algorithm

VRT014 User s guide V0.8. Address: Saltoniškių g. 10c, Vilnius LT-08105, Phone: (370-5) , Fax: (370-5) ,

Power Factor Correction with AC-DC Buck Converter

@IJMTER-2015, All rights Reserved 383

Review: Our Approach 2. CSC310 Information Theory

ANNUAL OF NAVIGATION 11/2006

Shunt Active Filters (SAF)

A Mathematical Solution to Power Optimal Pipeline Design by Utilizing Soft Edge Flip-Flops

THE ARCHITECTURE OF THE BROADBAND AMPLIFIERS WITHOUT CLASSICAL STAGES WITH A COMMON BASE AND A COMMON EMITTER

An Adaptive Over-current Protection Scheme for MV Distribution Networks Including DG

California, 4 University of California, Berkeley

Topology Control for C-RAN Architecture Based on Complex Network

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 ISSN

Chapter 13. Filters Introduction Ideal Filter

A Simple Yet Efficient Accuracy Configurable Adder Design

POWER constraints are a well-known challenge in advanced

CMOS Implementation of Lossy Integrator using Current Mirrors Rishu Jain 1, Manveen Singh Chadha 2 1, 2

Chaotic Filter Bank for Computer Cryptography

Learning Ensembles of Convolutional Neural Networks

1. Introduction. Key words: FPGA, Picoblaze, PID controller, HDL, Simulink

ECE315 / ECE515 Lecture 5 Date:

EXPERIMENTAL KOHONEN NEURAL NETWORK IMPLEMENTED IN CMOS 0.18 m TECHNOLOGY

A method to reduce DC-link voltage fluctuation of PMSM drive system with reduced DC-link capacitor

Voltage Quality Enhancement and Fault Current Limiting with Z-Source based Series Active Filter

Control of Chaos in Positive Output Luo Converter by means of Time Delay Feedback

Lecture 10: Bipolar Junction Transistor Construction. NPN Physical Operation.

FFT Spectrum Analyzer

Optimal Placement of PMU and RTU by Hybrid Genetic Algorithm and Simulated Annealing for Multiarea Power System State Estimation

Network Reconfiguration in Distribution Systems Using a Modified TS Algorithm

Prevention of Sequential Message Loss in CAN Systems

Space Time Equalization-space time codes System Model for STCM

Side-Match Vector Quantizers Using Neural Network Based Variance Predictor for Image Coding

熊本大学学術リポジトリ. Kumamoto University Repositor

Performance Analysis of Multi User MIMO System with Block-Diagonalization Precoding Scheme

FPGA Implementation of Fuzzy Inference System for Embedded Applications

Dual Functional Z-Source Based Dynamic Voltage Restorer to Voltage Quality Improvement and Fault Current Limiting

A Novel Soft-Switching Converter for Switched Reluctance Motor Drives

THE GENERATION OF 400 MW RF PULSES AT X-BAND USING RESONANT DELAY LINES *

Accelerated Modular Multiplication Algorithm of Large Word Length Numbers with a Fixed Module

The Performance Improvement of BASK System for Giga-Bit MODEM Using the Fuzzy System

Guidelines for CCPR and RMO Bilateral Key Comparisons CCPR Working Group on Key Comparison CCPR-G5 October 10 th, 2014

Network Reconfiguration for Load Balancing in Distribution System with Distributed Generation and Capacitor Placement

Sensors for Motion and Position Measurement

MTBF PREDICTION REPORT

Unit 1. Current and Voltage U 1 VOLTAGE AND CURRENT. Circuit Basics KVL, KCL, Ohm's Law LED Outputs Buttons/Switch Inputs. Current / Voltage Analogy

Generalized Incomplete Trojan-Type Designs with Unequal Cell Sizes

Lecture 30: Audio Amplifiers

Digital Differential Protection of Power Transformer Using Matlab

Control of Venturini Method Based Matrix Converter in Input Voltage Variations

Transcription:

World Academy of Scence, Engneerng and Technology Internatonal Journal of Electrcal and Computer Engneerng Vol:4, No:, 200 A Hgh-Speed Multplcaton Algorthm Usng Modfed Partal Product educton Tree P Asadee Internatonal Scence Index, Electrcal and Computer Engneerng Vol:4, No:, 200 wasetorg/publcaton/0409 Abstract Multplcaton algorthms have consderable effect on processors performance A new hgh-speed, low-power multplcaton algorthm has been presented usng modfed Dadda tree structure Three mportant modfcatons have been mplemented n nner product generaton step, nner product reducton step and fnal addton step Optmzed algorthms have to be used nto basc computaton components, such as multplcaton algorthms In ths paper, we proposed a new algorthm to reduce power, delay, and transstor count of a multplcaton algorthm mplemented usng low power modfed counter Ths work presents a novel desgn for Dadda multplcaton algorthms The proposed multplcaton algorthm ncludes structured parts, whch have mportant effect on nner product reducton tree In ths paper, a V, 64-bt carry hybrd adder s presented for fast, low voltage applcatons The new 64-bt adder uses a new crcut to mplement the proposed carry hybrd adder The new adder usng 80 nm CMOS technology has been mplemented on 700 MHz clock frequency The proposed multplcaton algorthm has acheved 4 percent mprovement n transstor count, percent reducton n delay and 2 percent modfcaton n power consumpton n compared wth conventonal desgns Keywords adder, CMOS, counter, Dadda tree, encoder I INTODUCTION HEE are two knds of multplcaton algorthms, seral Tmultplcaton algorthms and parallel multplcaton algorthms [,2] Seral multplcaton algorthms use sequental crcuts wth feedbacks In seral multplcaton algorthms, nner products are sequentally produced and computed Parallel multplcaton algorthms often use combnatonal crcuts, and do not contan feedback structures [,4] There are two well-known knds of parallel multplcaton algorthms, array multplcaton algorthms and Dadda multplcaton algorthms In array multplcaton algorthms, cells, whch consst of an AND-gate computng nner products and a counter, are put n a network pattern lke an array Dadda multplcaton algorthms use AND-gates, carry save counters and a carry propagate adder (CPA) [5,6] Dadda multplcaton algorthms have a tree structure Whle Dadda multplcaton algorthms work at hgher speed than array multplcaton algorthms, Dadda multplcaton algorthms have more complcated structure than array multplcaton algorthms [4] Multplcaton s the fundamental arthmetc operaton mportant n several P Asadee s wth the Islamc Azad Unversty Varamn branch (e-mal: asadee2@gmalcom) Fg Four to two counter usng full adders processors and dgtal sgnal processng systems Dgtal sgnal processng systems need multplcaton algorthms to mplement DSP algorthms such as flterng where the multplcaton algorthm s drectly wthn the crtcal path [4] Therefore, the demand for hgh-speed multplcaton algorthms has become more mportant The hgher speed results to enlarged power consumpton, thus, low power archtectures wll be the choce of the future Ths has gven way to the growth of new crcut algorthms, wth the plan of reducng the power consumpton of multplcaton algorthms wth havng hgh-speed structures and approprate performance [7,8,9] The proposed structured Smth multplcaton algorthm uses a recognton component, a radx-4 Smth encoder, and a structured nner product producer component, 6-bt hghspeed Dadda-trees for nner product reducton, a 4-bt counter tree and a fnal carry-hybrd adder The recognton component generate the effcent sgnals of the nput data, approprately selects the multplcaton algorthm and multplcand operands, and then produces control sgnals to dsable the unts of the Dadda-trees and the counter-tree to match the desred data precson Based on the number of multplcaton algorthm bts, the Smth encoder and nner product generator calculate the number of nner products produced, whle mantanng the unused nner product generator segments n the statc state The control and enable sgnals produced by the recognton component choose ether the Dadda-tree or the 4-bt counter-tree for a gven sngle multplcaton operaton Therefore, the unts of the unused crcuts are able to contnue ther statc condton In the statc condton, the prevous values are held, to avod any swtchng from happenng n the unused part of the structure To take advantage of short accuracy, sgnal gatng can selectvely turn Internatonal Scholarly and Scentfc esearch & Innovaton 4() 200 58

World Academy of Scence, Engneerng and Technology Internatonal Journal of Electrcal and Computer Engneerng Vol:4, No:, 200 Internatonal Scence Index, Electrcal and Computer Engneerng Vol:4, No:, 200 wasetorg/publcaton/0409 off those parts of the Dadda-trees and 4-bt counter-tree, whch are not currently n use, and make the proposed Smth multplcaton algorthm perform lke a changeable sze multplcaton algorthm In the fnal step, the product s generated from the outputs of the actve Dadda-tree and carryhybrd adder Thus, the Smth encoder, nner product generator and carry-hybrd adder can be shared and use agan to calculate 6 or 8-bt multplcatons For 4-bt multplcatons, the Smth encoder and nner product generator components are unused and turned off The proposed Smth multplcaton algorthm s mplemented wth fve ppelne steps, wth an nput enable sgnal beng used as a powerreducton part for each step II FOU TO TWO COUNTE AND FULL-ADDE The four to two counter organzaton n fact adds fve nner products bts nto three The structure s connected n such a way that four of the nputs are comng from the same bt place of the weght whle one bt s come from the adacent place - (recognzed as carry-n) The outputs of four to two counter uses one bt n the place and two bts n the place + Ths archtecture s named counter snce t adds four nner products nto two (whle usng one bt exactly connected between adacent four to two counters) The block dagram of four to two counter can also be bult usng -2 counters It conssts of two -2 counters (full adders) n seres and nvolves a crtcal path of 4 XO delays as shown n Fg Counters are the fundamental buldng blocks n all the multplcaton algorthm components Hence usng fast and effcent full counters plays an mportant role n the performance of the total system In the followng secton, we descrbe the counter unts used n our desgn The 28 transstor counter s the CMOS tradtonal adder crcut Ths counter unt s bult usng equal number of N- MOS and P-MOS transstors The logc for the complmentary MOS logc was realzed usng C AB BC AC () out n n Sum ABCn ( A B Cn ) Cout (2) The frst 2 transstors of the crcut generate C out and the remanng transstors generate the Sum outputs Thus, the delay for computng s added to the total propagaton C out delay of the Sum output The structure of ths counter crcut s very large and therefore uses large on-chp area The statonary counter crcut was mplemented usng new logc and reduced number of transstors The basc dea n the statonary counter s the use agan of charge stored n the load capactance durng the hgh output to drve the control logc In regular counter desgns, the nput charge appled at logc hgh wll be draned off durng logc low form Ths s acheved by usng only one voltage source ( V DD ) n the crcut As an added advantage, there wll be no path from one voltage level ( V DD ) to the other (GND) The removal of the drect path to the ground elmnates the short crcut power unt for the counter structure Ths reduces the total power Fg 2 A novel full adder consumed n the crcut and makng t an energy effcent mplementaton The statonary counter s not only power optmzed but also area effcent due to ts transstor count The man dsadvantage of the statonary counter s the threshold voltage drop at the output voltage for certan nput structures A detaled relatve study of ths counter wth other low power counters can be found n [9,0,] In the counter unt, the desgn of XO and XNO of A and B s done usng pass transstor logc and an nverter s to nvert the nput sgnal Ths desgn results n faster XO and XNO outputs and ensures that there s a stablty of delays at the output of these gates Ths results to less false SUM and C out sgnals The capactance at the outputs of XO and XNO gates s also decreased, as they are not loaded wth nverter Snce, the sgnal reducton at the SUM and C out s mportant for sub-mcron crcuts, drvers can be used to decrease the degradaton The drver wll help n producng outputs wth equal rse and fall tmes Ths consequences n better performance concernng speed, low power consumpton and drvng capabltes The output voltage swng wll be equal to the V DD, f a drver s used at the output Fg 2 gves the crcut level dagram of counter A detaled comparatve study of presented counter wth other low power counters can be found n [2] III SMITH ENCODE The recognton component uses the effectve nput data, and then produces the control sgnals that are used to turn off the approprate unts of the Dadda-tree and counter-tree, to match the data precson In the presented multplcaton algorthm, the control sgnals not only select the data flows, but also calculate the ppelne regster components, n order to mantanng the non-effectve bts n ther prevous states and, thus, ensure that the functonal components addressed by these data do not consume swtchng power In addton, these control sgnals are used to control the Dadda-trees and counter-tree Fg shows the functonal blocks of the actverange recognton unt that ncludes one-detect crcut (O gate), comparator, multplexers and logc gates The O gate s used, whch determnes the output to be hgh f any of the nputs are hgh Both the 6-bt wde multplcaton algorthm and the multplcand nput operands are parttoned nto three parts, where the detecton s done for the 5-bt, 8-bt and 4- Internatonal Scholarly and Scentfc esearch & Innovaton 4() 200 59

World Academy of Scence, Engneerng and Technology Internatonal Journal of Electrcal and Computer Engneerng Vol:4, No:, 200 Internatonal Scence Index, Electrcal and Computer Engneerng Vol:4, No:, 200 wasetorg/publcaton/0409 bt ranges The presented multplcaton algorthm supports three multplcaton forms that are 5-bt, 8-bt and 4-bt multplcaton forms Both the 6-bt and 8-bt multplcaton forms are mplemented usng Smth multplcaton algorthm values greater than 8F h-decmal utlze the 6-bt multplcaton form Ths problem s solved n the 4-bt multplcaton form by usng a 4-bt unsgned counter-tree The output of the three crcuts of both the multplcaton algorthm and multplcand are grouped nto a -bt lne, where the most sgnfcant bt shows the 6-bt multplcaton form, the second bt shows the 8-bt multplcaton form and the least sgnfcant bt shows the 4-bt multplcaton form The two -bt buses are compared usng a -bt comparator Ths comparson detects f the nput multplcand operand s greater than the nput multplcaton algorthm operand The output of the comparator s used to swtch the nput multplcand and multplcaton algorthm operands The radx-4 smth encoder can generate fve possble values of -2, -, 0, and 2 tmes the nput [5] Three control sgnals COMP, SHIFT and ZEO (=0,,7) are produced dependng on the -bt codng method shown n the Smthcodng table [4] The Smth encoder s used to produce these control sgnals, whch are used n the nner product generaton component to drect approprate operaton ( OP, =0,,7) on the nput multplcand operand Fg 4 shows the nner product generator component, whch s desgned to be shared between the 6-bt and 8-bt multplcaton forms The ZEO sgnal s used to output zeros as output of that nner product step, the COMP complements the nput multplcand operand, and fnally the SHIFT sgnal shfts the nput multplcand operand left by one The total numbers of nner products (PP) produced are N/2 (N=max number of multplcaton algorthm bts), where PP =OP M ( M =th-bt of the Fg ecognton component multplcand, =0,,5) The 8-bt multplcaton form only needs frst four of the nner products Thus, n the ppelne step between the Smth encoder and the nner product generator component, all the necessary control sgnals are produced to turn on (turn off) the crcut n use (not n use) There are three types of arrangement forms for the nner product, whch are steady wth the operaton forms for the nner product, whch are consstent wth the operaton forms of the low-power multplcaton algorthm In the 6-bt multplcaton form all eght of the nner product ( PP, where =0,,5) producer rows are actve and n the 8-bt multplcaton form only the frst four of the nner product ( PP, where =0,,7) producer rows are actve, and the rest of the crcut s n statc state Fnally, n the 4-bt multplcaton form all eght of the nner product producer rows are put n the steady state IV MODIFIED DADDA TEE ACHITECTUE In ths secton, we present the proposed desgn for Dadda multplcaton algorthms The proposed multplcaton algorthm contans multple parts If no or one part s faulty, the proposed multplcaton algorthm works correctly We dvde an n n Dadda multplcaton algorthm nto 2n parts We show the presented desgn for Dadda multplcaton algorthms Dadda multplcaton algorthm uses a carry chan counter As s well known, there are several knds of carry chan counters We use the smplest knd of carry chan counters, rpple-carry counters (CCs) Dadda multplcaton algorthms usng other knds of carry chan counters are dscussed Some mprovements of the proposed desgn are also presented [5] Here, we dvde an n n Dadda multplcaton algorthm nto Internatonal Scholarly and Scentfc esearch & Innovaton 4() 200 520

World Academy of Scence, Engneerng and Technology Internatonal Journal of Electrcal and Computer Engneerng Vol:4, No:, 200 Internatonal Scence Index, Electrcal and Computer Engneerng Vol:4, No:, 200 wasetorg/publcaton/0409 2n parts SL (0 <2n ) What gates and counters belong to whch parts are recursvely decded as descrbed below: ) An counter whose sum output s the -th part belongs to SL 2) An counter whose sum output s connected to an counter whch belongs to SL also belongs to SL ) An AND gate whose output s connected to an counter whch belongs to SL also belongs to SL Note that the carry output of an counter whch belongs to SL s always connected to SL + Fg 5 shows eght 4 4 Dadda multplcaton algorthm In ths fgure, the dotted arrows mean carry outputs of carry save counters Fg 5 shows the 4 4 Dadda multplcaton algorthm dvded nto eght parts Each part conssts of AND-gates and counters The counters and wres n every part make the same tree structure as the Dadda multplcaton algorthm shown n Fg 5 except two dfference dscussed n the next paragraphs For example, n SL, the counters FA, FA and counter, 2, HA correlate to the carry save counters CSA,, CSA2 and the carry propagate counter CPA, respectvely The nner Fg 4 Inner product generator product A[0]B[], A[]B[2], A[2]B[] and A[]B[0] correlate to AB[], AB[2], AB[] and AB[0], respectvely The thrd bt of product Y[] correlates to the product Y One of the dfferences s the followng For most parts SL, the Dadda multplcaton algorthm uses some counters and wres whch do not correlate to any counters and wres n the parts SL For example, the ffth part SL 5 does not use wres for nner products correlatng to AB[] and AB[0] as well as counters correlatng to CSA The other dfference s that carry output of the counters n a part SL are connected to not a part of the part SL but a part of ts neghbor part SL + For example, the carry outputs of FA and FA n SL, 2, are connected to FA and FA of 2,4,4 SL4 and not counters n SL The proposed n n Dadda multplcaton algorthm conssts of 2n parts SL ( 0 2n- ) and a part SL The -th part SL ( 0 2n-2 ) can play the role of both SL and SL + The (2n-) part SL 2n plays the role of only SL2n, and the part SL plays Internatonal Scholarly and Scentfc esearch & Innovaton 4() 200 52

World Academy of Scence, Engneerng and Technology Internatonal Journal of Electrcal and Computer Engneerng Vol:4, No:, 200 Internatonal Scence Index, Electrcal and Computer Engneerng Vol:4, No:, 200 wasetorg/publcaton/0409 Fg 6 CHA array archtecture the role of only SL 0 Fg 5 shows an example of the presented 4 4 Dadda multplcaton algorthm The multplcaton algorthm contans eght arrangement parts and a redundant part The arrangement parts use AND gates, counters and swtches The control sgnals for swtches are mplemented by hgh-speed crcuts In the proposed multplcaton algorthm, arrangement s made n the chan method, that s, f SL k s faulty, SL plays the role of SL 0, and SL ( k ) arrangement part plays the role of SL The arrangement part SL ( k ) plays the role of SL The AND gates n SL (0<=<2n-) must be capable of Fg 5 Dadda multpler dvded nto eght parts calculatng not only the nner products A[x]B[y] (x+y=, 0 x,y<n) but also the nner products A[x+]B[y] (x+y+=, 0 (x+),y<n ) For those calculatons, swtches are nserted between the nput A and the AND gates Smlarly to the AND gates, counters and wres n the parts of the Dadda multplcaton algorthms, also make the same tree structure as the non-dadda multplcaton algorthms but presented crcut makes hgh-speed structure As shown n Fg 5, some counters are full adders and others are half adders n dfferent parts The knd of the counter A, correlatng to a carry save/propagate counter CA No counter n SL and SL do not correlate to CA In the example shown n Fg 5, FA, correlatng to CSA n SL would be a counter because the correlatng counter FA,2 n SL2 s a counter An counter HA 2, s a half adder because the correlatng counter HA2,2 n SL2 s a counter and any counters n SL and SL2 do not correlate to CSA 2 No counter n SL correlates to CPA because no counter n SL and SL2 correlates to CPA No wres for nner products and sum outputs n SL correlates to a data path n the Dadda multplcaton algorthm f and only f any wres n SL and SL do not relate to the data path The carry nputs of SL (0<=<=2n-) are connected back to not only SL but also SL 2 through swtches selectng those carry sgnals It s because SL uses carry sgnals output from SL 2 f SL s faulty whle t usually uses carry sgnals output from SL In the example shown n Fg 5, the Internatonal Scholarly and Scentfc esearch & Innovaton 4() 200 522

World Academy of Scence, Engneerng and Technology Internatonal Journal of Electrcal and Computer Engneerng Vol:4, No:, 200 Internatonal Scence Index, Electrcal and Computer Engneerng Vol:4, No:, 200 wasetorg/publcaton/0409 counter FA 2, n SL s lnked back to the carry output of both FA, and FA,2 throughout a swtch The -th bt of the product Y[] s calculated n ether SL or SL So, swtches usng the output sgnals are lnked to those outputs As SL plays the role of only SL 0, the buldng of SL s the same as that of SL 0 In the smlar fashon, the constructon of SL 2n has swtches selectng the carry nputs Note that swtches selectng multplcand A and carry sgnals are reparable because the false swtches n SL can repared by deletng SL Swtches usng output sgnals are not fxable V CAY HYBID STUCTUE Mnmzng the crtcal path s the most usual way to decrease the propagaton tme [5] Therefore, we use a method bult wth an extremely compact array of cells, each mplementng the "*" operator Its crtcal path down to log2 n logcal levels whle keepng the fan-out to two for each unt A 64-bt CHA s presented for hgh-speed and lowpower multplcaton algorthms Snce the speed of a CHA adder manly relates on the speed of carry propagaton chan In order to speed up the generaton of the carry chan and obtanng low power, low voltage logc, was used to mplement the hgh-speed, low-power CHA adder As shown n Fg 6 the 64-bt carry hybrd adder has parallel structure The key element of the 64-bt CHA was "*" operator The "*" operator algorthm was shown as follows: ( g, p)*( g, p ) ( g pg, p p ) () g=a*b and p=a b f a, b were nput sgnal The "o" cells were smlar to the "*" ones as shown n Fg 6, but are only used as buffers n order to make the sgnal propagaton consstent across the adder Each black unt s the "*" operator and the varous crcuts of the black unt were desgned By usng the N-P logc components, the 64-bt adder can be desgned as a ppelnng structure As mentoned earler, the CHA unts were used to mplement the "*" operator n the 64-bt CHA In order to make the carry propagaton chan have the crtcal delay path, we nput the ppelne sgnal ( AA0 AA0 ) ( BB0 BB0 ) as follows: (000 00)+( ) and ( )+(000 0) The 64-bt CHA adder usng 80 nm CMOS technology wth V power supply Due the CHA unts, operaton speed comparson results show the new crcut has the speed advantage over the conventonal crcut The new 64-bt CHA adder could be operated on 700 MHZ clock frequency wth V power supply and the conventonal adder could not operate on 700 MHz It was ust about 500 MHz The maxmum operaton frequency and power consumpton comparson results are calculated, and the power consumpton s calculated under maxmum operaton frequency Note that the normalzed Power freq s also gven It shows that the new crcut has less TABLE I COMPAISON BETWEEN 64 64 BIT TEE MULTIPLIES Multplers Present study [6,7,8] [9,0,,2] Technology ( nm ) 80 80 80 Transstor counts 69 6245 8654 Multplcaton tme (ns) 45 5 68 2 Chp Area ( mm ) 069 094 085 Power Dss (mw/mhz) 067 085 9 nm: Nanometer; ns: Nano second; MHZ: Mega hertz power consumpton under same operaton frequency VI CONCLUSION In ths paper, we presented a novel hgh-speed, low-power multplcaton algorthm Three mportant modfcatons have been mplemented n nner product generaton step, nner product reducton step and fnal addton step The new functonal unts, together of optmzed Dadda-tree were used to desgn the proposed Smth multplcaton algorthm The presented multplcaton algorthm has better powerconsumpton characterstcs, and thus be more power effcent than other multplcaton algorthms In ths work, the lowvoltage and hgh-speed 64-bt CHA was desgned and mplemented The nternal delay and power consderatons reduce voltage swng scheme Based upon the SPICE smulaton results, the 64-bt CHA has the speed and power dsspaton advantages over the conventonal adder Ths paper has proposed a novel desgn for Dadda multplcaton algorthms A hgh-performance algorthm for the proposed desgn has also been shown The presented desgn has been evaluated from the aspects of the area, power and delay tme The new multplcaton algorthm has acheved 4 percent mprovement n transstor count, percent reducton n delay and 2 percent modfcaton n power consumpton n compared wth conventonal desgns Table I shows comparson between present study and other desgns EFEENCES [] P Brent and HT Kung, "A regular layout for parallel adders", IEEE Trans Comput, vol C-, pp 260-264, Mar 982 [2] A Goldovsky, B Patel and B Schulte, "Desgn and mplementaton of a 6 by 6 low-power two's complement multpler", ISCAS 2000 Geneva, vol 5, pp 45-48 [] G Goto, et al, "A 4-ns compact 54*54-b multpler utlzng sgnselect Booth encoders", IEEE J Sold-State Crcuts, vol 2, pp 676-682, Nov 997 [4] X Huang, et al, "Hgh-performance VLSI multpler wth a new redundant bnary codng", Journal of VLSI Sgnal processng, vol, pp 28-29, Oct 99 [5] P Gopeen, "Novel and Effcent four to two counters", ISCAS 2000 Geneva, vol 5, pp 2-9 [6] D Kudeeth, "Implementaton of low-power multplers", Journal of low-power electroncs, vol 2, 5-, 2006 [7] KH Chng, "A novel carry-lookahead adder", Proc Internatonal symposum low-power electroncs and desgn, 998 [8] H Lee, "A Hgh-Speed Booth Multpler", ICCS 2002 [9] TY Tang, and CS Choy, "Desgn of self-tmed asynchronous Booth's multpler", n Proc Asa South Pacfc Desgn Automaton Conf, Jan 2000, pp 5-9 Internatonal Scholarly and Scentfc esearch & Innovaton 4() 200 52

World Academy of Scence, Engneerng and Technology Internatonal Journal of Electrcal and Computer Engneerng Vol:4, No:, 200 [0] K Numba and H Itu, "Hgh-speed desgn for Wallace multpler", ISSCC 200 [] YN Chng, "Low-power hgh-speed multplers", IEEE Transactons on Computers, vol 54, no, pp 55-6, 2005 [2] M Sheple, "Hgh performance array multpler", IEEE transactons on very large scale ntegraton systems, vol 2, no, pp 20-25, 2004 Internatonal Scence Index, Electrcal and Computer Engneerng Vol:4, No:, 200 wasetorg/publcaton/0409 Internatonal Scholarly and Scentfc esearch & Innovaton 4() 200 524