CSE 370 Winter Homework 5 Solutions

Similar documents
CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES

Adder (electronics) - Wikipedia, the free encyclopedia

Comparative Analysis of Various Adders using VHDL

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

PUBLICATIONS OF PROBLEMS & APPLICATION IN ENGINEERING RESEARCH - PAPER CSEA2012 ISSN: ; e-issn:

High Speed, Low power and Area Efficient Processor Design Using Square Root Carry Select Adder

FPGA IMPLEMENATION OF HIGH SPEED AND LOW POWER CARRY SAVE ADDER

EECS 270 Winter 2017, Lecture 15 Page 1 of 8

DESIGN OF LOW POWER HIGH SPEED ERROR TOLERANT ADDERS USING FPGA

Structural VHDL Implementation of Wallace Multiplier

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

FPGA Implementation of Area Efficient and Delay Optimized 32-Bit SQRT CSLA with First Addition Logic

Design, Implementation and performance analysis of 8-bit Vedic Multiplier

FPGA Implementation of Area-Delay and Power Efficient Carry Select Adder

Analysis of Low Power, Area- Efficient and High Speed Multiplier using Fast Adder

DESIGN AND IMPLEMENTATION OF AREA EFFICIENT, LOW-POWER AND HIGH SPEED 128-BIT REGULAR SQUARE ROOT CARRY SELECT ADDER

DESIGN OF HIGH SPEED AND ENERGY EFFICIENT CARRY SKIP ADDER

Minimization of Area and Power in Digital System Design for Digital Combinational Circuits

Unit 3. Logic Design

A CASE STUDY OF CARRY SKIP ADDER AND DESIGN OF FEED-FORWARD MECHANISM TO IMPROVE THE SPEED OF CARRY CHAIN

Group 10 Group 9 Group 8 Group 7 Group 6 Group 5 Group 4 Group 3 Group 2 Group 1 Group 0 GG5 PG5 GG4 PG4. Block 3 Block 2 Block 1 Block 0

PERFORMANCE ANALYSIS OF DIFFERENT ADDERS USING FPGA

On Built-In Self-Test for Adders

Design of 16-bit Heterogeneous Adder Architectures Using Different Homogeneous Adders

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

Design of 32-bit Carry Select Adder with Reduced Area

Class Project: Low power Design of Electronic Circuits (ELEC 6970) 1

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

DESIGN OF HIGH SPEED 32 BIT UNSIGNED MULTIPLIER USING CLAA AND CSLA

Analysis & Comparison Different Adders

Figure.1. Schematic of 4-bit CLA JCHPS Special Issue 9: June Page 101

DESIGN AND IMPLEMENTATION OF 64- BIT CARRY SELECT ADDER IN FPGA

International Journal of Scientific & Engineering Research, Volume 7, Issue 3, March-2016 ISSN

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

A Highly Efficient Carry Select Adder

Implementation of 64 Bit KoggeStone Carry Select Adder with BEC for Efficient Area

AREA AND POWER EFFICIENT CARRY SELECT ADDER USING BRENT KUNG ARCHITECTURE

CLAA, CSLA and PPA based Shift and Add Multiplier for General Purpose Processor

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

Experiment # 4. Binary Addition & Subtraction. Eng. Waleed Y. Mousa

Implementation of 32-Bit Carry Select Adder using Brent-Kung Adder

SQRT CSLA with Less Delay and Reduced Area Using FPGA

5. (Adapted from 3.25)

E2.11/ISE2.22 Digital Electronics II

Design and Implementation of Low Power Error Tolerant Adder

(CSC-3501) Lecture 6 (31 Jan 2008) Seung-Jong Park (Jay) CSC S.J. Park. Announcement

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

Topic Notes: Digital Logic

II. LITERATURE REVIEW

Multiplier and Accumulator Using Csla

Area Delay Efficient Novel Adder By QCA Technology

Digital Integrated CircuitDesign

Design of Delay Efficient PASTA by Using Repetition Process

Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing

32-bit High Speed Adder

MULTI DOMINO DOUBLE MANCHESTER CARRY CHAIN ADDERS FOR HIGH SPEED CIRCUITS

Timing and Power Optimization Using Mixed- Dynamic-Static CMOS

Improved Performance and Simplistic Design of CSLA with Optimised Blocks

An Efficient Low Power and High Speed carry select adder using D-Flip Flop

IMPLEMENTATION OF AREA EFFICIENT AND LOW POWER CARRY SELECT ADDER USING BEC-1 CONVERTER

FPGA Realization of Hybrid Carry Select-cum- Section-Carry Based Carry Lookahead Adders

Optimized area-delay and power efficient carry select adder

Design and Implementation of High Speed Area Efficient Carry Select Adder Using Spanning Tree Adder Technique

Combinational Logic Circuits. Combinational Logic

EXPERIMENT 12: DIGITAL LOGIC CIRCUITS

University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences. Homework #9 Solution

Digital Systems Laboratory

Efficient Carry Select Adder Using VLSI Techniques With Advantages of Area, Delay And Power

Domino CMOS Implementation of Power Optimized and High Performance CLA adder

Area and Delay Efficient Carry Select Adder using Carry Prediction Approach

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Survey of VLSI Adders

International Journal of Advance Engineering and Research Development

Function Table of an Odd-Parity Generator Circuit

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA

Low Power FIR Filter Structure Design Using Reversible Logic Gates for Speech Signal Processing

A Novel High-Speed, Higher-Order 128 bit Adders for Digital Signal Processing Applications Using Advanced EDA Tools

Digital Electronics. Functions of Combinational Logic

Design and Comparative Analysis of Conventional Adders and Parallel Prefix Adders K. Madhavi 1, Kuppam N Chandrasekar 2

National Conference on Emerging Trends in Information, Digital & Embedded Systems(NC e-tides-2016)

Circuits in CMOS VLSI. Darshana Sankhe

DESIGN OF BINARY MULTIPLIER USING ADDERS

AN EFFICIENT CARRY SELECT ADDER WITH LESS DELAY AND REDUCED AREA USING FPGA QUARTUS II VERILOG DESIGN

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

International Journal of Advance Engineering and Research Development

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

Integration of Optimized GDI Logic based NOR Gate and Half Adder into PASTA for Low Power & Low Area Applications

Datapath Components. Control vs. Datapath, Registers, Adders (Binary Addition) Copyright (c) 2012 Sean Key

Chapter 3 Digital Logic Structures

16-bit Digital Adder Design in 250nm and 64-bit Digital Comparator Design in 90nm CMOS Technologies

Design of High Speed Hybrid Sqrt Carry Select Adder

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

Adder Design and Analysis

Design of an optimized multiplier based on approximation logic

Design & Implementation of Low Power Error Tolerant Adder for Neural Networks Applications

Signal Processing Using Digital Technology

CPE/EE 427, CPE 527 VLSI Design I: Homeworks 3 & 4

ADVANCES in NATURAL and APPLIED SCIENCES

NOVEL HIGH SPEED IMPLEMENTATION OF 32 BIT MULTIPLIER USING CSLA and CLAA

A VLSI Implementation of Fast Addition Using an Efficient CSLAs Architecture

Transcription:

CSE 370 Winter 2008 Homework 5 Solutions 1) Carry Look-Ahead Adder (CLA) a) add1 b) add4

c) cla4

d) cla16

e) Gate Count: 118 gates add1 : 3 gates add4 : 4*Add1 = 12 gates cla4 : 14 gates cla16: (4*Add4) + (5*Cla4) = 48 + 70 = 118 gates Delay: Look at the schematic for add16 above. The A and B inputs first go through add4 blocks. In the add4, they produce the P and G signals. This takes 1 gate delay. They also produce a sum; however the carry input to the gate producing the sum hasn t arrived yet. So we do not consider the sum now. The P and G signals produced by any add4 block, say U1, goes into a cla4 U2. In the cla4, it takes 2 gate delays to produce a blockp and a blockg. The cla4 also produces carry output bits, however the carry in required to compute the cout hasn t arrived yet. So we do not consider the cout now. The blockp s and blockg s from the cla4 s U2, U4, U6 and U8 are available at the same time. These are fed to the cla4 - U9. In U9, the Carry in is available and hence it computes the cout bits. In the worst case, it takes 2 gate delays to do this. The blockp and blockg output of U9 is available at this time (5 gate delays for BlockP and G signals). The carry bits produced by U9 are fed back into cla4 s U2, U4, U6 and U8. Only now are the carry in bits available to these four blocks. Hence now we consider the cout of these blocks. It takes 2 gate delays to produce the cout bits in the worst case. These cout bits act as the carry in bits for the add4 blocks U1, U3, U5 and U7. Only now have its carry in bits arrived. So now we can consider the sum. From the time the carry in bit arrives it takes only 1 gate delay to compute the sum output. The other input to the XOR gate was computed long before and hence we do not worry about the gate it traverses through. Total: 1+2+2+2+1 = 8 gate delays. f) Yes. A cla64 would simply have 4 cla16 s and use 1 cla4 that takes the blockg and blockp signals from the cla16 s to generate the carry inputs for each of the cla16 s. Gate Count: (4*cla16) + cla4 = 486 Delay: 5 gate delays for blockp and blockg of cla16 (see above). 2 gate delays in the cla4 to produce the carry in for the cla16 s. Once the cla16 has its carry input it takes only 5 gate delays to produce the sum as its bit-wise P &G has already been computed. Total: 5+2+5 =12

2) a) The first stage of the carry save adder (CSA_1 st ) consists of 16 full adders (FA). Call the inputs of the i th FA A i, B i, C in i and the outputs Sum i and C out i. Then the CSA_1 st component makes the following (very straightforward) connections to the 16 FAs: A[i] A i B[i] B i Sum[i] Sum i Carry[i] C out i The following answers make use of a component [<<] which denotes a left shift by one bit. Normally one would accomplish this by setting O[15:1] = I[14:0], O[0] = GND. b)

c)

3) a) makepg: add1 block with just P and G outputs. Or you can reuse add1 just leaving Sum output unconnected. b) make4pg: same concept as above, applied to add4. c) S = P XOR C in ; Co = G + PC in d) 16 bit adder: This architecture is called a carry skip or a carry bypass adder. Simply chain together 4 of the ones in the question e) Why this architecture is faster: Consider you did not use the multiplexers. Then you get a standard ripple carry. It would look like this: The worst case for the above circuit is when all the full adders propagate. In that case we would have the following delay: To get all P and G signals: 1 gate delay (see add1 block) Once you have P, G and C in, it takes 1 gate delay for sum output and 2 gate delays to get C out. Remember; C out = G + PC in. The first full adder hence takes 3 gate delays to provide its C out. Ever subsequent full-adder takes 2 gate delays. Hence the final full adder will produce its sum bit: 3 + 2*(14) +1 = 32 gate delays later. Compare this to the 8 that the CLA takes. Now let us consider the carry skip architecture.

Carry skip: Clearly, the all propagate case which is the worst case for the ripple carry is not the worst case for the Carry skip adder. Because, if all propagates were true, you would simply have 3 mux delays plus (3*2 + 1)gate delays to get the final sum bit. Assuming that the mux delay is comparable to a gate delay that would give us a total of 10 gate delays for the all propagate case. f) The worst case would however be different. In the worst case, You have a bit 0 generating, bits 1-15 propagating. In this case the first stage does not skip the carry. However stages 2 and 3 skip it leading into the final stage where the carry ripples to produce a sum bit. Hence you have, 1 gate delays for the C out from bit 0. Then 3*2 = 6 gate delays for the FA s of bits 1-3. Then 3 gate delays to go through the 3 multiplexers. And finally (3*2 +1); 7 gate delays to get the final sum bit. This gives a total of 17 gate delays. If more than one stage had a generate that was true, then you would get parallel computation of sum bits in more than one stage and hence would definitely be faster than the above case. Compared to 32 gate delays for the ripple carry, carry-skip is twice as fast. Plus you only need very little additional logic to implement it. The CLA is twice as fast as the Carry skip. But it needs much more logic. Hence, there is a tradeoff between logical complexity and the speed you can achieve. g) Input sequence to trigger this worst case: As it was suggested above, you want bit-0 to generate, bit 1-15 to propagate. This might happen in the following configuration: A: 0000000000000001 C in = 1 or 0 B: 1111111111111111 There are several other input combinations that can show the above behavior.