Computer Arithmetic (2)

Similar documents
Digital Integrated CircuitDesign

Structural VHDL Implementation of Wallace Multiplier

Trade-Offs in Multiplier Block Algorithms for Low Power Digit-Serial FIR Filters

Techniques for Implementing Multipliers in Stratix, Stratix GX & Cyclone Devices

An Optimized Design for Parallel MAC based on Radix-4 MBA

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

DESIGN OF BINARY MULTIPLIER USING ADDERS

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

Combinational Circuits DC-IV (Part I) Notes

L15: VLSI Integration and Performance Transformations

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm

L15: VLSI Integration and Performance Transformations

Multiple Constant Multiplication for Digit-Serial Implementation of Low Power FIR Filters

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

(CSC-3501) Lecture 6 (31 Jan 2008) Seung-Jong Park (Jay) CSC S.J. Park. Announcement

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

MULTIRATE IIR LINEAR DIGITAL FILTER DESIGN FOR POWER SYSTEM SUBSTATION

A Review on Different Multiplier Techniques

Area Efficient and Low Power Reconfiurable Fir Filter

International Journal of Scientific & Engineering Research Volume 3, Issue 12, December ISSN

Adder (electronics) - Wikipedia, the free encyclopedia

Mahendra Engineering College, Namakkal, Tamilnadu, India.

Review of Booth Algorithm for Design of Multiplier

SPIRO SOLUTIONS PVT LTD

CHAPTER 1 INTRODUCTION

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

High Speed, Low power and Area Efficient Processor Design Using Square Root Carry Select Adder

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Techniques to Optimize 32 Bit Wallace Tree Multiplier

Architecture for Canonic RFFT based on Canonic Sign Digit Multiplier and Carry Select Adder

VLSI Design and FPGA Implementation of N Binary Multiplier Using N-1 Binary Multipliers

A Novel Approach For Designing A Low Power Parallel Prefix Adders

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

DIGIT SERIAL PROCESSING ELEMENTS. Bit-Serial Multiplication. Digit-serial arithmetic processes one digit of size d in each time step.

LOGIC DIAGRAM: HALF ADDER TRUTH TABLE: A B CARRY SUM. 2012/ODD/III/ECE/DE/LM Page No. 1

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

DESIGN OF LOW POWER MULTIPLIERS

Design and Implementation of High Speed Carry Select Adder

Chapter 11. Digital Integrated Circuit Design II. $Date: 2016/04/21 01:22:37 $ ECE 426/526, Chapter 11.

Comparative Study and Analysis of Performances among RNS, DBNS, TBNS and MNS for DSP Applications

Design of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

Digital. Design. R. Ananda Natarajan B C D

CS 61C: Great Ideas in Computer Architecture Finite State Machines, Functional Units

SRV ENGINEERING COLLEGE SEMBODAI RUKMANI VARATHARAJAN ENGINEERING COLLEGE SEMBODAI

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

NOVEL HIGH SPEED IMPLEMENTATION OF 32 BIT MULTIPLIER USING CSLA and CLAA

Design of FIR Filter Using Modified Montgomery Multiplier with Pipelining Technique

LIST OF EXPERIMENTS. KCTCET/ /Odd/3rd/ETE/CSE/LM

Objectives: Students will learn to divide decimals with both paper and pencil as well as with the use of a calculator.

An 8-bit 8-stage FIR Filter using the CMOS 28nm FDSOI Technology

On Built-In Self-Test for Adders

An area optimized FIR Digital filter using DA Algorithm based on FPGA

EECS150 - Digital Design Lecture 23 - Arithmetic and Logic Circuits Part 4. Outline

EECS 270 Winter 2017, Lecture 15 Page 1 of 8

Unit 3. Logic Design

Lab Report: Digital Logic

CLAA, CSLA and PPA based Shift and Add Multiplier for General Purpose Processor

DESIGN OF MULTIPLE CONSTANT MULTIPLICATION ALGORITHM FOR FIR FILTER

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

Implementation of Parallel MAC Unit in 8*8 Pre- Encoded NR4SD Multipliers

Study on Digital Multiplier Architecture Using Square Law and Divide-Conquer Method

FPGA Implementation of Area Efficient and Delay Optimized 32-Bit SQRT CSLA with First Addition Logic

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

TABLE 3-2 Truth Table for Code Converter Example

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm

Design of Roba Mutiplier Using Booth Signed Multiplier and Brent Kung Adder

DESIGN OF 4 BIT BINARY ARITHMETIC CIRCUIT USING 1 S COMPLEMENT METHOD

An Analysis of Multipliers in a New Binary System

Introduction (concepts and definitions)

FIR Filter Fits in an FPGA using a Bit Serial Approach

ISSN Vol.03,Issue.02, February-2014, Pages:

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

Design of 16-bit Heterogeneous Adder Architectures Using Different Homogeneous Adders

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:

Design and Implementation of Reconfigurable FIR Filter

AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER

PROMINENT SPEED ARITHMETIC UNIT ARCHITECTURE FOR PROFICIENT ALU

UNIT-IV Combinational Logic

An Efficient Baugh-WooleyArchitecture forbothsigned & Unsigned Multiplication

Datapath Components. Control vs. Datapath, Registers, Adders (Binary Addition) Copyright (c) 2012 Sean Key

FPGA IMPLENTATION OF REVERSIBLE FLOATING POINT MULTIPLIER USING CSA

Area Delay Efficient Novel Adder By QCA Technology

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations

University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences. Homework #9 Solution

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

10. DSP Blocks in Arria GX Devices

Survey of VLSI Adders

Low Power Approach for Fir Filter Using Modified Booth Multiprecision Multiplier

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

International Journal of Advance Engineering and Research Development

Signal Processing Using Digital Technology

6. DSP Blocks in Stratix II and Stratix II GX Devices

Transcription:

Computer Arithmetic () Arithmetic Units How do we carry out,,, in FPGA? How do we perform sin, cos, e, etc? ELEC816/ELEC61 Spring 1 Hayden Kwok-Hay So H. So, Sp1 Lecture 7 - ELEC816/61 Addition Two ve integers can be added similar to the way decimal numbers are added in long addition 3 1 19 4 1 1 1 1 1 1 1 1 1 1 1 The same addition can be implemented in hardware (ASIC), and FPGA. 1 1 1 Ripple Carry Adder Mimic the working of a long addition Each bit of addition handled by one Full-Adder Full Adder Add two 1-bit numbers AND a carry in i.e. Add THREE 1-bit numbers Produce 1 sum bit and 1 carry bit H. So, Sp1 Lecture 7 - ELEC816/61 3 H. So, Sp1 Lecture 7 - ELEC816/61 4 Half Adder Add two 1-bit numbers Produce 1 sum bit and 1 carry bit A A B C S S B 1 1 1 1 1 1 1 Full Adder A full adder handles a carry input as well as the two input data bits All together there are 3 inputs, and outputs S = A B C in = AB C in (A B) A B C in S 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 H. So, Sp1 Lecture 7 - ELEC816/61 5 H. So, Sp1 Lecture 7 - ELEC816/61 6

Ripple Carry Adder (1) A ripple-carry adder is formed by chaining series of full adders (s) 1 for each input bit Carry-out from a bit i is connected as the carry-input for bit (i 1) Ripple Carry Adder () Delay through a ripple-carry adder is proportional to the width of data input O(n) delay, where n is the width of the input A 3 B 3 A B A 1 B 1 A B A 3 B 3 A B A 1 B 1 A B 4 3 1 4 S 3 3 S S 1 1 S S 3 S S 1 S H. So, Sp1 Lecture 7 - ELEC816/61 7 H. So, Sp1 Lecture 7 - ELEC816/61 8 Carry Look Ahead Adder In a ripple carry adder, each bit must wait for the result of carry from previous bit before its calculation may start A carry look ahead (CLA) adder looks ahead in the input to figure out the carry Define two functions: Generate G i A i B i Propagate P i A i B i If G i = 1, then c i1 = 1 If P i = 1, then c i1 = c i Bit i propagate the carry from bit (i-1) to bit (i1) CLA adder Both generate and propagate can be calculated in constant time They depend only on the input bits Using the definition of P and G, carry bits can be calculated in constant time as well: c i1 P i c i P i (G i 1 c i 1 ) P i G i 1 P i (G i P i c i ) P i G i 1 P i G i P i P i G i 3 P i P c H. So, Sp1 Lecture 7 - ELEC816/61 9 H. So, Sp1 Lecture 7 - ELEC816/61 1 CLA Adder CLA Adder c i1 P i G i 1 P i G i P i P i G i 3 P i P c A 3 B 3 A B A 1 B 1 A B Looking at how a carry is calculated, we can interpret it as: Carry bit i1 is set if (1) a carry is generated at bit i OR () if a carry is generated in any of the previous position AND can be propagated all the way to position i. How long does it take to calculate carry? 3 C C 3 C C 1 4 S 3 P 3 G 3 S P G S 1 P 1 G 1 S P G 3 1 1 3 1 1 3 1 1 1 1 1 Constant delay! Caveat? Carry Lookahead Logic C H. So, Sp1 Lecture 7 - ELEC816/61 11 H. So, Sp1 Lecture 7 - ELEC816/61 1

Adder on FPGAs Implement Ripple-carry/CLA using logic fabric directly LUT, FF, etc Built-in adder Other adder architecture FPGA specific one? Bit-serial? Fast Adder on FPGA How do we build fast adder using this? LUT FF H. So, Sp1 Lecture 7 - ELEC816/61 13 H. So, Sp1 Lecture 7 - ELEC816/61 14 Fast Adder on FPGA S = A B C in = AB C in (A B) Fast Carry Logic H. So, Sp1 Lecture 7 - ELEC816/61 15 H. So, Sp1 Lecture 6 - ELEC816/61 16 Adder performance on FPGA Which of the following is fastest on FPGA? 16-bit ripple-carry adder implemented using LUT 16-bit carry-lookahead adder implemented using LUT 16-bit adder using fast carry logic 3-bit ripple-carry adder implemented using LUT 3-bit carry-lookahead adder implemented using LUT 3-bit adder using fast carry logic Subtractor Subtracting two numbers in s complement is relatively easy To calculate A - B: 1. find B from B Negate all bits in B Add 1. Add A and B Can reuse adder developed earlier H. So, Sp1 Lecture 7 - ELEC816/61 17 H. So, Sp1 Lecture 7 - ELEC816/61 18

Subtractor Multiplication B 3 B B 1 B 1 1 1 1 1 1 1 A 3 A A 1 A Subtract 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 S 3 S S 1 S H. So, Sp1 Lecture 7 - ELEC816/61 19 H. So, Sp1 Lecture 7 - ELEC816/61 Multiplication Multiplication is a form of repeated addition Multiplying two n-bit numbers can be achieved by adding n partial results Produce a result of n bits Multiplier - Iterative Start from basic definition of multiplication, do shift and conditional add Requires n cycles A B >> 1 >> CLK S H. So, Sp1 Lecture 7 - ELEC816/61 1 H. So, Sp1 Lecture 7 - ELEC816/61 Multiplier - Parallel Use n adders to perform all partial sum addition in parallel Requires 1 cycle but long cycle Simple Parallel Multiplier Critical path scales with n H. So, Sp1 Lecture 7 - ELEC816/61 3 H. So, Sp1 Lecture 7 - ELEC816/61 4

Multiplier - Carry Save Adders Carry save adder tree Critical path scales with n Fast adder at the end Fast Multiplier on FPGA Reuse carry logic for adders for partial result calculation Source: xapp15 H. So, Sp1 Lecture 7 - ELEC816/61 5 H. So, Sp1 Lecture 7 - ELEC816/61 6 Dedicated DSP Block in V6 Constant Multiplication If one of the input to a multiplier is constant, circuit can be simplified IF one of the input is a power of, then multiplication becomes shift A * n is equivalent to A << n What if the constant is not power of? Number decomposition H. So, Sp1 Lecture 6 - ELEC816/61 7 H. So, Sp1 Lecture 7 - ELEC816/61 8 Constant Multiplier Decomposition When multiplying a constant in fixed point, recall that the value represented by the bit string is: n k b n 1 i k b i Therefore, ALL representable fixed point numbers can be represented as a sum of power of Can decompose the constant multiplier into multiple shifts n 1 i= H. So, Sp1 Lecture 7 - ELEC816/61 9 A Decomposition B = ka A k B n-1 n- 1 << n-1 << n- B A B Compared to standard multiplier, all terms are eliminated Can we do better? H. So, Sp1 Lecture 7 - ELEC816/61 3 << 1 <<

Canonic Signed Digit Signed digit (SD) representation: Similar to binary representation except the set {-1,, 1} is used for the digits Representation is not unique E.g. In 4-bit SD number rep: 3 = 11 = 11 = 111 = 111 = 1111 Canonic representation has minimum number of nonzero digits Not unique A Canonic Signed Digit Use CSD to minimize number of nonzero E.g. 15 = 111111 = 111 6 5 B A 7 1 - B H. So, Sp1 Lecture 7 - ELEC816/61 31 H. So, Sp1 Lecture 7 - ELEC816/61 3 Division Division is substantially more complicated than multiplication main methods: Bit-by-bit calculation Calculate each bit similar to manual division Mathematical approximation Start with an approximation and iteratively refine the solution until desired precision is reached Use as few as possible! H. So, Sp1 Lecture 7 - ELEC816/61 33 Signal Flow Graph Manipulations H. So, Sp1 Lecture 7 - ELEC816/61 34 FIR as an Example Signal Flow Graph Simplify the block diagram with more efficient notation: h h 1 h k k h h 1 h = Delay for 1 sample (clock cycle) = FF FIR filter H. So, Sp1 Lecture 7 - ELEC816/61 35 H. So, Sp1 Lecture 7 - ELEC816/61 36

Dataflow system Remember: In most digital signal processing system with a continuous stream of data input, the overall latency usually doesn t matter. Therefore, it is ok to put extra delay at I/O without changing the function of the design z -5 h h 1 h z - Nodal Delay Transfer (a) (b) (c) (d) k k 1 z - k k 1 k k k k 1 k k 1 z 1 But why? H. So, Sp1 Lecture 7 - ELEC816/61 37 (e) z 1 H. So, Sp1 Lecture 7 - ELEC816/61 38 Nodal Delay Transfer Remember, z 1 is non-causal Not implementable on hardware Must eliminate any z 1 in the final graph before going to hardware implementation Pushing delay within the graph Inserting delay at I/O Reorganizing the graph Cutset Separate the SFG into two disjoint graphs Example: h h h 1 h h h 1 H. So, Sp1 Lecture 7 - ELEC816/61 39 H. So, Sp1 Lecture 7 - ELEC816/61 4 Cutset Retiming Generalization of the nodal delay transfer primitives Delay can be added to all incoming edges to a cutset if advances are added to all outgoing edges, and vice-versa Cutset Retiming h z 1 h h 1 z 1 h h 1 h H. So, Sp1 Lecture 7 - ELEC816/61 41 H. So, Sp1 Lecture 7 - ELEC816/61 4

Use of retiming Reduce critical path Pipelining Decrease number of registers Reduce Power Reduce clock rate In Summary Review basic computer arithmetic Add/sub easiest to implement Highly optimized in FPGAs Multiplier more complex VLSI has many optimized multipliers FPGAs design may use the fast carry logic Dedicated multiplier / DSP blocks Divisor very complex Use IP cores Signal flow graph and retiming helps to lay out signal processing systems H. So, Sp1 Lecture 7 - ELEC816/61 43 H. So, Sp1 Lecture 7 - ELEC816/61 44