Implementing Multipliers

Similar documents
Implementing Logic with the Embedded Array

Using Soft Multipliers with Stratix & Stratix GX

Techniques for Implementing Multipliers in Stratix, Stratix GX & Cyclone Devices

Stratix II DSP Performance

4. Embedded Multipliers in the Cyclone III Device Family

4. Embedded Multipliers in Cyclone IV Devices

Quartus II Simulation with Verilog Designs

Quartus II Simulation with Verilog Designs

Stratix II Filtering Lab

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

FLEX 10K. Features... Embedded Programmable Logic Family. Preliminary Information

10. DSP Blocks in Arria GX Devices

Stratix Filtering Reference Design

6. DSP Blocks in Stratix II and Stratix II GX Devices

Implementing FIR Filters and FFTs with 28-nm Variable-Precision DSP Architecture

Implementing Multipliers with Actel FPGAs

Cyclone II Filtering Lab

ACEX 1K. Features... Programmable Logic Family. Tools. Table 1. ACEX TM 1K Device Features

FLEX 10K. Features... Embedded Programmable Logic Family. Table 1. FLEX 10K Device Features

Arria V Timing Optimization Guidelines

BCD Adder. Lecture 21 1

a8259 Features General Description Programmable Interrupt Controller

ACEX 1K. Features... Programmable Logic Device Family. Tools

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Digital Systems Design

To appear in IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, San Francisco, February 2002.

A Survey on Power Reduction Techniques in FIR Filter

FLEX 10KE. Features... Embedded Programmable Logic Device

Mahendra Engineering College, Namakkal, Tamilnadu, India.

Lecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques.

Vector Arithmetic Logic Unit Amit Kumar Dutta JIS College of Engineering, Kalyani, WB, India

DESIGN OF EFFICIENT MULTIPLIER USING ADAPTIVE HOLD LOGIC

Introduction to Simulation of Verilog Designs. 1 Introduction. For Quartus II 13.0

FPGA Circuits. na A simple FPGA model. nfull-adder realization

First Name: Last Name: Lab Cover Page. Teaching Assistant to whom you are submitting

Introduction to Simulation of Verilog Designs. 1 Introduction

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:

IES Digital Mock Test

Understanding MAX 9000 Timing

International Journal of Advanced Research in Computer Science and Software Engineering

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

Understanding FLEX 8000 Timing

Computer Arithmetic (2)

A Low-Power High-speed Pipelined Accumulator Design Using CMOS Logic for DSP Applications

a6850 Features General Description Asynchronous Communications Interface Adapter

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

Topics. FPGA Design EECE 277. Combinational Logic Blocks. From Last Time. Multiplication. Dr. William H. Robinson February 25, 2005

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

Implementation of FPGA based Design for Digital Signal Processing

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

Computer Architecture Laboratory

Comparative Analysis of different Algorithm for Design of High-Speed Multiplier Accumulator Unit (MAC)

Digital Logic, Algorithms, and Functions for the CEBAF Upgrade LLRF System Hai Dong, Curt Hovater, John Musson, and Tomasz Plawski

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

The Design and Simulation of Embedded FIR Filter based on FPGA and DSP Builder

Data Word Length Reduction for Low-Power DSP Software

Introduction to Simulation of Verilog Designs. 1 Introduction. For Quartus II 11.1

Compressors Based High Speed 8 Bit Multipliers Using Urdhava Tiryakbhyam Method

Design of Multiplier Less 32 Tap FIR Filter using VHDL

EP220 & EP224 Classic EPLDs

CHAPTER 5 IMPLEMENTATION OF MULTIPLIERS USING VEDIC MATHEMATICS

On Built-In Self-Test for Adders

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

DIRECT UP-CONVERSION USING AN FPGA-BASED POLYPHASE MODEM

CHAPTER 1 INTRODUCTION

2009 Spring CS211 Digital Systems & Lab 1 CHAPTER 3: TECHNOLOGY (PART 2)

International Journal of Scientific & Engineering Research Volume 3, Issue 12, December ISSN

Understanding Timing in Altera CPLDs

Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter

Design and Analyse Low Power Wallace Multiplier Using GDI Technique

VLSI Implementation of Digital Down Converter (DDC)

Highly Versatile DSP Blocks for Improved FPGA Arithmetic Performance

Low Power 3-2 and 4-2 Adder Compressors Implemented Using ASTRAN

Introduction to Simulation of Verilog Designs Using ModelSim Graphical Waveform Editor. 1 Introduction. For Quartus II 13.1

A Review on Different Multiplier Techniques

AUTOMATIC IMPLEMENTATION OF FIR FILTERS ON FIELD PROGRAMMABLE GATE ARRAYS

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

Class Project: Low power Design of Electronic Circuits (ELEC 6970) 1

Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks

Design A Redundant Binary Multiplier Using Dual Logic Level Technique

FIR Filter Fits in an FPGA using a Bit Serial Approach

Digital Integrated CircuitDesign

Power Optimization in Stratix IV FPGAs

Analysis Parameter of Discrete Hartley Transform using Kogge-stone Adder

Performance Analysis of an Efficient Reconfigurable Multiplier for Multirate Systems

FIR Filter Design on Chip Using VHDL

ISSN Vol.07,Issue.08, July-2015, Pages:

Implementation and Performance Analysis of different Multipliers

An Optimized Design for Parallel MAC based on Radix-4 MBA

Classic. Feature. EPLD Family. Table 1. Classic Device Features

VLSI Implementation of Area-Efficient and Low Power OFDM Transmitter and Receiver

Implementation of Efficient 5:3 & 7:3 Compressors for High Speed and Low-Power Operations

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Transcription:

Implementing Multipliers in FLEX 10K Devices March 1996, ver. 1 Application Note 53 Introduction The Altera FLEX 10K embedded programmable logic device (PLD) family provides the first PLDs in the industry with an embedded array. The embedded array consists of a series of embedded array blocks (EABs) that can implement complex logic functions, such as multipliers. Each EAB can be configured as an 8-input, 8-output look-up table (LUT). Therefore, a single EAB can create a multiplier with up to 8 inputs such as a, 5 3, or 6 2 multiplier. Figure 1 shows a graphical representation of the flexible multiplier sizes that can be implemented in an EAB. Figure 1. Multiplier Configuration for a Single EAB 5 3 6 2 This application note describes how to implement large multipliers using several EABs and compares parallel multiplier and time-domainmultiplexed multiplier implementations. 1 The design files described in this application note are available from the Altera BBS via modem at (08) 95-010 and the Altera FTP site at ftp.altera.com. The self-extracting files are: an_53.exe and an_53.tar. Single-EAB Multipliers You can implement a multiplier with up to 8 inputs in a single EAB using a function from the library of parameterized modules (LPM). The LPM is a set of architecture-independent modules with scalable widths that completely describes the logical operation of a circuit. Using the LPM function, lpm_mult, you can define the width of the multiplicand for the multiplier. Then, you can use MAX+PLUS II to place the multiplier in an EAB by following these steps: 1. Select the lpm_mult function in any MAX+PLUS II application. 2. Choose the Logic Options command (Assign menu). In the Logic Options dialog box, the name of the function is displayed in the Node Name box. Altera Corporation 1 A-AN-053-01

3. Choose the Individual Logic Options button and turn on the Implement in EAB option. Choose OK.. Choose OK to implement the multiplier in the EAB. Multiple-EAB Multipliers A multiplier with more than 8 inputs must be implemented in two or more EABs. Each EAB computes a single partial product, generated from a multiplier. To illustrate how to split the multiplier across multiple EABs, consider how a 2-digit by 2-digit multiplication is calculated using base 10 multiplication. See Figure 2. Figure 2. Base 10 Multiplication 12 37 7 1 7 2 + 3 1 3 2 3 10 2 + (7 + 6) 10 1 + 1 10 0 Rather than using base 10 (as shown in Figure 2), the EAB performs the same operation in hexadecimal radix. Each partial product is calculated within a single EAB. See Figure 3. Figure 3. Hexadecimal Multiplication Each partial product is generated by one EAB. Partial products are summed to produce the final product. X[7..] X[3..0] Y[7..] Y[3..0] X[7..] Y[3..0] X[3..0] Y[3..0] + X[7..] Y[7..] X[3..0] Y[7..] X[7..] Y[7..] 2 + ((X[7..] Y[3..0]) + (X[3..0] Y[7..])) 1 + X[7..] Y[3..0] 0 To account for the relative significance in hexadecimal radix, each partial product is multiplied by n (where n = 0, 1, 2,...) and then added together to determine the final product. You can choose one of two design methods to generate the final product: a parallel multiplier or a time-domain-multiplexed multiplier. 2 Altera Corporation

Parallel Multiplier AN 53: Implementing Multipliers in FLEX 10K Devices The parallel multiplier design method uses multiple EABs to generate all of the partial products in parallel. For example, an 8 8 parallel multiplier uses four EABs (one for each partial product) to simultaneously generate four partial products. Before adding the partial products together, each partial product is shifted to account for the n term (i.e., each partial product is shifted over n hexadecimal digits or n bits). The adder assembles the final product by shifting the data into different bits. Addition is normally generated by a two-stage adder with 8 bits for the first stage and 12 bits for the second stage (see Figure ). Figure. 2-Stage Adder S 7 S 6 S 5 S S 3 S 2 S 1 S 0 T 7 T 6 T 5 T T 3 T 2 T 1 T 0 R 7 R 6 R 5 R R 3 R 2 R 1 R 0 + U 7 U 6 U 5 U U 3 U 2 U 1 U 0 Q 15 Q 1 Q 13 Q 12 Q 11 Q 10 Q 9 Q 8 Q 7 Q 6 Q 5 Q Q 3 Q 2 Q 1 Q 0 Where R = X[3..0] Y[3..0] S = X[3..0] Y[7..] T = X[7..] Y[3..0] U = X[7..] Y[7..] Addition performed in the first stage Addition performed in the second stage You can pipeline the parallel multiplier to enhance design speeds by using registers to process logic over multiple Clock cycles. The registers within the EAB can be used for pipelining (see Figure 5). Altera Corporation 3

Figure 5. Parallel Multiplier with Pipelining Optional Pipelining Registers X[3..0] EAB Z[3..0] Y[3..0] X[3..0] Z[7..] Y[7..] X[7..] Y[3..0] Z[11..8] X[7..] Y[7..] Z[15..12] Multiplier An 8 8 parallel multiplier is implemented in 3 stages: a multiplier stage using EABs, and 2 adder stages with 8 bits for the first stage and 12 bits for the second stage. To pipeline the multiplier, each bit must be registered after each stage, which requires 21 registers for the first stage and registers for the second stage. For the multiplier stage, each EAB has registers available at the inputs and outputs. Therefore, additional logic elements (LEs) are not required for the multiplier stage. The LEs containing the adder logic provide 21 registers; therefore only 20 additional LEs are required for the entire circuit. Time-Domain-Multiplexed Multiplier The time-domain-multiplexed multiplier design method uses a single EAB to generate all partial products on different Clock cycles (see Figure 6). Therefore, the appropriate bits need to be loaded into the EAB before each multiplication. After multiplication, the accumulator shifts the data to account for the n term and then sums the different partial products to produce the final product. Altera Corporation

Figure 6. Simulation Waveform for Time-Domain-Multiplexed Multiplier Clock EAB Output R S T U Accumulator Output (1) (2) (3) () Where R = X[3..0] Y[3..0] S = X[3..0] Y[7..] T = X[7..] Y[3..0] U = X[7..] Y[7..] Notes: (1) X[3..0] Y[3..0] 0 (2) (X[3..0] Y[3..0] 0 ) + (X[3..0] Y[7..] 1 ) (3) (X[3..0] Y[3..0] 0 ) + ((X3..0] Y[7..]) + (X[7..] Y[3..0])) 1 () (X[3..0] Y[3..0] 0 ) + [((X[3..0] Y[7..]) + (X[7..] Y[3..0])) 1 ] + (X[7..] Y[7..] 2 ) To pipeline the time-domain-multiplexed multiplier, insert registers between the EAB performing the multiplication and the accumulator performing the addition and shifting. Figure 7 shows a timedomain-multiplexed multiplier. Figure 7. Time-Domain-Multiplexed Multiplier X[7..] D Q ENA Optional Input Registers X[3..0] Y[7..] D Q ENA D Q EAB 8 Multiplier Shift 8 Shift Shift 0 12 8 Loadable Accumulator D Q Z[15..0] ENA Y[3..0] D Q ENA Control Altera Corporation 5

You can also increase throughput in the time-domain-multiplexed multiplier design method by implementing the multiplier in two or more EABs. Then, the multiplier computes multiple partial products simultaneously, which reduces the number of Clock cycles. The time-domain-multiplexed multiplier implementation is well-suited for very large multiplications, such as or 32 32, because it conserves EABs and logic cells. In contrast, large multiplications would consume a prohibitive amount of logic cells or EABs if computed in parallel. Design Speed The parallel multiplier generates all of the partial products and sums the response within a single Clock cycle. In addition, data is loaded on every Clock cycle, giving the parallel multiplier high throughput and fast calculation times. Designers can pipeline the parallel multiplier for faster Clock speeds. Pipelining requires multiple Clock cycles and more latency time to generate the multiplication for a single multiplier. However, it decreases the Clock period while still allowing new data to be loaded on every Clock cycle. The faster Clock speeds generated by pipelining allow for the highest throughput for consecutive operations because pipelining can generate a new product on every Clock cycle. See Figure 8. Figure 8. Simulation Waveforms for Non-Pipelined & Pipelined Parallel Multipliers Non-Pipelined Parallel Multiplier Clock Data Product 1 Computation Computations 1 2 3 1 2 3 Pipelined Parallel Multiplier Clock Data 1 2 3 Product 1 2 3 1 Computation Computations 6 Altera Corporation

The typical time-domain-multiplexed multiplier uses a single EAB to compute all partial products on different Clock cycles. Therefore, multiplication requires the same number of Clock cycles as partial products. In the 8 8 bit multiplication example shown in Figure 7, the multiplication requires Clock cycles. When consecutive multiplications are required, the first multiplication must be completed before the second multiplication can begin. Designers can pipeline the time-domain-multiplexed multiplier for faster Clock speeds. Pipelining creates faster Clock speeds by reducing the Clock period and generating higher throughput. Table 1 summarizes the performance of parallel and time-domain-multiplexed multipliers. Table 1. Circuit Performance Design Clock Cycles for an 8 8 Multiplier One Multiplication Two Multiplications Parallel Multiplier 1 2 Parallel Multiplier with 3-Stage 3 Pipeline Time-Domain-Multiplexed 9 Multiplier Time-Domain-Multiplexed Multiplier with 2-Stage Pipeline 5 10 Device Utilization The 8 8 parallel multiplier design uses EABs plus 21 additional LEs required for the 12-bit and 8-bit adders. A 3-stage pipeline requires 20 additional registers to store data. A parallel multiplier with 3-stage pipelining will not require any additional LEs when the registers are implemented in the EAB. In contrast, the time-domain-multiplexed multiplier uses only one EAB. The multiplier uses logic, rather than EABs, to select which bits are used for multiplication. A time-domain-multiplexed multiplier with 2-stage pipelining does not require any additional LEs. Altera Corporation 7

Table 2 summarizes the number of EABs and LEs required for each type of multiplier. Table 2. Device Utilization for an 8 8 Multiplier Design EABs Required LEs Required Parallel Multiplier 2 Parallel Multiplier with 3-Stage 5 Pipeline Time-Domain-Multiplexed Multiplier 1 65 Time-Domain-Multiplexed Multiplier with 2-Stage Pipeline 1 65 Conclusion Large multipliers can be implemented in FLEX 10K devices with either a parallel multiplier or time-domain-multiplexed multiplier design method. The parallel multiplier offers the fastest Clock speeds but requires more space and device resources. The timedomain-multiplexed multiplier conserves space and device resources but offers slower Clock speeds. Both design methods can be pipelined for faster Clock speeds. 8 Altera Corporation

Copyright 1995, 1996 Altera Corporation, 2610 Orchard Parkway, San Jose, California 9513, USA, all rights reserved. By accessing any information on this CD-ROM, you agree to be bound by the terms of Altera s Legal Notice.