MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng.

Similar documents
ENERGY consumption is a critical design criterion for

Trading Accuracy for Power in a Multiplier Architecture

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India,

Low-Power Approximate Unsigned Multipliers with Configurable Error Recovery

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER

A Survey on A High Performance Approximate Adder And Two High Performance Approximate Multipliers

Low Power System-On-Chip-Design Chapter 12: Physical Libraries

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

Faster and Low Power Twin Precision Multiplier

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

Digital Systems Design

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

A Highly Efficient Carry Select Adder

High-speed low-power 2D DCT Accelerator. EECS 6321 Yuxiang Chen, Xinyi Chang, Song Wang Electrical Engineering, Columbia University Prof.

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

A New Configurable Full Adder For Low Power Applications

Audio Sample Rate Conversion in FPGAs

128 BIT MODIFIED SQUARE ROOT CARRY SELECT ADDER

AN EFFICIENT DESIGN OF ROBA MULTIPLIERS 1 BADDI. MOUNIKA, 2 V. RAMA RAO M.Tech, Assistant professor

International Journal of Scientific & Engineering Research, Volume 7, Issue 3, March-2016 ISSN

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

Design and Implementation of High Speed Carry Select Adder

A Novel High-Speed, Higher-Order 128 bit Adders for Digital Signal Processing Applications Using Advanced EDA Tools

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

An Efficient SQRT Architecture of Carry Select Adder Design by HA and Common Boolean Logic PinnikaVenkateswarlu 1, Ragutla Kalpana 2

Design and Analysis of Row Bypass Multiplier using various logic Full Adders

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

Implementing Logic with the Embedded Array

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

Digital Integrated CircuitDesign

A Design Approach for Compressor Based Approximate Multipliers

DESIGN AND IMPLEMENTATION OF 64- BIT CARRY SELECT ADDER IN FPGA

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY

Fpga Implementation of Truncated Multiplier Using Reversible Logic Gates

Controlled Timing-Error Acceptance for Low Energy IDCT Design

Run-Length Based Huffman Coding

Design and Implementation of Carry Select Adder Using Binary to Excess-One Converter

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

Design and Simulation of Low Power and Area Efficient 16x16 bit Hybrid Multiplier

Optimization of Overdrive Signoff

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.

An Inversion-Based Synthesis Approach for Area and Power efficient Arithmetic Sum-of-Products

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

DESIGN OF LOW POWER / HIGH SPEED MULTIPLIER USING SPURIOUS POWER SUPPRESSION TECHNIQUE (SPST)

Lecture 1. Tinoosh Mohsenin

A Low Power Array Multiplier Design using Modified Gate Diffusion Input (GDI)

LOW-POWER FFT VIA REDUCED PRECISION

Implementing Multipliers with Actel FPGAs

ATA Memo No. 40 Processing Architectures For Complex Gain Tracking. Larry R. D Addario 2001 October 25

Design of Energy Aware Adder Circuits Considering Random Intra-Die Process Variations

EC 1354-Principles of VLSI Design

A New Architecture for Signed Radix-2 m Pure Array Multipliers

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

Chapter 4. Variable-Precision Arithmetic Circuit Implementation

PUBLICATIONS OF PROBLEMS & APPLICATION IN ENGINEERING RESEARCH - PAPER CSEA2012 ISSN: ; e-issn:

Low Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS

An Efficent Real Time Analysis of Carry Select Adder

REVIEW ARTICLE: EFFICIENT MULTIPLIER ARCHITECTURE IN VLSI DESIGN

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

High Speed Low Power Operations for FFT Using Reversible Vedic Multipliers

Signal Integrity Management in an SoC Physical Design Flow

Behavioral Modeling of Digital Pre-Distortion Amplifier Systems

arxiv: v1 [cs.et] 18 Mar 2018

CHAPTER 4 GALS ARCHITECTURE

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS

Verilog Implementation of 64-bit Redundant Binary Product generator using MBE

Exploring High-Speed Low-Power Hybrid Arithmetic Units at Scaled Supply and Adaptive Clock-Stretching

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

CHAPTER 5 DESIGN OF COMBINATIONAL LOGIC CIRCUITS IN QCA

VLSI IMPLEMENTATION OF MODIFIED DISTRIBUTED ARITHMETIC BASED LOW POWER AND HIGH PERFORMANCE DIGITAL FIR FILTER Dr. S.Satheeskumaran 1 K.

Low-Power CMOS VLSI Design

Design and Implementation of Efficient Carry Select Adder using Novel Logic Algorithm

Efficient Carry Select Adder Using VLSI Techniques With Advantages of Area, Delay And Power

Efficient Dedicated Multiplication Blocks for 2 s Complement Radix-2m Array Multipliers

Low Power 3-2 and 4-2 Adder Compressors Implemented Using ASTRAN

IMPLEMENTATION OF AREA EFFICIENT MULTIPLIER AND ADDER ARCHITECTURE IN DIGITAL FIR FILTER

SQRT CSLA with Less Delay and Reduced Area Using FPGA

Optimized high performance multiplier using Vedic mathematics

Multi-Valued Majority Logic Circuits Using Spin Waves

HIGHLY RELIABLE LOW POWER MAC UNIT USING VEDIC MULTIPLIER

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES

International Journal of Modern Trends in Engineering and Research

Comparison of Conventional Multiplier with Bypass Zero Multiplier

CMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits

Interconnect-Power Dissipation in a Microprocessor

2. ADC Architectures and CMOS Circuits

DESIGN OF LOW POWER HIGH SPEED ERROR TOLERANT ADDERS USING FPGA

DESIGN OF LOW POWER MULTIPLIERS

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

Overview of Design Methodology. A Few Points Before We Start 11/4/2012. All About Handling The Complexity. Lecture 1. Put things into perspective

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

Advanced FPGA Design. Tinoosh Mohsenin CMPE 491/691 Spring 2012

Approximating Computation and Data for Energy Efficiency

Design and Analysis of CMOS Based DADDA Multiplier

Designing Reliable and Low Power Multiplier by using Algorithmic Noise Tolerant

Transcription:

MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng., UCLA - http://nanocad.ee.ucla.edu/ 1

Outline Introduction Inaccurate Multiplier Design Hardware vs. Software Tradeoff Conclusions & Future Work 2

Motivation Accuracy can be used as a design metric, traded for : Area Nearly half the hardware in an array multiplier is there to get the last bit right Timing in n-bit addition, on average the carry propagates log 2 n positions Power a probabilistic switch s energy consumption can be as low as kt*log 2 *(1-p) Joules per transition Especially useful in applications that are capable of absorbing errors 3

Voltage Over-scaling Introduce inaccuracy by intelligently over-scaling voltage(s): Saving power, at the expense of incomplete computations Use multiple voltage levels to generate acceptable SNRs Ignore overhead of level conversion, routing etc. Not always feasible in compact arithmetic layouts 4

Under-design Focus on inaccurate adders Carry Out independent of Carry In, shortens critical path Accuracy traded for speed or yield No easy correction methodology 5

Outline Introduction Inaccurate Multiplier Design - Error rates and Power savings - Comparison with Scaling/Truncation - Application : Image Filtering - Correct Mode of operation Hardware vs. Software Tradeoff Summary, Conclusions & Future Work 6

Inaccurate Multiplier Introduce error by manipulating the logic function, using the 2x2 multiplier as a building block Correct for 15/16 input combinations The modified K-Map allows for a simpler implementation: Half the area Shorter critical path Lesser switching capacitance 7

Building Larger Multipliers A H and X H upper two bits, A L and X L lower two bits Use the inaccurate 2x2 block to generate partial products, add shifted partial products to get final result Inaccuracy introduced through the partial products alone, the adder network remains accurate 8

Error Rates Bit-Width Error Probability Mean-Error Max-Error 2 0.06 1.39% 22.22% 4 0.19 2.60% 22.22% 8 0.46 3.25% 22.22% 12 0.67 3.31% 22.22% 16 0.81 3.32% 22.22% Building block error-probability = 1/16 Used C++ simulation models for higher bit widths Relatively large but constant max-error of 22.22% Mean-Error increases with bit-width saturating around ~3.3% 9

Max Error Max-Error remains constant at 22.22% Property of the building block, transferred to higher bit-widths Probability of hitting max-error reduces rapidly with increasing bit-width Bit-Width Max-Error Probability 2 0.0625 4 0.0351 8 0.0243 12 0.0226 16 0.0225 10

Saturating Mean-Error Relatively small mean error of between ~1.4%-3.3% This compares well with other approaches As bit width increases the percentage of larger errors reduces, leading to saturation 11

Experimental Setup Architectures written in Verilog and synthesized using RTL- Compiler to 45nm Nangate open cell library Inaccurate Multiplier : 2x2 building blocks to generate partial products Accurate adder tree to produce final solution Accurate Multiplier, best of either : Generate accurate partial products and add Architecture selection and optimization by the tool Simulation in NCSIM VCD file back-annotate for power analysis 12

Power Savings 60 50 40 Dynamic Power Reduction (%) 30 Leakage Power Reduction (%) 20 10 Area Reduction (%) 0 1.33 2 2.5 3 3.33 Power and area savings between 31%-45% Power benefits of building block get transferred to larger bit-widths Bit Width 13 F 1.25F 1.5F 1.75F 2F Avg. 2 44.9% 42.1% 42.1% 48.9% 48.9% 45.4% 4 13.7% 31.6% 44.8% 44.7% 46.5% 36.3% 8 33.1% 40.4% 26.3% 48.8% 58.9% 41.5% 16 25.6% 29.6% 32.4% 33.8% 37.4% 31.8%

Design Level Savings Design # of Multipliers # of Gates Multiplier Power Reduction Total Power Reduction FFT 32 158K 25.16% 13.98% FIR 4 1.1K 31.09% 18.30% Mini-RISC 1 10K 28.04% 1.51% Power savings reflect best in multiplier intensive designs like the FIR filter Less pronounced for other designs Benefits of approximate arithmetic are very design specific 14

Power Savings vs. Frequency Savings increase as desired frequency of operation increases The inaccurate multiplier is inherently faster, needing less aggressive gate sizing to meet increasing frequency constraints 15

Tunable Error Replacing individual 2x2 components with accurate versions allows a designer to exploit other points on the power-accuracy curve Inaccurate partial products yield a better trade-off than an inaccurate adder tree 16

Comparison with Over-Scaling Scaled Voltage Error Probability Mean-Error Max-Error 0.90V 0.89 20.86% 100.0% 0.77V 0.99 38.07% 100.0% Table shows error rates and mean-error for 8 bit multiplier, post voltage over-scaling Scaled voltage corresponds to power savings of between 30%-50% Significantly larger mean and max error Mean-Error for Inaccurate Multiplier : 3.25% Max-Error for Inaccurate Multiplier : 22.22% 17

Leakage Optimization & Voltage Scaling Figure plots the effect of leakage optimization on mean-error post voltage over-scaling scaling (nominal 1.2V 1.1V) Voltage over-scaling becomes significantly less attractive with increasing leakage optimization 18

Power Reduction (%) 19 50.00 45.00 40.00 35.00 30.00 25.00 20.00 15.00 10.00 5.00 0.00 Comparison with Truncation 0 10 20 30 40 50 Mean Error (%) Partial Products LSB truncation offers slightly better trade-off than that achieved by MSB truncation Both LSB and MSB truncation offer a much poorer trade-off than our proposed method as well as introducing errors through the adders FA LSB MSB

Application : Image Filtering Accurate Baseline Inaccurate: 41.5% power reduction, 20.3dB Over-scaling: 30% power reduction, 9.16dB Over-scaling: 50% power reduction, 2.64dB

Error Detection/Correction Simple decoder logic used to detect presence and value of errors Residual Adder can be used to add the correction amount Can produce correct behavior when needed 21

Non-Adder Based Error Correction For our experiments we used adder based correction But Custom logic for correction is also possible 22

Accurate Mode of Operation (1/2) 4.6% - 10.5% area & 4.8% - 8.56% power overhead (over baseline accurate) Envision a system with two modes of operation : Regular : inaccurate, power-saving mode with correction unit turned off Critical : accurate, correction unit ON leading to slower and more power hungry operation 23

Accurate Mode of Operation (2/2) We look at three different configurations for the correction unit, requiring varying degrees of architectural support 1: Accurate mode runs slower (0.85*original-frequency) Needs power-gating + frequency scaling Average power saving ~36% 2: Accurate and Inaccurate mode run at same frequency Needs only power gating Smaller average power reduction ~12.44% Only useful at lower frequencies 3: Inaccurate runs at same frequency but lower voltage Power gating + Voltage-Scaling Average power saving ~33.22% 24

Outline Introduction Inaccurate Multiplier Design Hardware vs. Software Tradeoff Conclusions & Future Work 25

Comparison with Software Tradeoff Software allows for a tradeoff between accuracy and runtime/power for applications such as JPEG Hardware approach still consumes less power for the same SNR if it lies on the critical path Savings are smaller than stand-alone inaccurate vs. baseline 26

Outline Introduction Inaccurate Multiplier Design Hardware vs. Software Tradeoff Conclusions & Future Work 27

Conclusions (1/2) We propose an inaccurate multiplier architecture that : Leverages a 2x2 building block to trade accuracy for power Displays a mean-error of 1.39% - 3.35% Results in power savings between 30% - 50% For a simple image filtering application the inaccurate multiplier : Achieves 2X - 8X better SNR than simple voltage scaling Does not suffer from multiple voltage domain overheads of advanced over-scaling techniques The architecture is extended to allow for a correct mode of operation 28

29 Conclusions (2/2) For inaccurate multipliers, introducing errors via partial products offers a better power vs. error tradeoff than through the adders Designing for error avoids the overheads of multiple power domains and can be easily integrated into the ASIC design flow Benefits of inaccurate arithmetic are very design specific Software based tradeoff can offer comparable benefits for some applications

30 Future Work Extension of methodology to other arithmetic units adders, dividers etc. Simpler correction/decoder units, with lower overhead Method to find the point of maximum power benefit for a given error rate