Sloppy Addition and Multiplication

Similar documents
California State University, Bakersfield Computer & Electrical Engineering & Computer Science ECE 3220: Digital Design with VHDL Laboratory 6

HIGH VOLTAGE DC-DC CONVERTER USING A SERIES STACKED TOPOLOGY

Frequency Calibration of A/D Converter in Software GPS Receivers

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT)

Digital Control of Boost PFC AC-DC Converters with Predictive Control

Analysis. Control of a dierential-wheeled robot. Part I. 1 Dierential Wheeled Robots. Ond ej Stan k

DSP-Based Control of Boost PFC AC-DC Converters Using Predictive Control

A Flyback Converter Fed Multilevel Inverter for AC Drives

CHAPTER 2 WOUND ROTOR INDUCTION MOTOR WITH PID CONTROLLER

REAL-TIME IMPLEMENTATION OF A NEURO-AVR FOR SYNCHRONOUS GENERATOR. M. M. Salem** A. M. Zaki** O. P. Malik*

International Journal of Engineering Research & Technology (IJERT) ISSN: Vol. 1 Issue 6, August

Design, Realization, and Analysis of PIFA for an RFID Mini-Reader

Constant Switching Frequency Self-Oscillating Controlled Class-D Amplifiers

Control of Electromechanical Systems using Sliding Mode Techniques

STRUCTURAL SEMI-ACTIVE CONTROL DEVICE

Lab 7 Rev. 2 Open Lab Due COB Friday April 27, 2018

RESEARCH ON NEAR FIELD PASSIVE LOCALIZATION BASED ON PHASE MEASUREMENT TECHNOLOGY BY TWO TIMES FREQUENCY DIFFERENCE

Raising Cavity Q for Microwave-Pulse Compression by Reducing Aperture Skin-Effect Losses

Lecture 6-1. Data Path Circuits

Hashiwokakero. T. Morsink. August 31, 2009

Kalman Filtering Based Object Tracking in Surveillance Video System

A simple low rate turbo-like code design for spread spectrum systems

Comm 502: Communication Theory. Lecture 5. Intersymbol Interference FDM TDM

SCK LAB MANUAL SAMPLE

Influence of Sea Surface Roughness on the Electromagnetic Wave Propagation in the Duct Environment

Asymptotic Diversity Analysis of Alamouti Transmit Diversity with Quasi-ML Decoding Algorithm in Time-Selective Fading Channels

Comparative Study of PLL, DDS and DDS-based PLL Synthesis Techniques for Communication System

Method to Improve Range and Velocity Error Using De-interleaving and Frequency Interpolation for Automotive FMCW Radars

Identification of Image Noise Sources in Digital Scanner Evaluation

Voltage Analysis of Distribution Systems with DFIG Wind Turbines

A New Technique to TEC Regional Modeling using a Neural Network.

Subcarrier exclusion techniques

Design and Performance Comparison of PI and PID Controllers For Half Bridge DC-DC Converter

Sampling Theory MODULE XIII LECTURE - 41 NON SAMPLING ERRORS

Available online at ScienceDirect. Procedia Technology 17 (2014 )

Design of buck-type current source inverter fed brushless DC motor drive and its application to position sensorless control with square-wave current

Fixed Structure Robust Loop Shaping Controller for a Buck-Boost Converter using Genetic Algorithm

DESIGN OF SECOND ORDER SIGMA-DELTA MODULATOR FOR AUDIO APPLICATIONS

The RCS of a resistive rectangular patch antenna in a substrate-superstrate geometry

Gemini. The errors from the servo system are considered as the superposition of three things:

A Real-Time Wireless Channel Emulator For MIMO Systems

Deterministic Deployment for Wireless Image Sensor Nodes

Self-Programmable PID Compensator for Digitally Controlled SMPS

Phase-Locked Loops (PLL)

Design of Centralized PID Controllers for TITO Processes*

Mechatronics Laboratory Assignment 5 Motor Control and Straight-Line Robot Driving

Improving the Regulatory Response of PID Controller Using Internal Model Control Principles

IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 11, 2016 ISSN (online):

Simulation study on Sinusoidal Pulse Width Modulation based on Digital Signal Processing Technique

A Simple DSP Laboratory Project for Teaching Real-Time Signal Sampling Rate Conversions

Efficient Electronic Payment Systems by Using a Sparse Elliptic Curve Cryptography

Chapter Introduction

AC : TEACHING DIGITAL FILTER IMPLEMENTATIONS US- ING THE 68HC12 MICROCONTROLLER

Published in: Proceedings of the 26th European Solid-State Circuits Conference, 2000, ESSCIRC '00, September 2000, Stockholm, Sweden

CIRCULAR SYNTHETIC APERTURE SONAR WITHOUT A BEACON

Techniques for Implementing a Model Simulated on a Physical Drive Vector Control

Load frequency control of interconnected hydro-thermal power system using conventional pi and fuzzy logic controller

Adaptive Space/Frequency Processing for Distributed Aperture Radars

CONTROL OF COMBINED KY AND BUCK-BOOST CONVERTER WITH COUPLED INDUCTOR

Resonant amplifier L A B O R A T O R Y O F L I N E A R C I R C U I T S. Marek Wójcikowski English version prepared by Wiesław Kordalski

The Cascode and Cascaded Techniques LNA at 5.8GHz Using T-Matching Network for WiMAX Applications

SIMULATION OF TWO CONTINUOUS DTC SCHEMES FOR THE INDUCTION MOTOR

Pre- and Post-DFT Combining Space Diversity Receiver for Wideband Multi-Carrier Systems

A Faster and Accurate Method for Spectral Testing Applicable to Noncoherent Data

AN EVALUATION OF DIGILTAL ANTI-ALIASING FILTER FOR SPACE TELEMETRY SYSTEMS

Time-Domain Coupling to a Device on Printed Circuit Board Inside a Cavity. Chatrpol Lertsirimit, David R. Jackson and Donald R.

A Solution for DC-DC Converters Study

Active Harmonic Elimination in Multilevel Converters Using FPGA Control

Produced in cooperation with. Revision: May 26, Overview

Basic Study of Radial Distributions of Electromagnetic Vibration and Noise in Three-Phase Squirrel-Cage Induction Motor under Load Conditions

Supervised Information-Theoretic Competitive Learning by Cost-Sensitive Information Maximization

COST OF TRANSMISSION TRANSACTIONS: Comparison and Discussion of Used Methods

Instantaneous Cycle-Slip Detection and Repair of GPS Data Based on Doppler Measurement

Stiffness Control of a Robotic Arm Using Robust Fixed Point Transformations

Design of Control for Battery Storage Unit Converter

UNIVERSITY OF SASKATCHEWAN EE456: Digital Communications FINAL EXAM, 9:00AM 12:00PM, December 9, 2010 (open-book) Examiner: Ha H.

Modeling and Simulation of Digital Filter Jie Zhao

Improved Selective Harmonic Elimination for Reducing Torque Harmonics of Induction Motors in Wide DC Bus Voltage Variations

The Performance Analysis of MIMO OFDM System with Different M-QAM Modulation and Convolution Channel Coding

A Feasibility Study on Frequency Domain ADC for Impulse-UWB Receivers

A Programmable Compensation Circuit for System-on- Chip Application

PERFORMANCE ANALYSIS OF SWITCHED RELUCTANCE MOTOR; DESIGN, MODELING AND SIMULATION OF 8/6 SWITCHED RELUCTANCE MOTOR

A moving sound source localization method based on TDOA

AN INTERACTIVE DESIGN OF THE WINDING LAYOUT IN PERMANENT MAGNET MACHINES

PERFORMANCE EVALUATION OF LLC RESONANT FULL BRIDGE DC-DC CONVERTER FOR AUXILIARY SYSTEMS IN TRACTION

Optimized BER Performance of Asymmetric Turbo Codes over AWGN Channel

Reactive Power Control of Photovoltaic Systems Based on the Voltage Sensitivity Analysis Rasool Aghatehrani, Member, IEEE, and Anastasios Golnas

Comparison Study in Various Controllers in Single-Phase Inverters

Active vibration isolation for a 6 degree of freedom scale model of a high precision machine

Design of Monotonic Digitally Controlled Oscillator (DCO) for Wide Tuning Range

DIGITAL COMMUNICATION

Optimal Control for Single-Phase Brushless DC Motor with Hall Sensor

Renewable Energy 36 (2011) 2508e2514. Contents lists available at ScienceDirect. Renewable Energy. journal homepage:

Integral Control AGC of Interconnected Power Systems Using Area Control Errors Based On Tie Line Power Biasing

Hybrid Cascaded H-Bridge Multilevel Inverter Motor Drive DTC Control for Electric Vehicles.

FUZZY Logic Based Space Vector PWM Controlled Hybrid Active Power Filter for Power Conditioning

Design of hybrid continuous-time discrete-time delta-sigma modulators. Kwan, HK; Lui, SH; Lei, CU; Liu, Y; Wong, N; Ho, KL

EEEE 480 Analog Electronics

NEW BACK-TO-BACK CURRENT SOURCE CONVERTER WITH SOFT START-UP AND SHUTDOWN CAPABILITIES

Power Electronics Laboratory. THE UNIVERSITY OF NEW SOUTH WALES School of Electrical Engineering & Telecommunications

Transcription:

Sloppy Addition and Multiplication IMM-Technical Report-2011-14 Alberto Nannarelli Dept. Informatic and Mathematical Modelling Technical Univerity of Denmark Kongen Lyngby, Denmark Email: an@imm.dtu.dk Abtract Sometime reducing the preciion of a numerical proceor, by introducing error, can lead to ignificant performance (delay, area and power diipation) improvement without compromiing the overall quality of the proceing. In thi work, we how how to perform the two baic operation, addition and multiplication, in an imprecie manner by implifying the hardware implementation. With the propoed loppy operation, we obtain a reduction in delay, area and power diipation, and the error introduced i till acceptable for application uch a image proceing. 1 Introduction In common language the adjective arithmetical uually indicate omething very precie or error-free. However, alo arithmetic operation have to be put in the context. There are everal field of application of computer arithmetic that can tolerate ome impreciion. For example, in audio and image proceing or in wirele communication, it might be deirable to get better performance (fater, maller, le power-hungry ytem) at expene of ome quality degradation. Recently, a few paper have addreed thi iue of deigninmprecie hardware to ave power [1, 2, 3, 4]. In thi work, we introduce a ytematic way of havinmprecie arithmetic operation for the two mot common operation: addition and multiplication. We liked the term loppy introduced in [5], and we will ue thi term in the paper to refer to imprecie arithmetic operation.

2 Sloppy Addition The idea i very imple. Do we need to propagate the carry for the whole word? Auming that we are operating on poitive integer, and defining poition k a the bit of weight 2 k in a n-bit word, we can ignore the carry up to poition k when implementing the addition. The bit-level algorithm to implement thi loppy adder i the following: c=0 // carry if (i < k) then _i = a_i XOR b_i; ele _i = a_i XOR b_i XOR c; c = (a_i AND b_i) OR (a_i AND c) OR (b_i AND c); endif For example, addition 103 + 70 (n = 8, k = 4): loppy exact A : 0110 0111 + 0110 0111 + B : 0100 0110 + 0100 0110 + c : 100- ---- = 0100 110- = -------------- -------------- S : 1010 0001 1010 1101 That i, the loppy adder compute 161 (exact value i 173) introducing an error ǫ = 12. By looking at the bit of weight < 2 k, we notice that the XOR of two one produce a zero um bit (1 1 = 0). Becaue the carry i not computed (or propagated), in poition k an error 2 k+1 i generated. The error can be halved to 2 k by computing the OR of the two bit in place of the XOR. For the example above we have: loppy (OR-ing) A : 0110 0111 + B : 0100 0110 + c : 100- ---- = -------------- S : 1010 0111 and the error i reduced from ǫ = 12 to ǫ = 6 (halved). By imulating all poible combination of the operand for the 8-bit addition (k = 4), we found that by obtaining the um by OR-ing the k leat-ignificant bit the average error i ǫ mean = 3.75, while by XOR-ing, it i ǫ mean = 7.5. We how in Figure 1 the comparion of the hardware implementation of the loppy adder ued in the above example (n = 8, k = 4) and an error-free 8-bit carry-propagate adder (CPA). The data on delay, area and power are reported in Table 1. In a rough evaluation, we conidered lowering the upply voltage V DD in the loppy adder to match the delay of the error-free adder (1.0 n). In our library, when V DD i lowered from 1.0 V 2

carry network carry network Figure 1: Implementation of 8-bit error-free (top) and loppy k = 4 (bottom) adder. to 0.7 V the delay double. Becaue the power diipation i P 1.0V = V 2 DD f N ai C i 20 = (1.0) 2 K we aume that the witching activity doe not change when caling V DD. Therefore, K = 20 i contant: P 0.7V = (0.7) 2 20 10 µw That i, with the loppy adder the power i reduced to 1/4 at ame adder peed. 2.1 Example: loppy adder in image filtering We ue the loppy adder defined above (k = 4) to proce two graycale (each pixel i an unigned 8-bit integer) image for the following bidimenional filter: 1. an averaging (low-pa) filter; 2. a harpening filter; 3

CPA 8-bit loppy ratio max. delay [p] 999 495 2.00 Area [µm 2 ] 191 112 1.70 Power [µw] 42 20 2.10 Table 1: Synthei data of adder in Figure 1. moothing harpening edge det. ǫ max ǫ ǫ max ǫ ǫ max ǫ uma 26 7.2 60 18.9 64 9.0 hue 28 7.8 59 17.5 68 9.2 Table 2: Error analyi of proceed image. 3. an edge-detection unit. The viual reult are hown in Figure 2. The maximum error (abolute value) ǫ max and the average error ǫ are reported in Table 2 for the different type of filtering. The reult how that the degradation i independent of the image (uma i a portrait, while hue ha greater detail). Depending on the filter mak, we can change the deign of the loppy adder to obtain larger aving. For example, for edge-detection, a loppy adder with k = 6 ha an average error ǫ = 28. 3 Sloppy Multiplication Parallel multiplication p = x y can be divided into three tep: 1. generation of Partial Product (PP); 2. carry-free reduction from n PP to 2 operand; 3. carry-propagate two operand addition. We ue a loppy approach for tep 1 only, a tep 2 i quite delay-efficient (no carry propagation) and tep 3 heen addreed in the previou ection. We conider radix-4 multiplication a for n n bit operand n PP are generated and the unit 2 i maller. In radix-4 multiplication, the radix-4 digit of the multiplier y are recoded into igned-digit repreentation to avoid multiple of 3 and carry propagation a explained in [6]. The reulting architecture (for one digit) recoder plu PP generation (rec+ppgen) i ketched in Figure 3 (top). Similarly to what wa done for the addition, we have a loppy rec+ppgen for the leat-ignificant digit of y. The recodin performed a hown in Table 3. The reultinmplementation i greatly implified a hown in Figure 3 (bottom). Clearly, a competitor of the loppy multiplier i the truncated multiplier. To compare performance and error introduced, we implemented a 8 8-bit multiplier (two complement) in the following cheme: 4

1. Smoothing filter (uma) original error-free loppy-adder error map (hue) original error-free loppy-adder error map 2. Sharpening filter (uma) original error-free loppy-adder error map (hue) original error-free loppy-adder error map 3. Edge-detection (uma) original error-free loppy-adder error map (hue) original error-free loppy-adder error map 5 Figure 2: Viual reult of loppy addition in filtering.

PP k y 2k+1 y 2k td. loppy ǫ k 0 0 0 0 0 0 1 x 4 k 2x 4 k x 4 k 1 0 2x 4 k 2x 4 k 0 1 1 3x 4 k 2x 4 k x 4 k Table 3: Sloppy radix-4 recoding. unit delay power area error [p] [µw] [µm 2 ] ǫ ǫ max r2-mult 900 70 2612 0 0 r4-mult 850 84 1842 0 0 r2-trunc 870 32 1426 256 897 r4-trunc 820 26 847 304 640 loppy 490 21 1195 145 657 Table 4: Summary of reult for 8 8-bit multiplier. 1. r2-mult a radix-2 tandard multiplier; 2. r4-mult a radix-4 tandard multiplier (with PP generation a in Figure 3-top); 3. r2-trunc a r2-mult with k t truncated bit; 4. r4-trunc a r4-mult with k t truncated bit; 5. loppy a radix-4 multiplier with PP generation a in Figure 3-bottom for k digit. We etimated a comparable error for k = 2 loppy digit and k t = 8 truncated bit. The reult of the imulation on all 2 16 combination are reported in Table 4. The data do not include the contribution of the final carry-propagate adder. 4 Putting Everything Together Now we combine the loppy multiplier and adder in a multiply-add (and accumulate) unit (Figure 4) which can be ued for the trivial implementation of the Invere Dicrete Coine Tranform (IDCT), which i part of the JPEG decompreion algorithm. We implemented the unit of Figure 4 with regular (R) and loppy (S) operation a hown in Table 5. The multiplier i 12 12 bit, the adder i 24 bit. By C imulation, we found a loppine limit of k m = 3 digit (6 bit) for the multiplier and k a = 8 bit for the adder. The reult in Table 5 are obtained by implementation in a 90 nm tandard cell library (clock rate i 100 MHz). The error are computed with repect to a floating-point oftware implementation. The viual reult are hown in Figure 5. The reult how that the larger reduction in power i obtained when the loppy multiplier i ued. The contribution of the loppy adder i little with repect to the power, but it i 6

y y y 2k+1 2k 2k 1 x x x n 1 1 0 one two recoding PP generation neg PP k n PP k n 1 PP k 1 PP k 0 y y y 2k+1 2k 2k 1 x x x n 1 n 2 0 two loppy recoding PP generation logic 0 PP k PP k PP k PP k n n 1 1 0 Figure 3: Implementation of error-free (top) and loppy (bottom) rec+ppgen. ignificant in delay reduction 1 (about 40% fater) and the lack can be ued for low power deign. The degradation due to the loppy adder, in addition to that of the loppy multiplier, i marginal. 5 Concluion and Future Work We have preented imple way of performing addition and multiplication in an imprecie manner with the aim to get better performance (delay, area and power) at expene of an increaed error which can be tolerated in ome application. Thi i preliminary work, jut the idea, which i going to be further developed. Reference [1] K. He, A. Gertlauer, and M. Orhanky, Controlled Timing-Error Acceptance for Low Energy IDCT Deign, Proc. of 2011 Deign, Automation and Tet in Europe Conference 1 The ynthei wa done with the minimum area contraint. Therefore, the adder i yntheized a a carryripple adder. 7

Unit delay area uma hue power MULT ADD [p] [µm 2 ] P ave [µw] ǫ ǫ max P ave [µw] ǫ ǫ max ratio R R 3500 5580 128 3.7 9 185 3.8 10 1.00 S R 3400 5090 107 5.0 34 155 6.0 39 0.84 R S 3090 5440 125 3.8 18 181 5.0 21 0.98 S S 2930 4950 106 5.0 35 153 6.6 36 0.83 Table 5: Summary of reult for IDCT implementation. X Y MULT CSA 3:2 ADD regiter S Figure 4: Scheme of multiply-accumulate ued for IDCT. (DATE), Mar. 2011. [2] A. Lingamneni, J.-L. N. C. Enz, K. Palem, and C. Piguet, Energy Parimoniou Circuit Deign through Probabilitic Pruning, Proc. of 2011 Deign, Automation and Tet in Europe Conference (DATE), Mar. 2011. [3] P. Kraue and I. Polian, Adaptive Voltage Over-Scaling for Reilient Application, Proc. of 2011 Deign, Automation and Tet in Europe Conference (DATE), Mar. 2011. [4] D. Mohapatra, V. Chippa, A. Raghunathan, and K. Roy, Deign of Voltage-Scalable Meta Function for Approximate Computing, Proc. of 2011 Deign, Automation and Tet in Europe Conference (DATE), Mar. 2011. [5] L. Hardety. The urpriing uefulne of loppy arithmetic. MIT New Office. [Online]. Available: http://web.mit.edu/newoffice/2010/fuzzy-logic-0103.html [6] M. Ercegovac and T. Lang, Digital Arithmetic. Morgan Kaufmann Publiher, 2004. 8

uma hue original loppy decompreed Figure 5: Original picture (top) and after decoding by loppy (S-S) IDCT (bottom). 9