High-speed low-power 2D DCT Accelerator. EECS 6321 Yuxiang Chen, Xinyi Chang, Song Wang Electrical Engineering, Columbia University Prof.

Similar documents
Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

LOW POWER & LOW VOLTAGE APPROXIMATION ADDERS IMPLEMENTATION FOR DIGITAL SIGNAL PROCESSING Raja Shekhar P* 1, G. Anad Babu 2

Design of Optimizing Adders for Low Power Digital Signal Processing

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Design of an optimized multiplier based on approximation logic

An Efficient Reconfigurable Fir Filter based on Twin Precision Multiplier and Low Power Adder

Design and Implementation of Complex Multiplier Using Compressors

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER

MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng.

AN EFFICIENT DESIGN OF ROBA MULTIPLIERS 1 BADDI. MOUNIKA, 2 V. RAMA RAO M.Tech, Assistant professor

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

Design and Performance Analysis of a Reconfigurable Fir Filter

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

Design of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique

Design Of Arthematic Logic Unit using GDI adder and multiplexer 1

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

A Survey on A High Performance Approximate Adder And Two High Performance Approximate Multipliers

A Design Approach for Compressor Based Approximate Multipliers

An Area Efficient Decomposed Approximate Multiplier for DCT Applications

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

AN ERROR LIMITED AREA EFFICIENT TRUNCATED MULTIPLIER FOR IMAGE COMPRESSION

Low Power Approach for Fir Filter Using Modified Booth Multiprecision Multiplier

Design and Implementation of Truncated Multipliers for Precision Improvement and Its Application to a Filter Structure

IMPACT: IMPrecise adders for low-power Approximate CompuTing

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay

On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI

HIGH SPEED FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS

Data Word Length Reduction for Low-Power DSP Software

AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER

ASIC Implementation of High Throughput PID Controller

An area optimized FIR Digital filter using DA Algorithm based on FPGA

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India,

AREA EFFICIENT LOW ERROR COMPENSATION MULTIPLIER DESIGN USING FIXED WIDTH RPR

Quality-Aware Techniques for Reducing Power of JPEG Codecs

Low Power 3-2 and 4-2 Adder Compressors Implemented Using ASTRAN

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

Research Article Volume 6 Issue No. 4

EE 434 ASIC & Digital Systems

Faster and Low Power Twin Precision Multiplier

A New Configurable Full Adder For Low Power Applications

Optimized FIR filter design using Truncated Multiplier Technique

Pre-Encoded Multipliers Based on Non-Redundant Radix-4 Signed-Digit Encoding

Design and Implementation of Digit Serial Fir Filter

/$ IEEE

A 24Gb/s Software Programmable Multi-Channel Transmitter

Implementation and Performance Analysis of a Vedic Multiplier Using Tanner EDA Tool

A NOVEL DESIGN FOR HIGH SPEED-LOW POWER TRUNCATION ERROR TOLERANT ADDER

Design of Parallel Prefix Tree Based High Speed Scalable CMOS Comparator for converters

Comparison of Conventional Multiplier with Bypass Zero Multiplier

DESIGN OF MULTIPLE CONSTANT MULTIPLICATION ALGORITHM FOR FIR FILTER

Low-Power Approximate Unsigned Multipliers with Configurable Error Recovery

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

ASIC Computer-Aided Design Flow ELEC 5250/6250

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.

Approximate Computing Techniques for FIR Filters Implementation

An Efficient SQRT Architecture of Carry Select Adder Design by HA and Common Boolean Logic PinnikaVenkateswarlu 1, Ragutla Kalpana 2

CS 6135 VLSI Physical Design Automation Fall 2003

Subthreshold Voltage High-k CMOS Devices Have Lowest Energy and High Process Tolerance

Highly Versatile DSP Blocks for Improved FPGA Arithmetic Performance

ALPS: An Automatic Layouter for Pass-Transistor Cell Synthesis

Low Complexity Cross Parity Codes for Multiple and Random Bit Error Correction

Low Power Radiation Tolerant CMOS Design using Commercial Fabrication Processes

REALIAZATION OF LOW POWER VLSI ARCHITECTURE FOR RECONFIGURABLE FIR FILTER USING DYNAMIC SWITCHING ACITIVITY OF MULTIPLIERS

EE 434 ASIC and Digital Systems. Prof. Dae Hyun Kim School of Electrical Engineering and Computer Science Washington State University.

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing

Low-Power Multipliers with Data Wordlength Reduction

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER

A High-Speed Low-Complexity Modified Processor for High Rate WPAN Applications

S.Nagaraj 1, R.Mallikarjuna Reddy 2

ASIC Design and Implementation of SPST in FIR Filter

A 0.9 V Low-power 16-bit DSP Based on a Top-down Design Methodology

DESIGN & IMPLEMENTATION OF FIXED WIDTH MODIFIED BOOTH MULTIPLIER

A Novel Design of High-Speed Carry Skip Adder Operating Under a Wide Range of Supply Voltages

High Speed Energy Efficient Static Segment Adder for Approximate Computing Applications

VLSI IMPLEMENTATION OF MODIFIED DISTRIBUTED ARITHMETIC BASED LOW POWER AND HIGH PERFORMANCE DIGITAL FIR FILTER Dr. S.Satheeskumaran 1 K.

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique

Overview ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTES. Motivation. Modeling Levels. Hierarchical Model: A Full-Adder 9/6/2002

Tirupur, Tamilnadu, India 1 2

Domino CMOS Implementation of Power Optimized and High Performance CLA adder

DESIGN OF LOW POWER MULTIPLIER USING COMPOUND CONSTANT DELAY LOGIC STYLE

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

CMOS VLSI IC Design. A decent understanding of all tasks required to design and fabricate a chip takes years of experience

Digital Calibration for Current-Steering DAC Linearity Enhancement

A Fixed-Width Modified Baugh-Wooley Multiplier Using Verilog

Getting to Work with OpenPiton. Princeton University. OpenPit

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

[Devi*, 5(4): April, 2016] ISSN: (I2OR), Publication Impact Factor: 3.785

IJMIE Volume 2, Issue 5 ISSN:

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

IN SEVERAL wireless hand-held systems, the finite-impulse

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Design and Implementation of High Speed Carry Select Adder

Implementation of High Performance Carry Save Adder Using Domino Logic

Implementation of High Speed Area Efficient Fixed Width Multiplier

Digital Systems Design

Design and Analysis of CMOS based Low Power Carry Select Full Adder

Transcription:

High-speed low-power 2D DCT Accelerator EECS 6321 Yuxiang Chen, Xinyi Chang, Song Wang Electrical Engineering, Columbia University Prof. Mingoo Seok

Project Goal Project Goal Execute a full VLSI design flow from RTL design to place and route with custom standard cell. Demonstrate a low power design methodology using standard tool flows Steps Matlab implementation of DCT algorithm utilizing RTL Implementation of DCT and Approximate Adder, tested using matlab generated random inputs. Approximate Adder Physical Layout with standard cell dimensions Synthesize/post-synthesize RTL to gate level netlist. Measured delay and power APR with IBM 130nm standard cell and APPROX_ADDER cell. Fixed DRC errors. LVS Free.

Methodology Overview Synopsys Design Compile Matlab Model RTL Design Synthesis & Timing Analysis Encounter Auto Place & Route Function Verification ModelSim Prime Time Analysis

DCT with Loeffler Algorithm Loeffler Algorithm Number of multiplications reach the theoretical low limit. 4 Stages MultAddSub Blocks [1]M. Jridi and A. Alfalou,

Canonical signed digit (CSD) representation CSD Signed representation containing the fewest number of nonzero bits Effective way to carry out constant multiplier for DCT. Number of additions and subtractions will be minimized. Identified common elements in CSD constant coefficients and shared required resource X = 2^a ± 2^b ± 2^c ±.. [1]M. Jridi and A. Alfalou,

RTL Block Diagram for DCT-1 Stage 1 Stage 2 Stage 3 Stage 4

RTL Block Diagram for DCT-2 21 Cycles from first sets of input to last sets of output. Data feeds every 19 cycles

Semi-custom Approximate Adder conventional MA approximate adder Removal of some series connected transistors will facilitate faster charging/discharging of node capacitances. Complexity reduction by removal of transistors also aids in reducing dynamic power. Use approximate FA cells only in LSBs, thus ensuring that the final output quality does not degrade too much Measure the output quality using MSE (mean square error) [3] V. Gupta, D. Mohapatra, A. Raghunathan and K. Roy

Design Compiler Combine scx3_cmos8rf_lpvt_tt_1p2v_25c.db and cmrf8sf_custom.db to synthesize the whole design Set_don t_touch our customized cell approximate adder Use the smallest driven cell INVXLTS for all the inputs to get the maximum optimization

Netlist Simulation Simulate the function and timing in modelsim

Semi-custom Approximate Adder

Abstract View Detailed Blockage abstract Draw blockage manually to mitigate defects of APR tool

Floorplan - final Implementation Size: 710.2 (W) x 730.2 (H) Challenges: Match custom cell boundary to power grid. Antenna Issue. APR routing challenge.

Approximate Adder Accuracy Comparison Number of Approximate Adder MSE PSNR 1 3.53 42.68 db 2 13.46 36.87 db 3 41.95 31.94 db

Result With Adder Total Power (mw) Area (um^2) Frequency (ns) A B C 70.753 66.254 63.899 191558.88 182021.76 174057.11 225.7 225.7 225.7 State of the Art Power (mw) Area (mm^2) DCT Core 29.92 0.569 Adder A: RCA with full adders. Adder B: RCA with LSB 3 bits synthesized approximate adder. Adder C: RCA with LSB 3 bits customized approximate adder.

Reference [1] M. Jridi, A. Alfalou, "A low-power, high-speed DCT architecture for image compression: Principle and implementation," 18th IEEE/IFIP VLSI System on Chip Conference (VLSI-SoC), 2010, pp. 304-309. [2] Y. H. Chen, T. Y. Chang and C. Y. Li, "Highthroughput DA-based DCT with high accuracy error-compensated adder tree", IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 19, no. 4, pp.709-714, 2011 [3] V. Gupta, D. Mohapatra, A. Raghunathan and K. Roy, "Low-power digital signal processing using approximate adders", IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 32, no. 1, pp.124-137, 2013 [4] P. Kulkarni, P. Gupta, and M. D. Ercegovac, Trading accuracy for power in a multiplier architecture, J. Low Power Electron., vol. 7, no. 4, pp.490 501, 2011. [5] Y. V. Ivanov and C. J. Bleakley, Real-time h.264 video encoding in software with fast mode decision and dynamic complexity control, ACM Trans. Multimedia Comput. Commun. Applicat., vol. 6, pp. 5:1 5:21, Feb. 2010.