EEE 6323 Advanced VLSI Design - Spring 2015 Instructor: R. Bashirullah TA: Qiuzhong Wu

Similar documents
EE 434 ASIC & Digital Systems

REVIEW ARTICLE: EFFICIENT MULTIPLIER ARCHITECTURE IN VLSI DESIGN

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

Design A Redundant Binary Multiplier Using Dual Logic Level Technique

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Data Word Length Reduction for Low-Power DSP Software

Low power and Area Efficient MDC based FFT for Twin Data Streams

ISSN Vol.03,Issue.02, February-2014, Pages:

A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier

Low Power Approach for Fir Filter Using Modified Booth Multiprecision Multiplier

Design and Implementation of 64-bit MAC Unit for DSP Applications using verilog HDL

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

An Efficent Real Time Analysis of Carry Select Adder

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

An Optimized Design for Parallel MAC based on Radix-4 MBA

CHAPTER 1 INTRODUCTION

Mahendra Engineering College, Namakkal, Tamilnadu, India.

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

Low-Power Multipliers with Data Wordlength Reduction

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

A Review on Different Multiplier Techniques

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

Design of High Speed and Low Power Adder by using Prefix Tree Structure

Design and Implementation of Wallace Tree Multiplier Using Kogge Stone Adder and Brent Kung Adder

Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors

Option 1: A programmable Digital (FIR) Filter

Design and Analyse Low Power Wallace Multiplier Using GDI Technique

NOVEL HIGH SPEED IMPLEMENTATION OF 32 BIT MULTIPLIER USING CSLA and CLAA

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

Analysis of Parallel Prefix Adders

A FFT/IFFT Soft IP Generator for OFDM Communication System

Domino CMOS Implementation of Power Optimized and High Performance CLA adder

Efficient Implementation of Parallel Prefix Adders Using Verilog HDL

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

Power Efficient adder Cell For Low Power Bio MedicalDevices

ISSN Vol.07,Issue.08, July-2015, Pages:

A Novel Approach for High Speed and Low Power 4-Bit Multiplier

A Low Power Array Multiplier Design using Modified Gate Diffusion Input (GDI)

Digital Integrated CircuitDesign

An Efficient Reconfigurable Fir Filter based on Twin Precision Multiplier and Low Power Adder

AN ADVANCED VLSI ARCHITECTURE OF PARALLEL MULTIPLIER BASED ON HIGHER ORDER MODIFIED BOOTH ALGORITHM

A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast Arithmetic Circuits

An Efficient Design of Parallel Pipelined FFT Architecture

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER

A Novel Approach For Designing A Low Power Parallel Prefix Adders

AN EFFICIENT MAC DESIGN IN DIGITAL FILTERS

DESIGN OF LOW POWER MULTIPLIER USING COMPOUND CONSTANT DELAY LOGIC STYLE

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

Study on Digital Multiplier Architecture Using Square Law and Divide-Conquer Method

Design and Implementation of Parallel Micro-programmed FIR Filter Using Efficient Multipliers on FPGA

ISSN:

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

High Performance Low-Power Signed Multiplier

Ajmer, Sikar Road Ajmer,Rajasthan,India. Ajmer, Sikar Road Ajmer,Rajasthan,India.

Implementation of a FFT using High Speed and Power Efficient Multiplier

VLSI Implementation & Design of Complex Multiplier for T Using ASIC-VLSI

Low Power 3-2 and 4-2 Adder Compressors Implemented Using ASTRAN

A Survey on Power Reduction Techniques in FIR Filter

Low Power R4SDC Pipelined FFT Processor Architecture

Comparative Analysis of Multiplier in Quaternary logic

Tirupur, Tamilnadu, India 1 2

A design of 16-bit adiabatic Microprocessor core

Energy Reduction of Ultra-Low Voltage VLSI Circuits by Digit-Serial Architectures

DESIGN OF MULTIPLE CONSTANT MULTIPLICATION ALGORITHM FOR FIR FILTER

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:

A MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE

Low-Power Digital CMOS Design: A Survey

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA

Design and Analysis of CMOS Based DADDA Multiplier

Design and Performance Analysis of a Reconfigurable Fir Filter

IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA

IJMIE Volume 2, Issue 5 ISSN:

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

Modified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition

Low Power CMOS Digitally Controlled Oscillator Manoj Kumar #1, Sandeep K. Arya #2, Sujata Pandey* 3 and Timsi #4

Design and Simulation of 16x16 Hybrid Multiplier based on Modified Booth algorithm and Wallace tree Structure

Vector Arithmetic Logic Unit Amit Kumar Dutta JIS College of Engineering, Kalyani, WB, India

IJCSIET-- International Journal of Computer Science information and Engg., Technologies ISSN

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm

Design of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

Optimizing addition for sub-threshold logic

Area Efficient Fft/Ifft Processor for Wireless Communication

Implementation of Parallel MAC Unit in 8*8 Pre- Encoded NR4SD Multipliers

Design and Estimation of delay, power and area for Parallel prefix adders

PROMINENT SPEED ARITHMETIC UNIT ARCHITECTURE FOR PROFICIENT ALU

A Digital Clock Multiplier for Globally Asynchronous Locally Synchronous Designs

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS

IMPLEMENTATION OF 64-POINT FFT/IFFT BY USING RADIX-8 ALGORITHM

II. Previous Work. III. New 8T Adder Design

A/D Conversion and Filtering for Ultra Low Power Radios. Dejan Radjen Yasser Sherazi. Advanced Digital IC Design. Contents. Why is this important?

A High-Speed Low-Complexity Modified Processor for High Rate WPAN Applications

Design and Analysis of Low-Power 11- Transistor Full Adder

Design of Low Power CMOS Startup Charge Pump Based on Body Biasing Technique

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

Design and Implementation of Truncated Multipliers for Precision Improvement and Its Application to a Filter Structure

Transcription:

EEE 6323 Advanced VLSI Design - Spring 2015 Instructor: R. Bashirullah TA: Qiuzhong Wu (qiuzhongwu@ufl.edu) Due Monday April 20, 2015 (By noon) The goal of the project is to study one of the topics specified and design an architecture which consumes low power, is less sensitive to process variability and occupies as little area as possible. Any of the low power techniques taught in class (or new ones) can be used when implementing these projects. 1. MIPS LIST OF PROJECTS: The architecture of the MIPS processor can be taken from the Computer Architecture book written by John L. Hennessy, David A. Patterson. The goal of this project is to take the baseline unoptimized implementation of the MIPS processor given in the book optimize it for power, energy and area. Any of the power saving techniques can be used. Pipelining and Parallelism can be used. Low power RAMS can be used. You may even go for a sub- threshold design. A 2.60pJ/Inst Subthreshold Sensor Processor for Optimal Energy Efficiency Bo Zhai, Leyla Nazhandali, Javin Olson, Anna Reeves, Michael Minuth, Ryan Helfand,Sanjay Pant, David Blaauw and Todd Austin. Low-power CMOS digital design, A. Chandrakasan, S. Sheng, and R. Brodersen, IEEE J. Solid-State Circuits, vol. 27, pp. 473 483, Apr. 1992. A Leakage Reduction Methodology for Distributed MTCMOS (May, 2004), B. Calhoun, et al., IEEE Journal of Solid-State Circuits, Vol. 39, No. 5. A shared-well dual-supply-voltage 64-bit ALU, IEEE Journal of Solid State Circuits. Mar. 2004. Pages 494 500. 2. FFT Processor For the FFT project, you must create a hardware implementation of a FFT. The hardware implementation may be derived from any FFT algorithm (Cooley Tukey or Good Thomas or any other). It can be a radix-2, radix-4 or any specialized FFT implementations. Achievement of Low power is the main criteria here. Below are several articles on FFT hardware implementations: "A 180-mV Subthreshold FFT Processor Using a Minimum Energy Design Methodology," Wang, A., A. P. Chandrakasan,IEEE Journal of Solid-State Circuits, vol. 40, no. 1, pp. 310-319, January 2005 A single chip radix-2 FFT butterfly architecture using parallel data distributed arithmetic Mactaggart, I.R.; Jack, M.A.; Solid-State Circuits, IEEE Journal of,volume: 19, Issue: 3, Jun 1984 Pages:368 373 A Low-Power, High-Performance,1024-Point FFT Processor Bevan M. Baas Design and implementation of a 1024-point pipeline FFT processor, S. He and M. Torkelson, in

Proc. IEEE Custom Integrated Circuits Conf., May 1998, pp. 131 134. A high precision1024-point FFT processor for 2D convolution, M. Wosnitza, M. Cavadini, M. Thaler, and G. Troster, in Proc. IEEE Int. Solid-State Circuits Conf., 1998, vol. 41, pp. 118 119, 424. A radix 4 delay commutator for fast Fourier transform processor implementation Swartzlander, E.E.; Young, W.K.W.; Joseph, S.J.; Solid-State Circuits, IEEE Journal of,volume: 19, Issue: 5, Oct 1984 Pages:702-709 A VLSI array processor for 16-point FFT Lee, Moon-Key; Shin, Kyung-Wook; Lee, Jang-Kyu; Solid-StateCircuits, IEEE Journal of,volume: 26, Issue: 9, Sept. 1991 Pages:1286 1292 3. Digital PLL Phase-lock loops (PLLs) are used to recover timing information from a signal they are ubiquitous in communications, and are also used for timing recovery on boards and chips. Analog PLLs are very hard to design because they use feedback, and are very sensitive to noise and operating parameters. The goal of this project is to design a pure digital PLL and compare its performance (measured in lock time and phase noise) and costs (in terms of area, power, delay) to a traditional analog PLL. Some of the papers that can be referred are An All-Digital Phase-Locked Loop with 50-Cycle Lock Time Suitable for High-Performance Microprocessors Jim Dunning,, Gerald Garcia, Jim Lundberg, and Ed Nuckolls, IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 4, APRIL 1995 R. E. Best, Phase-Locked Loops, Theory, Design and Applications.New York: McGraw-Hill, 1993, 2nd ed. A Digitally Controlled PLL for SoC Applications Thomas Olsson, and Peter Nilsson IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 5, MAY 2004 751 A fully integrated standard-cell digital PLL, T. Olsson and P. Nilsson, IEEElectron. Lett., vol. 37, pp. 211 212, Feb. 2001. 4. High-Speed N-bit Kogge-Stone Adder (N >= 32) The KS-adder utilizes a parallel-prefix topology to reduce the critical path in the adder. The critical path, which is the carry generation path, has a logarithmic dependence of the bit-width. This should be compared to the linear dependence in the ripple carry adder. There are many ways to implement the carry generation tree for parallel prefix adders, but KS implementation is the most straightforward, and also it has one of the shortest critical paths of all tree adders. The drawback with the KS implementation is the large area consumed and the somewhat complex routing of interconnects. If you have a 16 bit adder, you will have 32 input pads and 16 output pads. This accounts to 48 pads which is too much. Because of the limited amount of pads a bit serial-to-parallel input/output interface (SPI) must be used to feed input vectors to the adder and get back the output. The inputs are feed to the circuit in a bit-serial data stream and are converted into N-bit vectors by the serial to parallel converters. Outputs of the sum vector are gotten through a parallel-to-serial interface. In addition to speed, use low power techniques to minimize power as well.

J.M. Rabaey, A. Chandrakasan, and B. Nikolic., Digital Integrated Circuits, 2nd ed.prentice Hall, 2003, ISBN 0-13-120764-4. N. Waste and K. Eshraghian, Principles of CMOS VLSI Design, Addison-Wesley, 1993. 5. Low Power N-bit Radix-4 Booth Multiplier (N >= 32) Normal array multipliers compute partial products in radix-2 manner. This leads to larger number of partial products. You can decrease the number of partial products by increasing the radix of your multiplication. This leads to fewer partial products and hence smaller and faster CSA array. Radix-4 will have N/2 partial products and hence each partial product will either be 0,1,2,3 times the multiplicand. Multiplication by 3 is hard. To solve this Booth encoding was used which removes the complex multiplication of 3 of the multiplicand. In this project you will have design and layout a 16 or more bit wide booth multiplier. Now, if you have a 16 bit multiplier, you will have 32 input pads and 32 output pads. This accounts to 64 pads which is too much. Because of the limited amount of pads a bit serial-to-parallel input/output interface (SPI) must be used to feed input vectors to the adder and get back the output. The inputs are feed to the circuit in a bit-serial data stream and are converted into N-bit vectors by the serial to parallel converters. Outputs of the multiplier are gotten through a parallel-to-serial interface. Use low power techniques to minimize power. J.M. Rabaey, A. Chandrakasan, and B. Nikolic., Digital Integrated Circuits, 2nd ed.prentice Hall,

2003, ISBN 0-13-120764-4. N. Waste and K. Eshraghian, Principles of CMOS VLSI Design, Addison-Wesley, 1993. 6. ECoG processor Brain-computer interfaces (BCIs) convert brain signals into outputs that communicate a user's intent. The electrocorticographic activity (ECoG) recorded from the cortical surface can serve as a modality for non-invasive BCI method. The sensorimotor rhythms comprise three major frequency bands - Mu(8-12Hz), Beta(18-26Hz) and Gamma(30-200Hz). The changes in these rhythm amplitudes correspond to human's actions or imagined actions. The objective is to develop a processor that can be used to estimate the energy in each frequency band over a specified time period. Thus, in order to estimate the power spectrum of the rhythm frequency, bandpass filters and FFTs(>=8-points) will be required. The snapshot of the power spectrum values should be available every 0.2 second. Ultra-low power consumption (~1uW) is utmost important for the bio-implantable circuits. See references for FFT Processor E. C. Leuthardt, G. Schalk, J. R. Wolpaw, J. G. Ojemann, and D. W. Moran, " A brain-computer interface using electrocorticographic signals in humans," J. Neural Eng., vol.1, no.2, pp. 63-71, 2004. K. J. Miller, E. C. Leuthardt, G. Schalk, R. P. N. Rao, N. R. Anderson, D. W. Moran, J. W. Miller, and J. G. Ojemann, "Spectral changes in cortical surface potentials during motor movement," J. Neurosci., vol. 27, no. 9, pp. 2424-2432, 2007. 7. A high speed ADC backend For this project, a digital backend of a high speed flash ADC is implemented. The desired sample rate is 4GSamples/s, and the nominal resolution is 5 bits. Due to the extremely high throughput (20Gb/s), it s impossible to test the ADC in real time at reasonable costs. There are two workarounds. The first is to decimate the ADC output until the data rate is within the equipment limit. The other is to store the data in a memory (shown as FIFO in the block diagram) and later read them for offline post-processing. Both approaches need to be implemented in this project. The design goal is to high throughput to process the sampled data.

Timeline Date Description Points 03/10 Project Assigned 03/17 Form groups (5 students per group) Brainstorming Phase: Determine the topic and carry out literature review 03/24 Submission of Project topic (1 page description) 5 pts 03/24-04/06 Design and Analysis Phase: Simulation, design and analysis. 04/06-04/17 Physical Implementation Phase: Layout and I/O Ring with full chip DRC and LVS 04/20 By Noon Report: 4 page paper 04/21 Final Project report demo/presentation 10:00am 2nd floor Comp. Lab 100 pts Important Dates: March 24, 2015: Submit your design topic. (5 pts). April 20, 2015: Paper due along with LVS and DRC report. April 21, 2015: Project check off. (100pts) Report As general guidelines, try to first understand the specifications before design implementation. Use HDL (Verilog-HDL/VHDL) as the design input. Go through the basic steps of general VLSI design flow (From HDL to GDS). You will need to hand in both a soft copy and hard copy of your source code: 1. Hand in HDL source code, result of each step, design report. 2. Design description, implementation notes, simulation result and performance summary (Power, area and speed etc) should be mentioned in this design report. 3. Submit your DRC and LVS report without the pads. If you have the DRC and LVS clean report with the IO pads also you get extra credit. 4. Write a 4- page double column paper in IEEE format. Download a word file template from: http://www.ieee.org/web/publications/authors/transjnl/index.html The paper should include Title, Author list (group members), Abstract, Introduction, a section describing design methodology, a section describing results and discussions, Conclusions and Reference list. All figures, including schematics waveform, plots and layout must be embedded within the paper. The paper cannot exceed four pages in length. Figures should be chosen appropriately to best explain the overall design and results.