Wave Pipelined Circuit with Self Tuning for Clock Skew and Clock Period Using BIST Approach

Similar documents
Implementation and Performance Analysis of different Multipliers

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

International Research Journal in Advanced Engineering and Technology (IRJAET)

Design and Implementation of Complex Multiplier Using Compressors

Design and Implementation of High Speed Carry Select Adder

CHAPTER III THE FPGA IMPLEMENTATION OF PULSE WIDTH MODULATION

International Journal of Modern Trends in Engineering and Research

An Efficent Real Time Analysis of Carry Select Adder

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER

Design of Low Power Baugh Wooley Multiplier Using CNTFET

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Design of an optimized multiplier based on approximation logic

SQRT CSLA with Less Delay and Reduced Area Using FPGA

A HIGH PERFORMANCE HARDWARE ARCHITECTURE FOR HALF-PIXEL ACCURATE H.264 MOTION ESTIMATION

A VLSI Implementation of Fast Addition Using an Efficient CSLAs Architecture

Low-Power Multipliers with Data Wordlength Reduction

An Optimized Design for Parallel MAC based on Radix-4 MBA

International Journal of Scientific & Engineering Research Volume 3, Issue 12, December ISSN

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter

ISSN Vol.03,Issue.02, February-2014, Pages:

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

High Speed Speculative Multiplier Using 3 Step Speculative Carry Save Reduction Tree

Low Power 3-2 and 4-2 Adder Compressors Implemented Using ASTRAN

AN NOVEL VLSI ARCHITECTURE FOR URDHVA TIRYAKBHYAM VEDIC MULTIPLIER USING EFFICIENT CARRY SELECT ADDER

VLSI IMPLEMENTATION OF AREA, DELAYANDPOWER EFFICIENT MULTISTAGE SQRT-CSLA ARCHITECTURE DESIGN

A VLSI Implementation of Fast Addition Using an Efficient CSLAs Architecture

ASIC Design and Implementation of SPST in FIR Filter

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

Modelling Of Adders Using CMOS GDI For Vedic Multipliers

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

A Survey on Power Reduction Techniques in FIR Filter

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

Chapter 1 Introduction

An Efficient SQRT Architecture of Carry Select Adder Design by HA and Common Boolean Logic PinnikaVenkateswarlu 1, Ragutla Kalpana 2

An area optimized FIR Digital filter using DA Algorithm based on FPGA

Class Project: Low power Design of Electronic Circuits (ELEC 6970) 1

Implementation of 32-Bit Carry Select Adder using Brent-Kung Adder

A Novel Low-Power Scan Design Technique Using Supply Gating

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors

FPGA Implementation of Area-Delay and Power Efficient Carry Select Adder

A Survey on A High Performance Approximate Adder And Two High Performance Approximate Multipliers

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

HIGH-PERFORMANCE HYBRID WAVE-PIPELINE SCHEME AS IT APPLIES TO ADDER MICRO-ARCHITECTURES

Design and Analysis of Row Bypass Multiplier using various logic Full Adders

Implementation and Performance Evaluation of Prefix Adders uing FPGAs

Performance Analysis of Multipliers in VLSI Design

Design of High Performance Modified Wave pipelined DAA Filter with Critical Path Approach

A Novel Approach to 32-Bit Approximate Adder

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

Single Chip FPGA Based Realization of Arbitrary Waveform Generator using Rademacher and Walsh Functions

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 7, July 2012)

A NOVEL IMPLEMENTATION OF HIGH SPEED MULTIPLIER USING BRENT KUNG CARRY SELECT ADDER K. Golda Hepzibha 1 and Subha 2

Course Outcome of M.Tech (VLSI Design)

Area Efficient and Low Power Reconfiurable Fir Filter

Implementation of Parallel MAC Unit in 8*8 Pre- Encoded NR4SD Multipliers

COMPARATIVE ANALYSIS OF 32 BIT CARRY LOOK AHEAD ADDER USING HIGH SPEED CONSTANT DELAY LOGIC

Pipelined Linear Convolution Based On Hierarchical Overlay UT Multiplier

ISSN Vol.07,Issue.08, July-2015, Pages:

A COMPARATIVE ANALYSIS OF AN ULTRA-LOW VOLTAGE 1-BIT FULL SUBTRACTOR DESIGNED IN BOTH DIGITAL AND ANALOG ENVIRONMENTS

DESIGN OF CARRY SELECT ADDER WITH REDUCED AREA AND POWER

A Fixed-Width Modified Baugh-Wooley Multiplier Using Verilog

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

PV SYSTEM BASED FPGA: ANALYSIS OF POWER CONSUMPTION IN XILINX XPOWER TOOL

Synthesis and Simulation of Floating Point Multipliers Dr. P. N. Jain 1, Dr. A.J. Patil 2, M. Y. Thakre 3

Efficient Carry Select Adder Using VLSI Techniques With Advantages of Area, Delay And Power

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY

II. LITERATURE REVIEW

Lecture 1. Tinoosh Mohsenin

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter

Design and Implementation of FPGA Based Digital Base Band Processor for RFID Reader

VLSI Implementation of Digital Down Converter (DDC)

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS

High Speed Non Linear Carry Select Adder Used In Wallace Tree Multiplier and In Radix-4 Booth Recorded Multiplier

Totally Self-Checking Carry-Select Adder Design Based on Two-Rail Code

Anitha R 1, Alekhya Nelapati 2, Lincy Jesima W 3, V. Bagyaveereswaran 4, IEEE member, VIT University, Vellore

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Architecture for Canonic RFFT based on Canonic Sign Digit Multiplier and Carry Select Adder

DESIGN AND TEST OF CONCURRENT BIST ARCHITECTURE

Research Journal of Pharmaceutical, Biological and Chemical Sciences

DESIGN AND IMPLEMENTATION OF 64- BIT CARRY SELECT ADDER IN FPGA

EC 1354-Principles of VLSI Design

Performance Enhancement of the RSA Algorithm by Optimize Partial Product of Booth Multiplier

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

FPGA IMPLEMENTATION OF POWER EFFICIENT ALL DIGITAL PHASE LOCKED LOOP

Design and Implementation of Carry Select Adder Using Binary to Excess-One Converter

DESIGN OF A HIGH SPEED MULTIPLIER BY USING ANCIENT VEDIC MATHEMATICS APPROACH FOR DIGITAL ARITHMETIC

An Optimized Design of High-Speed and Energy- Efficient Carry Skip Adder with Variable Latency Extension

ISSN (PRINT): , (ONLINE): , VOLUME-3, ISSUE-8,

VLSI IMPLEMENTATION OF MODIFIED DISTRIBUTED ARITHMETIC BASED LOW POWER AND HIGH PERFORMANCE DIGITAL FIR FILTER Dr. S.Satheeskumaran 1 K.

Design and Implementation of 64-bit MAC Unit for DSP Applications using verilog HDL

2. URDHAVA TIRYAKBHYAM METHOD

Transcription:

Technology Volume 1, Issue 1, July-September, 2013, pp. 41-46, IASTER 2013 www.iaster.com, Online: 2347-6109, Print: 2348-0017 Wave Pipelined Circuit with Self Tuning for Clock Skew and Clock Period Using BIST Approach 1 Ramyashree.J, 2 Meena Priya Dharshini 1 M.Tech.(VLSI and Embedded Systems), 2 Assistant Professor Department of Electronics & Communication, CMR Institute of Technology, Bangalore ABSTRACT Wave pipelined circuits are mainly used to improve the performance of digital circuits in terms of frequency of operation. Wave pipelining is a high performance approach which implements pipelining in logic without using intermediate registers. It improves the logic utilization by minimizing the idle time. The implementation of wave pipelined circuit is quite complex because of its requirement for adjustment of output clock period and clock skew. The clock skew represents the difference between the input and the output clocks. The automatic selection of clock period and clock skew using BIST is carried out in this project. The circuit is studied using a 16x16 multiplier. The frequency of operation of wave-pipelined circuit is higher than that of non-pipelined circuit. Also there is reduction in area requirement with respect to conventional pipelined circuit. The design development is carried out using the Verilog HDL. The design is simulated in Modelsim6.3 and synthesized using Xilinx 12.2. The design is implemented using Spartan III FPGA. Keywords: LFSR, Pipelining, Propagation delay, Self-tuning, Wave-pipelining. I. INTRODUCTION As the complexity of the circuit increases, the number of components also increases and so does the gate count. Hardware components in SOC include one or more processors, memories and dedicated components for accelerating critical tasks. Hence the power dissipation, clock routing complexity and clock skews between different parts of a synchronous system increases. These limitations can be overcome to a certain extent by using wave-pipelining [1]. Wave pipelining enables a combinational circuit to be operated at a higher frequency without the use of intermediate registers as in the case of pipelined circuit. It also lowers the clock routing complexity and the power dissipation compared to pipelined circuits. The maximization of the operating speed of the wave pipelined circuit requires the following three tasks: Adjustment of clock period Adjustment of clock skew Equalization of path delays The automation of all these three tasks will be done in this project. Effectiveness of the automation scheme is studied by using a multiplier circuit. II. WAVE-PIPELINED CIRCUIT Pipelining is usually employed to increase the speed of digital circuits. Pipelining can be conventional or wave pipelined. Conventional pipelining partitions the combinational logic into smaller chunks and inserts registers at theboundaries. The clock period depends upon the 41

propagation delay of the longest path betweenany tworegisters in conventional circuits. In wave pipelined circuit, the logic path is long enough. Hence the data dispersion is small. System can send multiple sets of data (waves) through the logic at a faster clock rate and without latching the data on the way [2]. Illustration of a wave pipelined circuit is shown in Fig.1 III. REVIEW OF PREVIOUS WORK Fig. 1 Illustration of a wave pipelined circuit Wave pipelined circuits achieve a speedup of N, where N denotes the number of data waves that propagate simultaneously through the circuit. In conventional pipelined circuits, similar speed-up is achieved; with N is the number of stages [2]. The clock for any digital circuit is given by Tck>Dmax- Dmin + Tsu +Th + (2*Tskew) where Tck is the clock cycle, Dmax is the maximum propagation delay, Dmin is the minimum clock delay, Tsu is the set up time, Th is the hold time of the registers [2]. To minimize Tck, (Dmax-Dmin) should be minimized. This can be done by equalizing the maximum and minimum path delays. The maximum and minimum path delays, input-output registers along with the clock is indicated in Fig.2 Commercial tools have been used to automate the process of wavepipelining. Generating the netlist the most complicated phase [2].Wave pipelining is also more susceptible to process and environmental variation than conventional pipelining [2]. Fig. 2 The maximum and minimum path delays of a logic block Stacked CMOS logic gates are used to obtain the same transistor depth for both p-logic and n-logic and hence balance the delay. Delay balancing elements such as inverters and buffers are added to equalize the maximum and minimum delay paths. Look table based approach can also be employed to automate the wave pipelined circuits [3]. 42

IV. PROPOSED SYSTEM Wave pipelining can be automated by using BIST (Built-in Self-Test) approach. The test vectors are applied to the logic circuit. A syndrome is generated from the outputs and it is compared with the signature which is pre-computed. The clock period is increased by using a counter until the correct outputs are obtained. The clock period for which the proper outputs are obtained is chosen to be the clock period of the digital circuit.delay balancing is done by using the following steps: The circuit is initially built using the gates with equal number of transistors in p-logic and n-logic. Inverters or buffers are added wherever required. The block diagram of the proposed system is shown in Fig.3. The various blocks of the circuit are: Mux1, Mux2: 2 input multiplexers with a select signal. I/P Registers: It is used to latch the inputs to the combinational circuit. Combinational circuit :The 16X16 multiplier as explained in Section V O/P Register: The register which is clocked after a clock skew to obtain the correct output. Clock Skew: It is the circuit required to generate the clock difference between the input and the output registers of the combinational circuit. LFSR Block: It is used to generate the address for RAM1, RAM2 and signature match circuit. RAM1, RAM2: Memory units required to store the inputs for test mode of operation. Signature match circuit with RAM: The RAM stores the expected products corresponding to the addresses of the LFSR Block. The signature match circuit compares the obtained result with the result stored in RAM and generates the error signal, in case of mismatch. LOCK signal: when the error signal goes low, the clock skew has to be fixed which is indicated by LOCK signal. V. DESIGN OF THE LOGIC BLOCK Fig. 3 Block diagram Digital signal processing is used for a variety of applications such as frequency selective filters (low pass, band pass, high pass, band reject), adaptive filters, equalizers, block matching algorithm for motion estimation, computation of transforms like DFT. In all these applications multipliers are used as one of the fundamental blocks [1]. 43

Implementation of an (16X16) multiplier using wave pipelined circuits can improve the overall performance of the system. The block diagram for the multiplier circuit is shown in Fig.4 INPUTS A PP15 PP14 PP2 PP1 PP0 B P[31:15] P[14] OUTPUT P[2] P[1] P[0] Fig. 4 Multiplier Circuit For non pipelined circuit, there are no registers inserted. For conventional pipelined circuit, the registers are inserted at all the stages. It is indicated as dotted lines in the figure 4. For wave pipelined configuration, the set of registers are present only at input stage and output stage. It is indicated as A and B in Fig. 4. Sixteen 2-input AND gates are required to generate the partial products PP0 to pp15. The partial products thus obtained are shifted and added using 16- bit adders successively to obtain the 32 bit product of the 16 bit multiplier and 16-bit multiplicand. Fifteen 16-bit adders are used to add the partial products. VI. SIMULATION RESULTS AND COMPARISON The proposed system is implemented using Xilinx 12.2 and simulated using Modelsim simulator. The circuit is implemented for a 16X16 multiplier. The results are tabulated as shown in Table 1. The simulated waveform for pipelined circuit is shown in Fig.5 Table 1 Implementation results of multipliers Schemes No. of slices Frequency (MHz) Conventional Pipelining 319 235 Non-pipelining 121 21 Wave-pipelining 138 203 44

VII. FPGA IMPLEMENTATION Fig. 5 Simulation Results for a Pipelined Multiplier The wave pipelined circuit with self tuning for clock skew and clock period is implemented using Xilinx Spartan-3 XC3S400 FPGA. An image of the FPGA implementation is shown in figure 6. Initially the circuit is operated in the test mode. The test_in signal is used to provide the test inputs to the multiplier circuit. Initially the clock skew and clock skew are not adjusted. Hence there is a mismatch in the obtained result and the stored result. This is indicated by the error signal going high. As the clock skew and output clock period gets adjusted, the correct answer is obtained. The error signal goes low. After the clock skew gets locked, the device can be operated in normal mode. Fig. 6 FPGA Implementation 45

VIII. CONCLUSIONS In this project, a 16x16 multiplier is considered as a combinational circuit. It is implemented in all the three configurations, namely non pipelined, conventional pipelined and wave pipelined schemes. The frequency of operation and area (in terms of number of slices) are compared. The frequency of operation is increased 8.66 times compared to non-pipelined circuit. The area requirement is also reduced by a factor of 1.57 with respect to conventional pipelined circuit. Hence it is evident that wave pipelining can be used in any combinational circuit whose frequency of operation has to be increased. Also, a circuit has been implemented which allows for selection of output clock and clock skew with respect to the input clock. The PVT (Process, voltage and Temperature) variations do not affect the performance of the circuit. The clock skew and output clock can be self-tuned whenever the delay of the circuit changes with PVT variations. The wave pipelined circuit with self-tuning for clock skew and clock period is implemented using Xilinx Spartan-3 FPGA and the results are verified. IX. ACKNOWLEDGMENT I would like to express my heartfelt gratitude to my Guide Ms.Meena Priya Dharshini, Associate Professor, Electronics and Communication Engineering, CMR Institute of Technology for her timely advice on the technical seminar and regular assistance throughout the project work. I extend my sincere thanks to Dr.Indumathi.G Head of the Department, Electronics And Communication Engineering, CMR Institute of Technology, for her constant encouragement. I also extend my sense of gratitude and sincere thanks to all the faculty members of Electronics And Communication Engineering, CMR Institute of Technology, for their constant encouragement and support. REFERENCES [1] RengaprabhuParamasivam, V. Adhinarayanan, S. Gopalakrishnan, Design and implementation of Automated Wave-Pipelined Circuit using ASIC, IEEE 2012 [2] WooKim, YongKim, Automating Wave-pipelined Circuit Design, IEEE Design & Test of Computers, Vol. 20, Nov. 2003. [3] E. I. Boemo, S. Lopez-Buedo and J. M. Meneses, Wave pipelines via look-up tables,ieee International Symposium on Circuits and Systems ISCAS, 1996. [4] J. Nyathi and J. G. Delgado-Frias, A hybrid wave pipelined network router,ieee Transactions on Circuits and Systems I: Fundamental Theory and Applications, Dec. 2002. [5] W. P. Burleson, M. Ciesielski, F. Klass, and Liu, Wave pipelining: a tutorial and research survey, IEEE Transactions on Very Large Scale Integration (VLSI)Systems, Sep.1998. [6] Woo Jin Kim, Yong-Bin Kim, Wave Pipelined Circuits Synthesis, Instrumentation and Measurement Technology Conference, IMTC 2005. [7] Kevin J. Nowka and Michael J. Flynn, Environmental limits on the performance of CMOS wave-pipelined circuits, Technical Report CSL-TR-94-600,Departments of Electrical Engineering and Computer Science, Stanford University, January 1994. [8] Peter J. Ashenden, Digital Design, An Embedded Systems approach using Verilog, Morgan Kaufmann Publishers, Elsevier 2008. [9] Spartan-3 FPGA Family Data Sheet- Xilinx. 46