Wave Pipelined Circuit with Self Tuning for Clock Skew and Clock Period Using BIST Approach

Technology Volume 1, Issue 1, July-September, 2013, pp. 41-46, IASTER 2013 www.iaster.com, Online: 2347-6109, Print: 2348-0017 Wave Pipelined Circuit with Self Tuning for Clock Skew and Clock Period Using BIST Approach 1 Ramyashree.J, 2 Meena Priya Dharshini 1 M.Tech.(VLSI and Embedded Systems), 2 Assistant Professor Department of Electronics & Communication, CMR Institute of Technology, Bangalore ABSTRACT Wave pipelined circuits are mainly used to improve the performance of digital circuits in terms of frequency of operation. Wave pipelining is a high performance approach which implements pipelining in logic without using intermediate registers. It improves the logic utilization by minimizing the idle time. The implementation of wave pipelined circuit is quite complex because of its requirement for adjustment of output clock period and clock skew. The clock skew represents the difference between the input and the output clocks. The automatic selection of clock period and clock skew using BIST is carried out in this project. The circuit is studied using a 16x16 multiplier. The frequency of operation of wave-pipelined circuit is higher than that of non-pipelined circuit. Also there is reduction in area requirement with respect to conventional pipelined circuit. The design development is carried out using the Verilog HDL. The design is simulated in Modelsim6.3 and synthesized using Xilinx 12.2. The design is implemented using Spartan III FPGA. Keywords: LFSR, Pipelining, Propagation delay, Self-tuning, Wave-pipelining. I. INTRODUCTION As the complexity of the circuit increases, the number of components also increases and so does the gate count. Hardware components in SOC include one or more processors, memories and dedicated components for accelerating critical tasks. Hence the power dissipation, clock routing complexity and clock skews between different parts of a synchronous system increases. These limitations can be overcome to a certain extent by using wave-pipelining [1]. Wave pipelining enables a combinational circuit to be operated at a higher frequency without the use of intermediate registers as in the case of pipelined circuit. It also lowers the clock routing complexity and the power dissipation compared to pipelined circuits. The maximization of the operating speed of the wave pipelined circuit requires the following three tasks: Adjustment of clock period Adjustment of clock skew Equalization of path delays The automation of all these three tasks will be done in this project. Effectiveness of the automation scheme is studied by using a multiplier circuit. II. WAVE-PIPELINED CIRCUIT Pipelining is usually employed to increase the speed of digital circuits. Pipelining can be conventional or wave pipelined. Conventional pipelining partitions the combinational logic into smaller chunks and inserts registers at theboundaries. The clock period depends upon the 41

propagation delay of the longest path betweenany tworegisters in conventional circuits. In wave pipelined circuit, the logic path is long enough. Hence the data dispersion is small. System can send multiple sets of data (waves) through the logic at a faster clock rate and without latching the data on the way [2]. Illustration of a wave pipelined circuit is shown in Fig.1 III. REVIEW OF PREVIOUS WORK Fig. 1 Illustration of a wave pipelined circuit Wave pipelined circuits achieve a speedup of N, where N denotes the number of data waves that propagate simultaneously through the circuit. In conventional pipelined circuits, similar speed-up is achieved; with N is the number of stages [2]. The clock for any digital circuit is given by Tck>Dmax- Dmin + Tsu +Th + (2*Tskew) where Tck is the clock cycle, Dmax is the maximum propagation delay, Dmin is the minimum clock delay, Tsu is the set up time, Th is the hold time of the registers [2]. To minimize Tck, (Dmax-Dmin) should be minimized. This can be done by equalizing the maximum and minimum path delays. The maximum and minimum path delays, input-output registers along with the clock is indicated in Fig.2 Commercial tools have been used to automate the process of wavepipelining. Generating the netlist the most complicated phase [2].Wave pipelining is also more susceptible to process and environmental variation than conventional pipelining [2]. Fig. 2 The maximum and minimum path delays of a logic block Stacked CMOS logic gates are used to obtain the same transistor depth for both p-logic and n-logic and hence balance the delay. Delay balancing elements such as inverters and buffers are added to equalize the maximum and minimum delay paths. Look table based approach can also be employed to automate the wave pipelined circuits [3]. 42

IV. PROPOSED SYSTEM Wave pipelining can be automated by using BIST (Built-in Self-Test) approach. The test vectors are applied to the logic circuit. A syndrome is generated from the outputs and it is compared with the signature which is pre-computed. The clock period is increased by using a counter until the correct outputs are obtained. The clock period for which the proper outputs are obtained is chosen to be the clock period of the digital circuit.delay balancing is done by using the following steps: The circuit is initially built using the gates with equal number of transistors in p-logic and n-logic. Inverters or buffers are added wherever required. The block diagram of the proposed system is shown in Fig.3. The various blocks of the circuit are: Mux1, Mux2: 2 input multiplexers with a select signal. I/P Registers: It is used to latch the inputs to the combinational circuit. Combinational circuit :The 16X16 multiplier as explained in Section V O/P Register: The register which is clocked after a clock skew to obtain the correct output. Clock Skew: It is the circuit required to generate the clock difference between the input and the output registers of the combinational circuit. LFSR Block: It is used to generate the address for RAM1, RAM2 and signature match circuit. RAM1, RAM2: Memory units required to store the inputs for test mode of operation. Signature match circuit with RAM: The RAM stores the expected products corresponding to the addresses of the LFSR Block. The signature match circuit compares the obtained result with the result stored in RAM and generates the error signal, in case of mismatch. LOCK signal: when the error signal goes low, the clock skew has to be fixed which is indicated by LOCK signal. V. DESIGN OF THE LOGIC BLOCK Fig. 3 Block diagram Digital signal processing is used for a variety of applications such as frequency selective filters (low pass, band pass, high pass, band reject), adaptive filters, equalizers, block matching algorithm for motion estimation, computation of transforms like DFT. In all these applications multipliers are used as one of the fundamental blocks [1]. 43

Implementation of an (16X16) multiplier using wave pipelined circuits can improve the overall performance of the system. The block diagram for the multiplier circuit is shown in Fig.4 INPUTS A PP15 PP14 PP2 PP1 PP0 B P[31:15] P[14] OUTPUT P[2] P[1] P[0] Fig. 4 Multiplier Circuit For non pipelined circuit, there are no registers inserted. For conventional pipelined circuit, the registers are inserted at all the stages. It is indicated as dotted lines in the figure 4. For wave pipelined configuration, the set of registers are present only at input stage and output stage. It is indicated as A and B in Fig. 4. Sixteen 2-input AND gates are required to generate the partial products PP0 to pp15. The partial products thus obtained are shifted and added using 16- bit adders successively to obtain the 32 bit product of the 16 bit multiplier and 16-bit multiplicand. Fifteen 16-bit adders are used to add the partial products. VI. SIMULATION RESULTS AND COMPARISON The proposed system is implemented using Xilinx 12.2 and simulated using Modelsim simulator. The circuit is implemented for a 16X16 multiplier. The results are tabulated as shown in Table 1. The simulated waveform for pipelined circuit is shown in Fig.5 Table 1 Implementation results of multipliers Schemes No. of slices Frequency (MHz) Conventional Pipelining 319 235 Non-pipelining 121 21 Wave-pipelining 138 203 44

VII. FPGA IMPLEMENTATION Fig. 5 Simulation Results for a Pipelined Multiplier The wave pipelined circuit with self tuning for clock skew and clock period is implemented using Xilinx Spartan-3 XC3S400 FPGA. An image of the FPGA implementation is shown in figure 6. Initially the circuit is operated in the test mode. The test_in signal is used to provide the test inputs to the multiplier circuit. Initially the clock skew and clock skew are not adjusted. Hence there is a mismatch in the obtained result and the stored result. This is indicated by the error signal going high. As the clock skew and output clock period gets adjusted, the correct answer is obtained. The error signal goes low. After the clock skew gets locked, the device can be operated in normal mode. Fig. 6 FPGA Implementation 45

VIII. CONCLUSIONS In this project, a 16x16 multiplier is considered as a combinational circuit. It is implemented in all the three configurations, namely non pipelined, conventional pipelined and wave pipelined schemes. The frequency of operation and area (in terms of number of slices) are compared. The frequency of operation is increased 8.66 times compared to non-pipelined circuit. The area requirement is also reduced by a factor of 1.57 with respect to conventional pipelined circuit. Hence it is evident that wave pipelining can be used in any combinational circuit whose frequency of operation has to be increased. Also, a circuit has been implemented which allows for selection of output clock and clock skew with respect to the input clock. The PVT (Process, voltage and Temperature) variations do not affect the performance of the circuit. The clock skew and output clock can be self-tuned whenever the delay of the circuit changes with PVT variations. The wave pipelined circuit with self-tuning for clock skew and clock period is implemented using Xilinx Spartan-3 FPGA and the results are verified. IX. ACKNOWLEDGMENT I would like to express my heartfelt gratitude to my Guide Ms.Meena Priya Dharshini, Associate Professor, Electronics and Communication Engineering, CMR Institute of Technology for her timely advice on the technical seminar and regular assistance throughout the project work. I extend my sincere thanks to Dr.Indumathi.G Head of the Department, Electronics And Communication Engineering, CMR Institute of Technology, for her constant encouragement. I also extend my sense of gratitude and sincere thanks to all the faculty members of Electronics And Communication Engineering, CMR Institute of Technology, for their constant encouragement and support. REFERENCES [1] RengaprabhuParamasivam, V. Adhinarayanan, S. Gopalakrishnan, Design and implementation of Automated Wave-Pipelined Circuit using ASIC, IEEE 2012 [2] WooKim, YongKim, Automating Wave-pipelined Circuit Design, IEEE Design & Test of Computers, Vol. 20, Nov. 2003. [3] E. I. Boemo, S. Lopez-Buedo and J. M. Meneses, Wave pipelines via look-up tables,ieee International Symposium on Circuits and Systems ISCAS, 1996. [4] J. Nyathi and J. G. Delgado-Frias, A hybrid wave pipelined network router,ieee Transactions on Circuits and Systems I: Fundamental Theory and Applications, Dec. 2002. [5] W. P. Burleson, M. Ciesielski, F. Klass, and Liu, Wave pipelining: a tutorial and research survey, IEEE Transactions on Very Large Scale Integration (VLSI)Systems, Sep.1998. [6] Woo Jin Kim, Yong-Bin Kim, Wave Pipelined Circuits Synthesis, Instrumentation and Measurement Technology Conference, IMTC 2005. [7] Kevin J. Nowka and Michael J. Flynn, Environmental limits on the performance of CMOS wave-pipelined circuits, Technical Report CSL-TR-94-600,Departments of Electrical Engineering and Computer Science, Stanford University, January 1994. [8] Peter J. Ashenden, Digital Design, An Embedded Systems approach using Verilog, Morgan Kaufmann Publishers, Elsevier 2008. [9] Spartan-3 FPGA Family Data Sheet- Xilinx. 46