On Built-In Self-Test for Adders

Similar documents
DESIGN AND IMPLEMENTATION OF AREA EFFICIENT, LOW-POWER AND HIGH SPEED 128-BIT REGULAR SQUARE ROOT CARRY SELECT ADDER

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

Design and Implementation of High Speed Carry Select Adder

Efficient Implementation of Parallel Prefix Adders Using Verilog HDL

A Novel High-Speed, Higher-Order 128 bit Adders for Digital Signal Processing Applications Using Advanced EDA Tools

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA

Design of an optimized multiplier based on approximation logic

DESIGN OF LOW POWER HIGH SPEED ERROR TOLERANT ADDERS USING FPGA

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

Design and Implementation of Complex Multiplier Using Compressors

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

An Optimized Design for Parallel MAC based on Radix-4 MBA

IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA

Interconnect testing of FPGA

Analysis of Parallel Prefix Adders

Design of BIST using Self-Checking Circuits for Multipliers

Recursive Pseudo-Exhaustive Two-Pattern Generator PRIYANSHU PANDEY 1, VINOD KAPSE 2 1 M.TECH IV SEM, HOD 2

FPGA Implementation of Area Efficient and Delay Optimized 32-Bit SQRT CSLA with First Addition Logic

A Novel Approach For Designing A Low Power Parallel Prefix Adders

FPGA IMPLEMENATION OF HIGH SPEED AND LOW POWER CARRY SAVE ADDER

Signal Processing Using Digital Technology

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

II. Previous Work. III. New 8T Adder Design

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

LOGIC DIAGRAM: HALF ADDER TRUTH TABLE: A B CARRY SUM. 2012/ODD/III/ECE/DE/LM Page No. 1

Design and Implementation of Hybrid Parallel Prefix Adder

High Performance Low-Power Signed Multiplier

Comparison of Multiplier Design with Various Full Adders

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

Automated Generation of Built-In Self-Test and Measurement Circuitry for Mixed-Signal Circuits and Systems

High Speed Non Linear Carry Select Adder Used In Wallace Tree Multiplier and In Radix-4 Booth Recorded Multiplier

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

NOVEL HIGH SPEED IMPLEMENTATION OF 32 BIT MULTIPLIER USING CSLA and CLAA

Design Of Arthematic Logic Unit using GDI adder and multiplexer 1

Digital Integrated CircuitDesign

DESIGN AND TEST OF CONCURRENT BIST ARCHITECTURE

Mahendra Engineering College, Namakkal, Tamilnadu, India.

32-Bit CMOS Comparator Using a Zero Detector

A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast Arithmetic Circuits

By Dayadi Lakshmaiah, Dr. M. V. Subramanyam & Dr. K. Satya Prasad Jawaharlal Nehru Technological University, India

Area and Delay Efficient Carry Select Adder using Carry Prediction Approach

Unit 3. Logic Design

CSE 370 Winter Homework 5 Solutions

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

Design and Estimation of delay, power and area for Parallel prefix adders

Design and Analysis of Row Bypass Multiplier using various logic Full Adders

LOGIC GATES AND LOGIC CIRCUITS A logic gate is an elementary building block of a Digital Circuit. Most logic gates have two inputs and one output.

Australian Journal of Basic and Applied Sciences. Optimized Embedded Adders for Digital Signal Processing Applications

A BIST Circuit for Fault Detection Using Recursive Pseudo- Exhaustive Two Pattern Generator

Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing

Combinational Logic Circuits. Combinational Logic

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay

DESIGN OF CARRY SELECT ADDER WITH REDUCED AREA AND POWER

Combinational Circuits DC-IV (Part I) Notes

DESIGN OF LOW POWER MULTIPLIERS

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

A VLSI Implementation of Fast Addition Using an Efficient CSLAs Architecture

Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter

Group 10 Group 9 Group 8 Group 7 Group 6 Group 5 Group 4 Group 3 Group 2 Group 1 Group 0 GG5 PG5 GG4 PG4. Block 3 Block 2 Block 1 Block 0

Adder (electronics) - Wikipedia, the free encyclopedia

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique

Design and Implementation of High Speed Area Efficient Carry Select Adder Using Spanning Tree Adder Technique

MACGDI: Low Power MAC Based Filter Bank Using GDI Logic for Hearing Aid Applications

Design and Comparative Analysis of Conventional Adders and Parallel Prefix Adders K. Madhavi 1, Kuppam N Chandrasekar 2

Design and Implementation of High Radix Booth Multiplier using Koggestone Adder and Carry Select Adder

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

International Journal of Scientific & Engineering Research, Volume 7, Issue 3, March-2016 ISSN

SQRT CSLA with Less Delay and Reduced Area Using FPGA

LIST OF EXPERIMENTS. KCTCET/ /Odd/3rd/ETE/CSE/LM

Design and Implementation of 64-bit MAC Unit for DSP Applications using verilog HDL

Computer Architecture and Organization:

Implementation of Carry Select Adder using CMOS Full Adder

Design and Performance Analysis of a Reconfigurable Fir Filter

Low Power FIR Filter Structure Design Using Reversible Logic Gates for Speech Signal Processing

FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER

VHDL Implementation of High Speed and Low Power BIST Based Vedic Multiplier

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

National Conference on Emerging Trends in Information, Digital & Embedded Systems(NC e-tides-2016)

A VLSI Implementation of Fast Addition Using an Efficient CSLAs Architecture

Efficient Shift-Add Multiplier Design Using Parallel Prefix Adder

DESIGN AND IMPLEMENTATION OF 64- BIT CARRY SELECT ADDER IN FPGA

ISSN Vol.02, Issue.11, December-2014, Pages:

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

Design of High Speed and Low Power Adder by using Prefix Tree Structure

2. URDHAVA TIRYAKBHYAM METHOD

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS

High Speed, Low power and Area Efficient Processor Design Using Square Root Carry Select Adder

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

Design of Roba Mutiplier Using Booth Signed Multiplier and Brent Kung Adder

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

PUBLICATIONS OF PROBLEMS & APPLICATION IN ENGINEERING RESEARCH - PAPER CSEA2012 ISSN: ; e-issn:

IES Digital Mock Test

Totally Self-Checking Carry-Select Adder Design Based on Two-Rail Code

Number system: the system used to count discrete units is called number. Decimal system: the number system that contains 10 distinguished

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

Low-Power Multipliers with Data Wordlength Reduction

Transcription:

On Built-In Self-Test for s Mary D. Pulukuri and Charles E. Stroud Dept. of Electrical and Computer Engineering, Auburn University, Alabama Abstract - We evaluate some previously proposed test approaches for various types of adders in an attempt to find an architecture-independent algorithm for testing adders in embedded Digital Signal Processors (DSPs) in Field Programmable Gate Arrays (FPGAs). We find that a minor modification to a previously proposed Built-In Self-Test (BIST) approach provides the highest fault coverage for most types of adders and, equally important, it is simple to implement. Keywords: built-in self-test, carry look ahead adders, ripple carry adders Introduction While developing Built-In Self-Test (BIST) approaches for Xilinx Virtex-4 Field Programmable Gate Arrays (FPGAs), we investigated various test approaches for adders. The target adder has three input ports of -bits each and was incorporated in the embedded Digital Signal Processor (DSP) cores in Virtex-4 FPGAs. Unfortunately, the architecture and implementation of the adder are not explicitly defined in the data sheet [1]. Based on the timing specifications, we rule out sequential logic implementations leaving ripple carry, carry select, and carry look ahead (CLA) adders. Since the adder is used to sum the final partial products from the multiplier portion of the DSP, the CLA adder seems the most likely candidate based on data sheet timing specifications and the fact that CLA adders are typically used for summing the final partial products in modified- Booth/Wallace-tree multipliers [2]. Carry Look Ahead s The basic structure of a CLA adder is summarized in Figure 1 where 4-bit implementations are typically used due to fanin limitations. Each adder cell takes two inputs (A i and B i ) and a carry-in (C i ) to produce sum (S i ), propagate (P i ), and generate (G i ) signals. The P i and G i signals from the adder cells are used in conjunction with the carry-in signal in the look ahead carry unit (LCU) to produce carry signals to subsequent adders as summarized by the LCU logic equations in Figure 1. There are two approaches to implementing the adder cells as summarized by the adder logic equations in Figure 1; these include generating the P i signal via an OR gate (denoted in Figure 1 as the POR implementation) and generating the P i signal via the exclusive-or (XOR) gate (denoted in Figure 1 as PXOR implementation) used for the sum. In order to construct larger CLA adders from the basic 4-bit CLA adder illustrated in Figure 1, there are three basic approaches. In the ripple CLA, the carry-out from one LCU is connected to the carry-in of the next LCU [3]. Alternatively, the LCU itself can also produce a propagate signal (PG) and a generate signal (GG) which can be fed to a second stage of LCU such that a 16-bit CLA adder can be constructed. For even larger adders, such as the -bit adder in the Virtex-4 DSP cores, one can either ripple the carry outputs of the second stage LCU (which we refer to as a ripple LCU) or include additional stages of LCUs (which we refer to as multi-stage LCU). Therefore, our goal is to find an architecture-independent test algorithm that provides high fault coverage regardless of the adder implementation. 1

Logic Equations POR: PXOR: S = A B Cin S = P Cin P = A+B P = A B G = A B G = A B Look Ahead Carry Unit Logic Equations PG=P 0 P 1 P 2 P 3 GG=G 3 +G 2 P 3 +G 1 P 2 P 3 +G 0 P 1 P 2 P 3 C 1 =G 0 +P 0 C 0 C 2 =G 1 +G 0 P 1 +P 1 P 0 C 0 C 3 =G 2 +G 1 P 2 +G 0 P 1 P 2 +P 2 P 1 P 0 C 0 C 4 =G 3 +G 2 P 3 +G 1 P 2 P 3 +G 0 P 1 P 2 P 3 +P 3 P 2 P 1 P 0 C 0 A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 C 0 S 3 S 2 S 1 S 0 C 4 P 3 G 3 C 3 P 2 G 2 C 2 P 1 G 1 C 1 P 0 G 0 4-bit Look Ahead Carry Unit PG GG Prior Testing Approaches A minimum set of 10 test vectors was proposed to detect all single stuck-at faults in a 4-bit CLA adder in which the P i signal in the adder is produced by an OR gate (POR implementation) [3]. This was extended to a set of 11 test vectors to test any size ripple CLA adder [3]. Another test algorithm was proposed for BIST implementation which produces a 2 (N+1) test vector sequence to test an N-bit adder [4]. The test pattern generator (TPG) implementation for this algorithm requires an N+1 bit shift register and an inverter to form a twisted ring counter in conjunction with N XOR gates and N XNOR gates as shown in Figure 2. This BIST circuit is easy to implement and the test vector sequence produced by this circuit is illustrated in Figure 2 for N=4. reset SReg i SReg i+1 Figure 1: Carry Look Ahead N+1-bit Serial Shift Register A i to adder inputs B i C i to adder carry-in Figure 2: Test Algorithm [4] AAAABBBBC 32103210i 111100001 111000001 110100011 101100111 011101111 000011110 000111110 001011100 010011000 100010000 When applying these two test algorithms to various implementations of CLA adders (as well as a simple ripple carry adder), we find that both the test algorithms lack the ability to provide 100% single stuck-at fault coverage, as summarized in Table 1 for -bit adders implementations. Note that all CLA implementations use the OR implementation for the propagate signal. While the minimum set of test vectors described in [3] provides 100% fault coverage for its target ripple CLA implementation as well as for the simple ripple carry adder, it fails to provide complete fault 2

coverage for other CLA implementations. The BIST algorithm presented in [4] provides the best overall fault coverage for all CLA adder implementations but fails to provide 100% fault coverage. Table 1: Stuck-at Fault Simulation Results for -bit s Gate Number of Test Algorithm Vector Set Implementation Delays Faults vector set [3] BIST [4] Modified BIST Ripple Carry 96 1296 100% 99.9% 100% Ripple CLA 28 1392 100% 99.9% 100% Ripple LCU 12 1542 95.7% 99.9% 100% Multi-stage LCU 10 1506 95.9% 99.9% 100% Modification to BIST Approach Upon investigation of the undetected faults from the BIST-based test vectors, we observed that two additional vectors were needed to detect the remaining faults and provide 100% fault coverage. These two missing vectors can be produced with a simple modification to the TPG implementation by replacing the inverter in the twisted ring counter with a flip-flop, as illustrated in Figure 3, and using the Q-bar output of the flip-flop to provide the inversion for the twisted ring counter. This minor modification to the TPG produces a 2 (N+2) test vector sequence, a sample of which is illustrated in Figure 3 for N=4 where the new test vectors are noted. With this simple modification, 100% stuck-at fault coverage is obtained for all adder implementations, as summarized in the Modified BIST column of Table 1. reset Application to Embedded DSPs in FPGAs Virtex-4 FPGAs incorporate from 32 to 512 embedded DSPs depending on the family and size of the device. Each DSP has an 18 18-bit 2 s complement multiplier followed by three multiplexers (denoted X, Y and Z) and a 3-port adder/subtractor, as illustrated in Figure 4a. The select inputs to the X, Y and Z multiplexers are dynamically controlled by seven OPMODE input signals. The 3- port adder/subtractor performs P=Z±(X+Y+Cin) [1] where X, Y, and Z are -bit busses. We assume the implementation is a 2-stage CLA adder as illustrated in Figure 4b. Only the C port input to the DSP provides -bit access to the adder via the Y and Z multiplexers. Unfortunately, the concatenation of input ports A and B (denoted A:B) provides only 36-bits since the A and B ports are 18-bits each. SReg i SReg i+1 N+1-bit Serial Shift Register A i to adder inputs C i to adder carry-in Figure 3: Modified Test Algorithm AAAABBBBC 32103210i 111100001 111000001 110100011 101100111 011101111 new->000011111 000011110 000111110 001011100 010011000 100010000 new->111100000 The only other -bit access to the adder/subtractor is through the accumulator register (denoted P) which provides feedback access to both the X and Z multiplexers [1]. Therefore, application of a single test vector requires two clock cycles: one to load a portion of the test vector in the P register and the second to apply the complete test vector to the adder. B i 3

With this limited access, each stage of the adder can be tested in turn, as summarized in Table 3. During the first clock cycle of each test vector application, bits of the 97-bit test vector (including CIN input for stage 1 adder and SUBTRACT input for stage 2 adder) can be loaded into the P register via the Z or the Y multiplexers (depending on the stage of the adder that is being tested) while 0s are applied to the other two multiplexers under the control of the OPMODE signals. Logic 0s are also applied to the CIN and SUBTRACT inputs to facilitate passing the -bit portion of the test vector to the P register; note that when testing the second stage adder, however, the -bit vector to the P register must be inverted for those cases where the overall test vector will apply a logic 1 to the SUBTRACT input. During the second clock cycle, the -bit vector in the P register is applied to the X multiplexer while the remaining bits of the test pattern are applied via the C port to either the Y multiplexer or the Z multiplexer (depending on the stage of the adder that is being tested). During this second clock cycle, logic 0s are applied through the third multiplexer and the appropriate test pattern values are applied to the CIN or SUBTRACT inputs (depending on the stage of the adder that is being tested). These 2-clock cycle test patterns can be generated by the TPG by simply incorporating a clock enable on the shift register such that the complete test pattern is held for two clock cycles while the appropriate OPMODE values control the X, Y, and Z multiplexers to transmit the complete test vector to the adder stage under test. 36 A:B A B 36 0s C port X Y CIN ± Z SUBTRACT a) access to adder [1] b) 2-stage CLA adder Figure 4: in Virtex-4 DSP Table 2: Test Pattern Application to 2-Stage Under Test Clock Cycle X MUX Y MUX Z MUX Top 1 0s 0s C port 2 P register C port 0s Bottom 1 0s C port 0s 2 P register 0s C port Conclusion The BIST approach proposed by Al-Asaad, Hayes, and Murray in [4] has proven to be an excellent approach for testing adders; it provides complete stuck-at fault coverage for many different types of adder implementations. Furthermore, the BIST approach is easy to implement as well as easy to modify, as we have shown here in our application to the 3-port, -bit adder/subtractor in the embedded DSPs in Virtex-4 FPGAs. Furthermore, a similar 3-port, -bit adder/subtractor is incorporated in DSP cores in Virtex-5 FPGAs [5], such that this BIST approach can again be used. It should be noted, however, that this BIST approach is not architecture independent; for example, it does not detect all faults in a CLA implementation that uses the XOR implementation for the propagate signal. On the other hand, the basic method described in [4] can be used to test other structures such as multipliers and arithmetic logic units (ALUs). P P port (Y MUX) (X MUX) (Z MUX) -bit CLA Cin -bit CLA Subtract 4

References [1] Xilinx, XtremeDSP for Virtex-4 FPGAs, User Guide UG073 (v2.7), Xilinx Inc., 2008. [2] A. Paschalis, N. Kranitis, M. Psarakis, D. Gizopoulus and Y. Zorian, An Effective BIST Architecture for Fast Multiplier Cores, Proc. Design, Automation and Test in Europe Conf., pp. 117-121, 1999. [3] S. Kajihara and T. Sasao, On the s with Minimum Tests, Proc. IEEE VLSI Test Symp., pp. 10-15, 1997. [4] H. Al-Asaad, J. Hayes, and B. Murray, Scalable Test Generators for High-Speed Datapath Circuits, J. Electronic Testing: Theory and Applications, vol 12, pp. 111-125, 1998. [5] Xilinx, Virtex-5 XtremeDSP Design Considerations, User Guide UG193 (v3.1), Xilinx Inc., 2008. 5