VLSI DESIGN OF DIGIT-SERIAL FPGA ARCHITECTURE

Size: px
Start display at page:

Download "VLSI DESIGN OF DIGIT-SERIAL FPGA ARCHITECTURE"

Transcription

1 Journal of Circuits, Systems, and Computers Vol. 3, No. (24) 7 52 c World Scientific Publishing Company VLSI ESIGN OF IGIT-SERIAL FPGA ARCHITECTURE HANHO LEE School of Information and Communication Engineering, Inha University, Incheon 42-75, Korea GERAL E. SOBELMAN epartment of Electrical and Computer Engineering, University of Minnesota, 2 Union Street SE, Minneapolis, MN 55455, USA Received May 2 Revised 28 October 22 This paper presents a novel application-specific field-programmable gate array (FPGA) architecture that satisfies efficient implementation of digit-serial SP architectures on a digit wide basis. igit-serial SP designs have been an effective implementation method for FPGAs. To efficiently realize a digit-serial SP design on FPGAs, one must create an FPGA architecture optimized for those types of systems. We examine the various circuits used in digit-serial SP designs to extract their key features that should be reflected in the new FPGA architecture. We explain the design methodology, layout and implementation of the new digit-serial FPGA architecture. igit-serial SP designs using the digit-serial FPGA (S-FPGA) are compared to those implemented on Xilinx FPGAs. We have estimated that the S-FPGA are about times more efficient in area and faster than the equivalent digit-serial SP architectures implemented using Xilinx FPGAs. Keywords: VLSI; FPGA; SP; digit-serial; architecture.. Introduction Field-Programmable Gate Arrays (FPGAs) are of interest for use in digital signal processing (SP) systems due to their ability to implement custom hardware solutions while still maintaining flexibility through device reprogramming. FPGAs provide a configurable structure through an array of adjustable logic modules interconnected by programmable routing resources and surrounded by programmable input/output (I/O) blocks. The main constraints on FPGA architectures are limited Corresponding author. 7

2 8 H. Lee & G. E. Sobelman routing resources, limited I/O resources and large routing delays. Under these circumstances, a digit-serial approach has been shown to be an effective implementation style for FPGAs. 5 In practical SP applications, it may be desirable to combine the area-efficiency of a bit-serial architecture with the time-efficiency of a corresponding bit-parallel architecture into a single area/time efficient digit-serial architecture. 6 9 The implementation methods of digit-serial architectures have been proposed in Refs Moreover, the digit-level pipelined and bit-level pipelined digit-serial multipliers that can be used to further increase the performance of digit-serial architectures have been proposed in Refs. 7. It was demonstrated that the area time efficiency and performance of the digit-serial architectures are considerably above bit-serial and bit-parallel architectures for FPGA in Refs. 3. Several digit-serial arithmetic circuits and SP architectures using FPGAs were presented in Refs. 5. This paper shows that by focusing on a specific class of digit-serial SP applications, we may increase the area and speed efficiency of FPGAs significantly. However, the general-purpose FPGAs, such as those described in Refs. 4 and 5, do not offer area-efficient realization for a certain class of digit-serial SP applications, and were better suited for state machine and wide range of logic functions. To efficiently realize the digit-serial SP designs on FP- GAs, one must create an FPGA architecture targeted to digit-serial SP architectures that can compensate the weakness of the general-purpose FPGAs and accelerate the performance substantially. 6 We examine the various circuits used in digit-serial SP designs to extract their key features that should be reflected in the new FPGA architecture. Key to the suitability of the FPGA for these applications is the fact that each of its basic blocks is capable of processing a digit-size of up to 4-bits. The targeted digit-serial SP systems may contain several digit-level and bit-level pipelined digit-serial datapaths of various digit sizes. They may also have irregular control logic in some portions. Thus, our digit-serial FPGA (S-FPGA) architecture must contain some bit-level programmability, yet take advantage of the high degree of regularity that exists in digit-serial datapaths. The S-FPGA architecture makes possible a more efficient realization of those digit-serial architectures, and static RAM programming technology is used to provide the in-circuit reprogrammability. This paper is organized as follows. Section 2 describes the overview of digit-serial approach. S-FPGA architecture and major components are presented in Sec. 3. In Sec. 4, we discuss the various circuit design issues that must be considered. Routing architecture design issues will be addressed in Sec. 5. A layout style, performance and fabrication result for S-FPGA architecture is presented in Sec. 6. Example digit-serial SP implementations using the proposed S-FPGA are described in Sec. 7. Section 8 presents a performance comparison between the S-FPGA and Xilinx FPGA. Our conclusions are summarized in Sec. 9.

3 VLSI esign of igit-serial FPGA Architecture 9 2. igit-serial Approach Previous architectures have primarily focused on two approaches: bit-serial and bit-parallel implementations. Bit-serial designs process one input bit of a word (or sample) at a time. The advantages of these systems include fewer interconnections, fewer pin-outs, less internal hardware, faster clock speed, and less power consumption. Their main disadvantage is that they are slow because for a word-length of W bits, bit-serial architectures will require W clock cycles to compute one word or sample. Therefore they are primarily suited for low to medium speed applications. Bit-parallel systems process all input bits of a word in one clock cycle and is the most common implementation style. Their main advantage is that they can compute one word in one clock cycle and therefore can provide high-performance and are ideal for high-speed applications. Their disadvantages include larger chip area, interconnection, pin-out, and they consume more power. To avoid the disadvantages of the bit-serial and bit-parallel computation, the concept of digit-serial implementations has been proposed in recent years. 6 3,7 igit-serial approach offers a flexible trade-off between bit-serial and bit-parallel approaches, and between data throughput and the size of arithmetic operators. A system based on these approach can combine the advantages of the high throughput of parallel computation and the small operator size of serial computation. Bit-serial systems, which process one bit of the input sample in one clock cycle, have very localized routing and area-efficient in FPGAs. 8,9 On the other hand bit-parallel systems, which process one whole word of the input sample in one clock-cycle, requires many modules, so the routing resources often are insufficient and result in large routing delay. However, in applications which require moderate sample rates both these systems may be ineffective, that is, the bit-serial systems may be too slow and bit-parallel systems may be faster than necessary and occupy considerable amount of area. To this end, digit-serial systems are best suited for implementation of digital signal processing systems which require moderate sampling rates. In a digit-serial arithmetic implementation, the W -bits of a data word are processed in units of the digit-size N-bits in W/N clock cycles, and are processed serially one digit at a time with the least significant digit first. This leads to arithmetic operators that have smaller area than equivalent bit-parallel arithmetic designs and have a larger throughput than equivalent bit-serial arithmetic designs. Architectures based on the digit-serial approach may offer the best overall trade-off between speed, efficient area utilization, throughput, I/O pin limitations and power consumption. By considering a range of values for the digit-size, one can search the design space to find the optimum implementation for a given application. The implementation methods of digit-serial architectures have been proposed. 6 3,7 The first approach is to start with a bit-parallel structure and then use folding to obtain the digit-serial architecture. 6 8 The second approach is to start with a bit-serial architecture and then use unfolding to obtain the digit-serial

4 2 H. Lee & G. E. Sobelman architecture. 7 The major drawback of the architectures based on these approaches is that they cannot be pipelined at the bit-level, which has severely limited their throughput. This could be a major obstacle for high-speed applications. The main reason why these structure cannot be pipelined is due to the existence of carry feedback loops, which are impossible to pipeline. Recently, the digit-serial architectures that can be pipelined at the bit-level have been reported. 2 The use of carry feed-forward has solved a major bottleneck of the carry feedback loops of conventional digit-serial designs. The possibility of high degree of pipelining offered A B A B rst S S S2 S3 A B A B rst C_LS if if if2 if3 (a) (b) A S B A B S S2 S3 rst A B A B C_LS (b) rst if if if2 if3 C_LS (b) if A B if A B if2 if3 Add A B A B Add rst C_LS (c) (c) rst Out Out Out2 Out3 C_LS (c) A B Out A B Out Out2 Out3 C_MS rst A B A B Latch C_MS (d) C_LS rst Out Out Out2 Out3 Latch Sign (d) C_LS (d) Out Out Out2 Out3 Sign Fig.. (a) igit-serial adder, (b) digit-serial subtractor, (c) digit-serial adder/subtractor, and (d) digit-serial comparator with N = 4 bits.

5 VLSI esign of igit-serial FPGA Architecture 2 A B HA HA S A B FA HA S FA HA S2 FA S3 Pipelining (a) Cin= A B FA HA if A B FA HA if FA HA if2 FA if3 Pipelining (b) Fig. 2. (a) Bit-level pipelined digit-serial adder and (b) bit-level pipelined digit-serial subtractor with N = 4 bits. by the structure in Refs. 2 increases the throughput rate of the digit-serial architectures. A basic element in a digit-serial SP implementation is the digit-serial adder shown in Fig. (a). A digit-serial adder with a digit-size (N) of 4 bits is a circuit that adds four pairs of bits along with a previous carry bit and produces a sum digit and a new carry bit. The two operands, A and B, are fed one digit at a time into the digit-serial adder. The addition is done N-bits at a time, with the carry rippling from one full adder to the next. The carry-out from the digit-serial adder is fed back into the first full adder during the next clock cycle, when the next pair of inputs have arrived. Several examples of digit-serial arithmetic circuits are shown in Figs. 3.

6 SMM2 X X" Parallel inputs X X" Y(3) Y(4) Y() Y() Y(2) Y(5) Y(6) Y(7) 22 H. Lee & G. E. Sobelman Y PI PI PO PO igit-serial input X X X Y X" X X" SMM2 Pi PO Pi PO X Y X" X X" SMM2 Pi PO Pi PO X Y X" X X" SMM2 Pi PO Pi PO X Y X" X X" SMM2 Pi PO Pi PO X Y X" X X" SMM2 Pi PO Pi PO X Y X" X X" SMM2 Pi PO Pi PO X Y X" X X" SMM2 Out Pi PO Pi Out PO igi-serial output (a) Fig. 3. (a) Unsigned digit-level pipelined N = 2 digit-serial multiplier and (b) unsigned digit-level pipelined N = 4 digit-serial multiplier.

7 VLSI esign of igit-serial FPGA Architecture 23 X2 Pi Pi PO PO SMM4 Y X X X X2 X3 X X X2 X3 PI3 PI2 PI PI Y PO3 PO PO PO2 SMM4 X X X X2 X3 X X PO X3 SMM4 Y P2 P3 PO2 PO3 X X Parallel inputs X2 igit-serial input X2 X3 X X X2 X3 Pi Pi P2 P3 PO PO PO2 PO3 SMM4 Y X3 X X X2 X3 X X X2 X3 Pi Pi P2 P3 PO PO PO2 PO3 SMM4 Y P3 X X X2 X3 X X X2 X3 Pi Pi P2 P3 PO PO PO2 PO3 SMM4 Y X3 X X X2 X3 X X X2 X3 Pi Pi P2 P3 PO PO PO2 PO3 Y(3) Y(5) igi-serial output Out3 Out2 Out Out X3 X X X2 X3 Pi Pi P2 Y(6) PO PO PO2 PO3 SMM4 Y X X X2 X X X2 X3 Pi Pi P2 P3 PO PO2 PO3 SMM4 Y Y() Y(2) Y(4) Y(7) X2 X X Y() (b) Fig. 3 (Continued).

8 24 H. Lee & G. E. Sobelman A digit-level pipelined digit-serial multiplier shown in Fig. 3 can be implemented by unfolding the structure of a bit-serial multiplier. 7 One input of this multiplier is parallel while the other is digit-serial with the least significant digit presented first. The output is also digit-serial with the least significant digit first. Chang has presented the bit-level pipelined digit-serial multiplier design that can be pipelined at the bit-level, which results in higher processing speeds. The bitlevel pipelined digit-serial multiplier contains digit-cells, digit-serial 3:2 compressor and digit-serial adder. Each digit-cell consists of a partial product generator module and a carry-save adder (CSA) module. A simple digit-serial 3:2 compressor adder is used to combine the carry-save adder array outputs down to two digits. A digitserial adder is then used to add these two digits to generate the final digit-serial outputs. 3. igit-serial FPGA Architecture A igit-serial FPGA (S-FPGA) architecture is composed of a digit-serial logic block (LB), programmable interconnect architecture and I/O block as shown in Fig. 4. Overall structure of LB and routing architecture are explained in this section. 3.. igit-serial logic block The LB is a logic-based core cell which is simple and straightforward. Figure 5(a) shows a simplified schematic of the LB, consisting of four main parts; digit-serial S S S S LB LB LB IO Block S S S S LB LB LB S S S S LB LB LB S S S S Fig. 4. S-FPGA architecture.

9 VLSI esign of igit-serial FPGA Architecture 25 LB Input igit-serial Logic Array Logic Module Logic Module Logic Module Fast Carry Logic LB Cin Cin Cin2 Cin3 Carry -type Select Logic Register Array LB Output Logic Module (a) Multiplexer controlled by configuration program : SRAM Configuration Bits Sbit Sbit SbitM3(:) E Ci CTL A B Ci A B Ci2 Ci3 E 2 3 Ri(3:) CK SR CE C C C LM LM LM2 LM3 P G P G P2 G2 P3 G3 P(3:) G(3:) Fast Carry Logic C C C2 C3 2 3 SIR SRi SbitM3() SRA SRA SCO3 SO SO SO2 SO3 RO3 CO CO CO2 CO3 (b) Fig. 5. (a) igit-serial logic block (LB) diagram and (b) detail of LB of S-FPGA.

10 26 H. Lee & G. E. Sobelman Table. The configuration settings for the LB. Bit SIR SRi SRA SRA SCO3 SbitM3(:) Meaning High if (3:) is the direct input to -FFs High if Ri (3:) is the direct input to -FFs High if -FFs with outputs SO (3:) are used High if -FFs with outputs CO (2:) are used High if -FF with output CO3 is used Low Low if N = 2 digit-serial circuits are mapped onto LB High-Low if N = 3 digit-serial circuits are mapped onto LB logic array, fast-carry logic, carry-type select logic, and register array. The detailed LB structure is shown in Fig. 5(b). The LB is formed in a digit-serial structure and operates on N = 4 bit operands. This leads to direct implementation of basic mathematical functions, such as digit-serial adders, subtractors and multipliers, in LB. The LBs can be connected together through the interconnection resources to implement digit-serial adders, subtractors, and multipliers of any digit-size. The LB can also realize several bit-wise logical operations, including AN, OR, XOR, etc. It has 26 inputs and 9 outputs plus clock (CK), set/reset (SR) and clock enable (CE) inputs. Programming configuration bits in Table share across identically programmed datapath slices in a LB. This programming bit sharing reduces the total number of SRAM cells resulting in higher density and faster reconfiguration. The low number of SRAM bits simplified the programming of the LB, which is also desirable for reconfigurable computing. The LB will support efficient implementation of N = 2, 3 and 4 digit-serial SP applications as well as random control circuits. The LB architecture uniquely combines both fine and coarse logic granularity for optimum logic utilization and high performance. High logic utilization is provided by the fine-grained logic modules that can implement random logic functions without wasting device resources. The coarse-grained structure of the four fully interconnected logic modules provides fast operation and efficient routing with minimal signal skew for digit-serial SP architectures igit-serial logic array The digit-serial logic array is composed of four small logic modules (LMs), which are the smallest unit of logic in the structure of LB and a logic-based core cell. The logic-based core cell method has a less number of transistors than look-up table (LUT) method. Figure 6 depicts the structure of LM, which has five data inputs, four data outputs and a configuration bit. Propagate (P ) and generate (G) outputs are used as inputs of fast-carry logic in the LB. A large number of logic functions can be implemented by using an appropriate subset of the inputs and tying the remaining inputs of a LM high or low as shown in Table 2. Each logic module can implement arithmetic functions such as a full-adder, subtractor and

11 VLSI esign of igit-serial FPGA Architecture 27 Cin B SO CO E A Sbit P G Fig. 6. The structure of logic module (LM). Table 2. Logic functions which can be implemented by a logic module. Function Function Function Input pattern Configuration name at SO at CO (A B C E) bit (Sbit) Full-Adder X Y Z X Y + Z (X+Y) X Y Z Y Subtractor X Y Z X Y + Z (X+Y ) Y X Z X Half-Adder X Y X Y X Y Multiplier cell (X Y) P Z (X Y) P+Z ((X Y) P) X P Z P Y INV X X X AN2 X Y X Y X X Y NAN2 (X Y) Y X Y X OR2 X + Y (X + Y) X Y NOR2 (X + Y) Y X XOR2 X Y X Y X Y XNOR2 (X Y) X Y Y X MUX 2: X S + Y S X S + X Y S S X S Y MUXB 2: X S + Y S Y S Y S X AN XOR (X Y) Z X Y Z X Z Y some combinational functions. Table 3 shows the programming table of configurable SRAM cells for mapping the arithmetic functions. A single logic array can implement either N = 2, 3 and 4 digit-serial adders/subtractors, unsigned digit-serial multiplier modules with partial product or two s complement digit-serial multiplier modules. Based on our observation, these operations are widely used in digit-serial SP applications. The LM can also realize several bit-wise logical operations, including AN, OR, XOR, etc Fast-carry and carry-type select logic The LB provides fast-carry logic that bypasses the ripple-carry interconnect structure for N = 4 digit-serial circuits with a carry look-ahead method. The use of LMs to make the propagate (P ) and generate (G) signals made the fast-carry logic in LB very simple. The fast-carry logic greatly increases the efficiency and performance of digit-serial adders, subtractors and multiplier building blocks with N = 4.

12 28 H. Lee & G. E. Sobelman Table 3. Programming of configurable SRAM cells (S=digit-serial). Arithmetic Configuration bits function Sbit(:) SbitM3(:) SIR SRI SRA SRA SCO3 INV AN2 NAN2 OR2 NOR2 XOR2 XNOR2 MUX 2: MUXB 2: AN XOR Full-Adder flip-flop UU N = 4 S adder N = 4 S subt. N = 4 S mult. cell As shown in Fig. 5(b), there is three 3: multiplexers employing a combination of multiplexing to select one of three possible carry options via two configurable bits after LMs. This structure comprises of carry-type select logic that is used to select either the ripple-carry chains, the fast-carry logic or the carry-save array. The ripple-carry chains sequentially connect all the LMs in a LB, supporting N = 2 digit-serial circuits. The carry from a lower-order bit moves forward to the higher-order bit via the carry chain Register array Our S-FPGA is a register rich architecture which makes possible a high degree of pipelining, leading to increased performance. The register array contains a multiplexer to select the output, an edge-triggered flip-flop and output drivers. In total, there are nine flip-flops in each LB that can be combined to form a 8-bit register. The eight multiplexers in front of flip-flops will select either carry-save operation or direct inputs via configurable bits. The flip-flops share a common clock (CK), clock enable (CE), and set/reset (SR). Alternatively, the inputs (Ri(3:)) can be used as a direct inputs to the registers that are frequently used to implement shift-registers. The final outputs SO(3:) and CO(3:) can be either the direct outputs from the multiplexers or the outputs from flip-flops Routing architecture The routing framework contains predefined segmented wires in the vertical and horizontal directions as shown in Fig. 7. S-FPGA has two groups of routing resources. One is internal routing resources and the other is the external

13 Quad-length line (8) Singlelength ouble-length line (8) line (4) Internal Input Routing Internal Output Routing A B Ci E A B Ci E Ci2 2 E2 SO SO SO2 SO3 LB CO CO Ci3 3 E3 CO2 CO3 Ri Ri Ri2 Ri3 CK SR RO3 CE Connection Block Long-length line (6) M M Switch Block M M M M M M Buffered Switch Fig. 7. Connection Block S-FPGA routing architecture. Switch Block M Single-length line (4) ouble-length line (8) Quad-length line (8) VLSI esign of igit-serial FPGA Architecture 29

14 3 H. Lee & G. E. Sobelman routing resources. Internal routing is the routing between logic block pins to provide rich, direct routing resources needed in digit-serial circuits. These lines provide very fast signal transmission with short delay. The connections between adjacent logic blocks which frequently occurs in digit-serial circuits is implemented via direct line without consuming any slow switch block. Feedback interconnect is the feedback paths from the LB s outputs (CO(3:)) to its input (Ci) without consuming any external routing resources. Signal routed to one of the LB pins is buffered at the input or output. The buffers at the LB pins effectively isolates the capacitive load of the drain capacitance of pass-transistors from the routing segments. The routing delay through the routing segments is greatly reduced. External routing employs connection blocks and switch blocks to permit the interconnection of individual LBs. Switch blocks are connected to singlelength, double-length, quard-length and long-length line segments on four directions. Switch blocks provide connectivity with the routing segments using 2 programmable switches. Connection blocks provide connectivity between LBs and routing segments using programmable switches. Pass-transistor switches add series resistance to S-FPGA routing paths, resulting in long delays for long paths. Longer segmentation of interconnect lines has been used to address this issue. At very large device sizes, lines are heavily-loaded, and the wire resistance slows down signals on the line. Signal propagation delay depends on the number of switches that the signal passes through. 4. Circuit-Level esign Issues for LB In this section we discuss the circuit-level design of the LB. Throughout the discussion, various trade-offs among supply voltage, logic style and performance are evaluated. The most important parameter controlling power consumption is the supply voltage, due to the squared term in the power consumption equation. 2 Thus, supply voltage reduction is the most effective way to reduce the power consumption. However, this method presents a tough challenge in the design of FP- GAs since most of these structure make extensive use of pass-transistor logic. The problem of supply voltage reduction is further exacerbated in process that do not have low-threshold devices. In these processes, lowering the supply voltage below 2.5 V results in a dramatic loss of performance and even causes some circuits to malfunction. Therefore, such a supply voltage reduction requires new design methods for low-voltage and low-power integrated circuits. Circuit topologies that help reducing the supply voltage are discussed. 4.. Impact of logic style The logic style used in logic gates basically influences the speed, size, power dissipation, and the wiring complexity of a circuit. The circuit delay is determined by the number of transistors in series, transistor sizes (i.e., channel widths), and intraand inter-cell wiring capacitances. Circuit size depends on the number of transistors and their sizes and on the wiring complexity. Power dissipation is determined

15 VLSI esign of igit-serial FPGA Architecture 3 by the switching activity and the node capacitances. All these characteristics may vary considerably from one logic style to another and thus make the proper choice of logic style crucial for circuit performance. Various investigation of logic styles with respect to low-power dissipation have recently been carried out and reported in the literature. 2,2 In these publications, CPL and related pass-transistor logic styles are propagated as low-power logic styles, because CPL gates count fewer transistors, have smaller transistors and smaller capacitances, and are faster than gates in complementary CMOS. However, these circuits have a limited drive capability at a low supply voltage. Although the poor signal level can still drive other circuits correctly at a high supply voltage, it cannot guarantee proper operation at a low supply voltage. Therefore, the problems of threshold voltage loss must be alleviated and the full voltage swing is needed to get a correct signal level at a low supply voltage Logic module circuit XOR and MUX constitute the critical part of logic module (LM) in LB. However 7-transistor XOR circuit 22 has been used to implement the LM circuit. The performance comparisons using several XOR circuits are presented in Ref. 23. The investigation results presented show that 7-transistor XOR performs much better than CPL and complementary static CMOS XOR. ouble-pass transistor logic (PL) MUX is used to improve circuit performance at reduced supply voltages. Because of the presence of both NMOS and PMOS devices, the output of PL MUX circuit has a full voltage swing and there is no static short circuit current problem. The investigation results presented in Ref. 2 show that for all simple and complex logic gates such as two-input NAN (NAN2), two-input NOR (NOR2) and three-input and-or-invert (AOI), complementary static CMOS outperforms CPL and other pass-transistor logic styles with respect to circuit delay, power dissipation, power-delay product, and layout size. CMOS also shows the highest robustness and smallest sensitivity to transistor and voltage scaling. This makes complementary CMOS the logic style of choice for low-power, low-voltage implementation of LM circuit. However, other logic style, such as pass-transistor XOR and MUX, is still be viable candidates for low-power high-speed implementation of LM circuit. A transistor-level schematic diagram of the proposed LM is depicted in Fig. 8. Addition of the complementary transistor allows the circuit to operate at V dd = 3.3 V without the loss in thresholds that plagued the NMOS only pass-transistor design. The LM circuit has 25% less delay and 37% less power consumption than LM circuit using static CMOS circuits. The proposed LM has good signal output levels and low power consumption at a low supply voltage (V dd = 2 V). Table 4 shows the circuit delays of LM using HSPICE simulation Fast-carry logic circuit We have designed fast-carry logic circuit using compact static CMOS carry lookahead (CLA) circuits. This fast-carry look-ahead circuit is based on transistor

16 32 H. Lee & G. E. Sobelman Cin SO B F A CO E P G Fig. 8. Transistor-level schematic of logic module. Table 4. Path delays of LM circuit. Path delays Voltage A P A G A SO A Carry 2. V.7 ns.53 ns.58 ns 2.39 ns 3.3 V.54 ns.7 ns.67 ns.8 ns sharing in multiple output static CMOS complex gates to reduce the transistor count and improve the operation speed of the whole circuit. 24 We define C i as the carry of the ith stage, and A i and B i are the ith bits of the input data; then C i+ is expressed as Expanding this yields C i = G i + P i C i, G i = A i B i, P i = A i B i. C i = G i + P i G i + P i P i G i P i P i P C in.

17 VLSI esign of igit-serial FPGA Architecture 33 Practically, the number of look-ahead stages is limited to four and the term C 3 is expressed as C 3 = G 3 + P 3 G 2 + P 2 [G + P (G + P C in )]. Figure 9(a) shows the carry chain of the 4-bit CLA block which yields C, C and C 2 as well as C 3. By inserting a PMOS transistor with P i gate input into the pull-up part, we can isolate the pull-up part of carry C i from the pull-up part of all carry C j (j > i) according to the logic redundancy method earlier. In the pull-down part, however, P i and G i cannot both be high because P i is the XOR and G i is the AN function of input operand A i and B i for i =, 2 and 3; therefore, there is no other discharging path to ground except the original pull-down part of carry C i, which makes the addition of redundant transistors unnecessary. The fast-carry logic circuit is composed of chains of MOSFETs serially connected between a power supply rail and the output of the subcircuit. These serially connected MOSFETs are a major source of delay and power dissipation, therefore, optimal sizing of these transistors is important in reducing the delay and power dissipation of these circuit structures. Channel width tapering method 25 has been used to reduce the delay and power dissipation of serially connected MOSFET chains in fast-carry logic circuit Glitch-free TSPC flip-flop flip-flop (-FF) is used heavily throughout the S-FPGA in order to make a possible high degree of pipelining, leading to increased performance. The -FFs share a common clock (CK), clock enable (CE), and set/reset (SR) and they can be set/reset locally or globally. Enhancing -FFs speed can lead to a higher clock rate. Power dissipated in the clock distribution network has usually been a substantial part of the total system power consumption. Therefore, it is important to minimize the number of global clocks, as well as the gate capacitance associated with the clock nets. To accomplish this, the true single-phase clocking (TSPC) methodology has been proposed, with the basic register shown in Ref. 26. The TSPC scheme has the inherent advantage of clock skew problems being restricted to the proper distribution of only one clock phase. It is shown in Ref. 27 that, although the proportion of power consumption due to glitching varies significantly with the particular circuits (from 9% to 38%), the hazard/glitch power consumption cannot be neglected in static CMOS circuits. Therefore, the glitch-free TSPC -FF presented in Ref. 28 is used to implement the -FF with clock, set/reset and clock enable in LB as shown in Fig. 9(b). To ensure that the output does not discharge when y is high at evaluation, an NMOS transistor MN4, controlled by y b, is inserted into the output stage of the conventional TSPC -FF. Transistor size optimization also improves circuit speed by a factor of.5.8.

18 34 H. Lee & G. E. Sobelman Cin 2. P 2. P 2.4 P2 2.4 P3 2.7 C 2./.9 G P P 2.7/.9 P C 2./.9 G 2.4 P2 2.4 P2 P3 P2 P3 C2 2./.9 G2 P G3 P P3 C3 2./.9 C3 P2.9 P2 P.9 P P.2 Cin.2.2 G.9 G.9 G2.9 G3 (a) CLK SET CE 2.7/.6.9/.6 2.7/.6.9/.6 EN_CLK.9/.6 RST V V V MP 2.4/.6 MPS 2.7/.6 y MN.9/.6 V.9/.6 y_b.9/.6 MPS2 2.4/.6 2.7/.6 y2 MN2.9/.6 MNS.9/.6.9/.6 MP2 2.7/.6 MNS2.9/.6 MN4.9/.6 MN3.9/.6 QB 3./.6.9/.6 Q (b) Fig. 9. (a) Fast-carry logic circuit and (b) glitch-free TSPC Flip-flop with clock, set/reset and clock enable.

19 VLSI esign of igit-serial FPGA Architecture Routing Architecture esign Issues A key aspect in the design of an FPGA is its routing architecture, which comprises the resources that are used to interconnect the device s logic blocks. A large number of different routing architecture issues was investigated in Refs. 29 and 3. Several architectural and circuit design choices impact the amount of capacitance that will be charged and discharged within an FPGA design. The electrical design of FPGA interconnect circuit was investigated in Ref. 3. The number of metal layers available in the selected process technology influence the final area and power of an FPGA array. In addition, the sizing and number of switches within the interconnect fabric can also seriously affect power. Lastly, capacitance is affected by the interconnect of the basic logic cell to the surrounding interconnect and to the neighboring cells. Given that interconnect wiring is a crucial resource in an FPGA, the reduction of the intrinsic wiring capacitance by using upper layers of metal can help lower net capacitances. However, the necessity of reaching the active layers to insert switches leads to the addition of several contacts and vias to lower levels. Another important design consideration that has an impact on the resulting FPGA capacitance is the size and number of switches present on each interconnect segment. The interconnect network consists of programmable switches that are organized in connection blocks and switch blocks. The performance of FPGAs is mainly limited by the delay through the interconnects programmable switches. This delay increases quadratically with the number of series switches and linearly with the number of switches loading each node and is especially a problem when the programmable switches are implemented using MOS transistors since these have an appreciable resistance and capacitance. The FPGA has higher signal delay because (a) the channel resistance of pass-transistors connecting segments of wire, (b) parasitic capacitance of the off transistor, and (c) branches of extra wire that are not on the source to sink path. An approximate discrete analysis of a line modeled as RC cascadated sections (of equal R and C) yields t n = RCn(n + ) 2. () As it can be seen from the above equation, the total delay depends quadratically on the number of sections and linearly on the resistance of each interconnection and the capacitance of each section. The accumulation of quadratic delay can be limited by inserting repeaters that consist of pairs of unidirectional tristate buffers. A tradeoff for the switch size must be reached in order to obtain result. Once given the projected delay and the prospective load to be driven, a tradeoff can be found for the buffer strength and the CMOS switch in terms of required speed and area wasted. This tradeoff tries to minimize not only the total delay from the beginning of the interconnection to the end of the multi-switch line but also the local delays from the beginning of the line to each intermediate point after

20 36 H. Lee & G. E. Sobelman an interconnection element. As a consequence, the average delay is reduced and a significant area reduction is achieved. This section discusses the tradeoff of the NMOS pass-transistor switch size used for programmable interconnections as a function of the buffer strength and the desired delay for a given load. The repeater interval and buffer optimum size in the interconnection network are determined for the fast speed. 5.. NMOS pass-transistor switch sizing SRAM-based FPGAs normally use NMOS pass-transistors to implement routing switches and this kind of switch has significant series resistance and parasitic capacitance. The sizing and implementation of switches throughout the FPGA interconnect is a factor with a large impact on an FPGA s resulting power consumption and speed. When considering increases in switch size, there is a slow decrease in delay and a nearly linear increase in energy. In order to account for wiring capacitance component, a worst case estimate of wiring parasitics was made and added to each node. A series of simulations were performed to more accurately assess the trade-offs in switch size and implementation using interconnection network shown in Fig.. Figure shows the measured delay and energy-delay product from the input port to the node point after 2 switching sections. Figure (a) shows that increasing the NMOS pass-transistor switch width from.2 to 3.3 µm significantly reduces delay, but there are diminishing returns beyond that point. The reason that the delay flattens out as transistor size increases is that, although resistance drops as the transistor size is increased, parasitic capacitance increases. Assuming a wiring capacitance of 6 ff, a switch size of approximately 3.3 µm is optimal for passtransistor interconnect with a high fanout Repeater interval and buffer optimum size It is well known that an NMOS pass-transistor can transmit the signal completely, but it has poor performance on transmitting the signal. In the latter case, one will incur voltage drop V tn, where V tn is the threshold voltage of NMOS. The NMOS pass-transistor is effective at pulling low so the inverters PMOS transistor is fully turned on giving a solid low-to-high transition. The maximal voltage that can be passed through the NMOS transistor sits at V dd V tn. Since the poor V dd V tn voltage cannot fully turn on NMOS transistor in inverter, the falltime is longer than risetime. To achieve the same risetime and falltime at buffer output, we need to increase the NMOS transistor at first stage of buffer. Figure 2(a) shows the tristate buffer optimized with the same risetime and falltime. Figure shows an interconnection network with tristate buffer repeaters. The repeater interval k is defined as the number of switches between two nodes that

21 Buffered Switch Buffered Switch Output Buffer C block S block S block M S block S block M S block S block C block Input Buffer 6F 6F 6F 6F 6F 6F 6F 6F 6F k Fig.. Wp/Wn M Output Buffer 2.7u/.9u 4.5u/.5u k Input Buffer M.9u/2.u 2.u/.9u Interconnect network using pairs of unidirectional tristate buffers. VLSI esign of igit-serial FPGA Architecture 37

22 38 H. Lee & G. E. Sobelman 9 elay vs. Switch size elay (ns) Switch Size (Width in ums), Length=.6um (a) 2.5 Energy elay Product vs. Switch Size 2 Energy elay Product Switch Size (Width in ums), Length=.6um (b) Fig.. (a) elay versus switch-size and (b) energy delay product versus switch-size through 2 switches (repeater interval = 6).

23 VLSI esign of igit-serial FPGA Architecture 39 2./.6 2./.6 2./.6 IN.9/.6 2./.6 OUT.9/.6.9/.6.9/.6 2.4/.6 EN EN_b.9/.6 (a) 33 elay through 36 loaded stages versus repeater interval 32 3 elay through 36 loaded stages (ns) Repeater Interval (k) (b) Fig. 2. (a) Tristate buffer for repeater and (b) delay through 36 loaded switch sections versus repeater interval.

24 4 H. Lee & G. E. Sobelman elay 9 8 delay (ns) Switch Width (um) Buffer Width (um) Fig. 3. elay as the function of the switch width and buffer width (length =.6 µm). contain repeaters. Figure 2(b) compares delay for simulations of the propagation delay in 36 switch sections with variable repeater interval k. The curve shows the average propagation delay for a chain that uses tristate buffer repeater, which was optimized for minimum delay, with equal rising and falling delay. Increasing the repeater interval from 2 to 6 significantly reduces delay, but there are diminishing returns beyond that point. Above repeater interval, the signal is not transmitted from the input to the node point after 36 switch sections. Therefore, we can get a fast speed in interconnect network with repeater interval 6. Figure 3 shows the results of an experiment to measure the effect that varying the channel width of switch and buffer has on the speed-performance of interconnection network. The result shows that increasing the switch width significantly reduces delay by switch width 3.3 µm, but there is no large delay reduction as increasing the buffer width. Figure 4 shows the delay models of S-FPGA routing structure using the optimum-size pass-transistor switch and buffers. We have tried to identify the tradeoff of the NMOS pass-transistor switch size used for programmable interconnections as a function of the buffer strength and the desired delay for a given load. The switch size of approximately 3.3 µm is optimal for pass-transistor interconnect with a high fanout. The repeater interval in the interconnection network was determined for the fast speed. A tradeoff can be reached to minimize the area penalty and the average delay without practically increasing the overall delay.

25 elay model of S FPGA routing structure LB Output Buffer C block S block S block S block S block C block Input Buffer LB 2.76ns LB Buffered Switch Output Buffer C block S block S block S block 2.57 ns S block Buffered Switch S block Buffered S block S block S block S block Switch S block S block S block S block 2.38 ns Fig. 4. elay model of S-FPGA routing structure. Buffered Switch VLSI esign of igit-serial FPGA Architecture 4

26 April 3, 24 8:8 WSPC/23-JCSC H. Lee & G. E. Sobelman 6. Physical esign and Fabrication 6.. Area and speed Switch Block Input Internal Routing Vertical Routing Track LB Output Internal Routing river C Block river The proposed S-FPGA cell has been implemented using a full-custom VLSI design in order to extract physical characteristics. The custom layout of the S-FPGA cell was done in a.5 µm Hewlett-Packard (HP) CMOS process with three metal layers. Figure 5(a) shows the floorplan of S-FPGA tile and its major building blocks. C Block (a) (b) Fig. 5. (a) Floorplan of S-FPGA tile and (b) layout of S-FPGA prototype chip.

27 VLSI esign of igit-serial FPGA Architecture 43 Table 5. Circuit delays of S-FPGA cell measured from HSPICE simulation. Path elay (ns) Combinational logic LM inputs LB outputs (SO(3 : )) (without -FF) 2.9 LM inputs LB outputs (SO(3 : )) (with -FF) 3.86 LM inputs LM output (cm).85 LM inputs LM output (P ).54 LM inputs 2 th LM outputs (cm) (ripple-carry mode) 3.53 LB fast-carry logic P () and Ci Fast-carry logic output (C).42 P () and Ci Fast-carry logic output (C2).44 P () and Ci Fast-carry logic output (C3).47 Sequential delays Fast-carry logic output -FF output LB output 2.57 Routing track delays elay through two connection block and four switch block including buffers 2.76 (spanning 4 LBs) elay through four switch block including buffered switch (spanning 4 LBs) 2.38 Path delays N = 4 path delay (A CO3) 4.33 N = 2 critical path delay (using ripple-carry logic) (A SO) 6. N = 3 critical path delay (using fast-carry logic) (A SO2) 6.6 N = 4 critical path delay (using fast-carry logic) (A SO3) 6. The major components include the LB, connection block and switch block. The area of LB core is 23 µm 2 µm and the area of each tile (which contains LB, connection block and switch block) is 6 µm 42 µm = 252 µm 2. LB core makes up 9% of the total area while the routing resources makes up the remaining 8%. In particular, the routing resources such as connection block and switch block take up significant area. We used HSPICE to measure the circuit and routing track delays for the proposed S-FPGA cell and to verify the functionality of our layout. Table 5 shows the circuit and routing track delays for the proposed S-FPGA cell at V dd = 3.3 V. The routing track delays have been estimated from HSPICE simulation. These results are used to determine the speed of digit-serial arithmetic circuits implemented using the proposed S-FPGA cells. The critical path delay between the input and output pins of a LB, including direct-connection, is 6. ns Fabrication results A prototype chip of S-FPGA has been fabricated using.5 µm HP CMOS process with three metal layers, and the total layout is shown in Fig. 5(b). It contains only four of the tiles, which were enough to build the digit-serial adder and digit-serial multiplier. This chip has 4 pads, including four power and ground, 28 signal pins into the S-FPGA core, and eight programming pins. From this chip, we were able

28 44 H. Lee & G. E. Sobelman to make delay measurements that included one part of the logic block. Based on these measurements, the pad to pad delay through the part of critical path of LB and two switch block was 9 ns, giving a delay of about 7.5 ns when the pad delay is eliminated. Consideration of the propagation delays for S-FPGA suggest that digit-level pipelined digit-serial multipliers with throughput as fast as 5 MHz may be achieved. 7. igit-serial atapath Circuit Implementations on S-FPGA In this section, we overview the methodology used for technology mapping of the digit-serial circuits into S-FPGA. S-FPGA provides substantial support for the implementing of digit-serial arithmetic building blocks for digital systems such as FIR filters, CT circuits and similar computational intensive structures. Figure 6 shows how the LB can be used in various ways. igit-serial arithmetic modules can be implemented using LB in S-FPGA. Each N = 4 digit-serial arithmetic module is typically implemented using to 2 LBs with logic depth of or 2 LBs which leads to high clock frequency operation. Each digit-serial arithmetic modules consist of a single cell as in adders and subtractors, or multiple cells proportional to the word length as in digit-serial multipliers and registers. A single LB can be used to implement an unsigned N = 4 digit-serial multiplier module or a two s complement N = 4 digit-serial multiplier module as shown in Figs. 6(a) and 6(b). The outputs can be registered for pipelining; otherwise the -FFs are available for independent usage, bypassing the logic modules. X(3:) and Y bit(:) are 4-bit and 2-bit data for digit-serial multiplier modules. P I(3:) and P O(3:) are the input and output partial products. As shown in Figs. 6(a) and 6(b), the four AN gates and a 4-bit adder that are required for an N = 4 digit-serial multiplier module are contained in one LB. We can use the fast-carry logic for N = 4 digit-serial circuit to increase speed. The digit-serial multiplier modules can be easily stacked together to form deeper and/or wider multipliers. For example, to implement an N = 8 digit-serial multiplier module, the carry output (CO3) of one LB will be connected to the carry input (Ci) of the next LB, thus requiring two LBs. A single LB can be used to implement two N = 2 digit-serial multiplier modules using ripple-carry chain of LB as shown in Fig. 6(c), and four independent full-adders/subtractors as in a row of a carry-save array as shown in Fig. 6(d). Each logic module in a LB can also be configured to implement random logic without wasting device resources. We can configure each logic module to implement the following random logic gates: AN2, NAN2, OR2, NOR2, XOR2, XNOR2, MUX2, MUX, etc, as shown in Table 2. If LB is used to implement the bit-serial circuits, one LM can be used to implement the bit-serial circuit and the remaining LMs can be used for random logic gates. We have designed some digit-serial multipliers using LBs in S-FPGA as shown in Fig. 7. The digit-serial multipliers have been designed so that routing

29 VLSI esign of igit-serial FPGA Architecture 45 Ybit(:) Ybit(:) X PI X PI X2 PI2 PO PO PO2 X PI X PI X2 PI2 PO PO PO2 X3 PI3 PO3 X3 PI3 PO3 Sign it(:) (a) (b) PO PO PO2 PO3 X PO PI X PO PI Ybit PO2 X PI PO3 X PI Ybit X PI X PI Ybit X PI X PI Ybit PO PO PO PO A B Ci A B Ci Ci2 A B Ci A B Ci Ci2 Ci3 Ci3 Ybit(:) S C S C S2 C2 S3 C3 S C S C S2 C2 S3 C3 (b) (c) (c) (c) (d) (d) (d) Fig. 6. (a) Unsigned digit-level pipelined N = 4, (b) two s complement N = 4, (c) unsigned N = 2 and (d) bit-level pipelined N = 4 digit-serial multiplier module implementations onto LB.

30 X3 Y X A B Cin E A B Cin SO SO Y A B Cin E A B Cin SO SO Y2 A B Cin E A B Cin SO SO A B Cin E A B Cin SO SO A B Cin E A B Cin SO SO A B Cin E A B Cin SO SO A B Cin E A B Cin SO SO A B Cin E A B Cin SO SO Pout Pout A B Cin E A B Cin SO SO 46 H. Lee & G. E. Sobelman X Y X2 X X X2 X3 Cin2 2 E Cin3 3 Ri Ri Ri2 Ri3 LB SO2 SO3 CO Y Cin2 2 E Cin3 3 LB SO2 SO3 CO Y2 Cin2 2 E Cin3 3 LB SO2 SO3 CO CO CO CO CO CO CO CO2 Ri CO2 Ri CO2 Ri CO2 Ri CO2 Ri CO2 CO3 Ri Ri Ri Ri Ri Ri2 CO3 Ri2 CO3 Ri2 CO3 Ri2 CO3 Ri2 CO3 RO3 Ri3 RO3 Ri3 RO3 Ri3 RO3 Ri3 RO3 Ri3 RO3 CK SR CE CK SR CE CK SR CE CK SR CE CK SR CE CK SR CE Cin2 2 E Cin3 3 LB SO2 SO3 CO Cin2 2 E Cin3 3 LB SO2 SO3 CO Cin2 2 E Cin3 3 LB SO2 SO3 CO Cin2 2 E Cin3 3 Ri Ri Ri2 Ri3 LB SO2 SO3 CO CO CO2 CO3 RO3 CK SR CE Cin2 2 E Cin3 3 Ri Ri Ri2 Ri3 LB SO2 SO3 CO CO CO2 CO3 RO3 CK SR CE Pout2 Pout3 Cin2 2 E Cin3 3 Ri Ri Ri2 Ri3 LB SO2 SO3 CO CO CO2 CO3 RO3 CK SR CE CK SR CE (a) Fig. 7. (a) Implementation of unsigned digit-level pipelined N = digit-serial multiplier and (b) digit-cell for unsigned bit-level pipelined N = 4 digit-serial multipliers using LB.

31 CK SR CE CK SR CE CK SR CE CK SR CE CK SR CE CK SR CE CK SR CE CK SR CE CK S b b b3 b2 a2 b2 b b b3 a CK SR CE Ri3 Ri2 Ri Ri RO3 CO3 CO2 3 Ci3 E 2 Ci2 LB Ci B A E Ci B A CO CO SO3 SO2 SO SO CK SR CE Ri3 Ri2 Ri Ri RO3 CO3 3 Ci3 E 2 Ci2 CO2 CO LB CO SO3 Ci B A SO2 SO E Ci B A SO 4 4 b3 a b2 b a b CK SR CE Ri3 Ri2 Ri Ri RO3 CO3 3 Ci3 E 2 Ci2 CO2 CO LB CO SO3 Ci B A SO2 SO E Ci B A SO b a3 b2 b3 a3 b Sum_in Sum_in Sum_in CK SR CE CK SR CE CK SR CE Ri3 Ri2 Ri Ri RO3 RO3 RO3 CO3 Ri3 Ri2 Ri Ri CO3 Ri3 Ri2 Ri Ri CO3 3 Ci3 E 2 Ci2 CO2 CO2 CO2 CO CO CO LB CO SO3 3 Ci3 E 2 Ci2 SO2 SO LB CO SO3 3 Ci3 E 2 Ci2 SO2 SO LB CO (b) SO3 Fig. 7 (Continued). Ci B A Ci B A Ci B A SO2 SO E Ci B A SO E Ci B A SO E Ci B A SO Sum_out Sum_out Sum_out2 VLSI esign of igit-serial FPGA Architecture 47

32 48 H. Lee & G. E. Sobelman is kept regular and well-organized. An unsigned digit-level pipelined N = digit-serial multiplier implementation using LB is shown in Fig. 7(a). Each block is replaced by the digit-serial multiplier module shown in Fig. 6(a). In order to increase the throughput of the digit-serial multiplier, the architecture is pipelined at the digit-level. In the example shown, the pipelining limits the propagation to a 4- bit adder in the N = 4 digit-serial multiplier. The partial products presented to the shifting accumulator are generated by the logical AN of the input serial bit with each bit of the parallel input. However, the critical path delay of unsigned digitlevel pipelined N = 4 digit-serial multiplier is (AN + 4 Full adder + -FF) delay. Reduction in the critical path delay below this value is not possible because of the presence of feedback loops. It is found that the critical path of this N = 4 digit-serial multiplier using S-FPGA is (T LM + T F ast + T F F + T r ) delay. In these equations, T LM represents the propagation delay associated with the LM within the LBs, and T F ast and T F F represent, respectively, the propagation delay associated with fast-carry logic and flip-flop within the LBs. T r is a delay incurred in the routing between each LB. Since the proposed S-FPGA can implement significant digitserial arithmetic functions within a single LB, routing is only required to support the implementation of wide operand structures. Finally, the critical path for the overall digit-serial multiplier has been determined in terms of the worst critical path associated with the constituent multiplier module. Therefore, the maximum possible sampling frequency associated with the N = 4 digit-serial multiplier can be obtained as 4 f = W (T LM + T F ast + T F F + T r ). The unsigned bit-level pipelined digit-serial multiplier contains digit-cells, digitserial 3:2 compressor adder and digit-serial adder. A digit-cell can be configured using six LBs as shown in Fig. 7(b) and a simple digit-serial 3:2 compressor adder can be first used to reduce digit-cell output digits to two digits. A digit-serial adder is then used to add these two digits to generate the final digit-serial outputs. However, if it is mapped on S-FPGA, the reduction in the critical path below N = 4 digit-serial adder is not possible due to the presence of feedback loop in the final digit-serial adder. The resulting critical path of bit-level pipelined digit-serial multipliers would be (2T LM + T F F + T r ) delay. 8. Results To evaluate the advantages of S-FPGA, we need to compare the area and speed efficiency of the S-FPGA architecture with general purpose FPGAs. To determine the number of FPGA logic blocks needed to implement a circuit, we have mapped several digit-serial SP architectures onto S-FPGA using the direct handmapping which ensures the most efficient logic usage, and then estimated silicon area in each case. The area cost of FPGAs is estimated by the number of logic blocks required to implement a digit-serial SP architectures.

UNIT-II LOW POWER VLSI DESIGN APPROACHES

UNIT-II LOW POWER VLSI DESIGN APPROACHES UNIT-II LOW POWER VLSI DESIGN APPROACHES Low power Design through Voltage Scaling: The switching power dissipation in CMOS digital integrated circuits is a strong function of the power supply voltage.

More information

CHAPTER 5 DESIGN AND ANALYSIS OF COMPLEMENTARY PASS- TRANSISTOR WITH ASYNCHRONOUS ADIABATIC LOGIC CIRCUITS

CHAPTER 5 DESIGN AND ANALYSIS OF COMPLEMENTARY PASS- TRANSISTOR WITH ASYNCHRONOUS ADIABATIC LOGIC CIRCUITS 70 CHAPTER 5 DESIGN AND ANALYSIS OF COMPLEMENTARY PASS- TRANSISTOR WITH ASYNCHRONOUS ADIABATIC LOGIC CIRCUITS A novel approach of full adder and multipliers circuits using Complementary Pass Transistor

More information

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology Inf. Sci. Lett. 2, No. 3, 159-164 (2013) 159 Information Sciences Letters An International Journal http://dx.doi.org/10.12785/isl/020305 A New network multiplier using modified high order encoder and optimized

More information

High Performance Low-Power Signed Multiplier

High Performance Low-Power Signed Multiplier High Performance Low-Power Signed Multiplier Amir R. Attarha Mehrdad Nourani VLSI Circuits & Systems Laboratory Department of Electrical and Computer Engineering University of Tehran, IRAN Email: attarha@khorshid.ece.ut.ac.ir

More information

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis N. Banerjee, A. Raychowdhury, S. Bhunia, H. Mahmoodi, and K. Roy School of Electrical and Computer Engineering, Purdue University,

More information

Reduced Swing Domino Techniques for Low Power and High Performance Arithmetic Circuits

Reduced Swing Domino Techniques for Low Power and High Performance Arithmetic Circuits Reduced Swing Domino Techniques for Low Power and High Performance Arithmetic Circuits by Shahrzad Naraghi A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for

More information

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N Jan M. Rabaey, Anantha Chandrakasan, and Borivoje Nikolic CONTENTS PART I: THE FABRICS Chapter 1: Introduction (32 pages) 1.1 A Historical

More information

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering FPGA Fabrics Reference Wayne Wolf, FPGA-Based System Design Pearson Education, 2004 CPLD / FPGA CPLD Interconnection of several PLD blocks with Programmable interconnect on a single chip Logic blocks executes

More information

A Survey of the Low Power Design Techniques at the Circuit Level

A Survey of the Low Power Design Techniques at the Circuit Level A Survey of the Low Power Design Techniques at the Circuit Level Hari Krishna B Assistant Professor, Department of Electronics and Communication Engineering, Vagdevi Engineering College, Warangal, India

More information

Preface to Third Edition Deep Submicron Digital IC Design p. 1 Introduction p. 1 Brief History of IC Industry p. 3 Review of Digital Logic Gate

Preface to Third Edition Deep Submicron Digital IC Design p. 1 Introduction p. 1 Brief History of IC Industry p. 3 Review of Digital Logic Gate Preface to Third Edition p. xiii Deep Submicron Digital IC Design p. 1 Introduction p. 1 Brief History of IC Industry p. 3 Review of Digital Logic Gate Design p. 6 Basic Logic Functions p. 6 Implementation

More information

Chapter 6 Combinational CMOS Circuit and Logic Design. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 6 Combinational CMOS Circuit and Logic Design. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 6 Combinational CMOS Circuit and Logic Design Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Advanced Reliable Systems (ARES) Lab. Jin-Fu Li,

More information

Electronic Circuits EE359A

Electronic Circuits EE359A Electronic Circuits EE359A Bruce McNair B206 bmcnair@stevens.edu 201-216-5549 1 Memory and Advanced Digital Circuits - 2 Chapter 11 2 Figure 11.1 (a) Basic latch. (b) The latch with the feedback loop opened.

More information

Totally Self-Checking Carry-Select Adder Design Based on Two-Rail Code

Totally Self-Checking Carry-Select Adder Design Based on Two-Rail Code Totally Self-Checking Carry-Select Adder Design Based on Two-Rail Code Shao-Hui Shieh and Ming-En Lee Department of Electronic Engineering, National Chin-Yi University of Technology, ssh@ncut.edu.tw, s497332@student.ncut.edu.tw

More information

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors T.N.Priyatharshne Prof. L. Raja, M.E, (Ph.D) A. Vinodhini ME VLSI DESIGN Professor, ECE DEPT ME VLSI DESIGN

More information

An energy efficient full adder cell for low voltage

An energy efficient full adder cell for low voltage An energy efficient full adder cell for low voltage Keivan Navi 1a), Mehrdad Maeen 2, and Omid Hashemipour 1 1 Faculty of Electrical and Computer Engineering of Shahid Beheshti University, GC, Tehran,

More information

II. Previous Work. III. New 8T Adder Design

II. Previous Work. III. New 8T Adder Design ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: High Performance Circuit Level Design For Multiplier Arun Kumar

More information

Design and Implementation of Complex Multiplier Using Compressors

Design and Implementation of Complex Multiplier Using Compressors Design and Implementation of Complex Multiplier Using Compressors Abstract: In this paper, a low-power high speed Complex Multiplier using compressor circuit is proposed for fast digital arithmetic integrated

More information

Power-Area trade-off for Different CMOS Design Technologies

Power-Area trade-off for Different CMOS Design Technologies Power-Area trade-off for Different CMOS Design Technologies Priyadarshini.V Department of ECE Sri Vishnu Engineering College for Women, Bhimavaram dpriya69@gmail.com Prof.G.R.L.V.N.Srinivasa Raju Head

More information

A Low Power Array Multiplier Design using Modified Gate Diffusion Input (GDI)

A Low Power Array Multiplier Design using Modified Gate Diffusion Input (GDI) A Low Power Array Multiplier Design using Modified Gate Diffusion Input (GDI) Mahendra Kumar Lariya 1, D. K. Mishra 2 1 M.Tech, Electronics and instrumentation Engineering, Shri G. S. Institute of Technology

More information

CPE/EE 427, CPE 527 VLSI Design I: Homeworks 3 & 4

CPE/EE 427, CPE 527 VLSI Design I: Homeworks 3 & 4 CPE/EE 427, CPE 527 VLSI Design I: Homeworks 3 & 4 1 2 3 4 5 6 7 8 9 10 Sum 30 10 25 10 30 40 10 15 15 15 200 1. (30 points) Misc, Short questions (a) (2 points) Postponing the introduction of signals

More information

A Low-Power High-speed Pipelined Accumulator Design Using CMOS Logic for DSP Applications

A Low-Power High-speed Pipelined Accumulator Design Using CMOS Logic for DSP Applications International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume. 1, Issue 5, September 2014, PP 30-42 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org

More information

Implementation of Carry Select Adder using CMOS Full Adder

Implementation of Carry Select Adder using CMOS Full Adder Implementation of Carry Select Adder using CMOS Full Adder Smitashree.Mohapatra Assistant professor,ece department MVSR Engineering College Nadergul,Hyderabad-510501 R. VaibhavKumar PG Scholar, ECE department(es&vlsid)

More information

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling EE241 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolic Lecture 15 Low-Power Design: Supply Voltage Scaling Announcements Homework #2 due today Midterm project reports due next Thursday

More information

Sophisticated design of low power high speed full adder by using SR-CPL and Transmission Gate logic

Sophisticated design of low power high speed full adder by using SR-CPL and Transmission Gate logic Scientific Journal of Impact Factor(SJIF): 3.134 International Journal of Advance Engineering and Research Development Volume 2,Issue 3, March -2015 e-issn(o): 2348-4470 p-issn(p): 2348-6406 Sophisticated

More information

Lecture 9: Cell Design Issues

Lecture 9: Cell Design Issues Lecture 9: Cell Design Issues MAH, AEN EE271 Lecture 9 1 Overview Reading W&E 6.3 to 6.3.6 - FPGA, Gate Array, and Std Cell design W&E 5.3 - Cell design Introduction This lecture will look at some of the

More information

International Journal of Advance Engineering and Research Development

International Journal of Advance Engineering and Research Development Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 05, May -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 COMPARATIVE

More information

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS 1 T.Thomas Leonid, 2 M.Mary Grace Neela, and 3 Jose Anand

More information

Chapter 3 Novel Digital-to-Analog Converter with Gamma Correction for On-Panel Data Driver

Chapter 3 Novel Digital-to-Analog Converter with Gamma Correction for On-Panel Data Driver Chapter 3 Novel Digital-to-Analog Converter with Gamma Correction for On-Panel Data Driver 3.1 INTRODUCTION As last chapter description, we know that there is a nonlinearity relationship between luminance

More information

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS JDT-002-2013 EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS E. Prakash 1, R. Raju 2, Dr.R. Varatharajan 3 1 PG Student, Department of Electronics and Communication Engineeering

More information

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES 69 CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES 4.1 INTRODUCTION Multiplication is one of the basic functions used in digital signal processing. It requires more

More information

EE141-Spring 2007 Digital Integrated Circuits

EE141-Spring 2007 Digital Integrated Circuits EE141-Spring 2007 Digital Integrated Circuits Lecture 22 I/O, Power Distribution dders 1 nnouncements Homework 9 has been posted Due Tu. pr. 24, 5pm Project Phase 4 (Final) Report due Mo. pr. 30, noon

More information

Investigation on Performance of high speed CMOS Full adder Circuits

Investigation on Performance of high speed CMOS Full adder Circuits ISSN (O): 2349-7084 International Journal of Computer Engineering In Research Trends Available online at: www.ijcert.org Investigation on Performance of high speed CMOS Full adder Circuits 1 KATTUPALLI

More information

Low-Power Digital CMOS Design: A Survey

Low-Power Digital CMOS Design: A Survey Low-Power Digital CMOS Design: A Survey Krister Landernäs June 4, 2005 Department of Computer Science and Electronics, Mälardalen University Abstract The aim of this document is to provide the reader with

More information

Introduction to CMOS VLSI Design (E158) Lecture 9: Cell Design

Introduction to CMOS VLSI Design (E158) Lecture 9: Cell Design Harris Introduction to CMOS VLSI Design (E158) Lecture 9: Cell Design David Harris Harvey Mudd College David_Harris@hmc.edu Based on EE271 developed by Mark Horowitz, Stanford University MAH E158 Lecture

More information

Lecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques.

Lecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques. Introduction EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Techniques Cristian Grecu grecuc@ece.ubc.ca Course web site: http://courses.ece.ubc.ca/353/ What have you learned so far?

More information

DESIGN OF LOW POWER MULTIPLIER USING COMPOUND CONSTANT DELAY LOGIC STYLE

DESIGN OF LOW POWER MULTIPLIER USING COMPOUND CONSTANT DELAY LOGIC STYLE DESIGN OF LOW POWER MULTIPLIER USING COMPOUND CONSTANT DELAY LOGIC STYLE 1 S. DARWIN, 2 A. BENO, 3 L. VIJAYA LAKSHMI 1 & 2 Assistant Professor Electronics & Communication Engineering Department, Dr. Sivanthi

More information

A new 6-T multiplexer based full-adder for low power and leakage current optimization

A new 6-T multiplexer based full-adder for low power and leakage current optimization A new 6-T multiplexer based full-adder for low power and leakage current optimization G. Ramana Murthy a), C. Senthilpari, P. Velrajkumar, and T. S. Lim Faculty of Engineering and Technology, Multimedia

More information

Domino CMOS Implementation of Power Optimized and High Performance CLA adder

Domino CMOS Implementation of Power Optimized and High Performance CLA adder Domino CMOS Implementation of Power Optimized and High Performance CLA adder Kistipati Karthik Reddy 1, Jeeru Dinesh Reddy 2 1 PG Student, BMS College of Engineering, Bull temple Road, Bengaluru, India

More information

DESIGN OF MULTIPLYING DELAY LOCKED LOOP FOR DIFFERENT MULTIPLYING FACTORS

DESIGN OF MULTIPLYING DELAY LOCKED LOOP FOR DIFFERENT MULTIPLYING FACTORS DESIGN OF MULTIPLYING DELAY LOCKED LOOP FOR DIFFERENT MULTIPLYING FACTORS Aman Chaudhary, Md. Imtiyaz Chowdhary, Rajib Kar Department of Electronics and Communication Engg. National Institute of Technology,

More information

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER K. RAMAMOORTHY 1 T. CHELLADURAI 2 V. MANIKANDAN 3 1 Department of Electronics and Communication

More information

CHAPTER 3 NEW SLEEPY- PASS GATE

CHAPTER 3 NEW SLEEPY- PASS GATE 56 CHAPTER 3 NEW SLEEPY- PASS GATE 3.1 INTRODUCTION A circuit level design technique is presented in this chapter to reduce the overall leakage power in conventional CMOS cells. The new leakage po leepy-

More information

Learning Outcomes. Spiral 2 8. Digital Design Overview LAYOUT

Learning Outcomes. Spiral 2 8. Digital Design Overview LAYOUT 2-8.1 2-8.2 Spiral 2 8 Cell Mark Redekopp earning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as

More information

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Design of Wallace Tree Multiplier using Compressors K.Gopi Krishna *1, B.Santhosh 2, V.Sridhar 3 gopikoleti@gmail.com Abstract

More information

DESIGN AND SIMULATION OF A HIGH PERFORMANCE CMOS VOLTAGE DOUBLERS USING CHARGE REUSE TECHNIQUE

DESIGN AND SIMULATION OF A HIGH PERFORMANCE CMOS VOLTAGE DOUBLERS USING CHARGE REUSE TECHNIQUE Journal of Engineering Science and Technology Vol. 12, No. 12 (2017) 3344-3357 School of Engineering, Taylor s University DESIGN AND SIMULATION OF A HIGH PERFORMANCE CMOS VOLTAGE DOUBLERS USING CHARGE

More information

Lecture 11: Clocking

Lecture 11: Clocking High Speed CMOS VLSI Design Lecture 11: Clocking (c) 1997 David Harris 1.0 Introduction We have seen that generating and distributing clocks with little skew is essential to high speed circuit design.

More information

Digital Microelectronic Circuits ( ) CMOS Digital Logic. Lecture 6: Presented by: Adam Teman

Digital Microelectronic Circuits ( ) CMOS Digital Logic. Lecture 6: Presented by: Adam Teman Digital Microelectronic Circuits (361-1-3021 ) Presented by: Adam Teman Lecture 6: CMOS Digital Logic 1 Last Lectures The CMOS Inverter CMOS Capacitance Driving a Load 2 This Lecture Now that we know all

More information

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2 IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 07, 2015 ISSN (online): 2321-0613 Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse

More information

CHAPTER 6 GDI BASED LOW POWER FULL ADDER CELL FOR DSP DATA PATH BLOCKS

CHAPTER 6 GDI BASED LOW POWER FULL ADDER CELL FOR DSP DATA PATH BLOCKS 87 CHAPTER 6 GDI BASED LOW POWER FULL ADDER CELL FOR DSP DATA PATH BLOCKS 6.1 INTRODUCTION In this approach, the four types of full adders conventional, 16T, 14T and 10T have been analyzed in terms of

More information

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES 44 CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES 3.1 INTRODUCTION The design of high-speed and low-power VLSI architectures needs efficient arithmetic processing units,

More information

A Novel Low-Power Scan Design Technique Using Supply Gating

A Novel Low-Power Scan Design Technique Using Supply Gating A Novel Low-Power Scan Design Technique Using Supply Gating S. Bhunia, H. Mahmoodi, S. Mukhopadhyay, D. Ghosh, and K. Roy School of Electrical and Computer Engineering, Purdue University, West Lafayette,

More information

Implementation of High Performance Carry Save Adder Using Domino Logic

Implementation of High Performance Carry Save Adder Using Domino Logic Page 136 Implementation of High Performance Carry Save Adder Using Domino Logic T.Jayasimha 1, Daka Lakshmi 2, M.Gokula Lakshmi 3, S.Kiruthiga 4 and K.Kaviya 5 1 Assistant Professor, Department of ECE,

More information

UNIT-III GATE LEVEL DESIGN

UNIT-III GATE LEVEL DESIGN UNIT-III GATE LEVEL DESIGN LOGIC GATES AND OTHER COMPLEX GATES: Invert(nmos, cmos, Bicmos) NAND Gate(nmos, cmos, Bicmos) NOR Gate(nmos, cmos, Bicmos) The module (integrated circuit) is implemented in terms

More information

Design & Analysis of Low Power Full Adder

Design & Analysis of Low Power Full Adder 1174 Design & Analysis of Low Power Full Adder Sana Fazal 1, Mohd Ahmer 2 1 Electronics & communication Engineering Integral University, Lucknow 2 Electronics & communication Engineering Integral University,

More information

Low Power 32-bit Improved Carry Select Adder based on MTCMOS Technique

Low Power 32-bit Improved Carry Select Adder based on MTCMOS Technique Low Power 32-bit Improved Carry Select Adder based on MTCMOS Technique Ch. Mohammad Arif 1, J. Syamuel John 2 M. Tech student, Department of Electronics Engineering, VR Siddhartha Engineering College,

More information

EE584 Introduction to VLSI Design Final Project Document Group 9 Ring Oscillator with Frequency selector

EE584 Introduction to VLSI Design Final Project Document Group 9 Ring Oscillator with Frequency selector EE584 Introduction to VLSI Design Final Project Document Group 9 Ring Oscillator with Frequency selector Group Members Uttam Kumar Boda Rajesh Tenukuntla Mohammad M Iftakhar Srikanth Yanamanagandla 1 Table

More information

EC 1354-Principles of VLSI Design

EC 1354-Principles of VLSI Design EC 1354-Principles of VLSI Design UNIT I MOS TRANSISTOR THEORY AND PROCESS TECHNOLOGY PART-A 1. What are the four generations of integrated circuits? 2. Give the advantages of IC. 3. Give the variety of

More information

EE 330 Lecture 44. Digital Circuits. Ring Oscillators Sequential Logic Array Logic Memory Arrays. Final: Tuesday May 2 7:30-9:30

EE 330 Lecture 44. Digital Circuits. Ring Oscillators Sequential Logic Array Logic Memory Arrays. Final: Tuesday May 2 7:30-9:30 EE 330 Lecture 44 igital Circuits Ring Oscillators Sequential Logic Array Logic Memory Arrays Final: Tuesday May 2 7:30-9:30 Review from Last Time ynamic Logic Basic ynamic Logic Gate V F A n PN Any of

More information

Yet, many signal processing systems require both digital and analog circuits. To enable

Yet, many signal processing systems require both digital and analog circuits. To enable Introduction Field-Programmable Gate Arrays (FPGAs) have been a superb solution for rapid and reliable prototyping of digital logic systems at low cost for more than twenty years. Yet, many signal processing

More information

Integration of Optimized GDI Logic based NOR Gate and Half Adder into PASTA for Low Power & Low Area Applications

Integration of Optimized GDI Logic based NOR Gate and Half Adder into PASTA for Low Power & Low Area Applications Integration of Optimized GDI Logic based NOR Gate and Half Adder into PASTA for Low Power & Low Area Applications M. Sivakumar Research Scholar, ECE Department, SCSVMV University, Kanchipuram, India. Dr.

More information

Propagation Delay, Circuit Timing & Adder Design. ECE 152A Winter 2012

Propagation Delay, Circuit Timing & Adder Design. ECE 152A Winter 2012 Propagation Delay, Circuit Timing & Adder Design ECE 152A Winter 2012 Reading Assignment Brown and Vranesic 2 Introduction to Logic Circuits 2.9 Introduction to CAD Tools 2.9.1 Design Entry 2.9.2 Synthesis

More information

Propagation Delay, Circuit Timing & Adder Design

Propagation Delay, Circuit Timing & Adder Design Propagation Delay, Circuit Timing & Adder Design ECE 152A Winter 2012 Reading Assignment Brown and Vranesic 2 Introduction to Logic Circuits 2.9 Introduction to CAD Tools 2.9.1 Design Entry 2.9.2 Synthesis

More information

Design and Analysis of Row Bypass Multiplier using various logic Full Adders

Design and Analysis of Row Bypass Multiplier using various logic Full Adders Design and Analysis of Row Bypass Multiplier using various logic Full Adders Dr.R.Naveen 1, S.A.Sivakumar 2, K.U.Abhinaya 3, N.Akilandeeswari 4, S.Anushya 5, M.A.Asuvanti 6 1 Associate Professor, 2 Assistant

More information

COMPUTER ORGANIZATION & ARCHITECTURE DIGITAL LOGIC CSCD211- DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF GHANA

COMPUTER ORGANIZATION & ARCHITECTURE DIGITAL LOGIC CSCD211- DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF GHANA COMPUTER ORGANIZATION & ARCHITECTURE DIGITAL LOGIC LOGIC Logic is a branch of math that tries to look at problems in terms of being either true or false. It will use a set of statements to derive new true

More information

BICMOS Technology and Fabrication

BICMOS Technology and Fabrication 12-1 BICMOS Technology and Fabrication 12-2 Combines Bipolar and CMOS transistors in a single integrated circuit By retaining benefits of bipolar and CMOS, BiCMOS is able to achieve VLSI circuits with

More information

Low Power VLSI CMOS Design. An Image Processing Chip for RGB to HSI Conversion

Low Power VLSI CMOS Design. An Image Processing Chip for RGB to HSI Conversion REPRINT FROM: PROC. OF IRISCH SIGNAL AND SYSTEM CONFERENCE, DERRY, NORTHERN IRELAND, PP.165-172. Low Power VLSI CMOS Design An Image Processing Chip for RGB to HSI Conversion A.Th. Schwarzbacher and J.B.

More information

Pass Transistor and CMOS Logic Configuration based De- Multiplexers

Pass Transistor and CMOS Logic Configuration based De- Multiplexers Abstract: Pass Transistor and CMOS Logic Configuration based De- Multiplexers 1 K Rama Krishna, 2 Madanna, 1 PG Scholar VLSI System Design, Geethanajali College of Engineering and Technology, 2 HOD Dept

More information

A HIGH SPEED DYNAMIC RIPPLE CARRY ADDER

A HIGH SPEED DYNAMIC RIPPLE CARRY ADDER A HIGH SPEED DYNAMIC RIPPLE CARRY ADDER Y. Anil Kumar 1, M. Satyanarayana 2 1 Student, Department of ECE, MVGR College of Engineering, India. 2 Associate Professor, Department of ECE, MVGR College of Engineering,

More information

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS ( 1 Dr.V.Malleswara rao, 2 K.V.Ganesh, 3 P.Pavan Kumar) 1 Professor &HOD of ECE,GITAM University,Visakhapatnam. 2 Ph.D

More information

Digital Microelectronic Circuits ( ) Pass Transistor Logic. Lecture 9: Presented by: Adam Teman

Digital Microelectronic Circuits ( ) Pass Transistor Logic. Lecture 9: Presented by: Adam Teman Digital Microelectronic Circuits (361-1-3021 ) Presented by: Adam Teman Lecture 9: Pass Transistor Logic 1 Motivation In the previous lectures, we learned about Standard CMOS Digital Logic design. CMOS

More information

Novel Buffer Design for Low Power and Less Delay in 45nm and 90nm Technology

Novel Buffer Design for Low Power and Less Delay in 45nm and 90nm Technology Novel Buffer Design for Low Power and Less Delay in 45nm and 90nm Technology 1 Mahesha NB #1 #1 Lecturer Department of Electronics & Communication Engineering, Rai Technology University nbmahesh512@gmail.com

More information

A High Speed Low Power Adder in Multi Output Domino Logic

A High Speed Low Power Adder in Multi Output Domino Logic Journal From the SelectedWorks of Kirat Pal Singh Winter November 28, 2014 High Speed Low Power dder in Multi Output Domino Logic Neeraj Jain, NIIST, hopal, India Puran Gour, NIIST, hopal, India rahmi

More information

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST ǁ Volume 02 - Issue 01 ǁ January 2017 ǁ PP. 06-14 Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST Ms. Deepali P. Sukhdeve Assistant Professor Department

More information

CMOS VLSI Design (A3425)

CMOS VLSI Design (A3425) CMOS VLSI Design (A3425) Unit III Static Logic Gates Introduction A static logic gate is one that has a well defined output once the inputs are stabilized and the switching transients have decayed away.

More information

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder High Speed Vedic Multiplier Designs Using Novel Carry Select Adder 1 chintakrindi Saikumar & 2 sk.sahir 1 (M.Tech) VLSI, Dept. of ECE Priyadarshini Institute of Technology & Management 2 Associate Professor,

More information

BASIC PHYSICAL DESIGN AN OVERVIEW The VLSI design flow for any IC design is as follows

BASIC PHYSICAL DESIGN AN OVERVIEW The VLSI design flow for any IC design is as follows Unit 3 BASIC PHYSICAL DESIGN AN OVERVIEW The VLSI design flow for any IC design is as follows 1.Specification (problem definition) 2.Schematic(gate level design) (equivalence check) 3.Layout (equivalence

More information

PHYSICAL STRUCTURE OF CMOS INTEGRATED CIRCUITS. Dr. Mohammed M. Farag

PHYSICAL STRUCTURE OF CMOS INTEGRATED CIRCUITS. Dr. Mohammed M. Farag PHYSICAL STRUCTURE OF CMOS INTEGRATED CIRCUITS Dr. Mohammed M. Farag Outline Integrated Circuit Layers MOSFETs CMOS Layers Designing FET Arrays EE 432 VLSI Modeling and Design 2 Integrated Circuit Layers

More information

A Literature Survey on Low PDP Adder Circuits

A Literature Survey on Low PDP Adder Circuits Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 12, December 2015,

More information

CMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits

CMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 24: Peripheral Memory Circuits [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11

More information

Gdi Technique Based Carry Look Ahead Adder Design

Gdi Technique Based Carry Look Ahead Adder Design IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 4, Issue 6, Ver. I (Nov - Dec. 2014), PP 01-09 e-issn: 2319 4200, p-issn No. : 2319 4197 Gdi Technique Based Carry Look Ahead Adder Design

More information

Chapter 4. Problems. 1 Chapter 4 Problem Set

Chapter 4. Problems. 1 Chapter 4 Problem Set 1 Chapter 4 Problem Set Chapter 4 Problems 1. [M, None, 4.x] Figure 0.1 shows a clock-distribution network. Each segment of the clock network (between the nodes) is 5 mm long, 3 µm wide, and is implemented

More information

Technology, Jabalpur, India 1 2

Technology, Jabalpur, India 1 2 1181 LAYOUT DESIGNING AND OPTIMIZATION TECHNIQUES USED FOR DIFFERENT FULL ADDER TOPOLOGIES ARPAN SINGH RAJPUT 1, RAJESH PARASHAR 2 1 M.Tech. Scholar, 2 Assistant professor, Department of Electronics and

More information

Application and Analysis of Output Prediction Logic to a 16-bit Carry Look Ahead Adder

Application and Analysis of Output Prediction Logic to a 16-bit Carry Look Ahead Adder Application and Analysis of Output Prediction Logic to a 16-bit Carry Look Ahead Adder Lukasz Szafaryn University of Virginia Department of Computer Science lgs9a@cs.virginia.edu 1. ABSTRACT In this work,

More information

Two New Low Power High Performance Full Adders with Minimum Gates

Two New Low Power High Performance Full Adders with Minimum Gates Two New Low Power High Performance Full Adders with Minimum Gates M.Hosseinghadiry, H. Mohammadi, M.Nadisenejani Abstract with increasing circuits complexity and demand to use portable devices, power consumption

More information

Two New Low Power High Performance Full Adders with Minimum Gates

Two New Low Power High Performance Full Adders with Minimum Gates Two New Low Power High Performance Full Adders with Minimum Gates M.Hosseinghadiry, H. Mohammadi, M.Nadisenejani Abstract with increasing circuits complexity and demand to use portable devices, power consumption

More information

A NOVEL 4-Bit ARITHMETIC LOGIC UNIT DESIGN FOR POWER AND AREA OPTIMIZATION

A NOVEL 4-Bit ARITHMETIC LOGIC UNIT DESIGN FOR POWER AND AREA OPTIMIZATION A NOVEL 4-Bit ARITHMETIC LOGIC UNIT DESIGN FOR POWER AND AREA OPTIMIZATION Mr. Snehal Kumbhalkar 1, Mr. Sanjay Tembhurne 2 Department of Electronics and Communication Engineering GHRAET, Nagpur, Maharashtra,

More information

CS302 Digital Logic Design Solved Objective Midterm Papers For Preparation of Midterm Exam

CS302 Digital Logic Design Solved Objective Midterm Papers For Preparation of Midterm Exam CS302 Digital Logic Design Solved Objective Midterm Papers For Preparation of Midterm Exam MIDTERM EXAMINATION 2011 (October-November) Q-21 Draw function table of a half adder circuit? (2) Answer: - Page

More information

Digital Integrated CircuitDesign

Digital Integrated CircuitDesign Digital Integrated CircuitDesign Lecture 13 Building Blocks (Multipliers) Register Adder Shift Register Adib Abrishamifar EE Department IUST Acknowledgement This lecture note has been summarized and categorized

More information

Low Power, Area Efficient FinFET Circuit Design

Low Power, Area Efficient FinFET Circuit Design Low Power, Area Efficient FinFET Circuit Design Michael C. Wang, Princeton University Abstract FinFET, which is a double-gate field effect transistor (DGFET), is more versatile than traditional single-gate

More information

DESIGN OF LOW POWER HIGH PERFORMANCE 4-16 MIXED LOGIC LINE DECODER P.Ramakrishna 1, T Shivashankar 2, S Sai Vaishnavi 3, V Gowthami 4 1

DESIGN OF LOW POWER HIGH PERFORMANCE 4-16 MIXED LOGIC LINE DECODER P.Ramakrishna 1, T Shivashankar 2, S Sai Vaishnavi 3, V Gowthami 4 1 DESIGN OF LOW POWER HIGH PERFORMANCE 4-16 MIXED LOGIC LINE DECODER P.Ramakrishna 1, T Shivashankar 2, S Sai Vaishnavi 3, V Gowthami 4 1 Asst. Professsor, Anurag group of institutions 2,3,4 UG scholar,

More information

DESIGNING powerful and versatile computing systems is

DESIGNING powerful and versatile computing systems is 560 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 5, MAY 2007 Variation-Aware Adaptive Voltage Scaling System Mohamed Elgebaly, Member, IEEE, and Manoj Sachdev, Senior

More information

On Built-In Self-Test for Adders

On Built-In Self-Test for Adders On Built-In Self-Test for s Mary D. Pulukuri and Charles E. Stroud Dept. of Electrical and Computer Engineering, Auburn University, Alabama Abstract - We evaluate some previously proposed test approaches

More information

DESIGN OF LOW POWER HIGH SPEED ERROR TOLERANT ADDERS USING FPGA

DESIGN OF LOW POWER HIGH SPEED ERROR TOLERANT ADDERS USING FPGA International Journal of Advanced Research in Engineering and Technology (IJARET) Volume 10, Issue 1, January February 2019, pp. 88 94, Article ID: IJARET_10_01_009 Available online at http://www.iaeme.com/ijaret/issues.asp?jtype=ijaret&vtype=10&itype=1

More information

IJMIE Volume 2, Issue 3 ISSN:

IJMIE Volume 2, Issue 3 ISSN: IJMIE Volume 2, Issue 3 ISSN: 2249-0558 VLSI DESIGN OF LOW POWER HIGH SPEED DOMINO LOGIC Ms. Rakhi R. Agrawal* Dr. S. A. Ladhake** Abstract: Simple to implement, low cost designs in CMOS Domino logic are

More information

Implementation of Efficient 5:3 & 7:3 Compressors for High Speed and Low-Power Operations

Implementation of Efficient 5:3 & 7:3 Compressors for High Speed and Low-Power Operations Volume-7, Issue-3, May-June 2017 International Journal of Engineering and Management Research Page Number: 42-47 Implementation of Efficient 5:3 & 7:3 Compressors for High Speed and Low-Power Operations

More information

Low-Power Approximate Unsigned Multipliers with Configurable Error Recovery

Low-Power Approximate Unsigned Multipliers with Configurable Error Recovery SUBMITTED FOR REVIEW 1 Low-Power Approximate Unsigned Multipliers with Configurable Error Recovery Honglan Jiang*, Student Member, IEEE, Cong Liu*, Fabrizio Lombardi, Fellow, IEEE and Jie Han, Senior Member,

More information

COMPREHENSIVE ANALYSIS OF ENHANCED CARRY-LOOK AHEAD ADDER USING DIFFERENT LOGIC STYLES

COMPREHENSIVE ANALYSIS OF ENHANCED CARRY-LOOK AHEAD ADDER USING DIFFERENT LOGIC STYLES COMPREHENSIVE ANALYSIS OF ENHANCED CARRY-LOOK AHEAD ADDER USING DIFFERENT LOGIC STYLES PSowmya #1, Pia Sarah George #2, Samyuktha T #3, Nikita Grover #4, Mrs Manurathi *1 # BTech,Electronics and Communication,Karunya

More information

64 x 64 Bit Multiplier Using Pass Logic

64 x 64 Bit Multiplier Using Pass Logic Georgia State niversity ScholarWorks @ Georgia State niversity Computer Science Theses Department of Computer Science --6 6 6 Bit Multiplier sing Pass Logic Shibi Thankachan Follow this and additional

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 12, DECEMBER 2006 1309 Design of Robust, Energy-Efficient Full Adders for Deep-Submicrometer Design Using Hybrid-CMOS Logic

More information

An Efficient Low Power and High Speed carry select adder using D-Flip Flop

An Efficient Low Power and High Speed carry select adder using D-Flip Flop Journal From the SelectedWorks of Journal April, 2016 An Efficient Low Power and High Speed carry select adder using D-Flip Flop Basavva Mailarappa Konnur M. Sharanabasappa This work is licensed under

More information

SOLIMAN A. MAHMOUD Department of Electrical Engineering, Faculty of Engineering, Cairo University, Fayoum, Egypt

SOLIMAN A. MAHMOUD Department of Electrical Engineering, Faculty of Engineering, Cairo University, Fayoum, Egypt Journal of Circuits, Systems, and Computers Vol. 14, No. 4 (2005) 667 684 c World Scientific Publishing Company DIGITALLY CONTROLLED CMOS BALANCED OUTPUT TRANSCONDUCTOR AND APPLICATION TO VARIABLE GAIN

More information