87 CHAPTER 6 GDI BASED LOW POWER FULL ADDER CELL FOR DSP DATA PATH BLOCKS 6.1 INTRODUCTION In this approach, the four types of full adders conventional, 16T, 14T and 10T have been analyzed in terms of speed, power consumption and power delay products. The analysis showed that the 10T full adder is more suitable for low power applications as a multiplier. Since the existing SERF within the 10T full adder has constraints such as more glitches and low threshold voltage problem, the new GDI full adder has been proposed. Using this new model, a low power array multiplier has been designed. All these parametric analysis had been carried out using Tanner CAD Tool, with varying supply voltage values In microprocessor and DSP`s, addition is the most commonly used arithmetic operation and it is often one of the speed-limiting elements (Weste and Eshraghian 1993). Hence optimization of the adder both in terms of speed and/or power consumption should be pursued (Rabaey 1996). During the design of an adder two choices are made in regard to different design abstraction levels. One is responsible for the adder s architecture implemented with the one-bit full adder as a building block. The other defines the specific design style at transistor level to implement the one-bit full adder. In this Chapter the lower design level is focused. The 28T, 16T, 14T, 10T SERF and
88 10T GDI (Gate Diffusion Input) adder topologies are analyzed and compared to select the suitable adder cell for low power multiplier. The GDI method is based on the use of a simple cell as shown in Figure 6.1. One may be reminded of the standard CMOS inverter at the first glance of this circuit, but there are some important differences: The GDI cell contains three inputs G (common gate input of NMOS and PMOS), P (input to the source/drain of PMOS), and N (input to the source/drain of NMOS). Bulks of both NMOS and PMOS are connected to N or P (respectively), so it can be arbitrarily biased in contrast to CMOS inverter (Jiang et al 2004). Figure 6.1 Basic Gate-Diffusion-Input Cell The GDI cell with four ports can be recognized as a newly multifunctional device, which can achieve six functions with different combinations of inputs G, P and N. Table 6.1 shows that simple configuration changes in the inputs G, P, and N of the basic GDI cell can lead to very different Boolean functions (Callaway and swartzlander 1996) at the output. Most of these functions are complex (usually consuming 6-12 transistors) in CMOS, while they are very simple (only 2 transistors per function) in the GDI design methodology. Meanwhile, multiple-input gates can be implemented by combining several GDI cells (Po-Ming lee et al 2007).
89 Table 6. 1 Functions of the Basic GDI Cell Input Out Function P G N B A 0.B F1 1 A B + B F2 B A 1 A+ B OR 0 A B A.B AND B A C.B+A.C MUX 1 A 0 NOT The GDI based on XOR and XNOR gates cells are, infact, applications of the GDI technique. As shown in Figure 6.2, each of them requires only four transistors. Obviously, the proposed GDI XOR and XNOR gates use less transistors compared with the conventional CMOS counterparts. Owing to some attractive features which allow improvements in design complexity, Figure 6.2 (a) GDI XOR gate (b) GDI XNOR gate
90 transistor counts, static power dissipation and logic level swing, research on GDI is becoming vigorous in VLSI area. But, the GDI scheme suffers from the need for special CMOS process. Specifically it requires twin-well CMOS or Silicon On Insulator (SOI) process (Bui et al 2002)], which are more expensive than the standard p-well CMOS process. This challenges its applicability in many CMOS circuits. Two design strategies have been used to size each topology. The former aims to minimize power consumption, adopting minimum-size transistors and the latter achieves minimum PDP by suitable transistor sizing. Then the Performance for both design strategies has been compared for different supply voltage values. 6.2 POWER CONSUMPTION IN CMOS CIRCUITS The generic 1-bit conventional CMOS full adder cell is shown in Figure 6.3. The 1-bit full adder cell has 28 transistors (Kumar and Bayoumi 2006). Different logic styles can be investigated from different points of view. Evidently, they tend to favour one performance aspect at the expense of the others. Even a selected style appropriate for a specific function may not be suitable for another one. For example, static approach presenting robustness against noise effects (shams and Boyoumi 2000) automatically provides a reliable operation. The issue of ease of design is not always attained easily. The CMOS design style is not area efficient for complex gates with large fanins. Therefore, care must be taken when a static logic style is selected to realize a logic function. Pseudo NMOS technique is straightforward (Veeramachaneni and Sirinivas 2008), yet it compromises noise margin and suffers from static power dissipation. Pass transistor logic style is known to be a popular method for implementing some specific circuits such as multiplexers and XOR-based circuits, like adders.
91 Figure 6.3 Conventional CMOS Full Adder On the other hand, dynamic logic facilitates the realization of fast, small and complex gates. However, this advantage is gained at the expense of parasitic effects such as load sharing, which makes the design process hazardous. Charge leakage necessitates frequent refreshing, reducing the operational frequency of the circuit. In general, CMOS style is the best in terms of robustness and stability. The CMOS structure combines PMOS pullup and NMOS pull-down networks to produce considered outputs (Yano et al 1990). In this style, all transistors (either PMOS or NMOS) are arranged in completely separate branches and each may consist of several sub-branches. Mutually exclusiveness of pull-up and pull-down networks is of a great concern. 6.3 FULL ADDER TOPOLOGY ANALYSIS In this thesis, different components have been combined to make modified conventional, new 16T, 14T and 10T full adder cells.
92 Figure 6.4 Modified Conventional CMOS Full Adder Cell The Figure 6.4 shows the CMOS transistor level implementation of Modified conventional CMOS full adder design using 20 Transistor model (Veeramachaneni and Sirinivas 2008) which is the heart of the arithmetic unit. This type of CMOS full adder configuration has been widely used in numerous applications. It often exhibits a critical delay that actually limits the system performance. Two or more full adders are cascaded together to perform multiple bit addition. Figure 6.5 16 T Full Adder Cell.
93 In this system, speed takes a hit, therefore to ensure better speed performance a fast full adder has been designed and it consists of 16 Transistors as shown in Figure 6.5. This type of CMOS full adder is designed based on transmission function theory. i.e. Transmission Function Adders (TFA). The TFA consists of 16 Transistors and dissipates less power than conventional CMOS full adders. The reason behind to the power savings is due to the fact that this cell has less short circuit power and dynamic power dissipation. Figure 6.6 14 T Full Adder Cell. Figure 6.6 shows the schematic configuration of the full adder cell consisting of 14 Transistors. It ensures both low power and high speed performance. The power consumed by this circuit is less when compared with that of 10T GDI full adder (Roy and Prasad 2000) and more compared with 10TSERF full adder (Matsuzawa 1994).
94 Figure 6.7 10T SERF Full Adder Cell Another schematic configuration of the full adder cell consists of 10 T SERF Transistor shown in Figure 6.7 that ensures both low power and high speed performance (Dan Wang et al 2009). It consists of 10 Transistors and occupies less area than the conventional, 16T and 14T Transistors full adder cells. But it has a low voltage swing in output when these cells are used to construct the chain for multiple bit addition or multiplier operation (Mahmud and Bayoumi 1999). Figure 6.8 10T GDI - XNOR Based Full Adder Cell 6.3.1 Simulation Result and Comparison of Full Adder Cells To overcome the low output voltage swing of the SERF adder cell, the GDI XNOR based 10T adder cell is proposed. The proposed design has
95 the advantages of flexibility, less transistor counts, and it can be realized using standard P well process shown in Figure 6.8. To compare conventional, 16T, 14T, 10T-SERF and 10T GDI full adder s performance, delay and power dissipation by were evaluated performing simulation runs on a Tanner CAD tool using a 0.25-µm CMOS technology with same input whose main parameters are reported in Table 6.2 for 5V and Table 6.3 for 3.3V. Table 6.2 Comparison of Power Dissipation and Delay in Various Adder Cells Under V dd = 3.3 V Name of the CMOS Full Adder cell Power Consumptio n (e-9)w Delay in (e-9)s Power Delay Product (e-18) Ws 20T Model 0.374 3.07 1.148 16T Model 0.105 6.03 0.633 14T Model 0.133 3.05 0.405 10TSERF Model 0.11 2.9 0.319 10TGDI Model 0.155 2.6 0.403 Table 6.3 Comparison of Power Dissipation and Delay in Various Adder Cells Under V dd = 5.0 V Name of the CMOS Full Adder cell Power Consumptio n (e-6)w Delay in (e-9) s Power Delay Product (e-15) Ws 20T Model 1.4 1.5 2.1 16T Model 0.436 4.6 2.0 14 T Model 0.504 2.2 1.10 10TSERF Model 0.369 2.4 0.885 10TGDI Model 0.784 1.3 1.01
96 It can be seen that the 10T SERF has the least power and powerdelay product but it suffers from severe threshold loss problem which leads to circuit malfunction when cascaded in larger circuits. So the proposed 10T GDI full adder has less power consumption and delay when compared with other full adder cells without threshold loss problem. Figure 6.9 and Figure 6.10 show the T-spice output waveform containing the input signals (A, B, Cin ) and output signals (Sum, Carry and Power) of 10T SERF Full Adder and 10T GDI Full Adder cells. All other adders performance is reported for different V dd supply. The input pattern is varied from 000 to 111 with input changing every 10 nano-second so as to calculate the delay from the wave form. It can be seen from the waveform that Carry (C out ) of 10T SERF suffers from threshold loss problem (4 th bit) and glitches (1 st and 2 nd bit). This is the main drawback of 10T SERF full adder. Figure 6.9 Output Waveform of 10T SERF Full Adder Cell.
97 Figure 6.10 Output Waveform of 10T GDI Full Adder Cell. From this Figures 6.11 and 6.12, it is clear that the 10T GDI full adder has less PDP compared to all the other adder cells except 10T SERF full adder with V dd = 3.3V supply. It has strong output and reduced glitches in Carry signal with high PDP as compared to 10T SERF full adder under V dd = 5.0V. 7 6 5 4 3 2 1 Power (e-9)w delay (e-9)s PDP(e-18)W s 0 20T 16T 14T 10T SERF 10T GDI Figure 6.11 Power, Delay and PDP Comparison of Full Adder Cell under V dd = 3.3V
98 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 20T 16T 14T 10T SERF 10T GDI Power (e-9)w delay (e-9)s PDP(e-18)W s Figure 6.12 Power and Delay Comparison of Full Adder Cell under V dd = 5.0 V The comparison has been carried out from the various transistor count adder cell and among all, the 10T GDI based adder cell is found to be the best power optimized cell. In the class of 16T full adder cell which has high driving capability and exhibits the lowest delay, but it consumes very high power. Hence, the 10 T circuit is only suitable for arithmetic circuits where no compromise on performance is allowed [Table 6.3]. The topology offers a more reasonable trade-off between power and delay for high performance circuits, having a delay lower than the Mirror adder, but it pays a power dissipation penalty greater than the speed improvement. 6.4 ARRAY MULTIPLIER In this investigation Braun array multiplier is considered to analyze the performance of the two different full adders. The multipliers are the structures where there will be many cascading stages of the full adder. So the performance of the multiplier block which consists of too many cascading stages of full adder cells with AND gate are analyzed with different V dd supply by using two different 10T SERF and 10T GDI adder cell.
99 Figure 6.13 Array Multiplier The array Multiplier structure is considered for implementing the parallel array multiplication using 10T SERF and 10T GDI full adder cell as shown in Figure 6.13. The two input waveforms (a 3 a 2 a 1 a 0 and b 3 b 2 b 1 b 0 ) for both 10 T SERF and 10 T GDI based 4 bit Multiplier are shown in Figure 6.14.
100 Figure 6.14 Common Input Waveform of 10T SERF and 10T GDI based 4- Bit Multiplier.
101 6.4.1 Simulation Result and Comparison of Multipliers The output wave form for 4 bit multiplier using 10T SERF and 10 T GDI based full adders are shown in Figure 6.15 and Figure 6.16 respectively. Figure 6.15 Output Waveform of 10T SERF based 4- Bit Multiplier.
Figure 6.16 Output Waveform of 10T GDI based 4- Bit Multiplier. 102
103 Table 6.4 Comparison of Power Dissipation, Delay and PDP of Array Multipliers. Multiplier model using CMOS full adder V dd in V Power consumption in e -6 W Delay in e -9 S Power delay Product in e -15 Ws 10 T GDI Model 10TSERF Model 5.0 36.00 22.6 813.6 3.3 10.29 41.0 421.0 5.0 74.00 13.4 991.6 3.3 5.60 77.0 431.2 Table 6.4 represents the comparison of power dissipation, delay and PDP for array Multiplier using 10T SERF and 10 GDI based full adders with two different voltages. Further, Figure 6.17 also indicates the suitability of the GDI based array multiplier for low power multiplier design. 1200 1000 800 600 400 Power(e-6) W Delay(e-9)s PDP(e-15) Ws 200 0 SERF(3.3V) GDI(3.3V) SERF(5.0V) GDI(5.0V) Figure 6.17 Graph for Power, Delay and PDP of Multipliers
104 6.5 SUMMARY In this chapter the most suitable topologies of full adder cells 20T, 16T, 14T and 10T have been compared based on power, area and delay. The full adder cells have been simulated as a single circuit in T-spice software tool. The comparison has been carried out both assuming circuits with minimum transistor size, to minimize the power consumption, and with transistors sized to optimize the power-delay product. The 20 T and 16T topology does not provide any advantage either in terms of power nor speed. In 16T full adders with driving capability, have very high power dissipation, greater than conventional full adder cell, but exhibits the lowest delay. In 14T topology, the power delay product is less compared to 16T full adder cell but the delay is increased under lower V dd supply. Hence, the 10T full adder cell is suitable only for arithmetic circuits where no compromise on performance is allowed. The topology offers a more reasonable trade-off between power and delay for high performance circuits. 10T uses one three-transistor XNOR and one three-transistor XOR circuit. This is the reason for less power consumption in 10T circuit. Output load is one of the important parameters that affects power and performance of the circuits. In the final analysis, 10T GDI based full adder for multiplier is the best circuit for arithmetic operation in terms of power consumption for all values of output loads.