Investigating Delay-Power Tradeoff in Kogge-Stone Adder in Standby Mode and Active Mode

Similar documents
Leakage Diminution of Adder through Novel Ultra Power Gating Technique

Low Power 32-bit Improved Carry Select Adder based on MTCMOS Technique

Application and Analysis of Output Prediction Logic to a 16-bit Carry Look Ahead Adder

Leakage Power Reduction by Using Sleep Methods

Leakage Current Analysis

A Novel Dual Stack Sleep Technique for Reactivation Noise suppression in MTCMOS circuits

A NEW APPROACH FOR DELAY AND LEAKAGE POWER REDUCTION IN CMOS VLSI CIRCUITS

ANALYSIS OF LOW POWER 32-BIT BRENT KUNG ADDER WITH GROUND BOUNCEING NOISE OPTIMIZATION

A HIGH SPEED & LOW POWER 16T 1-BIT FULL ADDER CIRCUIT DESIGN BY USING MTCMOS TECHNIQUE IN 45nm TECHNOLOGY

Wide Fan-In Gates for Combinational Circuits Using CCD

Minimizing the Sub Threshold Leakage for High Performance CMOS Circuits Using Stacked Sleep Technique

Domino CMOS Implementation of Power Optimized and High Performance CLA adder

Leakage Power Reduction Through Hybrid Multi-Threshold CMOS Stack Technique In Power Gating Switch

CHAPTER 3 NEW SLEEPY- PASS GATE

Noise Tolerance Dynamic CMOS Logic Design with Current Mirror Circuit

Ultra Low Power VLSI Design: A Review

Study and Analysis of CMOS Carry Look Ahead Adder with Leakage Power Reduction Approaches

CHAPTER 5 DESIGN AND ANALYSIS OF COMPLEMENTARY PASS- TRANSISTOR WITH ASYNCHRONOUS ADIABATIC LOGIC CIRCUITS

Leakage Power Reduction for Logic Circuits Using Variable Body Biasing Technique

Power-Area trade-off for Different CMOS Design Technologies

Double Stage Domino Technique: Low- Power High-Speed Noise-tolerant Domino Circuit for Wide Fan-In Gates

EE434 ASIC & Digital Systems

Design of Multipliers Using Low Power High Speed Logic in CMOS Technologies

A Novel Approach for High Speed and Low Power 4-Bit Multiplier

LEAKAGE POWER REDUCTION IN CMOS CIRCUITS USING LEAKAGE CONTROL TRANSISTOR TECHNIQUE IN NANOSCALE TECHNOLOGY

1. Short answer questions. (30) a. What impact does increasing the length of a transistor have on power and delay? Why? (6)

IMPLEMENTATION OF POWER GATING TECHNIQUE IN CMOS FULL ADDER CELL TO REDUCE LEAKAGE POWER AND GROUND BOUNCE NOISE FOR MOBILE APPLICATION

Low Power High Performance 10T Full Adder for Low Voltage CMOS Technology Using Dual Threshold Voltage

Design of Multiplier using Low Power CMOS Technology

ESTIMATION OF LEAKAGE POWER IN CMOS DIGITAL CIRCUIT STACKS

ZIGZAG KEEPER: A NEW APPROACH FOR LOW POWER CMOS CIRCUIT

Circuit level, 32 nm, 1-bit MOSSI-ULP adder: power, PDP and area efficient base cell for unsigned multiplier

Design of low power SRAM Cell with combined effect of sleep stack and variable body bias technique

Reduction Of Leakage Current And Power In CMOS Circuits Using Stack Technique

Low-Power Digital CMOS Design: A Survey

PERFORMANCE ANALYSIS ON VARIOUS LOW POWER CMOS DIGITAL DESIGN TECHNIQUES

COMPREHENSIVE ANALYSIS OF ENHANCED CARRY-LOOK AHEAD ADDER USING DIFFERENT LOGIC STYLES

Figure.1. Schematic of 4-bit CLA JCHPS Special Issue 9: June Page 101

Implementation of High Performance Carry Save Adder Using Domino Logic

IJMIE Volume 2, Issue 3 ISSN:

A High Performance Variable Body Biasing Design with Low Power Clocking System Using MTCMOS

Topic 6. CMOS Static & Dynamic Logic Gates. Static CMOS Circuit. NMOS Transistors in Series/Parallel Connection

A Novel Multi-Threshold CMOS Based 64-Bit Adder Design in 45nm CMOS Technology for Low Power Application

Designing of Low-Power VLSI Circuits using Non-Clocked Logic Style

Low Power, Area Efficient FinFET Circuit Design

A Low Power High Speed Adders using MTCMOS Technique

Implementation of dual stack technique for reducing leakage and dynamic power

Low Power Optimization Of Full Adder, 4-Bit Adder And 4-Bit BCD Adder

A Low-Power High-speed Pipelined Accumulator Design Using CMOS Logic for DSP Applications

EEC 118 Lecture #12: Dynamic Logic

DESIGN AND ANALYSIS OF LOW POWER 10- TRANSISTOR FULL ADDERS USING NOVEL X-NOR GATES

UNIT-II LOW POWER VLSI DESIGN APPROACHES

Unique Journal of Engineering and Advanced Sciences Available online: Research Article

Energy Efficiency of Power-Gating in Low-Power Clocked Storage Elements

Design and Implementation of ALU Chip using D3L Logic and Ancient Mathematics

A HIGH SPEED DYNAMIC RIPPLE CARRY ADDER

ECE 334: Electronic Circuits Lecture 10: Digital CMOS Circuits

Design of Low Power Vlsi Circuits Using Cascode Logic Style

OPTIMIZATION OF LOW POWER ADDER CELLS USING 180NM TG TECHNOLOGY

International Journal of Innovative Research in Technology, Science and Engineering (IJIRTSE) Volume 1, Issue 1.

THE trend toward high-performance portable system-on-achip

Analysis of Low Power-High Speed Sense Amplifier in Submicron Technology

Sub-threshold Leakage Current Reduction Using Variable Gate Oxide Thickness (VGOT) MOSFET

A Survey of the Low Power Design Techniques at the Circuit Level

Keywords : MTCMOS, CPFF, energy recycling, gated power, gated ground, sleep switch, sub threshold leakage. GJRE-F Classification : FOR Code:

CHAPTER 3 PERFORMANCE OF A TWO INPUT NAND GATE USING SUBTHRESHOLD LEAKAGE CONTROL TECHNIQUES

Comparison of Power Dissipation in inverter using SVL Techniques

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 1, July 2013

Design of a Tri-modal Multi-Threshold CMOS Switch with Application to Data Retentive Power Gating

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

High Performance and Low power VLSI CMOS Circuit Designs using ONOFIC Approach

Certain Investigations on NAND Based Flip Flops for Glitch Avoidance Using Tanner

A Low Power Array Multiplier Design using Modified Gate Diffusion Input (GDI)

Total reduction of leakage power through combined effect of Sleep stack and variable body biasing technique

Modelling Of Adders Using CMOS GDI For Vedic Multipliers

DESIGN OF LOW-POWER ADDER USING DOUBLE GATE & MTCMOS TECHNOLOGY

Characterization of Variable Gate Oxide Thickness MOSFET with Non-Uniform Oxide Thicknesses for Sub-Threshold Leakage Current Reduction

ECE 471/571 The CMOS Inverter Lecture-6. Gurjeet Singh

A Survey on Leakage Power Reduction Techniques by Using Power Gating Methodology

Low Power Parallel Prefix Adder Design Using Two Phase Adiabatic Logic

Module 4 : Propagation Delays in MOS Lecture 19 : Analyzing Delay for various Logic Circuits

Design of Multiplier Using CMOS Technology

A Literature Review on Leakage and Power Reduction Techniques in CMOS VLSI Design

NOVEL OSCILLATORS IN SUBTHRESHOLD REGIME

NOVEL DESIGN OF 10T FULL ADDER WITH 180NM CMOS TECHNOLOGY

A Novel Low-Power Scan Design Technique Using Supply Gating

II. Previous Work. III. New 8T Adder Design

[Vivekanand*, 4.(12): December, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

Sleepy Keeper Approach for Power Performance Tuning in VLSI Design

Design and Analysis of Sram Cell for Reducing Leakage in Submicron Technologies Using Cadence Tool

Low-power Full Adder array-based Multiplier with Domino Logic

Announcements. Advanced Digital Integrated Circuits. Quiz #3 today Homework #4 posted This lecture until 4pm

ISSN: [Kumar* et al., 6(5): May, 2017] Impact Factor: 4.116

International Journal of Advance Engineering and Research Development

Low-power Full Adder array-based Multiplier with Domino Logic

Design of Adders with Less number of Transistor

ANALYSIS AND COMPARISON OF VARIOUS PARAMETERS FOR DIFFERENT MULTIPLIER DESIGNS

ADVANCES in NATURAL and APPLIED SCIENCES

A Novel Design of High-Speed Carry Skip Adder Operating Under a Wide Range of Supply Voltages

PROMINENT SPEED ARITHMETIC UNIT ARCHITECTURE FOR PROFICIENT ALU

Transcription:

Investigating Delay-Power Tradeoff in Kogge-Stone Adder in Standby Mode and Active Mode Design Review 2, VLSI Design ECE6332 Sadredini Luonan wang November 11, 2014 1. Research In this design review, we investigate power and delay tradeoff in Kogge-Stone adder and basic carry look-ahead adder. We have read some papers on parallel prefix adders and also some leakage reduction techniques. Kogge-Stone adder has an interesting structure and gives us some symmetric pattern to investigate power reduction techniques. 2. Design and Simulation We started with designing and simulating for 8bit and 16 bit Kogge-Stone adder (KSA) and carry look ahead (CLA). Kogge-Stone Adder: KSA has three component: 1. PG generator, 2. Dot block, and 3. Sum generator [Appendix B]. The number of dot blocks are more than PG generator and sum generator. So we call it dominant block when talking about leakage power consumption of the CSA. Table 1 present leakage power for all possible input vectors. P1 P2 G1 G2 Leakage Power (nw) 0 0 0 0 41.51 0 0 0 1.1 44.02 0 0 1.1 0 55.53 0 0 1.1 1.1 42.77 0 0 0 0 60.04 0 0 0 1.1 51.78 0 1.1 1.1 0 50.66 0 1.1 1.1 1.1 39.77 1.1 0 0 0 51.02 1.1 0 0 1.1 48.53 1.1 0 1.1 0 60.04 1.1 0 1.1 1.1 47.28 1.1 1.1 0 0 70.72 1.1 1.1 0 1.1 62.46 1.1 1.1 1.1 0 61.34 1.1 1.1 1.1 1.1 50.45 Table 1. Dot block leakage power for all possible input vectors P1 P2 G1 G2 = 0000 (I1) and P1 P2 G1 G2 = 0111 (I2) present the lowest leakage power for the dot block. We want dot blocks

have these inputs while the circuit is in standby mode. If we apply I2 to the last level of dot blocks and trace back towards the inputs, we see that it is not possible for all the dot blocks to follow I1. It is possible to come up with some block which their input pattern create a high leakage power. In contrast, if we use I2, each dot block can have I2 as its input and as a result, all the primary inputs of the adder are zero (symmetric input pattern). We concluded that having all inputs to zero have the lowest leakage power for the CSA and the simulation results (Table 2) approve it. Input vectors Leakage Power (uw) 8 bit KSA 16 bit KSA All 0 1.89 4.48 All 1 1.96 4.61 101010 1010 1.96 5.97 Table 2. Leakage power for some input vector (8 bit and 16 bit KSA) In the next step, we used three power gating techniques (A NMOS footer, a PMOS header, both header and footer) with setting all inputs to 0. Then, we simulated all three techniques with different widths for sleep transistor and calculated leakage power and propagation delay for 8 bit KSA. a. Sleep transistor in footer b. Sleep transistor in header c.sleep transistors in both header and footers Figure 1. Leakage-power tradeoff for different power gating techniques with different sleep transistor width (w= 50n, 100n, 150n,, 500n) Figure 2. CLA We observed that when we use just footer as a sleep transistor, the leakage power reduction is about 10x, but when we use header or header-footer, the order of leakage reduction is about 1000x. It happened for both KSA and CSA. This result for footer is kind of unexpected. We still have not figured out why it happens. 3. Power reduction in active and standby mode for KSA From Table 1, we figured out that for the input vector which is P1 P2 G1 G2 = 1100 (I3), leakage power is worst. We wrote a program with SystemC (why SystemC? It is illustrated in Appendix A) that calculates the possibilities of having each possible input vectors for each Dot block. Figure 3 shows that how often each dot block in 8 bit KSA can get I3 as its input vector. As Figure 3 represents, first level dot blocks have the most possibility to have I3 as its input vector.

P2=P1=1, G1=G2=0 Count Dot block level Bit index Figure 3. Number of having P1P2G1G2=1100 for each dot block input in 8 bit KSA Generally, the basic idea here is that using a multi-mode power gating structure which is described in Figure 2 [2]. Figure 4. Multi-mode sleep transistors: a. Normal mode, b. Cold mode, c. Park mode (intermediate power saving mode) [2] When one level dot block is in active mode, PG=1 and HLD=1. When one level dot block finished (circuit is still in the active mode), it can be put in the Park mode. It means that PG=0 and HLD =0 and in VGND, we have Vthp. The advantage of this mode is that it retains the output of the dot block (because next level dot blocks need its output) and because the biased voltage has decreased, the power consumption decreases. Finally, when the circuit is in standby mode, PG=0 and HLD=1 and leakage power will be reduced. In 8 bit KSA, we want to use apply our low power technique to the first level dot blocks. There are three reasons for this. First, the simulation result shows that that first level KSA adder have more potential for consuming leakage power. Second, the number of dot blocks is larger in the first level and if we use our low power technique, it is more possible to save both dynamic and static power. Third, it is more feasible for the first level dot block to go to the Park mode, from the timing aspect and maintaining critical path. We examined three types of multi-mode sleep transistors in the first dot block 8 bit KSA (In PDN, PUN, and both) in PARK mode with different transistor widths. Table 3 show the delay and average power in the active mode for 8 bit KSA without adding the power gating technique. Average power consumption in the active mode (uw) 28.72 Propagation delay (ps) 62.25 PG generator delay (ps) 22 Dot block delay (ps) 15 Sum generation delay (ps) 23 Table 3. Propagation delay and average power consumption in 8 bit KSA

a. Power gating in footer b. Power gating in header c. Power gating in header and footer Figure 5. Power delay tradeoff in Park mode (first level dot block) for 8 bit KSA As we, the total power consumption has decreased. We need to adjust this decrement to the portion of one level dot block delay to report the actual power reduction. In the future, want to simulate it in the standby mode to see how much leakage power reduction has gained and see how much our method can be effective. There are two main concerns with this technique. First, we have to generate a clock which is faster than system clock and second, when switch from active mode to the Park mode. We want to investigate them, too. 4. Progress, remained tasks Progress table: Task Drawing the schematic of 4 bit, 8bit, and 16 bit Kogge-Stone adder Writing SystemC code for calculating possibilities in dot blocks and Simulating KSA with different techniques and different width using Ocean Power reduction technique in active and standby mode for KSA Drawing the schematic of 4, 8, and 16 bit carry look ahead adder Simulating CLA with different techniques and different width using Ocean Paper Summarize Writing Design Review 2 Creating wiki page Remained tasks: Task Drawing the schematic of 32 and 64 bit Kogge-Stone adder Developing the idea of variation (if it was possible) with SystemC simulation Simulating KSA with different techniques and width using Ocean for 32 bit and 64 bit More investigation on multi-mode power gating technique technique for KSA Drawing the schematic of 32 and 64 bit carry look ahead adder Simulating CLA with different techniques and width using Ocean for 32 bit and 64 bit Have some comparison with the previous works Solving the problem which we faced during design review 2 who Luonan Luonan and Luanon Luonan who Luonan Luonan and Luonan and Luonan

5. Challenges and Question about the proceed Challenges: 1. Working with ocean script and doing simulation got some time in the beginning. Facing with some wired errors. Finally, we made it work. 2. It is very time consuming using ocean to calculate every combination of input for more than 8 bit adder, so we wrote a SystemC based simulation to calculate possibilities of the inputs of the dot block and the estimate leakage power based on dot block leakage power for different inputs. For example for 8 bit adder, ocean takes around 8 days to be completed 3. For the kogge stone adder, it is very time consuming to create 32 or 64 bit adder. Because we cannot use for example two 16 bit KSA adder to create 32 bit KSA. Questions about proceed: 1. If we will get glitch from the process variation in the Kogge Stone Adder? If it happens or not? 2. Why when we use just footer (in Figure 1), the leakage is bigger than having header or header and footer. It happened for both KSA and CLA. 3. Why there are some unexpected results in Figure 5-b. 6. Paper summery A New Optimized High-Speed Low-Power Data-Driven Dynamic (D3L) 32-Bit Kogge-Stone Adder Using fast design logic style, such as Domino Logic, can improve the adder speed. However, despite the high speed reached by a Domino Logic parallel prefix adder, such a circuit dissipates a large amount of energy due to the presence of the clock distribution tree which inputs the clock signal to all the logic gates. Data Driven Dynamic Logic (D3L) achieves a considerably energy saving, over conventional Domino Logic, by removing the clock signal: the control of the precharge and evaluation phases is managed only by input data.as a consequence, the power consumption is significantly reduced at the expense of a non-negligible penalty in terms of speed performances. In an n-type (p-type) D3L gate, the clocked precharging PMOS (NMOS) transistor employed in Domino Logic is replaced by a Pull-up PMOS (Pull-Down NMOS) network, which receives a subset of the input data signals (the so-called pre-charge inputs) instead of the clock signal. The evaluation network of the gate remains unchanged, with respect to the equivalent Domino gate, and the clocked NMOS (PMOS) foot transistor is avoided. The precharge inputs need to satisfy the following conditions: 1) during the precharge phase, the Pull-Down network (PDN) is OFF, the Pull-Up network (PUN) is certainly turned ON and the output node is charged to Vdd; 2) during the evaluation phase, the output node is eventually discharged to 0 by the PDN without any contention with the PUN. In this paper, a new parallel-prefix structure is presented to efficiently exploit D3L in the design of low-power high-performance adders. Moreover, a new dynamic design style, named Split-Path D3L, is proposed to overcome the speed limitations of traditional D3L. When applied to the design of a 32-bit Kogge-Stone adder, the proposed approach halves the precharge propagation path with respect to the traditional D3L design style, and allows a smaller sizing of the precharging PMOS transistors. As a consequence, the new technique leads to an Energy-Delay Product 25% and 20% lower than traditional domino and D3L logic styles. Resource Allocation and Binding Approach for Low Leakage Power Static power dissipation due to sub-threshold leakage current in CMOS VLSI circuits becomes significant as current technology descend into deep sub-micron regime. Sub-threshold leakage current increases due to reducing threshold voltage to compensate for performance loss. To solve this problem, the author proposed a resource allocation and binding approach for low leakage power. A MTCMOS design style is introduced briefly which use a sleep transistor to isolate the circuit from the supply voltage and the ground rails during idle periods. Then an allocation and binding algorithm which attempts to maximize contiguous idle times of resources was proposed. Because large modules contribute mainly to the performance loss, for example the multiplier, performance recovery based on multicycling and slack was proposed. They use total execution time as a metric, taking 6 control steps for each input vector, they were able to reduce the performance penalty from 40% to 14.28%. Then IIR filter was presented and leakage power savings in IIR with regular and multicycling were measured. 7. References [1] Rajani H.P., Srimannarayan Kulkarni, NOVEL SLEEP TRANSISTOR TECHNIQUES FOR LOW LEAKAGE POWER PERIPHERAL CIRCUITS, International Journal of VLSI design & Communication Systems (VLSICS) Vol.3,

No.4, August 2012. [2] Suhwan Kim ; Seoul Nat. Univ., Seoul ; Kosonocky, S.V. ; Knebel, D.R. ; Stawiasz, K., A Multi-Mode Power Gating Structure for Low-Voltage Deep-Submicron CMOS ICs, IEEE Transactions, IEEE Circuits and Systems Society, Volume:54, Issue: 7, July 2007. [3] Kumar, Y., Paliwal, S. ; Rai, C.K. ; Balasubramanian, S.K., A novel ground bounce reduction technique using four step power gating, Engineering and Systems (SCES), IEEE, 2013. [4] Benton H. Calhoun, Frank A. Honoré, and Anantha P. Chandrakasan, A Leakage Reduction Methodology for Distributed MTCMOS, IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 5, MAY 2004. [5] Anup Jalan and Mamta Khosla, Analysis of Leakage Power Reduction Techniques in Digital Circuits, India Conference (INDICON), IEEE, 2011. [6] Sathyabama University, Chennai, Leakage Power Reduction in CMOS Modulo4 adder andmodulo4 Multiplier in Submicron Technology, International Conference on Sustainable Energy and Intelligent System (SEISCON), 2011. [7] http://venividiwiki.ee.virginia.edu/mediawiki/index.php/toolscadencetutorialsbasic [8] Jan M. Rabaey, Anantha Chandrakasan, Borivoje Nikolic, Digital Integrated Circuits: A Design Perspective, Second Editon, 2003. [9] Chandramouli Gopalakrishnan and Srinivas Katkoori. Resource Allocation and Binding Approach for Low Leakage Power.Proceedings of the 16th International Conference on VLSI Design (VLSI 03). Appendix A At first, we wanted to use our SystemC simulation to calculate best input vector for the circuit in the standby mode. Because it is much faster than Ocean. Besides that, we wanted to have some timing information (for the later usage maybe). To investigate 8 bit adder input vectors, ocean takes about 8 days and SystemC simulation about a couple of hours. Then, we used it to get information which is represented in Figure 3. We also can use this simulation for variation purpose. Because of variation, it is possible to have some glitch in the circuit. By increasing transistor widths, the glitch effect can be alleviated. We want to use this simulation to see increasing transistor width in which part of the circuit can be more effective. We think that the parallel prefix adders which have more fan-out can be good case study. Appendix B

Figure 6. PG generator in KSA Figure 7. Dot block in KSA

Figure 8. Sum generator in KSA Figure 9. 8 bit KSA (Carry)

Figure 10. 8 bit KSA (sum) Figure 11. 16 bit KSA

Figure 12. 4 bit CLA Figure 13. 8 bit CLA

propagation delay propagation delay 220 210 leakage power and delay trade-off curve using only header (width=50nm,100nm,...500nm) 200 190 180 170 160 215 210 leakage power 10 20 30 30 40 50 60 70 80 90 leakage power and delay trade-off curve using only footer (width=50nm,100nm,...500nm) 205 200 195 190 185 180 175 170 leakage power 10 20 30 40 50 60 70 70 80 90