A Three-Port Adiabatic Register File Suitable for Embedded Applications

Similar documents
Energy Recovery for the Design of High-Speed, Low-Power Static RAMs

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May ISSN

Improved Two Phase Clocked Adiabatic Static CMOS Logic Circuit

Adiabatic Logic Circuits for Low Power, High Speed Applications

Low-Power 4 4-Bit Array Two-Phase Clocked Adiabatic Static CMOS Logic Multiplier

P high-performance and portable applications. Methods for

Energy-Recovery CMOS Design

IMPLEMENTATION OF ADIABATIC DYNAMIC LOGIC IN BIT FULL ADDER

Performance Analysis of Energy Efficient and Charge Recovery Adiabatic Techniques for Low Power Design

Design and Analysis of Energy Efficient MOS Digital Library Cell Based on Charge Recovery Logic

Design And Implementation Of Arithmetic Logic Unit Using Modified Quasi Static Energy Recovery Adiabatic Logic

Retractile Clock-Powered Logic

AC-1: A Clock-Powered Microprocessor

A Low-Power SRAM Design Using Quiet-Bitline Architecture

DESIGN & IMPLEMENTATION OF SELF TIME DUMMY REPLICA TECHNIQUE IN 128X128 LOW VOLTAGE SRAM

Clock-Powered CMOS: A Hybrid Adiabatic Logic Style for Energy-Efficient Computing

CHAPTER 5 DESIGN AND ANALYSIS OF COMPLEMENTARY PASS- TRANSISTOR WITH ASYNCHRONOUS ADIABATIC LOGIC CIRCUITS

CMOS VLSI Design (A3425)

Cascadable adiabatic logic circuits for low-power applications N.S.S. Reddy 1 M. Satyam 2 K.L. Kishore 3

CMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits

Lecture 12 Memory Circuits. Memory Architecture: Decoders. Semiconductor Memory Classification. Array-Structured Memory Architecture RWM NVRWM ROM

Design and Analysis of Energy Recovery Logic for Low Power Circuit Design

Low Power Parallel Prefix Adder Design Using Two Phase Adiabatic Logic

A design of 16-bit adiabatic Microprocessor core

Electronic Circuits EE359A

LOW POWER CMOS CELL STRUCTURES BASED ON ADIABATIC SWITCHING

Comparative Analysis of Low Power Adiabatic Logic Circuits in DSM Technology

Design of Low Power Carry Look-Ahead Adder Using Single Phase Clocked Quasi-Static Adiabatic Logic

2ND ORDER ADIABATIC COMPUTATION WITH 2N-2P AND 2N-2N2P LOGIC CIRCUITS

Implementation of Power Clock Generation Method for Pass-Transistor Adiabatic Logic 4:1 MUX

Design and Implement of Low Power Consumption SRAM Based on Single Port Sense Amplifier in 65 nm

Design and Analysis of Multiplexer in Different Low Power Techniques

Adiabatic Logic Circuits: A Retrospect

Adiabatic Logic. Benjamin Gojman. August 8, 2004

Designing of Low-Power VLSI Circuits using Non-Clocked Logic Style

IN targeting future battery-powered portable equipment and

REPORT DOCUMENTATION PAGE

Design and Analysis of Hybrid Current/Voltage CMOS SRAM Sense Amplifier with Offset Cancellation Karishma Bajaj 1, Manjit Kaur 2, Gurmohan Singh 3 1

Design of Low Power Vlsi Circuits Using Cascode Logic Style

[Vivekanand*, 4.(12): December, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

DESIGN OF ADIABATIC LOGIC BASED COMPARATOR FOR LOW POWER AND HIGH SPEED APPLICATIONS

Power Optimized Energy Efficient Hybrid Circuits Design by Using A Novel Adiabatic Techniques N.L.S.P.Sai Ram*, K.Rajasekhar**

International Journal Of Global Innovations -Vol.5, Issue.I Paper Id: SP-V5-I1-P04 ISSN Online:

UNIT-II LOW POWER VLSI DESIGN APPROACHES

THE content-addressable memory (CAM) is one of the most

Two Phase Clocked Adiabatic Static CMOS Logic and its Logic Family

A Literature Survey on Low PDP Adder Circuits

SURVEY AND EVALUATION OF LOW-POWER FULL-ADDER CELLS

Fast Low-Power Decoders for RAMs

!"#$%&'()*(+*&,"*")"-./* %()0$12&'()*')*3#'343&'%*.3&"0*4/* (2&'135*&-3)0'0&(-*0'6').!

Performance Analysis of Different Adiabatic Logic Families

A Comparative Analysis of Low Power and Area Efficient Digital Circuit Design

Low Power Adiabatic Logic Design

Implementation of Low Power Inverter using Adiabatic Logic

A Comparative Study of Power Dissipation of Sequential Circuits for 2N-2N2P, ECRL and PFAL Adiabatic Logic Families

12-nm Novel Topologies of LPHP: Low-Power High- Performance 2 4 and 4 16 Mixed-Logic Line Decoders

DESIGN & ANALYSIS OF A CHARGE RE-CYCLE BASED NOVEL LPHS ADIABATIC LOGIC CIRCUITS FOR LOW POWER APPLICATIONS

Design of Low-Power High-Performance 2-4 and 4-16 Mixed-Logic Line Decoders

True Single-Phase Adiabatic Circuitry

Design and Implementation of combinational circuits in different low power logic styles

Low Power Multiplier Design Using Complementary Pass-Transistor Asynchronous Adiabatic Logic

Towards An Efficient Low Frequency Energy Recovery Dynamic Logic

BiCMOS Circuit Design

Memory Basics. historically defined as memory array with individual bit access refers to memory with both Read and Write capabilities

Power Optimized Dadda Multiplier Using Two-Phase Clocking Sub-threshold Adiabatic Logic

Chapter 3 DESIGN OF ADIABATIC CIRCUIT. 3.1 Introduction

Design of Energy Efficient Arithmetic Circuits Using Charge Recovery Adiabatic Logic

BICMOS Technology and Fabrication

Study of High Speed Buffer Amplifier using Microwind

A CMOS Low-Voltage, High-Gain Op-Amp

Design and Analysis of CMOS and Adiabatic logic using 1:16 Multiplexer and 16:1 Demultiplexer

International Journal of Engineering Trends and Technology (IJETT) Volume 45 Number 5 - March 2017

FACT Descriptions and Family Characteristics

Design and Comparison of power consumption of Multiplier using adiabatic logic and Conventional CMOS logic

Design of Low Power Double Tail Comparator by Adding Switching Transistors

Intellect Amplifier, Current Clasped and Filled Current Approach Sense Amplifiers Techniques Based Low Power SRAM

Static Random Access Memory - SRAM Dr. Lynn Fuller Webpage:

Design of Energy Efficient Logic Using Adiabatic Technique

A 14-bit 2.5 GS/s DAC based on Multi-Clock Synchronization. Hegang Hou*, Zongmin Wang, Ying Kong, Xinmang Peng, Haitao Guan, Jinhao Wang, Yan Ren

Lecture 8: Memory Peripherals

Implementation of Efficient 5:3 & 7:3 Compressors for High Speed and Low-Power Operations

Comparison of adiabatic and Conventional CMOS

Domino Static Gates Final Design Report

PARAMETRIC ANALYSIS OF DFAL BASED DYNAMIC COMPARATOR

Topic 6. CMOS Static & Dynamic Logic Gates. Static CMOS Circuit. NMOS Transistors in Series/Parallel Connection

LOW POWER HIGH PERFORMANCE DECODER USING SWITCH LOGIC S. HAMEEDA NOOR 1, T.VIJAYA NIRMALA 2, M.V.SUBBAIAH 3 S.SALEEM 4

Low-Power Digital CMOS Design: A Survey

Power Efficient adder Cell For Low Power Bio MedicalDevices

Design of a Low Voltage low Power Double tail comparator in 180nm cmos Technology

DESIGN AND PERFORMANCE VERIFICATION OF CURRENT CONVEYOR BASED PIPELINE A/D CONVERTER USING 180 NM TECHNOLOGY

Design of Low Power Energy Efficient CMOS Circuits with Adiabatic Logic

POWER EVALUATION OF ADIABATIC LOGIC CIRCUITS IN 45NM TECHNOLOGY

Low Power, Area Efficient FinFET Circuit Design

THIS paper deals with the generation of multi-phase clocks,

Power-Area trade-off for Different CMOS Design Technologies

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

A Phase-Locked Loop with Embedded Analog-to-Digital Converter for Digital Control

BASIC PHYSICAL DESIGN AN OVERVIEW The VLSI design flow for any IC design is as follows

Transconductance Amplifier Structures With Very Small Transconductances: A Comparative Design Approach

Design of Multiplier using Low Power CMOS Technology

Transcription:

A Three-Port Adiabatic Register File Suitable for Embedded Applications Stephen Avery University of New South Wales s.avery@computer.org Marwan Jabri University of Sydney marwan@sedal.usyd.edu.au Abstract Adiabatic logic promises extremely low power consumption for those applications where slower rates are acceptable. However, there have been very few adiabatic designs, and any circuit of even moderate complexity requires some form of ram. This paper presents a register file implemented entirely with adiabatic logic, and fabricated using a 1.2 µm cmos technology. Comparison with a conventional cmos logic implementation, using both measured and simulated results, indicates significant power savings have been realised. 1 Introduction ram is an important component of any circuit of significant size. It can also make a significant contribution to the power consumption of such circuits due to the need to drive large capacitances in, word, and bit lines. Slowing the rate can decrease the power consumption, but with conventional cmos circuits, the relationship is linear, and the reduction in speed must be considerable to have any real impact. In adiabatic systems, however, the relationship between power and speed is quadratic [1]. This is accomplished by both reducing the power dissipated by each individual fet, and recycling the charge stored at each node in the circuit. To realise any significant power savings, however, adiabatic logic must operate at rates much slower than the maximum possible for a given fabrication technology. Typically, this means operating in the range 1 20 MHz. In some embedded applications like biomedical implants [3], such speeds are acceptable, and the increase in battery life through lower power consumption makes adiabatic logic attractive. The use of adiabatic and energy recovery techniques in design is not new. The first published design by Somasekhar, Ye and Roy describes a quasi-static adiabatic ram cell [8]. The circuit is based on a conventional 6T cell, but uses a complex adiabatic ing scheme with different offsets and swings to generate all signals and power supplies. Simulations of a 64 64 array using a 1.2 µm technology show 84% energy recovery for reads and 85% for s. Dennard and Frank take a different approach, describing a technique for reducing the power consumption of drams by adiabatically switching the bit lines [5]. As the bit lines of a dram are highly capacitive, by charging and discharging them adiabatically the charge stored on those lines may be reclaimed and recycled, leading to power savings. Tzartzanis and Athas propose a similar technique for static ram, replacing all latches and drivers in a conventional sram with energy recovery versions [9]. Simulations for a 256 256 sram array show energy savings of between 59% and 76%, at 200 MHz, over a conventional implementation. Moon and Jeong go one step further, using energy recovery logic for all circuits in a register file except for the ram cells [7]. Power savings of approximately 70% at 5 MHz are claimed for a 32 32 two port register file implemented in a 0.8 µm technology. This paper describes a three-port register file implementation using only adiabatic logic, the natural progression from previous adiabatic/conventional hybrids. The sole use of adiabatic logic allows significant power savings, and ready integration into larger adiabatic systems. The performance of a 8 16 register file is then compared with that of a conventional implementation using simulated and measured results. 2 Memory Cell Figure 1 illustrates a conventional implementation of a threeport cell by Chao and Wooley [2]. Devices M1 through M4 provide a storage element with negligible static power dissipation. Two pairs of nfets, M5/M6 and M7/M8, allow reading from the cell by selectively discharging the bit lines, and the pair M9/M10 allows data to be written to the cell. Figure 2 is the adiabatic cell used in this implementation. It is similar to the conventional cell, except that it is powered by the like all adiabatic circuits, and the cell outputs are not directly connected to the data buses. Devices M1 through M4, M9, M10, M13, and M14 form a 2N-2N2P logic gate [4]. To a value to the cell, the is ramped down to ground, reclaiming charge stored in the cell. The word line is then asserted and data placed on the lines. Ramping the up again causes the cell to take on a state dependent on which pair of nfets, M9/M13 or M10/M14, provide a path to ground. After the

M9 M7 M1 M5 M2 M6 M8 M10 decode M3 M4 decode Figure 4: Full word decoding circuitry Figure 1: Conventional three-port cell M7 M5 M6 M8 M1 M2 M11 M12 is either connected to V dd through three pfets, or to the power () through one of three nfets (Figure 5). Full transmission gates cannot be used (as it is not possible to generate fully complementary signals), and so the potential across the pfets isv th when they are switched on. This results in a non-adiabatic transition from V dd V th to V dd, and a slight increase in power consumption over an ideal adiabatic implementation. M9 M3 M4 M10 M13 Figure 2: Memory cell used in the register file has reached V dd, the data and signal may be removed, and the cell will maintain its state indefinitely. The worst case power consumption occurs when the cell state is toggled. As M1/M2 cease conducting when the line drops below V th, a charge of CV th will remain on one of the storage nodes (where C is the node capacitance). If the cell state is toggled, this charge is dumped to ground, resulting in an energy loss of CV th 2. This, however, is considerably less than the CV dd 2 energy loss of a conventional sram cell. 3 Write Decoding For the cell to retain its value, the must be held at V dd when the cell is not selected for writing. To accomplish this, the word decoder is modified to gate between the and V dd as shown in Figures 3 and 4. phi3 phi2 phi1 M14 Figure 3: Adiabatic word decoder The word decoder is a simple decoder followed by two buffers. This allows the decode signal to be made available across three phases. These signals are used to drive the gating circuitry, which ensures that the row 4 Read Decoding Figure 5: Write operation timing A simple decoder is all that is required for the read decoder, as illustrated in Figure 6. Because the gate must be able to sink the charge stored on the entire word line, decoders in large register files can be slow due to the number of nfets in the discharge path. To combat this problem, a buffer may be added at the output, shortening the discharge path to one nfet at the expense of an extra quarter cycle of latency. Figure 7 illustrates the read operation timing. 5 Read Data Output In place of the sense amplifier used in conventional ram circuits, this implementation uses an or gate, with the evaluation tree distributed along the length of the bit lines. An additional nfet connected to the complementary output ensures the gate reaches a sensible state when no read is performed. The circuit is illustrated in Figure 8.

read phi1 cell 3 row 3 cell 2 cell 1 row 2 read Figure 6: Adiabatic read word decoder cell 0 row 1 The bit lines in a ram are typically the most capacitive. Large fets are therefore required in the cross-coupled portion of the gate to reduce losses associated with charging these lines from the. As a consequence, the nfets in the evaluation branches may be relatively small due to the large gain of the cross-coupled fets. This helps keep the area of the cells reasonable. As with the read word decoder, the output should be buffered to help minimise the current sinking requirements of the evaluation branches. read phi2 row 0 6 Integration The register file is powered by a four-phase, with phi1 lagging by 90, phi2 lagging phi1 by 90, and phi3 lagging phi2 by 90. While a trapezoidal waveform is ideal, a sinusoidal may be used with only a slight increase in power dissipation. All and read/ signals must be valid during phi1, ie. may only change when phi1 is low. The data to be written must be valid during phi3, although this implementation uses buffering to synchronise that data with all other input signals. The data read is valid during phi3, with all operations performed within one cycle. 7 Implementation and Results The circuits presented here were used to implement a 8- word 16-bit 2R1W register file which was fabricated using the mosis 1.2 µm cmos process. To provide a basis for comparison, a similar register file was implemented with conventional cmos logic. The conventional circuit, generated from symbolic layout, was provided by N. Weste of Macquarie Figure 8: Adiabatic read output circuitry University. It was then compacted by hand to ensure a fair comparison with the adiabatic implementation, which was laid out geometrically. Figure 12 is a photomicrograph of the fabricated chip. The final size of the adiabatic circuit is 644 1300 µm, comprising 2876 devices (588 pfets). By comparison, the conventional implementation is 880 883 µm, and uses 2264 devices (888 pfets). Despite the larger number of devices, the adiabatic circuit requires less than 10% more silicon. This is due to the smaller number of pfets, and the corresponding relaxation of n-well requirements. Figure 9 shows the power consumption for both circuits for a range of frequencies and V dd =5 V. These results 100 mw 10 mw Conventional Power Consumption 1 mw 100 uw Conventional (no sense amps) Adiabatic A output (buffered) 10 uw Measured Simulated 1 2 3 4 5 6 Frequency (MHz) Figure 7: Read operation timing Figure 9: Power consumption versus frequency

Function Conventional Adiabatic Buffering 336 µw 8.6 µw Read Decoding 139 µw 5.2 µw Write Decoding 62 µw 4.5 µw Read Output 20.7 mw 5.3 µw Memory Cells 31 µw 4.2 µw Table 1: Power consumption at 1 MHz (V dd =5 V) 20/1 A out are based on a 50% level of activity, ie. 50% probability that the output of a gate will change. The activity level in practice, and hence the actual power consumption, will likely be much lower. It can be seen that the measured results for the conventional circuit closely correspond to the simulation results. The results for the adiabatic circuit are from simulation only, as measured results were not available at the time of writing. Table 1 provides a breakdown of the power consumption of the circuits by function. As can be seen, the power consumption of the conventional sense-amps swamp the results for that implementation. The reason for this can be seen in the circuit schematics of Figures 10 and 11. When the is high, the 20/1 nfet in the sense-amp precharges the bit line to V dd. If the value stored in the cell is high, and that cell is selected for reading, then the bit line is dragged low through 2/1 and 5/1 nfets. This results in a short circuit current of approximately 500 µa with 50% duty cycle for each bit in the word being read which is high. Although this sense-amp circuit is useful in the original application (the register file in a fast risc core), it is not particularly suitable for a low-power low-speed application. (Note that the problem could be fixed by anding the word select with the s complement.) For this reason, Figure 9 provides results for the conventional circuit without the sense-amp power consumption. Even when the conventional sense amplifiers are ignored, the adiabatic circuit produces impressive power savings compared to the conventional implementation. These savings range from 85% for the cells, through to more than 90% for the decoders. As the charge stored on the bit and word lines is being recycled, such significant power savings are to be expected of the decoders and read data output circuitry. In simulation the conventional circuit operated to more than 10 MHz and the adiabatic circuit, designed to operate at 5 MHz, to just over 6 MHz. The adiabatic circuit can operate at higher frequencies by using wider fets inthe gating circuitry and by increasing the drive of the decoders. Note that measured results are limited by the failure of the i/o pads, not necessarily by failure of the circuits. If the additional losses associated with the energy recycling are considered (which can be as low as 10 15% [6]), total power savings of more than 80% over conventional sram designs are still possible. 8 Summary To realise their full power-saving potential, adiabatic systems require some form of adiabatic. We have demonstrated a register file implemented entirely with adiabatic logic, a natural progression from other hybrid energyrecovery systems. This circuit has produced significant power savings over a Figure 10: Sense amplifier used in conventional circuit 2/1 2/1 5/1 Figure 11: Memory cell used in conventional circuit similar implementation using conventional cmos logic. Savings are particularly notable where highly capacitive buses are driven, as in the read data output, and row decoders. This work shows that in applications where slower rates are acceptable, power savings in the order of 80% are possible in circuitry through the use of adiabatic techniques. Acknowledgements The authors thank S.-Y. Choe and G. Hellestrand of the University of NSW, M. Bickerstaff and N. Weste of Macquarie University, S. Reemeyer of Sydney University, and C. Nicol of Lucent Technologies for their contributions and assistance. Initial investigations were funded in part by a grant from the Australian Research Council. References [1] W. C. Athas, L. Svensson, J. G. Koller, N. Tzartzanis, E. Y.-C. Chou, Low-power digital systems based on adiabatic-switching principles, IEEE Trans. VLSI Systems, vol. 2, no. 4, pp. 398 407, 1994. [2] C.-C. Chao, B. A. Wooley, A 1.3-ns 32-word 32-bit three-port BiCMOS register file, IEEE J. Solid-State Circuits, vol. 31, no. 6, pp. 758 766, 1996. [3] R. Coggins, M. Jabri, R. Wang, S. Avery, An amplitude and shift invariant micropower template matcher, Proc. Int. Conf. Neural Information Processing, pp. 1257 1261, 1996. [4] J. S. Denker, A review of adiabatic computing, Proc. IEEE Symp. Low Power Electronics, pp. 94 97, 1994. [5] R.H.Dennard,D.J.Frank,Memory with adiabatically switched bit lines, U. S. Patent 5,526,319, 1996.

Figure 12: Photomicrograph of the conventional (top) and adiabatic (bottom) register files [6] A. G. Dickinson, J. S. Denker, Adiabatic dynamic logic, IEEE J. Solid-State Circuits, vol. 30, no. 3, pp. 311 315, 1995. [7] Y.Moon,D.-K.Jeong, A 32 32-bit adiabatic register file with supply generator, Symp. VLSI Circuits Dig. Tech. Papers, pp. 27 28, 1997. [8] D. Somasekhar, Y. Ye, K. Roy, An energy recovery static RAM core, Proc. IEEE Symp. Low Power Electronics, pp. 62 63, 1995. [9] N. Tzartzanis, W. C. Athas, Energy recovery for the design of high-speed, low-power static RAMs, Proc. Int. Symp. Low Power Electronics and Design, pp. 55 60, 1996.