Energy Efficient and High Performance Current-Mode Neural Network Circuit using Memristors and Digitally Assisted Analog CMOS Neurons

Similar documents
Assoc. Prof. Dr. Burak Kelleci

6-Bit Charge Scaling DAC and SAR ADC

Design of High Gain Two stage Op-Amp using 90nm Technology

PG Scholar, Electronics (VLSI Design), PEC University of Technology, Chandigarh, India

Analog CMOS Interface Circuits for UMSI Chip of Environmental Monitoring Microsystem

RESIDUE AMPLIFIER PIPELINE ADC

ISSN:

A Low Power Low Voltage High Performance CMOS Current Mirror

GUJARAT TECHNOLOGICAL UNIVERSITY. Semester II. Type of course: ME-Electronics & Communication Engineering (VLSI & Embedded Systems Design)

Low Power Design of Successive Approximation Registers

Fan in: The number of inputs of a logic gate can handle.

1 Signals and systems, A. V. Oppenhaim, A. S. Willsky, Prentice Hall, 2 nd edition, FUNDAMENTALS. Electrical Engineering. 2.

Design of a Capacitor-less Low Dropout Voltage Regulator

Chapter 13 Oscillators and Data Converters

Current Mirrors. Current Source and Sink, Small Signal and Large Signal Analysis of MOS. Knowledge of Various kinds of Current Mirrors

Linear Integrated Circuits

DESIGN AND PERFORMANCE VERIFICATION OF CURRENT CONVEYOR BASED PIPELINE A/D CONVERTER USING 180 NM TECHNOLOGY

Index terms: Analog to Digital conversion, capacitor sharing, high speed OPAMP-sharing pipelined analog to digital convertor, Low power.

CHAPTER 3 DESIGN OF PIPELINED ADC USING SCS-CDS AND OP-AMP SHARING TECHNIQUE

An accurate track-and-latch comparator

CHAPTER 4 ULTRA WIDE BAND LOW NOISE AMPLIFIER DESIGN

CHAPTER IV DESIGN AND ANALYSIS OF VARIOUS PWM TECHNIQUES FOR BUCK BOOST CONVERTER

DESIGN OF MULTI-BIT DELTA-SIGMA A/D CONVERTERS

CHAPTER 4 MIXED-SIGNAL DESIGN OF NEUROHARDWARE

Radivoje Đurić, 2015, Analogna Integrisana Kola 1

d. Can you find intrinsic gain more easily by examining the equation for current? Explain.

Design and Analysis of Linear Voltage to current converters using CMOS Technology

Performance Evaluation of Different Types of CMOS Operational Transconductance Amplifier

An 8-Channel General-Purpose Analog Front- End for Biopotential Signal Measurement

METHODOLOGY FOR THE DIGITAL CALIBRATION OF ANALOG CIRCUITS AND SYSTEMS

High Speed CMOS Comparator Design with 5mV Resolution

EECS3611 Analog Integrated Circuit Design. Lecture 3. Current Source and Current Mirror

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2012

Design and Layout of Two Stage High Bandwidth Operational Amplifier

A 1.2V 8 BIT SAR ANALOG TO DIGITAL CONVERTER IN 90NM CMOS

On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI

Short Channel Bandgap Voltage Reference

Chapter 8 Differential and Multistage Amplifiers

A10-Gb/slow-power adaptive continuous-time linear equalizer using asynchronous under-sampling histogram

@IJMTER-2016, All rights Reserved 333

SAR ADC USING SINGLE-CAPACITOR PULSE WIDTH TO ANALOG CONVERTER BASED DAC. A Thesis. Presented to. The Graduate Faculty of the University of Akron

Operational Amplifiers

Binary Neural Network and Its Implementation with 16 Mb RRAM Macro Chip

CHAPTER 3. Instrumentation Amplifier (IA) Background. 3.1 Introduction. 3.2 Instrumentation Amplifier Architecture and Configurations

Delay-Locked Loop Using 4 Cell Delay Line with Extended Inverters

Low Power Phase Locked Loop Design with Minimum Jitter

3. DAC Architectures and CMOS Circuits

Design of Pipeline Analog to Digital Converter

D n ox GS THN DS GS THN DS GS THN. D n ox GS THN DS GS THN DS GS THN

P a g e 1. Introduction

Advances In Natural And Applied Sciences Homepage: October; 12(10): pages 1-7 DOI: /anas

UNIT-II LOW POWER VLSI DESIGN APPROACHES

Final Exam Spring 2012

High-Speed Analog to Digital Converters. ELCT 1003:High Speed ADCs

Efficient logic architectures for CMOL nanoelectronic circuits

A 4 GSample/s 8-bit ADC in. Ken Poulton, Robert Neff, Art Muto, Wei Liu, Andrew Burstein*, Mehrdad Heshami* Agilent Laboratories Palo Alto, California

UMAINE ECE Morse Code ROM and Transmitter at ISM Band Frequency

CMOS High Speed A/D Converter Architectures

Comparative Analysis of Compensation Techniques for improving PSRR of an OPAMP

Design of High Gain Low Voltage CMOS Comparator

Design of Continuous Time Multibit Sigma Delta ADC for Next Generation Wireless Applications

Operational Amplifier with Two-Stage Gain-Boost

Design for MOSIS Educational Program (Research) Testing Report for Project Number 89742

Chapter 5. Operational Amplifiers and Source Followers. 5.1 Operational Amplifier

DESIGN OF LOW POWER SAR ADC FOR ECG USING 45nm CMOS TECHNOLOGY

CS and CE amplifiers with loads:

Low-Voltage Wide Linear Range Tunable Operational Transconductance Amplifier

444 Index. F Fermi potential, 146 FGMOS transistor, 20 23, 57, 83, 84, 98, 205, 208, 213, 215, 216, 241, 242, 251, 280, 311, 318, 332, 354, 407

Sensors & Transducers Published by IFSA Publishing, S. L.,

IN the design of the fine comparator for a CMOS two-step flash A/D converter, the main design issues are offset cancelation

A Low Power Gain Boosted Fully Differential OTA for a 10bit pipelined ADC

10-Bit 5MHz Pipeline A/D Converter. Kannan Sockalingam and Rick Thibodeau

Design Of A Comparator For Pipelined A/D Converter

A 1 GS/s 6 bits Time-Based Analog-to-Digital Converter

DESIGN OF MULTIPLYING DELAY LOCKED LOOP FOR DIFFERENT MULTIPLYING FACTORS

ECE 442 Solid State Devices & Circuits. 15. Differential Amplifiers

A 2-bit/step SAR ADC structure with one radix-4 DAC

DESIGN OF A PROGRAMMABLE LOW POWER LOW DROP-OUT REGULATOR

Design of Low Power Preamplifier Latch Based Comparator

A 2.4 GHZ RECEIVER IN SILICON-ON-SAPPHIRE MICHAEL PETERS. B.S., Kansas State University, 2009 A REPORT

A Low Power Small Area Multi-bit Quantizer with A Capacitor String in Sigma-Delta Modulator

Design and Analysis of Low Power Two Stage CMOS Op- Amp with 50nm Technology

Digital Controller Chip Set for Isolated DC Power Supplies

Data Converters. Dr.Trushit Upadhyaya EC Department, CSPIT, CHARUSAT

A High Speed Encoder for a 5GS/s 5 Bit Flash ADC

Design and Analysis of Current-to-Voltage and Voltage - to-current Converters using 0.35µm technology

CHAPTER 6 IMPLEMENTATION OF FPGA BASED CASCADED MULTILEVEL INVERTER

Integrated Circuit Design for High-Speed Frequency Synthesis

Yet, many signal processing systems require both digital and analog circuits. To enable

High-Speed Hardware Efficient FIR Compensation Filter for Delta-Sigma Modulator Analog-to-Digital Converter in 0.13 μm CMOS Technology

Design of low-power, high performance flip-flops

Current Steering Digital Analog Converter with Partial Binary Tree Network (PBTN)

DESIGN OF A NOVEL CURRENT MIRROR BASED DIFFERENTIAL AMPLIFIER DESIGN WITH LATCH NETWORK. Thota Keerthi* 1, Ch. Anil Kumar 2

University of Michigan, EECS413 Final project. A High Speed Operational Amplifier. 1. A High Speed Operational Amplifier

DESIGN OF OTA-C FILTER FOR BIOMEDICAL APPLICATIONS

Test Results of the HTADC12 12 Bit Analog to Digital Converter at 250 O C

A Novel Architecture For An Energy Efficient And High Speed Sar Adc

Low Power Design of Schmitt Trigger Based SRAM Cell Using NBTI Technique

1. The decimal number 62 is represented in hexadecimal (base 16) and binary (base 2) respectively as

10. Chapter: A/D and D/A converter principles

Transcription:

Energy Efficient and High Performance Current-Mode Neural Network Circuit using Memristors and Digitally Assisted Analog CMOS Neurons Aranya Goswamy 1, Sagar Kumashi 1, Vikash Sehwag 1, Siddharth Kumar Singh 1, Manny Jain 1, Kaushik Roy 2, Mrigank Sharad 1 1 Department of Electronics and ECE, IIT Kharagpur, West Bengal, India 2 School of Electrical Engineering, Purdue University, West Lafayette, IN, USA Abstract: Emerging nano-scale programmable Resistive-RAM (RRAM) has been identified as a promising technology for implementing brain-inspired computing hardware. Several neural network architectures, that essentially involve computation of scalar-products between input data vectors and stored network weights can be efficiently implemented using high density cross-bar arrays of RRAM, integrated with CMOS. In such a design, the CMOS interface may be responsible for providing input excitations and for processing the RRAM s output. In order to achieve high energy efficiency along with high integration density in RRAM based neuromorphic hardware, the design of RRAM-CMOS interface can therefore play a major role. In this work we propose design of high performance, current mode CMOS interface for RRAM based neural network design. The use of current mode excitation for input interface and design of digitally assisted current-mode CMOS neuron circuit for the output interface is presented. The proposed technique achieve ~10x energy as well as performance improvement over conventional approaches employed in literature. Network level simulations show that the proposed scheme can achieve 2 orders of magnitude lower energy dissipation as compared to a digital ASIC implementation of a feed-forward neural network. 1. Introduction As demand on high performance computation increases, the traditional Von Neumann computer architecture becomes less efficient. In recent years, neuromorphic hardware systems have gained great attention. Such systems can potentially provide the capabilities of biological perception and information processing within a compact and energy-efficient platform. Many research activities have been carried out on neural network algorithm enhancement and/or system implementations built upon the conventional CPU, GPU, or FPGA [1]. In recent years several device solutions have been proposed for fabricating nano-scale programmable resistive elements, generally categorized under the term memristor [1-9]. Of special interest are those which are amenable to integration with state of the art CMOS technology, like memristors based on Ag-Si filaments [6-8]. Such devices can be integrated into metallic crossbars to obtain high density resistive crossbar memory (RCM) [1-8]. Continuous range of resistance values obtainable in these devices can facilitate the design of multi-level, non-volatile

memory [1-3]. The Resistive-Crossbar Memory (RCM) technology has led to interesting possibilities of combining memory with computation [1-5]. RCM can be highly suitable for a class of non-boolean computing applications that involve pattern-matching [5, 11]. Such applications employ highly memory intensive computing that may require correlation of a multidimensional input data with a large number of stored patterns or templates, in order to find the best match [11]. Use of conventional digital processing techniques for such tasks incurs high energy and real-estate cost, due to the sheer number of computations involved. Structurally, RCM can be a much closer fit for this class of associative computation. Owing to the direct use of nano-scale memory array for associative computing, it can provide very high degree of parallelism, apart from eliminating the overhead due to memory read[2]. Associative computing of practical complexity with RCM is essentially analog in nature, as it involves evaluating the degree of correlation between inputs and the stored data. In this project we investigate the construction of a neuron circuit that can take the dot product produced by the crossbar array as its input and accordingly produce a voltage level that can be compared with a reference voltage level to produce a digital output, using a comparator. The neuron design is essential in several respect, since it needs to act as a transimpedance amplifier with low input impedance and high tolerance to variability due to process variations or mismatch. 2. Description of Elements of Circuit 2.1 Regulated Cascode Transimpedance Amplifier

FIGURE 2. a)basic triode transconductor structure (b) Simple RGC triode transconductor In Figure 2(a) regulating amplifier keeps VDS of M1 at a constant value determined by VC. It is less than the overdrive voltage of M1. The voltage can be controlled from VC so as to place M3 in current-voltage feedback, thereby increasing output impedance. The concept is to drive the gate of M3 by an amplifier that forces VDS1 to be equal to VC. Therefore, the voltage variations at the drain of M3affect VDS1 to a lesser extent because amplifiers regulate this voltage. With the smaller variations atvds1 the current through M1 and hence output current remains more constant, yielding a higher output impedance [Razavi, 2001] Rout Agm3rO3rO1 (9) It is one of solutions using regulated cascode to replace the auxiliary amplifier in order to overcome restrictions on Figure 1. The circuit in Figure 2(b) proposed in [Mahattanakul & Toumazou, 1998] uses a single transistor, M5, to replace the amplifier in Figure 2(a). This circuit called regulated cascode which is abbreviated to RGC. The RGC uses M5 to achieve the gain boosting by increasing the output impedance without adding more cascode devices. VDS1 is calculated by follows: Assuming M5is in saturation region in Figure 2(b). It can be shown that

IC=12β5(VGS5 VT)2VGS5=VDS1 VC=2ICβ5 +VT5VDS1=VC+2ICβ5 +VT5 (10) From (6) Gm=β1VDS1=β1(VC+2ICβ5 +VT5). Thus, Gm can be tuned by using a controllable voltage source VC or current source IC. However, it is preferable in practice to use a controllable voltage source VC for lowering power consumption since VDS1 only varies as a square root function ofic. Simple RGC transconductor using a single transistor to achieve gain boosting can reduce area and power wasted by the auxiliary amplifiers. [3] The circuit in Fig. 5 [14] is usually referred to in the literature as a regulated cascode stage, and for this reason that is the designation that is used in this paper. However, we point out that the circuit in Fig. 5 is not derived from a cascode stage (i.e. a common-source followed by a common-gate stage); instead, it may be viewed as a common-gate stage to which a loop is added containing a voltage amplifier, which has the effect of dividing the input impedance by the amplifier gain. Furthermore, the circuit in Fig. 5 is different from a well-known high gain amplifier that is derived from a cascode stage and is properly called regulated cascode. The common-source transistor M2 with active load IB2 is an amplifier stage with voltage gain A A=gm2(r02 R0B2) (19) where R0B2 is the incremental resistance of the load current source IB2.

The input impedance is Z 1/ (A*gm1) (20) 2.2 Successive Approximation Register A successive approximation ADC is a type of analog-to-digital converter that converts a continuous analog waveform into a discrete digital representation via a binary searchthrough all possible quantization levels before finally converging upon a digital output for each conversion. [3] Although there are many variations for implementing a SAR ADC, the basic architecture is quite simple (see Figure 1). The analog input voltage (VIN) is held on a track/hold. To implement the binary search algorithm, the N-bit register is first set to midscale (that is, 100....00, where the MSB is set to 1). This forces the DAC output (VDAC) to be VREF/2, where VREF is the reference voltage provided to the ADC. A comparison is then performed to determine if VIN is less than, or greater than, VDAC. If VIN is greater than VDAC, the comparator output is a logic high, or 1, and the MSB of the N-bit register remains at 1. Conversely, if VIN is less than VDAC, the comparator output is a logic low and the MSB of the register is cleared to logic 0. The SAR control logic then moves to the next bit down, forces that bit high, and does another comparison. The sequence continues all the way down to the LSB. Once this is done, the conversion is complete and the N-bit digital word is available in the register.

Notice that four comparison periods are required for a 4-bit ADC. Generally speaking, an N-bit SAR ADC will require N comparison periods and will not be ready for the next conversion until the current one is complete. Mathematically, let Vin = xvref, so x in [-1, 1] is the normalized input voltage. The objective is to approximately digitize x to an accuracy of 1/2n. The algorithm proceeds as follows: Initial approximation x0 = 0. ith approximation xi = xi-1 - s(xi-1 - x)/2i. where, s(x) is the signum-function(sgn(x)) (+1 for x 0, -1 for x < 0). It follows using mathematical induction that xn - x 1/2n. [3]

The schematic of the SAR logic consists of shift register and code shift register using D-flip flop as shown in figure ix. Initially the reset line goes low. This line controls set line of FF1 and reset lines of all other sequencer flip flops. The same reset signal also controls the reset line of code register flip flops. Q and Qb of FF1 are set to 1 and 0 respectively. Qb also controls the set line of CF1. Hence the CF1 output is forced to 1. This is the MSB bit and the weight for VFSR/2. It should be noted that since sequence register is reset initially, the set input of all the code registers flip flops except CF1 is logic 1. Hence all the other code register output states are logic0 0. We get a sequence MSB=1 and all other set to 0. The analog equivalent of this weight will be generated by the DAC. When reset goes high and clock is triggered, Q becomes 0 and FF2 outs logic high. This low to high transition of FF2 triggers or clocks the code register flip flop CF1 to store control bus value to its output. When clock runs further, the code register flip flop retains the set value as FF2 output goes to zero. This process is repeated for each of the flip flops until after N clock cycles a high state comes out of sequencer flip flop controlling the code register LSB flip flop [4]. 2.3 Design of Circuit

The complete circuit shows an array of neuron, each having an input and output terminal. The input terminals form the outputs of the crossbar array, thus the input is the scalar dot product of the input and the weights in the memristor array. The SAR logic is shared both at the input and the output for the neurons. The buffers are connected to a control logic so that at any time instant only one neuron block is active. The SAR logic takes a finite time to adjust the input DC voltage of the neuron by turning on and off a particular combination of the transistors which mirror the bias current in the feedback branch. After the stabilization of one neuron input is complete, the next neuron is selected via the control logic, and the process repeats. Thus the stabilization of the whole circuit depends on the number of neurons in the array and the frequency of operation of the SAR logic block. The same technique is used to stabilize the DC point of the output. The circuit of the individual neuron is shown below.

In this neuron circuit, a regulated cascade transimpedance amplifier is used, which is essentially a common gate amplifier with feedback. This reduces the input impedance and decreases the variability at the input point and also increases the output impedance. The bias current for the feedback amplifier is controlled by the binary weighted array of PMOS transistors. The SAR logic block compares the input voltage with the reference voltage and either turns on or turns off successive transistors in the array. Analog MUX is used at the gates of the PMOS transistors for proper biasing. The change in the bias current changes the Vgs of the feedback amplifier and adjusts the input voltage. The voltage stabilization can be seen in the following diagram.

The variability in the DC point due to process variations and mismatch in the transistors was simulated using Monte Carlo analysis in Cadence. The results were checked for the circuit with and without SAR stabilization. As shown in the following figures, the standard deviation in the DC point showed a remarkable drop to 2.63019mV and 2.62507mV from the initial 10.3519mV. The number of runs was 100 and 500 respectively. Thus the variability is considerably reduced and the stable DC point allows better comparison. Power consumption in this circuit is measured to be 43uW with a Vdd of 1V. The bias current through the main transistor is 5uA.

The input and output characteristics are shown below: The response is that of a linear neuron circuit. The bandwidth is also very high, since the circuit effortlessly shows nanosecond response time.

The neuron model with SAR stabilized input and the complete circuit showing two neurons and a shared SAR logic at the input and output is shown below (Cadence Schematic).

3. Conclusion and Future Work The designed circuit was successful in achieving a low power, low input impedance, DC point stabilization and high bandwidth, characteristics which are ideal for it to be used as the interface between the crossbar structure and the output. The next level of this project involves preparation of layout diagram of the circuit and fabricating the circuit using commercial cleanroom facilities. Thereafter testing and validation of the circuit will be performed. [add here ]

References: 1) http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6252563 2)http://arxiv.org/ftp/arxiv/papers/1304/1304.2281.pdf 3)Wikipedia.org