CMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits

Similar documents
Lecture 12 Memory Circuits. Memory Architecture: Decoders. Semiconductor Memory Classification. Array-Structured Memory Architecture RWM NVRWM ROM

Lecture 8: Memory Peripherals

Topics. Memory Reliability and Yield Control Logic. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

DESIGN & IMPLEMENTATION OF SELF TIME DUMMY REPLICA TECHNIQUE IN 128X128 LOW VOLTAGE SRAM

Static Random Access Memory - SRAM Dr. Lynn Fuller Webpage:

Memory Basics. historically defined as memory array with individual bit access refers to memory with both Read and Write capabilities

Electronic Circuits EE359A

Homework 10 posted just for practice. Office hours next week, schedule TBD. HKN review today. Your feedback is important!

A Read-Decoupled Gated-Ground SRAM Architecture for Low-Power Embedded Memories

電子電路. Memory and Advanced Digital Circuits

A Low-Power SRAM Design Using Quiet-Bitline Architecture

Low Power Design of Schmitt Trigger Based SRAM Cell Using NBTI Technique

Semiconductor Memory: DRAM and SRAM. Department of Electrical and Computer Engineering, National University of Singapore

Lecture 6: Electronics Beyond the Logic Switches Xufeng Kou School of Information Science and Technology ShanghaiTech University

Switching threshold. Switch delay model. Input pattern effects on delay

EE 330 Lecture 44. Digital Circuits. Dynamic Logic Circuits. Course Evaluation Reminder - All Electronic

Memory (Part 1) RAM memory

SRAM Read-Assist Scheme for Low Power High Performance Applications

Team VeryLargeScaleEngineers Robert Costanzo Michael Recachinas Hector Soto. High Speed 64kb SRAM. ECE 4332 Fall 2013

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N

Topic 6. CMOS Static & Dynamic Logic Gates. Static CMOS Circuit. NMOS Transistors in Series/Parallel Connection

[Vivekanand*, 4.(12): December, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

Analysis of Low Power-High Speed Sense Amplifier in Submicron Technology

CHAPTER 7 A BICS DESIGN TO DETECT SOFT ERROR IN CMOS SRAM

Lecture #29. Moore s Law

UNIT-II LOW POWER VLSI DESIGN APPROACHES

Design of a high speed and low power Sense Amplifier

Chapter 6 Combinational CMOS Circuit and Logic Design. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

! Is it feasible? ! How do we decompose the problem? ! Vdd. ! Topology. " Gate choice, logical optimization. " Fanin, fanout, Serial vs.

Energy-Recovery CMOS Design

90% Write Power Saving SRAM Using Sense-Amplifying Memory Cell

Lecture 16. Complementary metal oxide semiconductor (CMOS) CMOS 1-1

EEC 118 Lecture #12: Dynamic Logic

EE 330 Lecture 44. Digital Circuits. Ring Oscillators Sequential Logic Array Logic Memory Arrays. Final: Tuesday May 2 7:30-9:30

Introduction to CMOS VLSI Design (E158) Lecture 9: Cell Design

Chapter 3 Novel Digital-to-Analog Converter with Gamma Correction for On-Panel Data Driver

ECE 334: Electronic Circuits Lecture 10: Digital CMOS Circuits

Lecture 9: Cell Design Issues

A Three-Port Adiabatic Register File Suitable for Embedded Applications

Design of Soft Error Tolerant Memory and Logic Circuits

Lecture 4&5 CMOS Circuits

Low Power System-On-Chip-Design Chapter 12: Physical Libraries

The Effect of Threshold Voltages on the Soft Error Rate. - V Degalahal, N Rajaram, N Vijaykrishnan, Y Xie, MJ Irwin

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2012

THE content-addressable memory (CAM) is one of the most

A Robust Low Power Static Random Access Memory Cell Design

CHAPTER 5 DESIGN AND ANALYSIS OF COMPLEMENTARY PASS- TRANSISTOR WITH ASYNCHRONOUS ADIABATIC LOGIC CIRCUITS

CMOS VLSI Design (A3425)

! Sequential Logic. ! Timing Hazards. ! Dynamic Logic. ! Add state elements (registers, latches) ! Compute. " From state elements

STUDY OF VOLTAGE AND CURRENT SENSE AMPLIFIER

A Novel Low-Power Scan Design Technique Using Supply Gating

Low Power, Area Efficient FinFET Circuit Design

EE 330 Lecture 42. Other Logic Styles Digital Building Blocks

Sleepy Keeper Approach for Power Performance Tuning in VLSI Design

Design of Low-Power High-Performance 2-4 and 4-16 Mixed-Logic Line Decoders

EEC 216 Lecture #10: Ultra Low Voltage and Subthreshold Circuit Design. Rajeevan Amirtharajah University of California, Davis

Announcements. Advanced Digital Integrated Circuits. Quiz #3 today Homework #4 posted This lecture until 4pm

EECS 427 Lecture 13: Leakage Power Reduction Readings: 6.4.2, CBF Ch.3. EECS 427 F09 Lecture Reminders

CACTI 5.1. Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, and Norman P. Jouppi HP Laboratories, Palo Alto HPL April 2, 2008*

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2012

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering

EE141- Spring 2004 Digital Integrated Circuits

CPE/EE 427, CPE 527 VLSI Design I: Homeworks 3 & 4

Noise Tolerance Dynamic CMOS Logic Design with Current Mirror Circuit

Digital Timing Control in SRAMs for Yield Enhancement and Graceful Aging Degradation

EECS 427 Lecture 22: Low and Multiple-Vdd Design

Advanced Operational Amplifiers

Integrated Circuits & Systems

Combinational Logic Gates in CMOS

Intellect Amplifier, Current Clasped and Filled Current Approach Sense Amplifiers Techniques Based Low Power SRAM

Energy Recovery for the Design of High-Speed, Low-Power Static RAMs

! Review: Sequential MOS Logic. " SR Latch. " D-Latch. ! Timing Hazards. ! Dynamic Logic. " Domino Logic. ! Charge Sharing Setup.

Near-threshold Computing of Single-rail MOS Current Mode Logic Circuits

12-nm Novel Topologies of LPHP: Low-Power High- Performance 2 4 and 4 16 Mixed-Logic Line Decoders

A Wordline Voltage Management for NOR Type Flash Memories

Lecture 18. BUS and MEMORY

Reducing the Sub-threshold and Gate-tunneling Leakage of SRAM Cells using Dual-V t and Dual-T ox Assignment

A Static Power Model for Architects

BASIC PHYSICAL DESIGN AN OVERVIEW The VLSI design flow for any IC design is as follows

ISSCC 2001 / SESSION 11 / SRAM / 11.4

Design Methodologies. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

EE 330 Lecture 43. Digital Circuits. Other Logic Styles Dynamic Logic Circuits

EE 330 Lecture 12. Devices in Semiconductor Processes. Diodes

Topics. Low Power Techniques. Based on Penn State CSE477 Lecture Notes 2002 M.J. Irwin and adapted from Digital Integrated Circuits 2002 J.

A Novel Technique to Reduce Write Delay of SRAM Architectures

Design and Implement of Low Power Consumption SRAM Based on Single Port Sense Amplifier in 65 nm

A new 6-T multiplexer based full-adder for low power and leakage current optimization

Design Methodologies. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Jan Rabaey, «Low Powere Design Essentials," Springer tml

Low Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS

EEC 216 Lecture #8: Leakage. Rajeevan Amirtharajah University of California, Davis

Design of Low Power Vlsi Circuits Using Cascode Logic Style

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

STATIC cmos circuits are used for the vast majority of logic

1. Short answer questions. (30) a. What impact does increasing the length of a transistor have on power and delay? Why? (6)

Circuit level, 32 nm, 1-bit MOSSI-ULP adder: power, PDP and area efficient base cell for unsigned multiplier

Power Spring /7/05 L11 Power 1

Design and Simulation of Low Voltage Operational Amplifier

Low Transistor Variability The Key to Energy Efficient ICs

Reduced Swing Domino Techniques for Low Power and High Performance Arithmetic Circuits

Transcription:

CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 24: Peripheral Memory Circuits [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411 L24 S.1

Review: Read-Write Memories (RAMs) Static SRAM data is stored as long as supply is applied large cells (6 fets/cell) so fewer bits/chip fast so used where speed is important (e.g., caches) differential outputs (output BL and!bl) use sense amps for performance compatible with CMOS technology Dynamic DRAM periodic refresh required (every 1 to 4 ms) to compensate for the charge loss caused by leakage small cells (1 to 3 fets/cell) so more bits/chip slower so used for main memories single ended output (output BL only) need sense amps for correct operation not typically compatible with CMOS technology Sp11 CMPEN 411 L24 S.2

Non-Volatile Memories The Floating-gate transistor (FAMOS) Source Floating gate Gate Drain D t ox G n + Substrate p t ox n +_ S Device cross-section Schematic symbol Sp11 CMPEN 411 L24 S.3

Floating-Gate Transistor Programming 20 V 0 V 5 V 10 V 5 V 20 V -- 5 V 0 V -- 2.5 V 5 V S D S D S D Avalanche injection Removing programming voltage leaves charge trapped Programming results in higher V T. Sp11 CMPEN 411 L24 S.4

A Programmable-Threshold Transistor I D 0 -state 1 -state ON DV T OFF V WL V GS Sp11 CMPEN 411 L24 S.5

Peripheral Memory Circuitry Row and column decoders Read bit line precharge logic Sense amplifiers Timing and control Speed Power consumption Area pitch matching Sp11 CMPEN 411 L24 S.6

Row Decoders Collection of 2 M complex logic gates organized in a regular, dense fashion (N)AND decoder for 8 address bits WL(0) =!A 7 &!A 6 &!A 5 &!A 4 &!A 3 &!A 2 &!A 1 &!A 0 WL(255) = A 7 & A 6 & A 5 & A 4 & A 3 & A 2 & A 1 & A 0 NOR decoder for 8 address bits WL(0) =!(A 7 A 6 A 5 A 4 A 3 A 2 A 1 A 0 ) WL(255) =!(!A 7!A 6!A 5!A 4!A 3!A 2!A 1!A 0 ) Goals: Pitch matched, fast, low power Sp11 CMPEN 411 L24 S.7

Implementing a Wide NOR Function Single stage 8x256 bit decoder (as in Lecture 22) One 8 input NOR gate per row x 256 rows = 256 x (8+8) = 4,096 Pitch match and speed/power issues Decompose logic into multiple levels!wl(0) =!(!(A 7 A 6 ) &!(A 5 A 4 ) &!(A 3 A 2 ) &!(A 1 A 0 )) First level is the predecoder (for each pair of address bits, form A i A i-1, A i!a i-1,!a i A i-1, and!a i!a i-1) Second level is the word line driver Predecoders reduce the number of transistors required Four sets of four 2-bit NOR predecoders = 4 x 4 x (2+2) = 64 256 word line drivers, each a four input NAND 256 x (4+4) = 2,048 Sp11 CMPEN 411 L24 S.8-4,096 vs 2,112 = almost a 50% savings Number of inputs to the gates driving the WLs is halved, so the propagation delay is reduced by a factor of ~4

Hierarchical Decoders Multi-stage implementation improves performance WL 1 WL 0 A 0 A 1 A 0 A 1 A 0 A 1 A 0 A 1 A 2 A 3 A 2 A 3 A 2 A 3 A 2 A 3 A 0 A 0 A 1 A 1 A 2 A 2 A 3 A 3 NAND decoder using 2-input pre-decoders Sp11 CMPEN 411 L24 S.9

Dynamic Decoders Precharge devices GND GND V DD WL 3 V DD WL 3 WL 2 V DD WL 2 WL 1 WL 0 V DD WL 1 WL 0 V DD φ A 0 A 0 A 1 A 1 A 0 A 0 A 1 A 1 φ 2-input NOR decoder 2-input NAND decoder Which one is faster? Smaller? Low power? Sp11 CMPEN 411 L24 S.10

Pass Transistor Based Column Decoder BL 3!BL 3 BL 2!BL 2 BL 1!BL 1 BL 0!BL 0 A 1 A 0 2 input NOR decoder S 3 S 2 S 1 S 0 Sp11 CMPEN 411 L24 S.11 data_out!data_out Read: connect BLs to the Sense Amps (SA) Writes: drive one of the BLs low to write a 0 into the cell Fast since there is only one transistor in the signal path. However, there is a large transistor count ( (K+1)2 K + 2 x 2 K ) For K = 2 3 x 2 2 (decoder) + 2 x 2 2 (PTs) = 12 + 8 = 20

Tree Based Column Decoder BL 3!BL 3 BL 2!BL 2 BL 1!BL 1 BL 0!BL 0 A 0!A 0 A 1!A 1 data_out!data_out Number of transistors reduced to (2 x 2 x (2 K -1)) for K = 2 2 x 2 x (2 2 1) = 4 x 3 = 12 Delay increases quadratically with the number of sections (K) (so prohibitive for large decoders) can fix with buffers, progressive sizing, combination of tree and pass transistor approaches Sp11 CMPEN 411 L24 S.12

Decoder Complexity Comparisons Consider a memory with 10b address and 8b data Conf. Data/Row Row Decoder Column Decoder 1D 8b 10b = a 10x2 10 decoder Single stage = 20,480 Two stage = 10,320 2D 32b 8b = 8x2 8 decoder (32x256 core) Single stage = 4,096 T Two stage = 2,112 T 2D 2D 64b (64x128 core) 128b (128x64 core) 7b = 7x2 7 decoder Single stage = 1,792 T Two stage = 1,072 T 6b = 6x2 6 decoder Single stage = 768 T Two stage = 432 T 2b = 2x2 2 decoder PT = 76 T Tree = 96 T 3b = 3x2 3 decoder PT = 160 T Tree = 224 T 4b = 4x2 4 decoder PT = 336 T Tree = 480 T Sp11 CMPEN 411 L24 S.13

Bit Line Precharge Logic First step of a Read cycle is to precharge (PC) the bit lines to V DD every differential signal in the memory must be equalized to the same voltage level before Read Turn off PC and enable the WL the grounded PMOS load limits the bit line swing (speeding up the next precharge cycle) BL!PC!BL equalization transistor - speeds up equalization of the two bit lines by allowing the capacitance and pull-up device of the nondischarged bit line to assist in precharging the discharged line Sp11 CMPEN 411 L24 S.14

Sense Amplifiers Amplification resolves data with small bit line swings (in some DRAMs required for proper functionality) Delay reduction compensates for the limited drive capability of the memory cell to accelerate BL transition Sp11 CMPEN 411 L24 S.15 t p = ( C * V ) / I av large input small SA output make V as small as possible Power reduction eliminates a large part of the power dissipation due to charging and discharging bit lines Signal restoration for DRAMs, need to drive the bit lines full swing after sensing (read) to do data refresh

Classes of Sense Amplifiers Differential SA takes small signal differential inputs (BL and!bl) and amplifies them to a large signal singleended output common-mode rejection rejects noise that is equally injected to both inputs Only suitable for SRAMs (with BL and!bl) Types Current mirroring Two-stage Latch based Single-ended SA needed for DRAMs Sp11 CMPEN 411 L24 S.16

Differential Sense Amplifier V DD M 3 M 4 y Out bit M 1 M 2 bit SE M 5 Directly applicable to SRAMs Sp11 CMPEN 411 L24 S.17

Differential Sensing SRAM V DD PC V DD BL EQ BL y M 3 V DD M 4 V DD 2 y WL i x M 1 M 2 2 x x 2 x SE M 5 SE SRAM cell i SE x Diff. Sense 2 x Amp V DD y Output Output (a) SRAM sensing scheme SE (b) two stage differential amplifier Sp11 CMPEN 411 L24 S.18

Read/Write Circuitry BL!BL CS D W!R R SA Local R/W Precharge D: data (write) bus R: read bus W: write signal CS: column select (column decoder) Local W (write): BL = D,!BL =!D enabled by W & CS Local R (read): R = BL,!R =!BL enabled by!w & CS Sp11 CMPEN 411 L24 S.19

Approaches to Memory Timing SRAM Timing Self-Timed DRAM Timing Multiplexed Addressing msb s lsb s Address Bus Address Address Bus RAS Row Addr. Column Addr. Address transition initiates memory operation CAS RAS-CAS timing Sp11 CMPEN 411 L24 S.20

Reliability and Yield Memories operate under low signal-to-noise conditions word line to bit line coupling can vary substantially over the memory array - folded bit line architecture (routing BL and!bl next to each other ensures a closer match between parasitics and bit line capacitances) interwire bit line to bit line coupling - transposed (or twisted) bit line architecture (turn the noise into a common-mode signal for the SA) leakage (in DRAMs) requiring refresh operation suffer from low yield due to high density and structural defects increase yield by using error correction (e.g., parity bits) and redundancy and are susceptible to soft errors due to alpha particles and cosmic rays Sp11 CMPEN 411 L24 S.21

Redundancy in the Memory Structure Fuse bank Redundant row Row address Redundant columns Column address Sp11 CMPEN 411 L24 S.22

Row Redundancy Fused Repair Addresses ==? ==? Redundant Wordline Redundant Wordline Enable Normal Wordline Decoder Normal Wordline Functional Address Normal Wordline Decoder Enable Normal Wordline Fused Repair Addresses ==? ==? Redundant Wordline Redundant Wordline Page 4 Sp11 CMPEN 411 L24 S.23

Normal Data Column Normal Data Column Normal Data Column Normal Data Column Redundant Data Column Column Redundancy Fuse Fuse Fuse Fuse Normal Data Column Normal Data Column Normal Data Column Normal Data Column Fuse Fuse Fuse Fuse Data 0 Data 1 Data 2 Page 5 Sp11 CMPEN 411 L24 S.24 Data 3 Data 4 Data 5 Data 6 Data 7

Error-Correcting Codes Example: Hamming Codes e.g. If B3 flips 1 1 = 3 0 2 K >= m+k+1. m # data bit, k # check bit For 64 data bits, needs 7 check bits Sp11 CMPEN 411 L24 S.25

Performance and area overhead for ECC Sp11 CMPEN 411 L24 S.26

Redundancy and Error Correction Sp11 CMPEN 411 L24 S.27

Soft Errors Nonrecurrent and nonpermanent errors from alpha particles (from the packaging materials) neutrons from cosmic rays S ystem F IT S 10000 1000 100 As feature size decreases, the charge 1 stored at each node decreases (due to a lower node capacitance and lower V DD ) and thus Q critical (the charge necessary to cause a bit flip) decreases leading to an increase in the soft error rate (SER) 10 From Semico Research Corp. 0.25 0.18 0.13 0.09 0.05 Process Technology MTBF (hours).13 µm.09 µ m Ground-based 895 448 Civilian Avionics System 324 162 Military Avionics System 18 9 From Actel Sp11 CMPEN 411 L24 S.28

CELL Processor! See class website for web links Sp11 CMPEN 411 L24 S.29

CELL Processor! Sp11 CMPEN 411 L24 S.30

CELL Processor! Sp11 CMPEN 411 L24 S.31

Embedded SRAM (4.6Ghz) Each SRAM cell 0.99um2 Each block has 32 sub-arrays, Each sub-array has 128 WL plus 4 redundant line, Each block has 2 redundant BL, Sp11 CMPEN 411 L24 S.32

Multiplier in CELL Sp11 CMPEN 411 L24 S.33

Next Lecture and Reminders Next lecture Power consumption in datapaths and memories - Reading assignment Rabaey, et al, 11.7; 12.5 Sp11 CMPEN 411 L24 S.34