Semiconductor Memory Classification Lecture 12 Memory Circuits RWM NVRWM ROM Peter Cheung Department of Electrical & Electronic Engineering Imperial College London Reading: Weste Ch 8.3.1-8.3.2, Rabaey Ch.10, p.551-595 Random Access SRAM DRAM Non-Random Access FIFO LIFO Shift Register EPROM E 2 PROM FLASH Mask-Programmed Programmable (PROM) URL: www.ee.ic.ac.uk/pcheung/ E-mail: p.cheung@ic.ac.uk CAM Lecture 12-1 Lecture 12-2 Memory Architecture: Decoders Array-Structured Memory Architecture M bits M bits Problem: ASPECT RATIO or HEIGHT >> WIDTH N Words S 0 S 1 S 2 S N-2 S N_1 Word 0 Word 1 Word 2 Word N-2 Word N-1 Storage Cell A 0 A 1 A K-1 Decoder S 0 Word 0 Word 1 Word 2 Word N-2 Word N-1 Storage Cell A K A K+1 A L-1 Row Decoder 2 L-K Bit Line Sense Amplifiers / Drivers Word Line M.2 K Storage Cell Amplify swing to rail-to-rail amplitude Input-Output (M bits) Input-Output (M bits) A 0 A K-1 Column Decoder Selects appropriate word N words => N select signals Too many select signals Decoder reduces # of select signals K = log 2 N Input-Output (M bits) Lecture 12-3 Lecture 12-4
Hierarchical Memory Architecture Memory Timing: Definitions Row Address Read Cycle Column Address Block Address READ WRITE Read Access Read Access Write Cycle Control Circuitry Block Selector Global Amplifier/Driver Global Data Bus DATA Data Valid Write Access I/O Advantages: 1. Shorter wires within blocks 2. Block address activates only 1 block => power savings Data Written Lecture 12-5 Lecture 12-6 Memory Timing: Approaches MOS NOR ROM MSB LSB Pull-up devices Address Bus RAS CAS Row Address Column Address RAS-CAS timing DRAM Timing Multiplexed Adressing Address Bus Address Address transition initiates memory operation SRAM Timing Self-timed [0] [1] [2] [3] [0] [1] [2] [3] Lecture 12-7 Lecture 12-8
MOS NOR ROM Layout MOS NOR ROM Layout [0] [1] Basic cell 10 λ x 7 λ Metal1 on top of diffusion (diffusion) Polysilicon Metal1 [0] Basic Cell 8.5 λ x 7 λ [1] [0] [1] [2] [3] Threshold raising implant (diffusion) Metal1 over diffusion [2] 2 λ Polysilicon [3] [2] [3] Only 1 layer (contact mask) is used to program memory array Programming of the memory can be delayed to one of last process steps Threshold raising implants disable transistors Lecture 12-9 Lecture 12-10 MOS NAND ROM MOS NAND ROM Layout Diffusion [0] [1] [2] [3] Pull-up devices Basic cell Polysilicon [0] 5 λ x 6 λ [1] Threshold [2] lowering implant [3] All word lines high by default with exception of selected row No contact to VDD or necessary; drastically reduced cell size Loss in performance compared to NOR ROM Lecture 12-11 Lecture 12-12
Decreasing Word Line Delay Precharged MOS NOR ROM Driver Polysilicon word line pre Precharge devices Metal word line (a) Driving the word line from both sides Metal bypass [0] [1] [2] [3] K cells Polysilicon word line [0] [1] [2] [3] (b) Using a metal bypass (c) Use silicides PMOS precharge device can be made as large as necessary, but clock driver becomes harder to design. Lecture 12-13 Lecture 12-14 Floating-Gate Transistor Programming Flash EEPROM 20 V 0 V 5 V Control gate Floating gate 20 V 10 V 5 V 5 V 0 V 2.5 V 5 V erasure Thin tunneling oxide S D S D S D n + source programming n + drain Avalanche injection. Removing programming voltage leaves charge trapped. Programming results in higher V T. p-substrate Lecture 12-15 Lecture 12-16
Cross-sections of NVM cells Characteristics of State-of-the-art NVM Flash Courtesy Intel EPROM Lecture 12-17 Lecture 12-18 Read-Write Memories (RAM) Basic RAM Cell STATIC (SRAM) Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential DYNAMIC (DRAM) Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended Lecture 12-19 Lecture 12-20
6-transistor CMOS SRAM Cell SRAM Read/Write M2 M4 M5 Q Q M6 M1 M3 Lecture 12-21 Lecture 12-22 RAM Cell Design 6T-SRAM Layout M2 M4 Q Q M5 M1 M3 M6 Lecture 12-23 Lecture 12-24
More Cell Layout Resistance-load SRAM Cell R L R L M3 Q Q M4 M1 M2 Static power dissipation -- Want R L large Bit lines precharged to to address t p problem Lecture 12-25 Lecture 12-26 Dual Port RAM Multiport RAM Cell Lecture 12-27 Lecture 12-28
3-Transistor DRAM Cell 3T-DRAM Layout W 1 2 W 2 1 R M1 X M3 M2 R X 1 -V T R M3 M2 C S 2 -V T V W M1 No constraints on device ratios Reads are non-destructive Value stored at node X when writing a 1 = V W -V Tn Lecture 12-29 Lecture 12-30 1-Transistor DRAM Cell DRAM Cell Observations C M1 C S X /2 C S V = V V PRE = ( V BIT V PRE )----------------------- C S + C Write "1" Read "1" V T sensing /2 Write: C S is charged or discharged by asserting and. Read: Charge redistribution takes places between bit line and storage capacitance 1T DRAM requires a sense amplifier for each bit line, due to charge redistribution read-out. DRAM memory cells are single ended in contrast to SRAM cells. The read-out of the 1T DRAM cell is destructive; read and refresh operations are necessary for correct operation. Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design. When writing a 1 into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than. Voltage swing is small; typically around 250 mv. Lecture 12-31 Lecture 12-32
1-T DRAM Cell SEM of poly-diffusion capacitor 1T-DRAM Capacitor Metal word line n + n + poly SiO 2 Field Oxide M1 word line poly Inversion layer induced by plate bias (a) Cross-section Diffused bit line Polysilicon gate Polysilicon plate (b) Layout Used Polysilicon-Diffusion Capacitance Expensive in Area Lecture 12-33 Lecture 12-34 Advanced 1T DRAM Cells Address Transition Detection Word line Insulating Layer Cell plate Capacitor dielectric layer Cell Plate Si A 0 DELAY t d ATD ATD Capacitor Insulator Storage Node Poly Refilling Poly Transfer gate Isolation Storage electrode A 1 DELAY t d 2nd Field Oxide Si Substrate A N-1 DELAY t d... Trench Cell Stacked-capacitor Cell Lecture 12-35 Lecture 12-36
Row Decoders Dynamic Decoders Collection of 2 M complex logic gates Organized in regular and dense fashion Precharge devices (N)AND Decoder 3 3 2 2 1 1 NOR Decoder 0 0 A 0 A 0 A 1 A 1 A 0 A 0 A 1 A 1 Dynamic 2-to-4 NOR decoder 2-to-4 MOS dynamic NAND Decoder Propagation delay is primary concern Lecture 12-37 Lecture 12-38 A NAND decoder using 2-input pre-decoders 4 input pass-transistor based column decoder 0 1 2 3 1 0 A 0 A 1 2 input NOR decoder S 0 S 1 S 2 S 3 A 0 A 1 A 0 A 1 A 0 A 1 A 0 A 1 A 2 A 3 A 2 A 3 A 2 A 3 A 2 A 3 D A 1 A 0 A 0 A 1 A 3 A 2 A 2 A 3 Splitting decoder into two or more logic layers produces a faster and cheaper implementation Advantage: speed (t pd does not add to overall memory access time) only 1 extra transistor in signal path Disadvantage: large transistor count Lecture 12-39 Lecture 12-40
4-to-1 tree based column decoder Decoder for circular shift-register 0 1 2 3 A 0 A 0 0 1 2 A 1 A 1 R R R... D Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders Solutions: buffers progressive sizing combination of tree and pass transistor approaches Lecture 12-41 Lecture 12-42 Bitline I/O Circuit - Read Bitline I/O Circuit - Write Lecture 12-43 Lecture 12-44
Write Drivers Alternative Write Circuit Lecture 12-45 Lecture 12-46 Bitline Multiplexing Bitline Mux - Option 1 Lecture 12-47 Lecture 12-48
Bitline Mux - Option 2 Sense Amplifiers Lecture 12-49 Lecture 12-50 Sense Amp Circuit Latch-Based Sense Amplifier EQ SE SE Initialized in its meta-stable point with EQ Once adequate voltage gap created, sense amp enabled with SE Positive feedback quickly forces output to a stable operating point. Lecture 12-51 Lecture 12-52
Single-to-Differential Conversion Open bitline architecture EQ R L 1 L 0 R 0 R 1 L cell x Diff. S.A. x + _ V ref L SE R y y C S... C S C S SE... C S C S C S How to make good V ref? dummy cell dummy cell Lecture 12-53 Lecture 12-54 DRAM Read Process with Dummy Cell Single-Ended Cascode Amplifier 6.0 VDD V (Volt) V (Volt) 4.0 2.0 0.00 1 2 3 4 5 t (nsec) (a) reading a zero 6.0 4.0 2.0 0.00 1 2 3 4 5 t (nsec) (b) reading a one V (Volt) 5.0 4.0 3.0 SE 2.0 EQ 1.0 0.0 0 1 2 3 4 5 (c) control signals V casc C Lecture 12-55 Lecture 12-56
Programmable Logic Array Pseudo-Static PLA AND PLANE x 0 x 1 x 2 Product Terms OR PLANE f 0 f 1 x 0 x 1 x 2 x 0 x 0 x 1 x 1 x 2 x 2 f 0 f 1 AND-PLANE OR-PLANE Lecture 12-57 Lecture 12-58 Dynamic PLA Clock Signal Generation for self-timed dynamic PLA AND OR Dummy AND Row AND AND AND Dummy AND Row OR AND x 0 x 0 x 1 x 1 x 2 x 2 f 0 f 1 OR OR AND-PLANE OR-PLANE (a) Clock signals (b) Timing generation circuitry. Lecture 12-59 Lecture 12-60
PLA Layout PLA versus ROM And-Plane Or-Plane Programmable Logic Array structured approach to random logic two level logic implementation NOR-NOR (product of sums) NAND-NAND (sum of products) IDENTICAL TO ROM! Main difference ROM: fully populated PLA: one element per minterm x 0 x 0 x 1 x 1 x 2 x 2 Pull-up devices f 0 f 1 Pull-up devices Lecture 12-61 Note: Importance of PLA s has drastically reduced 1. slow 2. better software techniques (mutli-level logic synthesis) Lecture 12-62