Lecture 8: Memory Peripherals

Similar documents
CMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits

Lecture 12 Memory Circuits. Memory Architecture: Decoders. Semiconductor Memory Classification. Array-Structured Memory Architecture RWM NVRWM ROM

Digital Integrated Circuits (83-313) Lecture 3: Design Metrics

DESIGN & IMPLEMENTATION OF SELF TIME DUMMY REPLICA TECHNIQUE IN 128X128 LOW VOLTAGE SRAM

Static Random Access Memory - SRAM Dr. Lynn Fuller Webpage:

Lecture 18. BUS and MEMORY

Team VeryLargeScaleEngineers Robert Costanzo Michael Recachinas Hector Soto. High Speed 64kb SRAM. ECE 4332 Fall 2013

Electronic Circuits EE359A

! Sequential Logic. ! Timing Hazards. ! Dynamic Logic. ! Add state elements (registers, latches) ! Compute. " From state elements

CACTI 5.1. Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, and Norman P. Jouppi HP Laboratories, Palo Alto HPL April 2, 2008*

Memory Basics. historically defined as memory array with individual bit access refers to memory with both Read and Write capabilities

LOW POWER HIGH PERFORMANCE DECODER USING SWITCH LOGIC S. HAMEEDA NOOR 1, T.VIJAYA NIRMALA 2, M.V.SUBBAIAH 3 S.SALEEM 4

Lecture 4&5 CMOS Circuits

! Review: Sequential MOS Logic. " SR Latch. " D-Latch. ! Timing Hazards. ! Dynamic Logic. " Domino Logic. ! Charge Sharing Setup.

Memory, Latches, & Registers

EE 330 Lecture 44. Digital Circuits. Dynamic Logic Circuits. Course Evaluation Reminder - All Electronic

DESIGN OF LOW POWER HIGH PERFORMANCE 4-16 MIXED LOGIC LINE DECODER P.Ramakrishna 1, T Shivashankar 2, S Sai Vaishnavi 3, V Gowthami 4 1

Shyamkumar Thoziyoor, Naveen Muralimanohar, and Norman P. Jouppi Advanced Architecture Laboratory HP Laboratories HPL October 19, 2007*

STATIC cmos circuits are used for the vast majority of logic

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering

Lecture 6: Electronics Beyond the Logic Switches Xufeng Kou School of Information Science and Technology ShanghaiTech University

In this lecture: Lecture 8: ROM & Programmable Logic Devices

Digital Microelectronic Circuits ( ) CMOS Digital Logic. Lecture 6: Presented by: Adam Teman

Texas Instruments TI046B1 Serial FRAM

Digital Microelectronic Circuits ( ) Pass Transistor Logic. Lecture 9: Presented by: Adam Teman

Topics. Memory Reliability and Yield Control Logic. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

Energy Recovery for the Design of High-Speed, Low-Power Static RAMs

A Low-Power SRAM Design Using Quiet-Bitline Architecture

Introduction to CMOS VLSI Design (E158) Lecture 5: Logic

EE 330 Lecture 44. Digital Circuits. Ring Oscillators Sequential Logic Array Logic Memory Arrays. Final: Tuesday May 2 7:30-9:30

CMOS VLSI Design (A3425)

電子電路. Memory and Advanced Digital Circuits

Samsung S5K3L1YX Mp, 1/3.2 Inch Optical Format 1.12 µm Pixel Pitch Back Illuminated (BSI) CMOS Image Sensor

Design and Implement of Low Power Consumption SRAM Based on Single Port Sense Amplifier in 65 nm

Preface to Third Edition Deep Submicron Digital IC Design p. 1 Introduction p. 1 Brief History of IC Industry p. 3 Review of Digital Logic Gate

Combinational Logic Gates in CMOS

Speed and Power Scaling of SRAM s

DESIGN AND ANALYSIS OF FAST LOW POWER. SRAMs

[Vivekanand*, 4.(12): December, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

A Read-Decoupled Gated-Ground SRAM Architecture for Low-Power Embedded Memories

A Three-Port Adiabatic Register File Suitable for Embedded Applications

ECE 471/571 Combinatorial Circuits Lecture-7. Gurjeet Singh

Switching threshold. Switch delay model. Input pattern effects on delay

Fast Low-Power Decoders for RAMs

EECS150 - Digital Design Lecture 19 CMOS Implementation Technologies. Recap and Outline

Low Power System-On-Chip-Design Chapter 12: Physical Libraries

An Interconnect-Centric Approach to Cyclic Shifter Design

Digital Design and System Implementation. Overview of Physical Implementations

EE 330 Lecture 42. Other Logic Styles Digital Building Blocks

! Is it feasible? ! How do we decompose the problem? ! Vdd. ! Topology. " Gate choice, logical optimization. " Fanin, fanout, Serial vs.

Memory, Latches, & Registers

nmos, pmos - Enhancement and depletion MOSFET, threshold voltage, body effect

Chapter 2 Combinational Circuits

Lecture 02: Digital Logic Review

Topic 6. CMOS Static & Dynamic Logic Gates. Static CMOS Circuit. NMOS Transistors in Series/Parallel Connection

12-nm Novel Topologies of LPHP: Low-Power High- Performance 2 4 and 4 16 Mixed-Logic Line Decoders

EECS 141: SPRING 98 FINAL

Homework 10 posted just for practice. Office hours next week, schedule TBD. HKN review today. Your feedback is important!

EE 330 Lecture 43. Digital Circuits. Other Logic Styles Dynamic Logic Circuits

Design of Low-Power High-Performance 2-4 and 4-16 Mixed-Logic Line Decoders

Power-Area trade-off for Different CMOS Design Technologies

EE434 ASIC & Digital Systems

(CSC-3501) Lecture 6 (31 Jan 2008) Seung-Jong Park (Jay) CSC S.J. Park. Announcement

Digital Timing Control in SRAMs for Yield Enhancement and Graceful Aging Degradation

ENEE307 Lab 7 MOS Transistors 2: Small Signal Amplifiers and Digital Circuits

Chapter 3 Digital Logic Structures

Introduction to CMOS VLSI Design (E158) Lecture 9: Cell Design

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N

Final Project Report 4-bit ALU Design

CHAPTER 3 NEW SLEEPY- PASS GATE

Lecture 9: Cell Design Issues

Design of a high speed and low power Sense Amplifier

UNIT-III GATE LEVEL DESIGN

EE141- Spring 2004 Digital Integrated Circuits

Chapter 3 Digital Logic Structures

Contents 1 Introduction 2 MOS Fabrication Technology

Lecture 9: Clocking for High Performance Processors

EE 42/100 Lecture 23: CMOS Transistors and Logic Gates. Rev A 4/15/2012 (10:39 AM) Prof. Ali M. Niknejad

EE 42/100 Lecture 24: Latches and Flip Flops. Rev B 4/21/2010 (2:04 PM) Prof. Ali M. Niknejad

EEC 118 Lecture #12: Dynamic Logic

Variability in Sub-100nm SRAM Designs

EE241 - Spring 2013 Advanced Digital Integrated Circuits. Announcements. Lecture 16: Power and Performance

Lecture 11: Clocking

USE GAL DEVICES FOR NEW DESIGNS

EEC 118 Lecture #11: CMOS Design Guidelines Alternative Static Logic Families

Module 4 : Propagation Delays in MOS Lecture 19 : Analyzing Delay for various Logic Circuits

UNIT-II LOW POWER VLSI DESIGN APPROACHES

Lecture 16. Complementary metal oxide semiconductor (CMOS) CMOS 1-1

Layers. Layers. Layers. Transistor Manufacturing COMP375 1

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2012

EE 330 Lecture 44. Digital Circuits. Other Logic Styles Dynamic Logic Circuits

Lecture Summary Module 1 Switching Algebra and CMOS Logic Gates

EE 330 Lecture 43. Digital Circuits. Other Logic Styles Dynamic Logic Circuits

Design and Implementation of High Speed Sense Amplifier for Sram

Analysis of Low Power-High Speed Sense Amplifier in Submicron Technology

(12) Patent Application Publication (10) Pub. No.: US 2011/ A1

Design and Analysis of Hybrid Current/Voltage CMOS SRAM Sense Amplifier with Offset Cancellation Karishma Bajaj 1, Manjit Kaur 2, Gurmohan Singh 3 1

(12) United States Patent (10) Patent No.: US 8,536,898 B2

NXP. P5CC052 Secure Contact PKI Smart Card Controller. Analog Circuit Analysis

VLSI Design 11. Sequential Elements

Transcription:

Digital Integrated Circuits (83-313) Lecture 8: Memory Peripherals Semester B, 2016-17 Lecturer: Dr. Adam Teman TAs: Itamar Levi, Robert Giterman 20 May 2017 Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from sources freely available on the internet. When possible, these sources have been cited; however, some references may have been cited incorrectly or overlooked. If you feel that a picture, graph, or code example has been copied from you and either needs to be cited or removed, please feel free to email adam.teman@biu.ac.il and I will address this as soon as possible.

2 Lecture Content

3 Memory Peripherals Overview

Memory Architecture ADDA-1 : ADDM Row Decoder ADD M-1 : ADD 0 Bit Line Sense Amplifiers /Drivers Column Decoder Storage Cell Word Line C 2 M Memory Size: W Words of C bits =W x C bits Address bus: A bits W=2 A Number of Words in a Row: 2 M Multiplexing Factor: M Number of Rows: 2 A-M Number of Columns: C x 2 M Row Decoder: A-M 2 A-M Column Decoder: M 2 M 4 Input/Output (C bits)

5 Memory Timing: Definitions

Row Decoder Major Peripheral Circuits Bit Line Storage Cell Row Decoder Column Multiplexer Sense Amplifier AW-1 : AM Word Line Write Driver Precharge Circuit Sense Amplifiers /Drivers C 2 M A M-1 : A 0 Column Decoder 6 Input/Output (C bits)

7 Row Decoder Design

Row Decoders A Decoder reduces the number of select signals by log 2. Number of Rows: N Number of Row Address Bits: log 2 N 8

Row Decoders Standard Decoder Design: Each output row is driven by an AND gate with k=log 2 N inputs. Each gate has a unique combination of address inputs (or their inverted values). For example, an 8-bit row address has 256 8-input AND gates, such as: WL WL 0 A7 A6 A5 A4 A3 A2 A1 A0 255 7 6 5 4 3 2 1 0 A A A A A A A A 9 NOR Decoder: DeMorgan will provide us with a NOR Decoder. In the previous example, we ll get 256 8-input NOR gates: WL0 A7 A6 A5 A4 A3 A2 A1 A0 WL255 A7 A6 A5 A4 A3 A2 A1 A0

How should we build it? Let s build a row decoder for a 256x256 SRAM Array. We need 256 8-input AND Gates. Each gate drives 256 bitcells We have various options: Which one is best? 10

Reminder: Logical Effort t t p EF pd, i pinv i i EF LE f LE i i i i b C i C in, i1 in, i PE F LE B L LEi b C i in,1 C N N opt i i EF PE F LE b N log PE log F LE B opt EF EF opt opt t t p EF t p N PE N pd pinv i i pinv i 11

Problem Setup For LE calculation we need to start with: Output Load (C L ) Input Capacitance (C in ) Branching (B) What is the Load Capacitance? 256 bitcells on each Word Line C 256C C WL Cell Wire 12 Let s ignore the wire for now What is the Input Capacitance? Let s assume our address drivers can drive a bit more than a bitcell, so: C 4C in, addr _ driver Cell

Problem Setup What is the Branching Effort? Lets take another look at the Boolean expressions: WL WL A A A A A A A A 0 7 6 5 4 3 2 1 0 A A A A A A A A 255 7 6 5 4 3 2 1 0 We see that half of the signals use A i and half use A i! So each address driver drives 128 8-input AND gates, but only one is on the selected WL path. C C ; C 127C B on path nand off path nand add _ driver Con path Coff path Cnand 127Cnand 128 C C on path nand 13

Number of Stages Altogether the path effort is: CWL PE LE B F LE bi LE 128 C address 13 LE 8k 2 LE 256C 4C Cell Cell The best case logical effort is LE 1 So the minimum number of stages for optimal delay is: PE N opt 2 13 13 log3.6 2 7 That s a lot of stages! 14

So which implementation should we use? The one with the minimum Logical Effort: LE 10 3 1 10 3; p 8 1 9 LE 2 5 3 10 3 p 4 2 6 LE 4 3 5 3 4 3 1 80 27; p 2 2 2 1 7 LE 43 3 2.37; p 2313 9 15

New optimal number of Stages So now we can calculate the actual path effort: PE F b LE N opt 13 2.37 2 19.418 log PE 7.7 3.6 i i k We could add another inverter or two to get closer to the optimal number of stages 16

Implementation Problems Address Line Capacitance: Our assumption was that C in,addr_driver =4C cell. But each address drives 128 gates That s a really long wire with high capacitance. This means that we will need to buffer the address lines This will probably ruin our whole analysis... Bit-cell Pitch: Each signal drives one row of bitcells. How will we fit 8 address signals into this pitch? 17

Predecoding - Concept Solution: Let s look at two decoder paths: WL 254, WL 255 We see that there are many shared gates. So why not share them? 18 For instance, we can use the purple output for both gates

Predecoding - Method How do we do this? If we look at the final Boolean expression, it has combinations of groups of inputs. By grouping together a few inputs, we actually create a small decoder. Then we just AND the outputs of all the pre decoders. For example: Two 4:16 predecoders 19 D dec A, A, A, A ; E dec A, A, A, A ; 0 1 2 3 4 5 6 7 WL D E ; WL D E ; WL D E ; 0 0 0 255 15 15 254 15 14

Predecoding - Example Let s look at our example: D dec A, A, A, A 0 1 2 3 E dec A, A, A, A 4 5 6 7 WL D E 0 0 0 WL D E 255 15 15 WL D E 254 15 14 20 What is our new branching effort? As before, each address drives half the lines of the small decoder. Each predecoder output drives 256/16 post-decoder gates. Altogether, the branching effort is: B b 16 256 addr _ driver bpredecoder 128 2 16 Same as before!

Predecoding - Solution Why is this a better solution? Each Address driver is only driving four gates less capacitance. We saved a ton of area by sharing gates. We can Pitch Fit 2-input NAND gates. 21

Another Predecoding Example We can try using four 2-input predecoders: This will require us to use 256 4-input NAND gates. 22

How do we choose a configuration? Pitch Fitting: 2-input NANDs vs. 4-input NAND. Switching Capacitance: How many wires switch at each transition? Stages Before the large cap: Distribution of the load along the delay. Conclusion: Usually do as much predecoding as possible! WL 0 WL 0 WL 1 WL 1 4 4 4 4 16 16 WL 127 WL 127 2 4 2 4 24 2 4 4 16 4 16 23 A 0 A 1 A 2 A 3 A 4 A 5 A 6 A 7 A 0 A 1 A 2 A 3 A 4 A 5 A 6 A 7

Alternative Solution: Dynamic Decoders 2-input NOR decoder 2-input NAND decoder 24

25 Column Multiplexer

Column Multiplexer First option PTL Mux with decoder Fast only 1 transistor in signal path. Large transistor Count A1 A0 B0 B1 B2 B3 Y 26

4 to 1 tree decoder Second option Tree Decoder For 2k:1 Mux, it uses k series transistors. Delay increases quadratically No external decode logic big area reduction. 27

28 Combining the Two

29 Precharge and Sense Amp

Precharge Circuitry Precharge bitlines high before reads bit bit_b Equalize bitlines to minimize voltage difference when using sense amplifiers bit bit_b 30

Sense Amplifiers t p = C ---------------- DV I av make DV as small as possible large small Idea: Use Sense Amplifer small transition s.a. input output 31

Differential Sense Amplifier Non-clocked Sense Amp has high static power. Clocked sense amp saves power Requires sense_clk after enough bitline swing Isolation transistors cut off large bitline capacitance 32

Further Reading Rabaey, et al. Digital Integrated Circuits (2 nd Edition) Elad Alon, Berkeley ee141 (online) Weste, Harris, CMOS VLSI Design (4 th Edition) 34