Process Variation Aware SRAM/Cache for Aggressive Voltage-Frequency Scaling

Similar documents
High Speed ADC Sampling Transients

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985

IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel

ECE315 / ECE515 Lecture 5 Date:

Figure.1. Basic model of an impedance source converter JCHPS Special Issue 12: August Page 13

SRAM Leakage Suppression by Minimizing Standby Supply Voltage

Control Chart. Control Chart - history. Process in control. Developed in 1920 s. By Dr. Walter A. Shewhart

A High-Sensitivity Oversampling Digital Signal Detection Technique for CMOS Image Sensors Using Non-destructive Intermediate High-Speed Readout Mode

MTBF PREDICTION REPORT

Yield Optimisation of Power-On Reset Cells and Functional Verification

Total Power Minimization in Glitch-Free CMOS Circuits Considering Process Variation

Uncertainty in measurements of power and energy on power networks

Dynamic Power Consumption in Virtex -II FPGA Family

Latency Insertion Method (LIM) for IR Drop Analysis in Power Grid

Calculation of the received voltage due to the radiation from multiple co-frequency sources

Sizing and Placement of Charge Recycling Transistors in MTCMOS Circuits

High Speed, Low Power And Area Efficient Carry-Select Adder

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

AC-DC CONVERTER FIRING ERROR DETECTION

Voltage Quality Enhancement and Fault Current Limiting with Z-Source based Series Active Filter

Total Power Minimization in Glitch-Free CMOS Circuits Considering Process Variation

Figure 1. DC-DC Boost Converter

HIGH PERFORMANCE ADDER USING VARIABLE THRESHOLD MOSFET IN 45NM TECHNOLOGY

California, 4 University of California, Berkeley

antenna antenna (4.139)

Shunt Active Filters (SAF)

A Mathematical Solution to Power Optimal Pipeline Design by Utilizing Soft Edge Flip-Flops

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation

Dual Functional Z-Source Based Dynamic Voltage Restorer to Voltage Quality Improvement and Fault Current Limiting

Learning Ensembles of Convolutional Neural Networks

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

A Novel Soft-Switching Two-Switch Flyback Converter with a Wide Operating Range and Regenerative Clamping

A High-Speed Multiplication Algorithm Using Modified Partial Product Reduction Tree

A Current Differential Line Protection Using a Synchronous Reference Frame Approach

Rejection of PSK Interference in DS-SS/PSK System Using Adaptive Transversal Filter with Conditional Response Recalculation

Control of Chaos in Positive Output Luo Converter by means of Time Delay Feedback

Figure 1. DC-DC Boost Converter

A study of turbo codes for multilevel modulations in Gaussian and mobile channels

Comparison of V I c control with Voltage Mode and Current Mode controls for high frequency (MHz) and very fast response applications

Understanding the Spike Algorithm

A NOVEL HIGH STEP-UP CONVERTER BASED ON THREE WINDING COUPLED INDUCTOR FOR FUEL CELL ENERGY SOURCE APPLICATIONS

Chapter 13. Filters Introduction Ideal Filter

29. Network Functions for Circuits Containing Op Amps

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results

Keywords: Speed binning, delay measurement hardware, process variation.

@IJMTER-2015, All rights Reserved 383

Multiple Error Correction Using Reduced Precision Redundancy Technique

Walsh Function Based Synthesis Method of PWM Pattern for Full-Bridge Inverter

Vectorless Analysis of Supply Noise Induced Delay Variation

AFV-P 2U/4U. AC + DC Power Solutions. series. Transient Generation for Disturbance Tests. only. High Performance Programmable AC Power Source

Digital Transmission

Triferential Subtraction in Strain Gage Signal Conditioning. Introduction

Exploiting Dynamic Workload Variation in Low Energy Preemptive Task Scheduling

Customer witness testing guide

Unit 1. Current and Voltage U 1 VOLTAGE AND CURRENT. Circuit Basics KVL, KCL, Ohm's Law LED Outputs Buttons/Switch Inputs. Current / Voltage Analogy

Optimal Sizing and Allocation of Residential Photovoltaic Panels in a Distribution Network for Ancillary Services Application

Multichannel Frequency Comparator VCH-315. User Guide

Space Time Equalization-space time codes System Model for STCM

Methods for True Power Minimization

Webinar Series TMIP VISION

Impact of Interference Model on Capacity in CDMA Cellular Networks. Robert Akl, D.Sc. Asad Parvez University of North Texas

MASTER TIMING AND TOF MODULE-

Micro-grid Inverter Parallel Droop Control Method for Improving Dynamic Properties and the Effect of Power Sharing

Chaotic Filter Bank for Computer Cryptography

Mismatch-tolerant Capacitor Array Structure for Junction-splitting SAR Analog-to-digital Conversion

Topology Control for C-RAN Architecture Based on Complex Network

COMPARISON OF VARIOUS RIPPLE CARRY ADDERS: A REVIEW

A MODIFIED DIRECTIONAL FREQUENCY REUSE PLAN BASED ON CHANNEL ALTERNATION AND ROTATION

Application Note 5324

Graph Method for Solving Switched Capacitors Circuits

Time-frequency Analysis Based State Diagnosis of Transformers Windings under the Short-Circuit Shock

N- and P-Channel 2.5-V (G-S) MOSFET

A Simple Satellite Exclusion Algorithm for Advanced RAIM

Development of a High Bandwidth, High Power Linear Amplifier for a Precision Fast Tool Servo System

RC Filters TEP Related Topics Principle Equipment

High Gain Soft-switching Bidirectional DC-DC Converters for Eco-friendly Vehicles

THE ARCHITECTURE OF THE BROADBAND AMPLIFIERS WITHOUT CLASSICAL STAGES WITH A COMMON BASE AND A COMMON EMITTER

Implementation Complexity of Bit Permutation Instructions

FAST ELECTRON IRRADIATION EFFECTS ON MOS TRANSISTOR MICROSCOPIC PARAMETERS EXPERIMENTAL DATA AND THEORETICAL MODELS

Research of Dispatching Method in Elevator Group Control System Based on Fuzzy Neural Network. Yufeng Dai a, Yun Du b

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques

Guidelines for CCPR and RMO Bilateral Key Comparisons CCPR Working Group on Key Comparison CCPR-G5 October 10 th, 2014

熊本大学学術リポジトリ. Kumamoto University Repositor

Process Variability Modeling for VLSI Circuit Simulation

Estimating Mean Time to Failure in Digital Systems Using Manufacturing Defective Part Level

1 GSW Multipath Channel Models

VRT014 User s guide V0.8. Address: Saltoniškių g. 10c, Vilnius LT-08105, Phone: (370-5) , Fax: (370-5) ,

PERFORMANCE EVALUATION OF BOOTH AND WALLACE MULTIPLIER USING FIR FILTER. Chirala Engineering College, Chirala.

Block-wise Extraction of Rent s Exponents for an Extensible Processor

Simulation Methodology for Analysis of Substrate Noise Impact on Analog / RF Circuits Including Interconnect Resistance

Analysis of Time Delays in Synchronous and. Asynchronous Control Loops. Bj rn Wittenmark, Ben Bastian, and Johan Nilsson

NETWORK 2001 Transportation Planning Under Multiple Objectives

Simulation of Distributed Power-Flow Controller (Dpfc)

Soft-Switched CCM Boost Converter with High Voltage Gain for High Power Applications

An Adaptive Over-current Protection Scheme for MV Distribution Networks Including DG

Neural-MOS Threshold Gate as a Way to Design On-Chip Learning Neuron Structures

A method to reduce DC-link voltage fluctuation of PMSM drive system with reduced DC-link capacitor

Transcription:

Process Varaton Aware SRAM/Cache for Aggressve Voltage-requency Scalng Avesta Sasan ( Mohammad A Makhzan), Houman Homayoun, Ahmed Eltawl, ad Kurdah {mmakhzan,hhomayou,aeltawl,kurdah}@uc.edu Unversty of Calforna Irvne Abstract-ths paper proposes a novel Process Varaton Aware SRAM archtecture desgned to nherently support voltage scalng. The perpheral crcutry of the SRAM s modfed to selectvely allow overdrvng a wordlne whch contans weak cell(s). Ths archtecture allows reducng the power on the entre array; however t selectvely trades power for correctness when rows contanng weak cells are accessed. The cell szng s desgned to assure successful read operatons. Ths avods flppng the content of the cells when the wordlne s overdrven. Our smulatons report 23% to 30% mprovement n cell access tme and 31% to 51% mprovement n cell wrte tme n overdrven wordlnes. Total area overhead s neglgble (4%). Low voltage operaton acheves more than 40% reducton n dynamc power consumpton and approxmately 50% reducton n leakage power consumpton. I. INTRODUCTION Wth fabrcated devce dmensons approachng the lmts of process technology capabltes, a rapd ncrease of manufacturng process varaton nduced defects s observed [1-5]. Ths n turns makes the defect rate n the memory devce senstve to changes n operaton parameters ncludng temperature, voltage and frequency. Due to the random nature of local process varaton, resultng defects have random and unform dstrbuton [1-5] that adversely affect the expected system yeld. urthermore, voltage scalng exponentally ncreases the mpact of process varaton on memory cell relablty, resultng n an exponental ncrease n the fault rate. In ths paper, we propose Selectve Charge Pumpng Cache (SCPC) archtecture wth mproved wordlne drver archtecture. The archtecture allows aggressvely scalng the supply voltage on the entre memory structure, and selectvely charge pumpng the wordlnes wth weak cells to hgher voltages. The cell s approprately szed to safe guard aganst read falures thus avodng flppng the content of the cells when the wordlne s n overdrve mode. The ncrease n leakage and area overhead as a result of ths reszng s neglgble (< 4%). The selectve nature of the crcut allows for sgnfcant savngs on both leakage and dynamc power consumpton whle only targetng falng cells wth selectve overdrvng. II. PRIOR WORK There exsts a multtude of technques to handle falng cells n SRAM structures. Among these technques row and column redundancy are wdely used. However these technques are lmted to a small fxed percentage of the memory array cells[7][8], and thus are poorly suted to dynamc parametrc errors. Another commonly used technque s Error Correctng Codes (ECC) [10][11] to deal wth transent defects. ECC guarded memores can handle dynamc faults albet at a heavy cost n power consumpton, area and complexty [17]. Statstcal szng and optmzaton of the SRAM cell for yeld enhancement s suggested n [12]. Ths pre-slcon technque could mprove producton yeld, however, t s lmted by conflct n szng requrement for dfferent types of falures [9]. In addton to desgn and crcut level technques, archtectural level technques have also been proposed to address manufacturng nduced process varaton. Inqustve Defect Cache (IDC) n [6] s a small drect or assocatve cache that works n parallel wth L1 cache and provdes a defect free vew of the cache for the processor n the current wndow of executon. However, n ths work the basc assumpton s that the data, f lost, could be recovered from lower level cache or memory and could only work for herarchcal structures. nally, the area overhead of ths method (about 12%) s large. Reszable caches are suggested n [9]. In ths technque, t s assumed that n a cache layout, two or more blocks are lad n one row, therefore the column decoders are altered to choose another block n the same row f the orgnal block s defectve. However, for tghtly coupled loops ths approach wll always result n a mss [6]. In the followng secton, we dscuss voltage scalng as the most vable approach for low power consumpton. Secton 4 then presents a dscusson of the proposed archtecture. Secton 5 dscusses the desgn consderatons assocated wth overdrvng the wordlne drvers. Secton 6 presents the performance mprovement whle secton 7 dscusses area and power mpact. The paper s concluded n secton 8. III. PROCESS VARIATION AND MEMORY CELL OPERATION UNDER VOLTAGE SCALING Process varaton n SRAM under voltage scalng was studed n [6][18] n whch varaton n process parameters was lumped nto an ndependent Gaussan dstrbuton characterzng the Vth fluctuatons of each transstor. In order to setup a baselne for comparson of our proposed soluton a smulaton was setup reproducng the observatons n [1-5]. In ths smulaton, the crcut under test s a standard sx transstor SRAM memory bt cell. The SPICE models used for the smulaton were obtaned from the Predctve Technology Model (PTM) [14] webste n 32nm. In the smulaton, Vdd s lowered from ts nomnal voltage (1V) to (0.6 V). In order to obtan a probablty of falure we assumed a constant access tme and used Monte Carlo smulatons to calculate the probablty of falure of a cell (accumulated read, wrte and destructve read probabltes). The SRAM cell under 978-3-9810801-5-5/DATE09 2009 EDAA

consderaton was desgned for a 48.1 ps access tme and 39.8 ps wrte tme at nomnal voltage (1V). 120ps operaton (access or wrte) tme was used to access the cell regardless of the voltage level. Another mportant ssue to consder s the frequency scalng polcy whch wll have a major effect on the probablty of falure. Along wth the voltage scalng the frequency wll or wll not scale. If the frequency s constant and the voltage s scaled the memory frequency management polcy wll be referred to as a xed requency Voltage Scalable (VS) polcy. On the other hand, f the frequency s scaled along wth voltage t wll be referenced as a requency Scalable and Voltage Scalable (SVS) polcy. In the example prevously gven, the VS polcy was used where the cycle tme s kept constant and an ncrease n the mean delay of the memory cells shfts a larger porton of access tme dstrbuton out of range. or future reference n the paper we wll defne the safe margn for both these polces as the ncrease n the access tme from mean access tme that reduces the probablty of falure below a controlled threshold and provdes mmunty from process varaton. IV. PROPOSED ARCHITECTURE To counter the effect of process varatons, typcally desgners overdrve the entre memory array. Ths leads to extra power consumpton as both leakage and dynamc power ncrease. We are proposng to use a modfed wordlne drver perpheral crcut to allow selectve wordlne overdrvng utlzng a small one step charge pump. The wordlne perpheral crcut wll drve the wordlne n two phases. In the frst phase, usng the suppled Vdd the wordlne s drven to Vdd. In the second phase the charge pump wll overdrve the wordlne voltage ncreasng the Vgs above the supply Vdd. Increase n Vgs mproves both access and wrte tme to the cell as wll be descrbed n the followng sectons. A. Improved wordlne drver gure 1 outlnes the proposed wordlne drver archtecture. It conssts of two consecutve NAND gates( possbly preceded by two or more nverters to ncrease fanout). The second NAND gate s pull up PMOS (P1) as shown n fgure 1 s connected to the supply voltage. Pull up PMOS (P2) s connected to the charge pump output. Usng ths drver P1 frst drves the wordlne to Vdd and condtoned upon actvaton of P2 the charge pump and wordlne start charge sharng. Beng at a hgher voltage the charge pump s capactor charge mgrates to the wordlne, effectvely rasng ts voltage as shown n gure 2. The proposed two phase confguraton for wordlne overdrvng scores two advantages over sngle phase charge pumpng. rst, charge sharng s a slow process, drvng the wordlne by solely charge sharng requres longer tme for the wordlne to reach ts peak value. Secondly, the two phase confguraton results n hgher wordlne peak value as compared to a wordlne drver wth a one phase charge pump confguraton. The delay unt n gure 1 controls the tmng dfference between the frst and second phase of drvng the wordlne. The delay unt conssts of small nverters (n ncreasng order of drvng power fan out-) and ts delay s controlled by channel length and/or the number of nverters n the chan. The output of delay unt s used to swtch from P1 to P2. gure 2 llustrate the tmng and operaton of the wordlne drver. The selectve behavor of ths crcut s controlled by the nput to the Nand-gate n the delay unt of the wordlne drver. If the external-nput to ths Nand-gate s hgh, the wordlne overdrve wll be nactve whereas a low nput ntates the overdrvng behavor. Typcally, the nput to the Nand-gate wll be from a defect map that stores the results of a BIST run that s performed for each new settng of voltage, frequency and temperature. Ths approach assures support for a large number of operaton modes. Alternatvely, f the system s known to have a small number of operatng modes, the confguraton nformaton can be stored n fuses that are confgured for each mode. gure 1: Proposed wordlne drver gure 4: Sgnals tmng order n the proposed charge pumped wordlne drver V. WORDLINE DRIVER DESIGN CONSIDERATIONS In ths paper we used the charge pump basc cell ntroduced n [15] as llustrated n gure 1. or the purpose of our proposed crcut we only use one stage of ths charge pump cell. CP and CPB wll have opposte polartes (phases). Note that two capactors are used n ths charge pump and dependng on the frequency of CP and the output capactance load the range of output fluctuaton could be controlled. The voltage of the wordlne after the second phase of wordlne drvng by the charge pump s related to the appled supply voltage, sze of the charge pump capactances, CP frequency and related devce szes of NMOS and PMOS transstors used wthn the charge pump. If CP and CPB stay constant durng cell access tme and transstors szes are chosen large enough, the wordlne voltage after overdrve s determned by:

V OverDrve = V N = 1 dd C + C cp ( Vdd Vth ) (1) wl + C cp C wl = [ Cgl + Cgh ] + C wre (2) In whch N s the number of cells n each word-lne. Cgl andcgh are the Gate Capactances of the th cell s access transstor connected to the low ( Cgl ) and hgh ( Cgh ) sde of the cell. C s the word-lne wre wre capactance. Cgl and Cgh are obtaned from PTM n 32 nm and the wre capactance was obtaned from CACTI 5.0 [16] wre models n 32 nm technology. VI. WORDLINE OVERDRIVING AND CELL STABILITY Overdrvng the wordlne drver may cause cell stablty problems. The access and pull down transstors n a SRAM cell durng a read as shown n fgure 3 form a voltage dvder. The sze of pull down transstor s chosen large enough to assure that the rse n the ntermedate node of the voltage dvder smaller than threshold voltage of devces used n the cell. Increasng the voltage of the access transstor lowers the resstance of the access transstor effectvely ncreasng the voltage at ths ntermedate node. Ths n turn ncreases the lkelhood of a bt flp durng a read operaton. In order to prevent ths effect one could trade cell area ncreasng the sze of pull down transstor to counter the voltage overdrve mpact on cell stablty. Increasng the sze of the pull down devce also mpacts the Statc Nose Margn (SNM) and the leakage of the SRAM cell. To establsh a far comparson, we lnearly upsze the tradtonal cell and compare the falure probablty of both archtecture wth equvalent cell areas. The followng analyss s based on a charge pump archtecture that effectvely ncreases the wordlne voltage by 40% over the suppled cell voltage. The charge pump capactances to acheve ths level of overdrve could be calculated from equaton 1. The rse n the ntermedate pont of the voltage dvder (durng a read operaton) can be expressed as: VDSAT + CR( VDD VTn ) Γ ΔV = (3) CR 2 2 2 Γ = V.(1 + CR) + CR.( V V ) + 2. CRV..( V V ) (4) DSATA DD Tn DSAT In whch CR s the cell rato of the pull down transstor to the access transstor and s obtaned from: W L CR. PD AC = (5) LPD WAC gure 4 llustrates the voltage rse n node L (node storng a 0) n both charge pumped and tradtonal archtecture varyng based on Pull-down cell sze. In ths smulaton the threshold voltage of devces s 0.31V. If the 0.27V voltage rse s chosen n the tradtonal desgn (to allow proper SNMread margn) the pull down to access transstor rato n the tradtonal cell base on fgure 4 should be 1.2. or the same voltage rse when overdrvng the wordlne the access transstor should be 1.4. Ths means ncreasng the sze of the pull down devce from (38.4nm to 44.8nm n a 32nm desgn) or (78nm to 91nm n DD WL 65nm). Ths n turns mean a larger layout and a larger leakage. It s mportant to note that the change n cell area s strongly dependent on the cell s layout. In the rest of ths paper we buld our dscusson based on the cell layout llustrated n fgure 3 however for any other SRAM layout a smlar methodology could be developed. By changng the rato of the pull up to pull down (λ) the SNM s degraded however for the slght change n the pull down devce (from 1.2 to 1.4) the change n the SNM s measured to be less than 4%. gure 3: SRAM crcut and layout gure 4: Voltage rse n node wth store value of 0 durng a read operaton. VII. IMPACT O CHARGE PUMP ON ACCESS/READ TIME Overdrvng the wordlne usng a charge pump mproves both mean and standard devaton of the access/wrte tme dstrbuton. As prevously mentoned, n order to guarantee read stablty durng a read operaton va wordlne overdrvng, the pull down transstor s upszed to assure the same voltage rse as n tradtonal cell, whle the area of the modfed cell and the proposed cell s equvalent. A. Mean access/wrte tme mprovement We prevously dscussed that by means of charge pumpng the wordlnes we could mprove the read/wrte tme of the cell and make t less senstve to process varaton. In ths secton, the gans assocated wth ths procedure are quantfed. A smulaton was setup usng Berkley 32nm PTM where the mean access/wrte tmes of the cells n the proposed scheme was compared to that of the tradtonal archtecture. gure 5 llustrates the smulaton results comparng the access tme and wrte tme of the proposed archtecture n charge pumpng and regular access mode to that of the tradtonal archtecture. It s nterestng to note that decreasng the voltage ncreases the percentage mprovement n the access tme but up to a pont (0.63v) after whch a further decrease n the voltage does not yeld an mprovement n the charge pump archtecture. Ths s because at lower voltage the charge pump requres a Q M2 M5 B M1 M3 B M4 M6 Q VDD GN W

longer tme to charge up and therefore when the charge pump s actvated t does not reach full charge. B. Improvements n the standard devaton from mean. In order to understand the effectveness of the proposed wordlne drver to mtgate process varaton effects a Monte Carlo smulaton was setup n whch the threshold voltage values, whch are based on a Gaussan dstrbuton [1-5], were vared. The normalzed result of ths smulaton s llustrated n gure 6. As dscussed n the prevous secton voltage scalng ncreases the mean access tme. In addton, results obtaned from ths smulaton show that not only does the mean shft but also the standard devaton from mean s modfed by voltage scalng. Ths poses a lmtaton on defnng a safe margn for a SVS frequency management polcy. In other words, the safety margn can no longer be specfed as a fxed value from the mean snce the standard devaton vares dependng on the appled supply voltage. or example, by referrng to gure 6, t s clear that the same safety margn that s used for the nomnal 1 v wll result n a hgher falure rate f used for the 0.8 v settng due to the larger standard devaton. gure 5: Percentage mprovement n Access tme (Top) and wrte tme (Bottom) of the charge pumped archtecture compared to tradtonal archtecture. As dscussed n the prevous secton usng the proposed wordlne drver results n a smaller shft n the mean access tme of the cell upon voltage scalng. gure 6 also reveals another nterestng characterstc of wordlne overdrvng: Although the charge pumped archtecture s also affected by process varaton, the standard devaton from mean access tme s reduced. Therefore for a fxed safe margn polcy the number of weak cells s decreased. Change n the mean and standard devaton of the access tme for charge pumped and tradtonal archtecture s summarzed n table 1. C. Improvements n the probablty of falure at lower voltages As explaned n secton 3, voltage scalng could be accomplshed wth ether SVS or VS frequency management polces. In the followng secton we nvestgate how the probablty of falure changes wth each of these frequency management polces. Probablty of falure n a system wth VS frequency management polcy Reducng the voltage ncreases the cell access and wrte tme. Therefore n a system wth an VS frequency, voltage scalng reduces the gap between maxmum realzable and clocked frequency. Reducng ths gap ncreases the senstvty to process varaton such that f n nomnal voltage 6σ varaton n Vth result n a faulty behavor n a cell, at lower voltages due to both mean shftng (as explaned n 6.1) and ncrease n the devaton of the access tme (as explaned n secton 6.2) a much smaller varaton could result n a defectve behavor. TABLE 1: CHANGE IN THE MEAN AND STANDARD DEVIATION O ACCESS TIME μ SHIT σ μ SHIT σ SHIT SHIT VOLTAGE 1-PHASE CP TRAD CP 1-PHASE TRAD 2- PHASE 2- PHASE 1.0 V 11.53 1.72 8.11 1.52 0.9 V 15.3 2.37 11.4 2.28 0.8 V 19.9 6.97 16 5.98 0.7 V 31.4 10.46 27.5 17.14 0.6 V 34.7 15.27 28 26.13 To quantfy how the probablty of falure of a VS system changes due to voltage scalng, a smulaton was setup where an SRAM wth maxmum realzable access tme of 48.1ps and wrte tme of 39.8ps experences voltage scalng. The SRAM operates at a fxed 120ps and the access tme s quantfed from nomnal voltage down to a lower voltage (close to Vth voltage). An error occurs f the access tme of 120ps s not honored. The process varaton s modeled as a varaton n Vth of each transstor. Process varaton from - 6σ to 6σ [1-5] for each transstor s consdered when Monte- Carlo smulaton are performed. The obtaned results are used to produce a probablty of falure for each voltage as shown n gure 7 (left) for both the tradtonal and the charge pump based approach. The probablty of falure of the charge pump archtecture s a several orders of magntude smaller that the tradtonal archtecture. Probablty of falure n a system wth SVS frequency management polcy In a requency Scalng and Voltage Scalng (SVS) frequency management polcy, memory access tme (frequency) s scaled as the voltage s scaled. In ths case a Monte Carlo smulaton was setup wth a fxed safety margn of 30 ps. gure 7 (rght) llustrate the result of ths smulaton. Agan a smlar trend s observed where the charge pump archtecture outperforms the tradtonal archtecture by several orders of magntude.

gure 6: Comparson of the effect of process varaton on access tme n charge pumped and tradtonal archtecture. the suppled voltage and operate at lower voltages or alternatvely, acheve a hgher producton yeld when operated at nomnal voltage. The extra power consumpton s composed of two parts: leakage and dynamc power consumpton. Leakage power consumpton s ncreased because not only the number of transstors n each wordlne drver s ncreased but also the capactor that s used for charge pumpng wll add to the leakage when t s dle and not power gated. urthermore, the charge pumped archtecture requres a small defect map to dentfy the faulty locatons. The extra dynamc power consumpton on the other hand s a result of defect map query, overdrvng the wordlne drver and the extra swtchng actvtes by the charge pumpng crcut. Over drvng the wordlne drver from Vdd to Vdd + Δ V s made possble by charge sharng between the charge pump capactor and the wordlne capactor. Durng the charge sharng process (Phase 2 of wordlne drvng) there s no path to voltage source but n the non-actve part of the clock cycle the capactors wll be recharged and therefore t drans current from the supply voltage. As prevously stated a polcy to control the CP and CPB sgnals s needed snce the frequency of CP and CPB sgnals wll defne the number of tmes the charge sharng s performed wthn one access. In ths smulaton we used a smple mplementaton where only after charge pumpng the sgnals CP and CPB wll swtch. Usng ths method the charge sharng s only performed once and the capactor that prevously was dscharged wll have the off cycle of the current clock perod and the entre next clock cycle to recharge. Ths behavor s controlled by the latch and XOR unt n the crcut llustrated n gure3. In order to demonstrate the power savngs, obtaned as a result of voltage scalng, a spce smulaton was setup. In ths spce smulaton the total power consumpton of two memory banks dentcal n all aspects except the wordlne drvers s compared. At each voltage level the average power consumpton over 10,000 read operatons s obtaned. The voltage scalng n each structure s possble for as long as the nequalty n (6) holds: E ( P [ WL] ) N P [ WL] R =. (6) Row Where R s the redundancy budget (8 n ths case), N s Row the number of rows n the bank and P (WL) s obtaned from: P ( WL) 1 P ( WL) = (7) [ ] cells H N H ( WL) = 1 P (8) P gure 7: Total probablty of falure (W+A+B) for VS polcy (left) and SVS polcy (rght) VIII. POWER CONSUMPTION SIMULATION RESULTS AND DISCUSSION The charge pumped archtecture consumes more power compared to tradtonal archtecture when operated at the same supply voltage assumng that the charge pumpng crcut s not power gated. However, the ablty to tolerate process varaton defects allows the charge pumped archtecture to down scale In whch P s the probablty of falure (and could be replaced by P cp or P ) where P trad cp s the probablty of falure of the charge pumped archtecture whle P trad s the probablty of falure of the tradtonal archtecture. Ncell s the number of cells on each wordlne. At each voltage the number of wordlnes Nw that requre charge pumpng s expected to be: N = E P WL = P WL N (9) w ( [ ]) [ ] Row. Trad Trad

or the purpose of ths smulaton we used a SVS polcy for voltage scalng wth P cp and Ptrad drawn from gure 7. or the charge pumped archtecture we need a defect map to store the faulty locatons. In practce many confguratons could be used for the defect map such as a very small SRAM always operatng at full supply voltage, flash bts mplemented at each wordlne, or set of latches updated externally. In the followng smulaton we setup our defect map as a set of latches mplemented on each wordlne. To create a far comparson we have ncluded the power consumpton and area of these latches as part of the analyss. Note that n very low voltages, the cache/sram archtecture mght have all the wordlnes n the charge pumpng mode. Although ths means ncremented dynamc power n almost all accesses, however stll we sgnfcantly save on leakage snce the leakage s managed usng the lower supply voltage. gure 8 depcts the power savngs of the two archtectures. Based on the probabltes of the falure, the expected number of weak rows (non operatonal at that voltage level) s obtaned and compared between the tradtonal and the charge pumped archtecture as depcted n gure 8 (top). In all the smulatons presented n gure 8 a redundancy of 8 rows s assumed. gure 8 (center) llustrates the dynamc power savngs for the two archtectures. It can be seen that a savngs n dynamc power of 34% (on top of tradtonal voltage scaled approaches) s possble usng the charge pumped archtecture by runnng the array at a lower voltage. In a 16KB SRAM, arranged n 256 rows, wth each row contanng 512 cells, and usng a redundancy of 8 rows the tradtonal archtecture, gven the probablty of falure n gure 7 could only scale the voltage down to 0.87V. However, the charge pumped archtecture can be downscaled to 0.67 volts whle mantanng the same performance. The savngs n leakage power are even more pronounced due to the exponental dependence on supply voltage as shown n gure 8 (bottom), where a savngs of more than 43% s acheved as compared to the tradtonal voltage scalng approaches. nally, usng the charge pump archtecture results n a savngs of 50% n dynamc power and 62% n leakage power when compared to a tradtonal archtecture workng at full voltage (1volt). These savngs n power consumpton could be mproved wth ntroducton of the power gatng to the wordlne drves and sharng a charge-pump between multple wordlnes. IX. CONCLUSION In ths paper we presented a novel archtecture for low power and hgh yeldng memory arrays. Proposed approach utlzes a charge pump wordlne drver and selectvely overdrves the wordlnes contanng weak cells. Ths archtecture acheves power savngs of more that 50% n dynamc, and 60% n leakage power as compared to the tradtonal archtecture runnng at nomnal voltage. Alternatvely, when operated at the same voltage as tradtonal memory t provdes an mprovement n memory array yeld. gure 8: (Top): Expected number of rows contanng defectve weak cells (falng cells) for an V Voltage scalng polcy wth 120ps access tme. (center): Dynamc power savngs. (Bottom): Comparson of Leakage power savngs. REERENCES [1] S. R. Nassf Modelng and Analyss of manufacturng varaton n Proc. CICC, 2001 pp/ 223-228 [2] S. Borkar, T. Karnk, et al., Process Varaton and mpact on crcuts and mcro archtectures, n Proc DAC 2003 pp338-342 [3] S. Mukhopadhyay, H. Mahmood, K. Roy Modelng of alure Probablty and Statstcal Desgn of SRAM Array for Yeld Enhancement n NanoScaled CMOS CADICS Vol.24 NO. 12, DEC 2005 [4] A. Bhavnagarwala, X. et al. The mpact of ntrnsc devce fluctuaton on CMOS SRAM cell stablty, IEEE J. Sold-State Crcuts vol.36, no.4 pp 658-665 Apr 2001 [5] H. Mahmood, at al.. Modelng of falure probablty and statstcal desgn of SRAM array for yeld enhancement n nano-scaled cmos, IEEE Trans CAD, 2003 [6] Avesta Sasan (Mohammad A Makhzan), Amn Khajeh, Ahmed Eltawl, ad Kurdah, Lmts of Voltage Scalng for Caches Utlzng ault Tolerant Technques. ICCD 2007. [7] S. E. Schuster, Multple word/bt lne redundancy for semconductor memores, IEEE J. Sold-State Crcuts, vol. SC-13, no. 5, pp. 698 703, Oct. 1978. [8] M. Horguch, Redundancy technques for hgh-densty DRAMS, n Proc. 2nd Annu. IEEE Int. Conf. Innovatve Systems n Slcon, Oct. 1997, pp. 22 29. [9] A. Argawal, B. C. Paul, S Mukhopadhyay, K. Roy Process Varaton n Embedded Memores: alure Analyss and Varaton Aware Archtecture. IEEE Journal of Sold State Curcuts, VOL. 40, NO. 9, SEPTEMBER 2005 [10] H. L. Kalter et al., A 50-ns 16-Mb DRAM wth a 10 ns data rate and on chp ECC, IEEE J. Sold-State Crcuts, vol. 25, no. 5, pp. 1118 1128, Oct. 1990. [11] D. Wess, J. J. Wuu, and V. Chn, The on-chp 3-MB subarray-based thrd level cache on an tanum mcroprocessor, IEEE J. Sold-StateCrcuts, vol. 37, no. 11, pp. 1523 1529, Oct. 1990. [12] S. Mukhopadhyay, et al., Statstcal desgn and optmzaton of SRAM cell for yeld enhancement, n Proc. Int. Conf. Computer Aded Desgn (ICACD), Nov. 2004, pp. 10 13. [13] P. P. Shrvan and E. J. McCluskey, PADded Cache: A New ault-tolerance Technque for Cache Memores, In Proc. Of 17th IEEE VLSI Test Symposum, pp.440-445, Aprl 1999. [14] http://www.eas.asu.edu/~ptm [15] Bhalerao, et al., A CMOS Low Voltage Charge Pump VSLID 2007 [16] http://qud.hpl.hp.com:9082/cact/ [17] G. Soh, Cache Memory Organzaton to Enhance the Yeld of Hgh Performance VLSI Processors, IEEE Trans. Comp., vol.38(4),, pp.484-492, Aprl 1989 [18] Avesta Sasan (Mohammad A Makhzan), Houman Homayoun, Ahmed Eltawl, ad Kurdah, Archtectural and Algorthm Level ault Tolerant Technques for Low Power Hgh Yeld Multmeda Devces. ICCD 2007.