RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM
|
|
- Sabrina Gregory
- 5 years ago
- Views:
Transcription
1 RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM Fengbin Tu, Weiwei Wu, Shouyi Yin, Leibo Liu, Shaojun Wei Institute of Microelectronics Tsinghua University The 45th International Symposium on Computer Architecture - ISCA 2018
2 Ubiquitous Deep Neural Networks (DNNs) Image Classification Object Detection Video Surveillance Speech Recognition 1
3 DNN Requires Large On-Chip Buffer Modern DNN s layer data storage can reach 0.3~6.27MB. The numbers will increase if the network processes higher resolution images or larger batch size. [1] Krizhevsky et al., ImageNet Classification with Deep Convolutional Neural Networks, NIPS 12. [2] Simonyan et al., Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 15. [3] Szegedy et al., Going Deeper with Convolutions, CVPR 15. [4] He et al., Deep Residual Learning for Image Recognition, CVPR 16. 2
4 SRAM-based DNN Accelerators The small footprint limits the on-chip buffer size of conventional SRAM-based DNN accelerators. Usually <500KB with area cost of 3~20mm 2. (Normalized) IO FC/LSTM Configurable Interface Weight Buffer CONV Configuratin Configuratin Controller Configuration Context Heterogeneous PE Array PE PE PE... PE PE PE PE PE PE... PE PE PE PE PE PE... PE PE PE PE PE PE... PE PE PE... Buffer CTRL Buffer CTRL Data Buffer1 Bank[0] Bank[47] Bank[0] Bank[47]... IO Super PE Super PE Super... Super Super PE PE PE Super PE Data Buffer2 Data Buffer System Thinker, 348KB, 19.4mm 2 DianNao, 44KB, 3.0mm 2 Eyeriss, 182KB, 12.3mm 2 Envision, 77KB, 10.1mm 2 (Normalized) Thinker: Yin et al., A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications, JSSC 18. DianNao: Chen et al., DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning, ASPLOS 14. Eyeriss: Chen et al., Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks, ISSCC 16. Envision: Moons et al., ENVISION: A 0.26-to-10TOPS/W Subword-Parallel Dynamic-Voltage-Accuracy-Frequency-Scalable Convolutional Neural Network Processor in 28nm FDSOI, ISSCC 17. 3
5 SRAM vs. edram (Embedded DRAM) edram has higher density than SRAM. Refresh is required for data retention. Charge will leak over time and might cause retention failures. 4
6 Refresh is an Energy Bottleneck [1] HPCA 13 edram Power Breakdown [2] ISCA 10 System Power Breakdown Overhead: edram Refresh Energy [1] Chang et al., Technology Comparison for Large Last-Level Caches (L3Cs): Low-Leakage SRAM, Low Write-Energy STT-RAM, and Refresh-Optimized edram, HPCA [2] Wilkerson et al., Reducing Cache Power with Low-Cost, Multi-bit Error-Correcting Codes, ISCA 10.
7 Opportunity to Remove edram Refresh Refresh Interval = Retention Time Ghosh, Modeling of Retention Time for High-Speed Embedded Dynamic Random Access Memories, TCASI
8 Opportunity to Remove edram Refresh Refresh is unnecessary, if Data Lifetime < Retention Time Opportunity1: Increase retention time by training. Opportunity2: Reduce data lifetime by scheduling. 7
9 RANA: Retention-Aware Neural Acceleration Framework 1. Accuracy Constraint 2. edram Retention Time Distribution 1. Energy Modeling 2. Data Lifetime Analysis 3. Buffer Storage Analysis 1. Data Mapping 2. Memory Controller Modification DNN Accelerator 2. Target DNN Model Retention-Aware Training Method Tolerable Retention Time Hybrid Computation Pattern Layerwise Configurations Refresh-Optimized edram Controller Optimized Energy Consumption (Training) (Scheduling) (Architecture) Compilation Phase Execution Phase Strengthen DNN accelerators with refresh-optimized edram: Increase on-chip buffer size by replacing SRAM with edram. Reduce energy overhead by removing unnecessary edram refresh. 8
10 RANA: Retention-Aware Neural Acceleration Framework DNN Accelerator 2. Target DNN Model Retention-Aware Training Method Tolerable Retention Time Hybrid Computation Pattern Layerwise Configurations Refresh-Optimized edram Controller Optimized Energy Consumption (Training) (Scheduling) (Architecture) DNN accelerator DNN model Layer description Hardware constraints edram Controller Unified Buffer System edram Bank Switch to the next layer No Run scheduling scheme Computation Pattern: <OD/WD, Tm, Tn, Tr, Tc> The last layer? Yes Reference Clock Programmable Clock Divider Refresh Issuer edram Refresh Flags edram Bank edram Bank edram Bank edram Bank Configurations for each layer Retention Time Data Lifetime Refresh Control 9
11 Tech1: Retention-Aware Training Method Retention time is diverse among different cells. Retention failure rate: Fraction of the cells under the given retention time. The weakest cell appears at the 45micro-second point. Typical edram Retention Time Distribution (32KB) Kong et al., Analysis of Retention Time Distribution of Embedded DRAM A New Method to Characterize Across-Chip Threshold Voltage Variation, ITC
12 Tech1: Retention-Aware Training Method Retrain the network to tolerate higher failure rate and get longer tolerable retention time. Target DNN Model Failure Rate (r) Fixed-Point Pretrain Fixed-Point DNN Model Random Bit-Level Errors Weight Adjustment Adding Layer Masks Retrain Retention-Aware Training Method Retention-Aware DNN Model 11
13 Tech1: Retention-Aware Training Method Failure rate of 10 5 : No accuracy loss, 734μs. Failure rate of 10 4 : Accuracy decreases. 45μs 734μs 1030μs Relative Accuracy under Different Retention Failure Rates 12
14 Tech2: Hybrid Computation Pattern Computation pattern, expressed in a loop. Data lifetime and buffer storage are related to the loop ordering, especially the outermost-level loop. 13
15 Tech2: Hybrid Computation Pattern Outputs are dynamically updated by accumulation, which recharges the cells like periodic refresh. Different computation patterns have different data lifetime and buffer storage requirements. Input Dependent Output Dependent Weight Dependent 14
16 Tech2: Hybrid Computation Pattern Scheduling scheme: Input: DNN accelerator and network s parameters. Optimization: Minimize total system energy. Output: Layerwise configurations. DNN accelerator DNN model Switch to the next layer Layer description Hardware constraints Run scheduling scheme Computation Pattern: <OD/WD, Tm, Tn, Tr, Tc> Scheduling Scheme min Energy s. t. Energy = Equation (14), Tn Th Tl R i, Tm Tr Tc R o, Tm Tn K 2 R w, 1 Tm M, 1 Tn N, 1 Tr R, 1 Tc C. No The last layer? Yes Configurations for each layer 15
17 Tech3: Refresh-Optimized edram Controller edram controller: Programmable clock divider: Refresh interval. Refresh issuers and flags, for each edram bank. Configuration from Tech1 & Tech2. edram Controller Unified Buffer System Reference Clock Programmable Clock Divider Refresh Issuer edram Bank edram Bank edram Bank edram Bank edram Refresh Flags edram Bank 16
18 Evaluation Platform RTL-level cycle-accurate simulation, for performance estimation and memory access tracing. System-level energy estimation, based on synthesis, Destiny and CACTI. DNN Accelerator edram Platform Configurations 256 MACs, 384KB SRAM, 200MHz, 5.682mm 2, 65nm 1.454MB, retention time = 45μs, 65nm Kong et al., Analysis of Retention Time Distribution of Embedded DRAM A New Method to Characterize Across-Chip Threshold Voltage Variation, ITC
19 Experimental Results edram refresh operations: 99.7% Off-chip memory access: 41.7% System energy consumption: 66.2% 18
20 Scalability to Other Architectures DaDianNao: 4096 MACs, 36MB edram, 606MHz. edram refresh operations: 99.9% System energy consumption: 69.4% Chen et al., DaDianNao: A Machine-Learning Supercomputer, MICRO
21 Takeaway DNN Accelerator 2. Target DNN Model Retention-Aware Training Method Tolerable Retention Time Hybrid Computation Pattern Layerwise Configurations Refresh-Optimized edram Controller Optimized Energy Consumption (Training) (Scheduling) (Architecture) RANA: Retention-Aware Neural Acceleration Framework Training: Retention-aware training method. Exploit DNN s error resilience to improve tolerable retention time. Scheduling: Hybrid computation pattern. Different computing order and parallelism show different data lifetime and buffer storage requirement. Architecture: Refresh-Optimized edram controller. No need to refresh all the banks. No need to always use the worst-case refresh interval. Not limited to applying edram to DNN acceleration. Approximate computing: Retention and error resilience. 20
22 Thank you for your attention!
An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet
LETTER IEICE Electronics Express, Vol.14, No.15, 1 12 An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet Boya Zhao a), Mingjiang Wang b), and Ming Liu Harbin
More informationAI Application Processing Requirements
AI Application Processing Requirements 1 Low Medium High Sensor analysis Activity Recognition (motion sensors) Stress Analysis or Attention Analysis Audio & sound Speech Recognition Object detection Computer
More informationMultiband NFC for High-Throughput Wireless Computer Vision Sensor Network
Multiband NFC for High-Throughput Wireless Computer Vision Sensor Network Fei Y. Li, Jason Y. Du 09212020027@fudan.edu.cn Vision sensors lie in the heart of computer vision. In many computer vision applications,
More informationOpportunities and Challenges in Ultra Low Voltage CMOS. Rajeevan Amirtharajah University of California, Davis
Opportunities and Challenges in Ultra Low Voltage CMOS Rajeevan Amirtharajah University of California, Davis Opportunities for Ultra Low Voltage Battery Operated and Mobile Systems Wireless sensors RFID
More informationStatus and Prospect for MRAM Technology
Status and Prospect for MRAM Technology Dr. Saied Tehrani Nonvolatile Memory Seminar Hot Chips Conference August 22, 2010 Memorial Auditorium Stanford University Everspin Technologies, Inc. - 2010 Agenda
More informationREVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.
December 3-6, 2018 Santa Clara Convention Center CA, USA REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND. https://tmt.knect365.com/risc-v-summit @risc_v ACCELERATING INFERENCING ON THE EDGE WITH RISC-V
More informationTechnology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.
FPGAs 1 CMPE 415 Technology Timeline 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs FPGAs The Design Warrior s Guide
More informationLecture 30. Perspectives. Digital Integrated Circuits Perspectives
Lecture 30 Perspectives Administrivia Final on Friday December 15 8 am Location: 251 Hearst Gym Topics all what was covered in class. Precise reading information will be posted on the web-site Review Session
More informationΕΠΛ 605: Προχωρημένη Αρχιτεκτονική
ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική Υπολογιστών Presentation of UniServer Horizon 2020 European project findings: X-Gene server chips, voltage-noise characterization, high-bandwidth voltage measurements,
More informationOn the Rules of Low-Power Design
On the Rules of Low-Power Design (and Why You Should Break Them) Prof. Todd Austin University of Michigan austin@umich.edu A long time ago, in a not so far away place The Rules of Low-Power Design P =
More informationLecture Perspectives. Administrivia
Lecture 29-30 Perspectives Administrivia Final on Friday May 18 12:30-3:30 pm» Location: 251 Hearst Gym Topics all what was covered in class. Review Session Time and Location TBA Lab and hw scores to be
More informationSSD Firmware Implementation Project Lab. #1
SSD Firmware Implementation Project Lab. #1 Sang Phil Lim (lsfeel0204@gmail.com) SKKU VLDB Lab. 2011 03 24 Contents Project Overview Lab. Time Schedule Project #1 Guide FTL Simulator Development Project
More informationLow Power System-On-Chip-Design Chapter 12: Physical Libraries
1 Low Power System-On-Chip-Design Chapter 12: Physical Libraries Friedemann Wesner 2 Outline Standard Cell Libraries Modeling of Standard Cell Libraries Isolation Cells Level Shifters Memories Power Gating
More informationPROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs
PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs Li Zhou and Avinash Kodi Technologies for Emerging Computer Architecture Laboratory (TEAL) School of Electrical Engineering and
More informationWireless Sensor Networks (aka, Active RFID)
Politecnico di Milano Advanced Network Technologies Laboratory Wireless Sensor Networks (aka, Active RFID) Hardware and Hardware Abstractions Design Challenges/Guidelines/Opportunities 1 Let s start From
More informationREAL TIME DIGITAL SIGNAL PROCESSING. Introduction
REAL TIME DIGITAL SIGNAL Introduction Why Digital? A brief comparison with analog. PROCESSING Seminario de Electrónica: Sistemas Embebidos Advantages The BIG picture Flexibility. Easily modifiable and
More informationVARIATION MONITOR-ASSISTED ADAPTIVE MRAM WRITE
Shaodi Wang, Hochul Lee, Pedram Khalili, Cecile Grezes, Kang L. Wang and Puneet Gupta University of California, Los Angeles VARIATION MONITOR-ASSISTED ADAPTIVE MRAM WRITE NanoCAD Lab shaodiwang@g.ucla.edu
More informationA Differential 2R Crosspoint RRAM Array with Zero Standby Current
1 A Differential 2R Crosspoint RRAM Array with Zero Standby Current Pi-Feng Chiu, Student Member, IEEE, and Borivoje Nikolić, Senior Member, IEEE Department of Electrical Engineering and Computer Sciences,
More informationEEC 216 Lecture #10: Ultra Low Voltage and Subthreshold Circuit Design. Rajeevan Amirtharajah University of California, Davis
EEC 216 Lecture #1: Ultra Low Voltage and Subthreshold Circuit Design Rajeevan Amirtharajah University of California, Davis Opportunities for Ultra Low Voltage Battery Operated and Mobile Systems Wireless
More informationEnergy- Efficient Hardware for Embedded Vision and Deep Convolu=onal Neural Networks
Energy- Efficient Hardware for Embedded Vision and Deep Convolu=onal Neural Networks Vivienne Sze MassachuseKs Ins=tute of Technology Contact Info email: sze@mit.edu website: www.rle.mit.edu/eems In collaboraon
More informationBinary Neural Network and Its Implementation with 16 Mb RRAM Macro Chip
Binary Neural Network and Its Implementation with 16 Mb RRAM Macro Chip Assistant Professor of Electrical Engineering and Computer Engineering shimengy@asu.edu http://faculty.engineering.asu.edu/shimengyu/
More informationImage processing. Case Study. 2-diemensional Image Convolution. From a hardware perspective. Often massively yparallel.
Case Study Image Processing Image processing From a hardware perspective Often massively yparallel Can be used to increase throughput Memory intensive Storage size Memory bandwidth -diemensional Image
More informationLow-Power Communications and Neural Spike Sorting
CASPER Workshop 2010 Low-Power Communications and Neural Spike Sorting CASPER Tools in Front-to-Back DSP ASIC Development Henry Chen henryic@ee.ucla.edu August, 2010 Introduction Parallel Data Architectures
More informationA 32 Gbps 2048-bit 10GBASE-T Ethernet Energy Efficient LDPC Decoder with Split-Row Threshold Decoding Method
A 32 Gbps 248-bit GBASE-T Ethernet Energy Efficient LDPC Decoder with Split-Row Threshold Decoding Method Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California,
More informationDeep Trench Capacitors for Switched Capacitor Voltage Converters
Deep Trench Capacitors for Switched Capacitor Voltage Converters Jae-sun Seo, Albert Young, Robert Montoye, Leland Chang IBM T. J. Watson Research Center 3 rd International Workshop for Power Supply on
More informationCreating Intelligence at the Edge
Creating Intelligence at the Edge Vladimir Stojanović E3S Retreat September 8, 2017 The growing importance of machine learning Page 2 Applications exploding in the cloud Huge interest to move to the edge
More informationUtilization Based Duty Cycle Tuning MAC Protocol for Wireless Sensor Networks
Utilization Based Duty Cycle Tuning MAC Protocol for Wireless Sensor Networks Shih-Hsien Yang, Hung-Wei Tseng, Eric Hsiao-Kuang Wu, and Gen-Huey Chen Dept. of Computer Science and Information Engineering,
More informationDigital Integrated Circuits Perspectives. Administrivia
Lecture 30 Perspectives Administrivia Final on Friday December 14, 2001 8 am Location: 180 Tan Hall Topics all what was covered in class. Review Session - TBA Lab and hw scores to be posted on the web
More informationWAFTL: A Workload Adaptive Flash Translation Layer with Data Partition
WAFTL: A Workload Adaptive Flash Translation Layer with Data Partition Qingsong Wei Bozhao Gong, Suraj Pathak, Bharadwaj Veeravalli, Lingfang Zeng and Kanzo Okada Data Storage Institute, A-STAR, Singapore
More informationLow-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering
Low-Power VLSI Seong-Ook Jung 2013. 5. 27. sjung@yonsei.ac.kr VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering Contents 1. Introduction 2. Power classification & Power performance
More informationImproving the Reliability of. NAND Flash, Phase-change RAM and Spin-torque Transfer RAM. Chengen Yang
Improving the Reliability of NAND Flash, Phase-change RAM and Spin-torque Transfer RAM by Chengen Yang A Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy
More informationAn Energy Scalable Computational Array for Energy Harvesting Sensor Signal Processing. Rajeevan Amirtharajah University of California, Davis
An Energy Scalable Computational Array for Energy Harvesting Sensor Signal Processing Rajeevan Amirtharajah University of California, Davis Energy Scavenging Wireless Sensor Extend sensor node lifetime
More informationEmbedded System Hardware - Reconfigurable Hardware -
2 Embedded System Hardware - Reconfigurable Hardware - Peter Marwedel Informatik 2 TU Dortmund Germany GOPs/J Courtesy: Philips Hugo De Man, IMEC, 27 Energy Efficiency of FPGAs 2, 28-2- Reconfigurable
More informationCircuits for Ultra-Low Power Millimeter-Scale Sensor Nodes
Circuits for Ultra-Low Power Millimeter-Scale Sensor Nodes Yoonmyung Lee, Dennis Sylvester, David Blaauw Department of Electrical Engineering and Science, University of Michigan, Ann Arbor, MI Abstract
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Part 14 Improving Performance: Interleaving Israel Koren ECE568/Koren Part.14.1 Background Performance
More informationLow Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS
Low Power Design Part I Introduction and VHDL design Ricardo Santos ricardo@facom.ufms.br LSCAD/FACOM/UFMS Motivation for Low Power Design Low power design is important from three different reasons Device
More informationISSCC 2001 / SESSION 11 / SRAM / 11.4
ISSCC 2001 / SESSION 11 / SRAM / 11.4 11.4 Abnormal Leakage Suppression (ALS) Scheme for Low Standby Current SRAMs Kouichi Kanda, Nguyen Duc Minh 1, Hiroshi Kawaguchi and Takayasu Sakurai University of
More informationA/D Conversion and Filtering for Ultra Low Power Radios. Dejan Radjen Yasser Sherazi. Advanced Digital IC Design. Contents. Why is this important?
1 Advanced Digital IC Design A/D Conversion and Filtering for Ultra Low Power Radios Dejan Radjen Yasser Sherazi Contents A/D Conversion A/D Converters Introduction ΔΣ modulator for Ultra Low Power Radios
More informationRamon Canal NCD Master MIRI. NCD Master MIRI 1
Wattch, Hotspot, Hotleakage, McPAT http://www.eecs.harvard.edu/~dbrooks/wattch-form.html http://lava.cs.virginia.edu/hotspot http://lava.cs.virginia.edu/hotleakage http://www.hpl.hp.com/research/mcpat/
More informationMixed-Signal Design Innovations in FDSOI Technology. Boris Murmann April 13, 2016
Mixed-Signal Design Innovations in FDSOI Technology Boris Murmann April 13, 2016 Outline Application trends and needs Review of FDSOI advantages Examples High-speed data conversion RF transceivers Medical
More informationData Word Length Reduction for Low-Power DSP Software
EE382C: LITERATURE SURVEY, APRIL 2, 2004 1 Data Word Length Reduction for Low-Power DSP Software Kyungtae Han Abstract The increasing demand for portable computing accelerates the study of minimizing power
More informationPractical Information
EE241 - Spring 2010 Advanced Digital Integrated Circuits TuTh 3:30-5pm 293 Cory Practical Information Instructor: Borivoje Nikolić 550B Cory Hall, 3-9297, bora@eecs Office hours: M 10:30am-12pm Reader:
More informationDetector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen
GIGA seminar 11.1.2010 Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen janne.janhunen@ee.oulu.fi 2 Outline Introduction Benefits and Challenges
More informationMultiplier Design and Performance Estimation with Distributed Arithmetic Algorithm
Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm M. Suhasini, K. Prabhu Kumar & P. Srinivas Department of Electronics & Comm. Engineering, Nimra College of Engineering
More informationEvolution of DSP Processors. Kartik Kariya EE, IIT Bombay
Evolution of DSP Processors Kartik Kariya EE, IIT Bombay Agenda Expected features of DSPs Brief overview of early DSPs Multi-issue DSPs Case Study: VLIW based Processor (SPXK5) for Mobile Applications
More informationLow Power R4SDC Pipelined FFT Processor Architecture
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) e-issn: 2319 4200, p-issn No. : 2319 4197 Volume 1, Issue 6 (Mar. Apr. 2013), PP 68-75 Low Power R4SDC Pipelined FFT Processor Architecture Anjana
More informationSeong-Ook Jung VLSI SYSTEM LAB, YONSEI University
Low-Power VLSI Seong-Ook Jung 2011. 5. 6. sjung@yonsei.ac.kr VLSI SYSTEM LAB, YONSEI University School of Electrical l & Electronic Engineering i Contents 1. Introduction 2. Power classification 3. Power
More informationA fully digital clock and data recovery with fast frequency offset acquisition technique for MIPI LLI applications
LETTER IEICE Electronics Express, Vol.10, No.10, 1 7 A fully digital clock and data recovery with fast frequency offset acquisition technique for MIPI LLI applications June-Hee Lee 1, 2, Sang-Hoon Kim
More informationLeakage Power Minimization in Deep-Submicron CMOS circuits
Outline Leakage Power Minimization in Deep-Submicron circuits Politecnico di Torino Dip. di Automatica e Informatica 1019 Torino, Italy enrico.macii@polito.it Introduction. Design for low leakage: Basics.
More informationLecture 6: Electronics Beyond the Logic Switches Xufeng Kou School of Information Science and Technology ShanghaiTech University
Lecture 6: Electronics Beyond the Logic Switches Xufeng Kou School of Information Science and Technology ShanghaiTech University EE 224 Solid State Electronics II Lecture 3: Lattice and symmetry 1 Outline
More informationHarnessing the Power of AI: An Easy Start with Lattice s sensai
Harnessing the Power of AI: An Easy Start with Lattice s sensai A Lattice Semiconductor White Paper. January 2019 Artificial intelligence, or AI, is everywhere. It s a revolutionary technology that is
More informationMemory (Part 1) RAM memory
Budapest University of Technology and Economics Department of Electron Devices Technology of IT Devices Lecture 7 Memory (Part 1) RAM memory Semiconductor memory Memory Overview MOS transistor recap and
More informationHomework 10 posted just for practice. Office hours next week, schedule TBD. HKN review today. Your feedback is important!
EE141 Fall 2005 Lecture 26 Memory (Cont.) Perspectives Administrative Stuff Homework 10 posted just for practice No need to turn in Office hours next week, schedule TBD. HKN review today. Your feedback
More informationUT90nHBD Hardened-by-Design (HBD) Standard Cell Data Sheet February
Semicustom Products UT90nHBD Hardened-by-Design (HBD) Standard Cell Data Sheet February 2018 www.cobham.com/hirel The most important thing we build is trust FEATURES Up to 50,000,000 2-input NAND equivalent
More informationLOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS
LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS Charlie Jenkins, (Altera Corporation San Jose, California, USA; chjenkin@altera.com) Paul Ekas, (Altera Corporation San Jose, California, USA; pekas@altera.com)
More information18nm FinFET. Lecture 30. Perspectives. Administrivia. Power Density. Power will be a problem. Transistor Count
18nm FinFET Double-gate structure + raised source/drain Lecture 30 Perspectives Gate Silicon Fin Source BOX Gate X. Huang, et al, 1999 IEDM, p.67~70 Drain Si fin - Body! I d [ua/um] 400-1.50 V 350 300-1.25
More informationPractical Information
EE241 - Spring 2013 Advanced Digital Integrated Circuits MW 2-3:30pm 540A/B Cory Practical Information Instructor: Borivoje Nikolić 509 Cory Hall, 3-9297, bora@eecs Office hours: M 11-12, W 3:30pm-4:30pm
More informationDASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators
DASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators Hiroyuki Usui, Lavanya Subramanian Kevin Chang, Onur Mutlu DASH source code is available at GitHub
More informationSelf-Aware Adaptation in FPGAbased
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Self-Aware Adaptation in FPGAbased Systems IEEE FPL 2010 Filippo Siorni: filippo.sironi@dresd.org Marco Triverio: marco.triverio@dresd.org Martina Maggio: mmaggio@mit.edu
More informationTrends and Challenges in VLSI Technology Scaling Towards 100nm
Trends and Challenges in VLSI Technology Scaling Towards 100nm Stefan Rusu Intel Corporation stefan.rusu@intel.com September 2001 Stefan Rusu 9/2001 2001 Intel Corp. Page 1 Agenda VLSI Technology Trends
More informationImproving the Processing Performance of a DSP for High Temperature Electronics using Circuit-Level Timing Speculation
Improving the Processing Performance of a DSP for High Temperature Electronics using Circuit-Level Timing Speculation Guillermo Payá-Vayá, Steffen Roskamp, Fritz Webering, and Holger Blume Payá-Vayá et
More informationA Complete Real-Time a Baseband Receiver Implemented on an Array of Programmable Processors
A Complete Real-Time 802.11a Baseband Receiver Implemented on an Array of Programmable Processors ACSSC 2008 Pacific Grove, CA Anh Tran, Dean Truong and Bevan Baas VLSI Computation Lab, ECE Department,
More informationLow Transistor Variability The Key to Energy Efficient ICs
Low Transistor Variability The Key to Energy Efficient ICs 2 nd Berkeley Symposium on Energy Efficient Electronic Systems 11/3/11 Robert Rogenmoser, PhD 1 BEES_roro_G_111103 Copyright 2011 SuVolta, Inc.
More informationA Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation
A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation Maziar Goudarzi, Tohru Ishihara, Hiroto Yasuura System LSI Research Center Kyushu
More informationPOWER GATING. Power-gating parameters
POWER GATING Power Gating is effective for reducing leakage power [3]. Power gating is the technique wherein circuit blocks that are not in use are temporarily turned off to reduce the overall leakage
More informationCSE 237A Winter 2018 Homework 1
CSE 237A Winter 2018 Homework 1 Problem 1 [10 pts] a) As discussed in the lecture, ARM based systems are widely used in the embedded computing. Choose one embedded application and compare features (e.g.,
More informationThe rise of always-listening sensors integrated in energy-scarce devices such as watches and remotecontrols
Context-Aware Hierarchical Information-Sensing in a 6 µw 9nm CMOS Voice Activity Detector Komail Badami, Steven Lauwereins, Wannes Meert, Marian Verhelst KU Leuven, Leuven, Belgium The rise of always-listening
More informationDesign and Implement of Low Power Consumption SRAM Based on Single Port Sense Amplifier in 65 nm
Journal of Computer and Communications, 2015, 3, 164-168 Published Online November 2015 in SciRes. http://www.scirp.org/journal/jcc http://dx.doi.org/10.4236/jcc.2015.311026 Design and Implement of Low
More informationLow Power VLSI Circuit Synthesis: Introduction and Course Outline
Low Power VLSI Circuit Synthesis: Introduction and Course Outline Ajit Pal Professor Department of Computer Science and Engineering Indian Institute of Technology Kharagpur INDIA -721302 Agenda Why Low
More informationA Data Remanence based Approach to Generate 100% Stable Keys from an SRAM Physical Unclonable Function
A Data Remanence based Approach to Generate 100% Stable Keys from an SRAM Physical Unclonable Function Muqing Liu, Chen Zhou, Qianying Tang, Keshab K. Parhi and Chris H. Kim University of Minnesota, Twin
More informationA Low Power and High Speed Viterbi Decoder Based on Deep Pipelined, Clock Blocking and Hazards Filtering
Int. J. Communications, Network and System Sciences, 2009, 6, 575-582 doi:10.4236/ijcns.2009.26064 Published Online September 2009 (http://www.scirp.org/journal/ijcns/). 575 A Low Power and High Speed
More informationEnhancing System Architecture by Modelling the Flash Translation Layer
Enhancing System Architecture by Modelling the Flash Translation Layer Robert Sykes Sr. Dir. Firmware August 2014 OCZ Storage Solutions A Toshiba Group Company Introduction This presentation will discuss
More informationA FFT/IFFT Soft IP Generator for OFDM Communication System
A FFT/IFFT Soft IP Generator for OFDM Communication System Tsung-Han Tsai, Chen-Chi Peng and Tung-Mao Chen Department of Electrical Engineering, National Central University Chung-Li, Taiwan Abstract: -
More informationDESIGN FOR LOW-POWER USING MULTI-PHASE AND MULTI- FREQUENCY CLOCKING
3 rd Int. Conf. CiiT, Molika, Dec.12-15, 2002 31 DESIGN FOR LOW-POWER USING MULTI-PHASE AND MULTI- FREQUENCY CLOCKING M. Stojčev, G. Jovanović Faculty of Electronic Engineering, University of Niš Beogradska
More informationMulti-core Platforms for
20 JUNE 2011 Multi-core Platforms for Immersive-Audio Applications Course: Advanced Computer Architectures Teacher: Prof. Cristina Silvano Student: Silvio La Blasca 771338 Introduction on Immersive-Audio
More informationComputer Aided Design of Electronics
Computer Aided Design of Electronics [Datorstödd Elektronikkonstruktion] Zebo Peng, Petru Eles, and Nima Aghaee Embedded Systems Laboratory IDA, Linköping University www.ida.liu.se/~tdts01 Electronic Systems
More informationA Framework for Assessing the Feasibility of Learning Algorithms in Power-Constrained ASICs
A Framework for Assessing the Feasibility of Learning Algorithms in Power-Constrained ASICs 1 Introduction Alexander Neckar with David Gal, Eric Glass, and Matt Murray (from EE382a) Whether due to injury
More informationEmbedded Systems. 9. Power and Energy. Lothar Thiele. Computer Engineering and Networks Laboratory
Embedded Systems 9. Power and Energy Lothar Thiele Computer Engineering and Networks Laboratory General Remarks 9 2 Power and Energy Consumption Statements that are true since a decade or longer: Power
More informationImage Processing Architectures (and their future requirements)
Lecture 16: Image Processing Architectures (and their future requirements) Visual Computing Systems Smart phone processing resources Example SoC: Qualcomm Snapdragon Image credit: Qualcomm Apple A7 (iphone
More informationFault Tolerance and Reliability Techniques for High-Density Random-Access Memories (Hardcover) by Kanad Chakraborty, Pinaki Mazumder
1 of 6 12/10/06 10:11 PM Fault Tolerance and Reliability Techniques for High-Density Random-Access Memories (Hardcover) by Kanad Chakraborty, Pinaki Mazumder (1 customer review) To learn more about the
More informationA Multiple SIMD Mesh Architecture for Multi-Channel Radar Processing
A Multiple SIMD Mesh Architecture for Multi-Channel Radar Processing Mikael Taveniku 2,3, Anders Åhlander 1, Magnus Jonsson 1 and Bertil Svensson 1,2 1. Centre for Computer Architecture, Halmstad University,
More informationTHE content-addressable memory (CAM) is one of the most
254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 1, JANUARY 2005 A 0.7-fJ/Bit/Search 2.2-ns Search Time Hybrid-Type TCAM Architecture Sungdae Choi, Kyomin Sohn, and Hoi-Jun Yoo Abstract This paper
More informationA Novel Technique to Reduce Write Delay of SRAM Architectures
A Novel Technique to Reduce Write Delay of SRAM Architectures SWAPNIL VATS AND R.K. CHAUHAN * Department of Electronics and Communication Engineering M.M.M. Engineering College, Gorahpur-73 010, U.P. INDIA
More informationExploring Computation- Communication Tradeoffs in Camera Systems
Exploring Computation- Communication Tradeoffs in Camera Systems Amrita Mazumdar Thierry Moreau Sung Kim Meghan Cowan Armin Alaghi Luis Ceze Mark Oskin Visvesh Sathe IISWC 2017 1 Camera applications are
More informationA wide-range all-digital duty-cycle corrector with output clock phase alignment in 65 nm CMOS technology
A wide-range all-digital duty-cycle corrector with output clock phase alignment in 65 nm CMOS technology Ching-Che Chung 1a), Duo Sheng 2, and Sung-En Shen 1 1 Department of Computer Science & Information
More informationSemantic Segmentation on Resource Constrained Devices
Semantic Segmentation on Resource Constrained Devices Sachin Mehta University of Washington, Seattle In collaboration with Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi Project
More informationDetection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 -
Lecture 11: Detection and Segmentation Lecture 11-1 May 10, 2017 Administrative Midterms being graded Please don t discuss midterms until next week - some students not yet taken A2 being graded Project
More informationA 0.9 V Low-power 16-bit DSP Based on a Top-down Design Methodology
UDC 621.3.049.771.14:621.396.949 A 0.9 V Low-power 16-bit DSP Based on a Top-down Design Methodology VAtsushi Tsuchiya VTetsuyoshi Shiota VShoichiro Kawashima (Manuscript received December 8, 1999) A 0.9
More informationDesign and FPGA Implementation of an Adaptive Demodulator. Design and FPGA Implementation of an Adaptive Demodulator
Design and FPGA Implementation of an Adaptive Demodulator Sandeep Mukthavaram August 23, 1999 Thesis Defense for the Degree of Master of Science in Electrical Engineering Department of Electrical Engineering
More informationLearning Pixel-Distribution Prior with Wider Convolution for Image Denoising
Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising Peng Liu University of Florida pliu1@ufl.edu Ruogu Fang University of Florida ruogu.fang@bme.ufl.edu arxiv:177.9135v1 [cs.cv]
More informationDatorstödd Elektronikkonstruktion
Datorstödd Elektronikkonstruktion [Computer Aided Design of Electronics] Zebo Peng, Petru Eles and Gert Jervan Embedded Systems Laboratory IDA, Linköping University http://www.ida.liu.se/~tdts80/~tdts80
More informationCherry Picking: Exploiting Process Variations in the Dark Silicon Era
Cherry Picking: Exploiting Process Variations in the Dark Silicon Era Siddharth Garg University of Waterloo Co-authors: Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu # Transistors Power/Dark
More informationUltralow-Power and Robust Embedded Memory for Bioimplantable Microsystems
2013 26th International Conference on VLSI Design and the 12th International Conference on Embedded Systems Ultralow-Power and Robust Embedded Memory for Bioimplantable Microsystems Maryam S. Hashemian
More informationDesign and Implementation of Signal Processing Systems: An Introduction
Design and Implementation of Signal Processing Systems: An Introduction Yu Hen Hu (c) 1997-2013 by Yu Hen Hu 1 Outline Course Objectives and Outline, Conduct What is signal processing? Implementation Options
More informationVLSI System Testing. Outline
ECE 538 VLSI System Testing Krish Chakrabarty System-on-Chip (SOC) Testing ECE 538 Krish Chakrabarty 1 Outline Motivation for modular testing of SOCs Wrapper design IEEE 1500 Standard Optimization Test
More informationMULTI-PORT MEMORY DESIGN FOR ADVANCED COMPUTER ARCHITECTURES. by Yirong Zhao Bachelor of Science, Shanghai Jiaotong University, P. R.
MULTI-PORT MEMORY DESIGN FOR ADVANCED COMPUTER ARCHITECTURES by Yirong Zhao Bachelor of Science, Shanghai Jiaotong University, P. R. China, 2011 Submitted to the Graduate Faculty of the Swanson School
More informationArchitectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance
Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance Michael D. Powell, Arijit Biswas, Shantanu Gupta, and Shubu Mukherjee SPEARS Group, Intel Massachusetts EECS, University
More informationGates Hall Phone: +1(650) Serra Mall, Room
Mingyu Gao Gates Hall Phone: +1(650)862-0664 353 Serra Mall, Room 318 Email: mgao12@stanford.edu Stanford, CA, 94305 https://www.stanford.edu/ mgao12 Research Interests Computer architecture and systems
More informationA fully synthesizable injection-locked PLL with feedback current output DAC in 28 nm FDSOI
LETTER IEICE Electronics Express, Vol.1, No.15, 1 11 A fully synthesizable injection-locked PLL with feedback current output DAC in 8 nm FDSOI Dongsheng Yang a), Wei Deng, Aravind Tharayil Narayanan, Rui
More informationLecture #29. Moore s Law
Lecture #29 ANNOUNCEMENTS HW#15 will be for extra credit Quiz #6 (Thursday 5/8) will include MOSFET C-V No late Projects will be accepted after Thursday 5/8 The last Coffee Hour will be held this Thursday
More information