RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM

Size: px
Start display at page:

Download "RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM"

Transcription

1 RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM Fengbin Tu, Weiwei Wu, Shouyi Yin, Leibo Liu, Shaojun Wei Institute of Microelectronics Tsinghua University The 45th International Symposium on Computer Architecture - ISCA 2018

2 Ubiquitous Deep Neural Networks (DNNs) Image Classification Object Detection Video Surveillance Speech Recognition 1

3 DNN Requires Large On-Chip Buffer Modern DNN s layer data storage can reach 0.3~6.27MB. The numbers will increase if the network processes higher resolution images or larger batch size. [1] Krizhevsky et al., ImageNet Classification with Deep Convolutional Neural Networks, NIPS 12. [2] Simonyan et al., Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 15. [3] Szegedy et al., Going Deeper with Convolutions, CVPR 15. [4] He et al., Deep Residual Learning for Image Recognition, CVPR 16. 2

4 SRAM-based DNN Accelerators The small footprint limits the on-chip buffer size of conventional SRAM-based DNN accelerators. Usually <500KB with area cost of 3~20mm 2. (Normalized) IO FC/LSTM Configurable Interface Weight Buffer CONV Configuratin Configuratin Controller Configuration Context Heterogeneous PE Array PE PE PE... PE PE PE PE PE PE... PE PE PE PE PE PE... PE PE PE PE PE PE... PE PE PE... Buffer CTRL Buffer CTRL Data Buffer1 Bank[0] Bank[47] Bank[0] Bank[47]... IO Super PE Super PE Super... Super Super PE PE PE Super PE Data Buffer2 Data Buffer System Thinker, 348KB, 19.4mm 2 DianNao, 44KB, 3.0mm 2 Eyeriss, 182KB, 12.3mm 2 Envision, 77KB, 10.1mm 2 (Normalized) Thinker: Yin et al., A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications, JSSC 18. DianNao: Chen et al., DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning, ASPLOS 14. Eyeriss: Chen et al., Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks, ISSCC 16. Envision: Moons et al., ENVISION: A 0.26-to-10TOPS/W Subword-Parallel Dynamic-Voltage-Accuracy-Frequency-Scalable Convolutional Neural Network Processor in 28nm FDSOI, ISSCC 17. 3

5 SRAM vs. edram (Embedded DRAM) edram has higher density than SRAM. Refresh is required for data retention. Charge will leak over time and might cause retention failures. 4

6 Refresh is an Energy Bottleneck [1] HPCA 13 edram Power Breakdown [2] ISCA 10 System Power Breakdown Overhead: edram Refresh Energy [1] Chang et al., Technology Comparison for Large Last-Level Caches (L3Cs): Low-Leakage SRAM, Low Write-Energy STT-RAM, and Refresh-Optimized edram, HPCA [2] Wilkerson et al., Reducing Cache Power with Low-Cost, Multi-bit Error-Correcting Codes, ISCA 10.

7 Opportunity to Remove edram Refresh Refresh Interval = Retention Time Ghosh, Modeling of Retention Time for High-Speed Embedded Dynamic Random Access Memories, TCASI

8 Opportunity to Remove edram Refresh Refresh is unnecessary, if Data Lifetime < Retention Time Opportunity1: Increase retention time by training. Opportunity2: Reduce data lifetime by scheduling. 7

9 RANA: Retention-Aware Neural Acceleration Framework 1. Accuracy Constraint 2. edram Retention Time Distribution 1. Energy Modeling 2. Data Lifetime Analysis 3. Buffer Storage Analysis 1. Data Mapping 2. Memory Controller Modification DNN Accelerator 2. Target DNN Model Retention-Aware Training Method Tolerable Retention Time Hybrid Computation Pattern Layerwise Configurations Refresh-Optimized edram Controller Optimized Energy Consumption (Training) (Scheduling) (Architecture) Compilation Phase Execution Phase Strengthen DNN accelerators with refresh-optimized edram: Increase on-chip buffer size by replacing SRAM with edram. Reduce energy overhead by removing unnecessary edram refresh. 8

10 RANA: Retention-Aware Neural Acceleration Framework DNN Accelerator 2. Target DNN Model Retention-Aware Training Method Tolerable Retention Time Hybrid Computation Pattern Layerwise Configurations Refresh-Optimized edram Controller Optimized Energy Consumption (Training) (Scheduling) (Architecture) DNN accelerator DNN model Layer description Hardware constraints edram Controller Unified Buffer System edram Bank Switch to the next layer No Run scheduling scheme Computation Pattern: <OD/WD, Tm, Tn, Tr, Tc> The last layer? Yes Reference Clock Programmable Clock Divider Refresh Issuer edram Refresh Flags edram Bank edram Bank edram Bank edram Bank Configurations for each layer Retention Time Data Lifetime Refresh Control 9

11 Tech1: Retention-Aware Training Method Retention time is diverse among different cells. Retention failure rate: Fraction of the cells under the given retention time. The weakest cell appears at the 45micro-second point. Typical edram Retention Time Distribution (32KB) Kong et al., Analysis of Retention Time Distribution of Embedded DRAM A New Method to Characterize Across-Chip Threshold Voltage Variation, ITC

12 Tech1: Retention-Aware Training Method Retrain the network to tolerate higher failure rate and get longer tolerable retention time. Target DNN Model Failure Rate (r) Fixed-Point Pretrain Fixed-Point DNN Model Random Bit-Level Errors Weight Adjustment Adding Layer Masks Retrain Retention-Aware Training Method Retention-Aware DNN Model 11

13 Tech1: Retention-Aware Training Method Failure rate of 10 5 : No accuracy loss, 734μs. Failure rate of 10 4 : Accuracy decreases. 45μs 734μs 1030μs Relative Accuracy under Different Retention Failure Rates 12

14 Tech2: Hybrid Computation Pattern Computation pattern, expressed in a loop. Data lifetime and buffer storage are related to the loop ordering, especially the outermost-level loop. 13

15 Tech2: Hybrid Computation Pattern Outputs are dynamically updated by accumulation, which recharges the cells like periodic refresh. Different computation patterns have different data lifetime and buffer storage requirements. Input Dependent Output Dependent Weight Dependent 14

16 Tech2: Hybrid Computation Pattern Scheduling scheme: Input: DNN accelerator and network s parameters. Optimization: Minimize total system energy. Output: Layerwise configurations. DNN accelerator DNN model Switch to the next layer Layer description Hardware constraints Run scheduling scheme Computation Pattern: <OD/WD, Tm, Tn, Tr, Tc> Scheduling Scheme min Energy s. t. Energy = Equation (14), Tn Th Tl R i, Tm Tr Tc R o, Tm Tn K 2 R w, 1 Tm M, 1 Tn N, 1 Tr R, 1 Tc C. No The last layer? Yes Configurations for each layer 15

17 Tech3: Refresh-Optimized edram Controller edram controller: Programmable clock divider: Refresh interval. Refresh issuers and flags, for each edram bank. Configuration from Tech1 & Tech2. edram Controller Unified Buffer System Reference Clock Programmable Clock Divider Refresh Issuer edram Bank edram Bank edram Bank edram Bank edram Refresh Flags edram Bank 16

18 Evaluation Platform RTL-level cycle-accurate simulation, for performance estimation and memory access tracing. System-level energy estimation, based on synthesis, Destiny and CACTI. DNN Accelerator edram Platform Configurations 256 MACs, 384KB SRAM, 200MHz, 5.682mm 2, 65nm 1.454MB, retention time = 45μs, 65nm Kong et al., Analysis of Retention Time Distribution of Embedded DRAM A New Method to Characterize Across-Chip Threshold Voltage Variation, ITC

19 Experimental Results edram refresh operations: 99.7% Off-chip memory access: 41.7% System energy consumption: 66.2% 18

20 Scalability to Other Architectures DaDianNao: 4096 MACs, 36MB edram, 606MHz. edram refresh operations: 99.9% System energy consumption: 69.4% Chen et al., DaDianNao: A Machine-Learning Supercomputer, MICRO

21 Takeaway DNN Accelerator 2. Target DNN Model Retention-Aware Training Method Tolerable Retention Time Hybrid Computation Pattern Layerwise Configurations Refresh-Optimized edram Controller Optimized Energy Consumption (Training) (Scheduling) (Architecture) RANA: Retention-Aware Neural Acceleration Framework Training: Retention-aware training method. Exploit DNN s error resilience to improve tolerable retention time. Scheduling: Hybrid computation pattern. Different computing order and parallelism show different data lifetime and buffer storage requirement. Architecture: Refresh-Optimized edram controller. No need to refresh all the banks. No need to always use the worst-case refresh interval. Not limited to applying edram to DNN acceleration. Approximate computing: Retention and error resilience. 20

22 Thank you for your attention!

An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet

An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet LETTER IEICE Electronics Express, Vol.14, No.15, 1 12 An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet Boya Zhao a), Mingjiang Wang b), and Ming Liu Harbin

More information

AI Application Processing Requirements

AI Application Processing Requirements AI Application Processing Requirements 1 Low Medium High Sensor analysis Activity Recognition (motion sensors) Stress Analysis or Attention Analysis Audio & sound Speech Recognition Object detection Computer

More information

Multiband NFC for High-Throughput Wireless Computer Vision Sensor Network

Multiband NFC for High-Throughput Wireless Computer Vision Sensor Network Multiband NFC for High-Throughput Wireless Computer Vision Sensor Network Fei Y. Li, Jason Y. Du 09212020027@fudan.edu.cn Vision sensors lie in the heart of computer vision. In many computer vision applications,

More information

Opportunities and Challenges in Ultra Low Voltage CMOS. Rajeevan Amirtharajah University of California, Davis

Opportunities and Challenges in Ultra Low Voltage CMOS. Rajeevan Amirtharajah University of California, Davis Opportunities and Challenges in Ultra Low Voltage CMOS Rajeevan Amirtharajah University of California, Davis Opportunities for Ultra Low Voltage Battery Operated and Mobile Systems Wireless sensors RFID

More information

Status and Prospect for MRAM Technology

Status and Prospect for MRAM Technology Status and Prospect for MRAM Technology Dr. Saied Tehrani Nonvolatile Memory Seminar Hot Chips Conference August 22, 2010 Memorial Auditorium Stanford University Everspin Technologies, Inc. - 2010 Agenda

More information

REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.

REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND. December 3-6, 2018 Santa Clara Convention Center CA, USA REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND. https://tmt.knect365.com/risc-v-summit @risc_v ACCELERATING INFERENCING ON THE EDGE WITH RISC-V

More information

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to. FPGAs 1 CMPE 415 Technology Timeline 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs FPGAs The Design Warrior s Guide

More information

Lecture 30. Perspectives. Digital Integrated Circuits Perspectives

Lecture 30. Perspectives. Digital Integrated Circuits Perspectives Lecture 30 Perspectives Administrivia Final on Friday December 15 8 am Location: 251 Hearst Gym Topics all what was covered in class. Precise reading information will be posted on the web-site Review Session

More information

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική Υπολογιστών Presentation of UniServer Horizon 2020 European project findings: X-Gene server chips, voltage-noise characterization, high-bandwidth voltage measurements,

More information

On the Rules of Low-Power Design

On the Rules of Low-Power Design On the Rules of Low-Power Design (and Why You Should Break Them) Prof. Todd Austin University of Michigan austin@umich.edu A long time ago, in a not so far away place The Rules of Low-Power Design P =

More information

Lecture Perspectives. Administrivia

Lecture Perspectives. Administrivia Lecture 29-30 Perspectives Administrivia Final on Friday May 18 12:30-3:30 pm» Location: 251 Hearst Gym Topics all what was covered in class. Review Session Time and Location TBA Lab and hw scores to be

More information

SSD Firmware Implementation Project Lab. #1

SSD Firmware Implementation Project Lab. #1 SSD Firmware Implementation Project Lab. #1 Sang Phil Lim (lsfeel0204@gmail.com) SKKU VLDB Lab. 2011 03 24 Contents Project Overview Lab. Time Schedule Project #1 Guide FTL Simulator Development Project

More information

Low Power System-On-Chip-Design Chapter 12: Physical Libraries

Low Power System-On-Chip-Design Chapter 12: Physical Libraries 1 Low Power System-On-Chip-Design Chapter 12: Physical Libraries Friedemann Wesner 2 Outline Standard Cell Libraries Modeling of Standard Cell Libraries Isolation Cells Level Shifters Memories Power Gating

More information

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs Li Zhou and Avinash Kodi Technologies for Emerging Computer Architecture Laboratory (TEAL) School of Electrical Engineering and

More information

Wireless Sensor Networks (aka, Active RFID)

Wireless Sensor Networks (aka, Active RFID) Politecnico di Milano Advanced Network Technologies Laboratory Wireless Sensor Networks (aka, Active RFID) Hardware and Hardware Abstractions Design Challenges/Guidelines/Opportunities 1 Let s start From

More information

REAL TIME DIGITAL SIGNAL PROCESSING. Introduction

REAL TIME DIGITAL SIGNAL PROCESSING. Introduction REAL TIME DIGITAL SIGNAL Introduction Why Digital? A brief comparison with analog. PROCESSING Seminario de Electrónica: Sistemas Embebidos Advantages The BIG picture Flexibility. Easily modifiable and

More information

VARIATION MONITOR-ASSISTED ADAPTIVE MRAM WRITE

VARIATION MONITOR-ASSISTED ADAPTIVE MRAM WRITE Shaodi Wang, Hochul Lee, Pedram Khalili, Cecile Grezes, Kang L. Wang and Puneet Gupta University of California, Los Angeles VARIATION MONITOR-ASSISTED ADAPTIVE MRAM WRITE NanoCAD Lab shaodiwang@g.ucla.edu

More information

A Differential 2R Crosspoint RRAM Array with Zero Standby Current

A Differential 2R Crosspoint RRAM Array with Zero Standby Current 1 A Differential 2R Crosspoint RRAM Array with Zero Standby Current Pi-Feng Chiu, Student Member, IEEE, and Borivoje Nikolić, Senior Member, IEEE Department of Electrical Engineering and Computer Sciences,

More information

EEC 216 Lecture #10: Ultra Low Voltage and Subthreshold Circuit Design. Rajeevan Amirtharajah University of California, Davis

EEC 216 Lecture #10: Ultra Low Voltage and Subthreshold Circuit Design. Rajeevan Amirtharajah University of California, Davis EEC 216 Lecture #1: Ultra Low Voltage and Subthreshold Circuit Design Rajeevan Amirtharajah University of California, Davis Opportunities for Ultra Low Voltage Battery Operated and Mobile Systems Wireless

More information

Energy- Efficient Hardware for Embedded Vision and Deep Convolu=onal Neural Networks

Energy- Efficient Hardware for Embedded Vision and Deep Convolu=onal Neural Networks Energy- Efficient Hardware for Embedded Vision and Deep Convolu=onal Neural Networks Vivienne Sze MassachuseKs Ins=tute of Technology Contact Info email: sze@mit.edu website: www.rle.mit.edu/eems In collaboraon

More information

Binary Neural Network and Its Implementation with 16 Mb RRAM Macro Chip

Binary Neural Network and Its Implementation with 16 Mb RRAM Macro Chip Binary Neural Network and Its Implementation with 16 Mb RRAM Macro Chip Assistant Professor of Electrical Engineering and Computer Engineering shimengy@asu.edu http://faculty.engineering.asu.edu/shimengyu/

More information

Image processing. Case Study. 2-diemensional Image Convolution. From a hardware perspective. Often massively yparallel.

Image processing. Case Study. 2-diemensional Image Convolution. From a hardware perspective. Often massively yparallel. Case Study Image Processing Image processing From a hardware perspective Often massively yparallel Can be used to increase throughput Memory intensive Storage size Memory bandwidth -diemensional Image

More information

Low-Power Communications and Neural Spike Sorting

Low-Power Communications and Neural Spike Sorting CASPER Workshop 2010 Low-Power Communications and Neural Spike Sorting CASPER Tools in Front-to-Back DSP ASIC Development Henry Chen henryic@ee.ucla.edu August, 2010 Introduction Parallel Data Architectures

More information

A 32 Gbps 2048-bit 10GBASE-T Ethernet Energy Efficient LDPC Decoder with Split-Row Threshold Decoding Method

A 32 Gbps 2048-bit 10GBASE-T Ethernet Energy Efficient LDPC Decoder with Split-Row Threshold Decoding Method A 32 Gbps 248-bit GBASE-T Ethernet Energy Efficient LDPC Decoder with Split-Row Threshold Decoding Method Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California,

More information

Deep Trench Capacitors for Switched Capacitor Voltage Converters

Deep Trench Capacitors for Switched Capacitor Voltage Converters Deep Trench Capacitors for Switched Capacitor Voltage Converters Jae-sun Seo, Albert Young, Robert Montoye, Leland Chang IBM T. J. Watson Research Center 3 rd International Workshop for Power Supply on

More information

Creating Intelligence at the Edge

Creating Intelligence at the Edge Creating Intelligence at the Edge Vladimir Stojanović E3S Retreat September 8, 2017 The growing importance of machine learning Page 2 Applications exploding in the cloud Huge interest to move to the edge

More information

Utilization Based Duty Cycle Tuning MAC Protocol for Wireless Sensor Networks

Utilization Based Duty Cycle Tuning MAC Protocol for Wireless Sensor Networks Utilization Based Duty Cycle Tuning MAC Protocol for Wireless Sensor Networks Shih-Hsien Yang, Hung-Wei Tseng, Eric Hsiao-Kuang Wu, and Gen-Huey Chen Dept. of Computer Science and Information Engineering,

More information

Digital Integrated Circuits Perspectives. Administrivia

Digital Integrated Circuits Perspectives. Administrivia Lecture 30 Perspectives Administrivia Final on Friday December 14, 2001 8 am Location: 180 Tan Hall Topics all what was covered in class. Review Session - TBA Lab and hw scores to be posted on the web

More information

WAFTL: A Workload Adaptive Flash Translation Layer with Data Partition

WAFTL: A Workload Adaptive Flash Translation Layer with Data Partition WAFTL: A Workload Adaptive Flash Translation Layer with Data Partition Qingsong Wei Bozhao Gong, Suraj Pathak, Bharadwaj Veeravalli, Lingfang Zeng and Kanzo Okada Data Storage Institute, A-STAR, Singapore

More information

Low-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering

Low-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering Low-Power VLSI Seong-Ook Jung 2013. 5. 27. sjung@yonsei.ac.kr VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering Contents 1. Introduction 2. Power classification & Power performance

More information

Improving the Reliability of. NAND Flash, Phase-change RAM and Spin-torque Transfer RAM. Chengen Yang

Improving the Reliability of. NAND Flash, Phase-change RAM and Spin-torque Transfer RAM. Chengen Yang Improving the Reliability of NAND Flash, Phase-change RAM and Spin-torque Transfer RAM by Chengen Yang A Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

More information

An Energy Scalable Computational Array for Energy Harvesting Sensor Signal Processing. Rajeevan Amirtharajah University of California, Davis

An Energy Scalable Computational Array for Energy Harvesting Sensor Signal Processing. Rajeevan Amirtharajah University of California, Davis An Energy Scalable Computational Array for Energy Harvesting Sensor Signal Processing Rajeevan Amirtharajah University of California, Davis Energy Scavenging Wireless Sensor Extend sensor node lifetime

More information

Embedded System Hardware - Reconfigurable Hardware -

Embedded System Hardware - Reconfigurable Hardware - 2 Embedded System Hardware - Reconfigurable Hardware - Peter Marwedel Informatik 2 TU Dortmund Germany GOPs/J Courtesy: Philips Hugo De Man, IMEC, 27 Energy Efficiency of FPGAs 2, 28-2- Reconfigurable

More information

Circuits for Ultra-Low Power Millimeter-Scale Sensor Nodes

Circuits for Ultra-Low Power Millimeter-Scale Sensor Nodes Circuits for Ultra-Low Power Millimeter-Scale Sensor Nodes Yoonmyung Lee, Dennis Sylvester, David Blaauw Department of Electrical Engineering and Science, University of Michigan, Ann Arbor, MI Abstract

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Part 14 Improving Performance: Interleaving Israel Koren ECE568/Koren Part.14.1 Background Performance

More information

Low Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS

Low Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS Low Power Design Part I Introduction and VHDL design Ricardo Santos ricardo@facom.ufms.br LSCAD/FACOM/UFMS Motivation for Low Power Design Low power design is important from three different reasons Device

More information

ISSCC 2001 / SESSION 11 / SRAM / 11.4

ISSCC 2001 / SESSION 11 / SRAM / 11.4 ISSCC 2001 / SESSION 11 / SRAM / 11.4 11.4 Abnormal Leakage Suppression (ALS) Scheme for Low Standby Current SRAMs Kouichi Kanda, Nguyen Duc Minh 1, Hiroshi Kawaguchi and Takayasu Sakurai University of

More information

A/D Conversion and Filtering for Ultra Low Power Radios. Dejan Radjen Yasser Sherazi. Advanced Digital IC Design. Contents. Why is this important?

A/D Conversion and Filtering for Ultra Low Power Radios. Dejan Radjen Yasser Sherazi. Advanced Digital IC Design. Contents. Why is this important? 1 Advanced Digital IC Design A/D Conversion and Filtering for Ultra Low Power Radios Dejan Radjen Yasser Sherazi Contents A/D Conversion A/D Converters Introduction ΔΣ modulator for Ultra Low Power Radios

More information

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Ramon Canal NCD Master MIRI. NCD Master MIRI 1 Wattch, Hotspot, Hotleakage, McPAT http://www.eecs.harvard.edu/~dbrooks/wattch-form.html http://lava.cs.virginia.edu/hotspot http://lava.cs.virginia.edu/hotleakage http://www.hpl.hp.com/research/mcpat/

More information

Mixed-Signal Design Innovations in FDSOI Technology. Boris Murmann April 13, 2016

Mixed-Signal Design Innovations in FDSOI Technology. Boris Murmann April 13, 2016 Mixed-Signal Design Innovations in FDSOI Technology Boris Murmann April 13, 2016 Outline Application trends and needs Review of FDSOI advantages Examples High-speed data conversion RF transceivers Medical

More information

Data Word Length Reduction for Low-Power DSP Software

Data Word Length Reduction for Low-Power DSP Software EE382C: LITERATURE SURVEY, APRIL 2, 2004 1 Data Word Length Reduction for Low-Power DSP Software Kyungtae Han Abstract The increasing demand for portable computing accelerates the study of minimizing power

More information

Practical Information

Practical Information EE241 - Spring 2010 Advanced Digital Integrated Circuits TuTh 3:30-5pm 293 Cory Practical Information Instructor: Borivoje Nikolić 550B Cory Hall, 3-9297, bora@eecs Office hours: M 10:30am-12pm Reader:

More information

Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen

Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen GIGA seminar 11.1.2010 Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen janne.janhunen@ee.oulu.fi 2 Outline Introduction Benefits and Challenges

More information

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm M. Suhasini, K. Prabhu Kumar & P. Srinivas Department of Electronics & Comm. Engineering, Nimra College of Engineering

More information

Evolution of DSP Processors. Kartik Kariya EE, IIT Bombay

Evolution of DSP Processors. Kartik Kariya EE, IIT Bombay Evolution of DSP Processors Kartik Kariya EE, IIT Bombay Agenda Expected features of DSPs Brief overview of early DSPs Multi-issue DSPs Case Study: VLIW based Processor (SPXK5) for Mobile Applications

More information

Low Power R4SDC Pipelined FFT Processor Architecture

Low Power R4SDC Pipelined FFT Processor Architecture IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) e-issn: 2319 4200, p-issn No. : 2319 4197 Volume 1, Issue 6 (Mar. Apr. 2013), PP 68-75 Low Power R4SDC Pipelined FFT Processor Architecture Anjana

More information

Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University

Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University Low-Power VLSI Seong-Ook Jung 2011. 5. 6. sjung@yonsei.ac.kr VLSI SYSTEM LAB, YONSEI University School of Electrical l & Electronic Engineering i Contents 1. Introduction 2. Power classification 3. Power

More information

A fully digital clock and data recovery with fast frequency offset acquisition technique for MIPI LLI applications

A fully digital clock and data recovery with fast frequency offset acquisition technique for MIPI LLI applications LETTER IEICE Electronics Express, Vol.10, No.10, 1 7 A fully digital clock and data recovery with fast frequency offset acquisition technique for MIPI LLI applications June-Hee Lee 1, 2, Sang-Hoon Kim

More information

Leakage Power Minimization in Deep-Submicron CMOS circuits

Leakage Power Minimization in Deep-Submicron CMOS circuits Outline Leakage Power Minimization in Deep-Submicron circuits Politecnico di Torino Dip. di Automatica e Informatica 1019 Torino, Italy enrico.macii@polito.it Introduction. Design for low leakage: Basics.

More information

Lecture 6: Electronics Beyond the Logic Switches Xufeng Kou School of Information Science and Technology ShanghaiTech University

Lecture 6: Electronics Beyond the Logic Switches Xufeng Kou School of Information Science and Technology ShanghaiTech University Lecture 6: Electronics Beyond the Logic Switches Xufeng Kou School of Information Science and Technology ShanghaiTech University EE 224 Solid State Electronics II Lecture 3: Lattice and symmetry 1 Outline

More information

Harnessing the Power of AI: An Easy Start with Lattice s sensai

Harnessing the Power of AI: An Easy Start with Lattice s sensai Harnessing the Power of AI: An Easy Start with Lattice s sensai A Lattice Semiconductor White Paper. January 2019 Artificial intelligence, or AI, is everywhere. It s a revolutionary technology that is

More information

Memory (Part 1) RAM memory

Memory (Part 1) RAM memory Budapest University of Technology and Economics Department of Electron Devices Technology of IT Devices Lecture 7 Memory (Part 1) RAM memory Semiconductor memory Memory Overview MOS transistor recap and

More information

Homework 10 posted just for practice. Office hours next week, schedule TBD. HKN review today. Your feedback is important!

Homework 10 posted just for practice. Office hours next week, schedule TBD. HKN review today. Your feedback is important! EE141 Fall 2005 Lecture 26 Memory (Cont.) Perspectives Administrative Stuff Homework 10 posted just for practice No need to turn in Office hours next week, schedule TBD. HKN review today. Your feedback

More information

UT90nHBD Hardened-by-Design (HBD) Standard Cell Data Sheet February

UT90nHBD Hardened-by-Design (HBD) Standard Cell Data Sheet February Semicustom Products UT90nHBD Hardened-by-Design (HBD) Standard Cell Data Sheet February 2018 www.cobham.com/hirel The most important thing we build is trust FEATURES Up to 50,000,000 2-input NAND equivalent

More information

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS Charlie Jenkins, (Altera Corporation San Jose, California, USA; chjenkin@altera.com) Paul Ekas, (Altera Corporation San Jose, California, USA; pekas@altera.com)

More information

18nm FinFET. Lecture 30. Perspectives. Administrivia. Power Density. Power will be a problem. Transistor Count

18nm FinFET. Lecture 30. Perspectives. Administrivia. Power Density. Power will be a problem. Transistor Count 18nm FinFET Double-gate structure + raised source/drain Lecture 30 Perspectives Gate Silicon Fin Source BOX Gate X. Huang, et al, 1999 IEDM, p.67~70 Drain Si fin - Body! I d [ua/um] 400-1.50 V 350 300-1.25

More information

Practical Information

Practical Information EE241 - Spring 2013 Advanced Digital Integrated Circuits MW 2-3:30pm 540A/B Cory Practical Information Instructor: Borivoje Nikolić 509 Cory Hall, 3-9297, bora@eecs Office hours: M 11-12, W 3:30pm-4:30pm

More information

DASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators

DASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators DASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators Hiroyuki Usui, Lavanya Subramanian Kevin Chang, Onur Mutlu DASH source code is available at GitHub

More information

Self-Aware Adaptation in FPGAbased

Self-Aware Adaptation in FPGAbased DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Self-Aware Adaptation in FPGAbased Systems IEEE FPL 2010 Filippo Siorni: filippo.sironi@dresd.org Marco Triverio: marco.triverio@dresd.org Martina Maggio: mmaggio@mit.edu

More information

Trends and Challenges in VLSI Technology Scaling Towards 100nm

Trends and Challenges in VLSI Technology Scaling Towards 100nm Trends and Challenges in VLSI Technology Scaling Towards 100nm Stefan Rusu Intel Corporation stefan.rusu@intel.com September 2001 Stefan Rusu 9/2001 2001 Intel Corp. Page 1 Agenda VLSI Technology Trends

More information

Improving the Processing Performance of a DSP for High Temperature Electronics using Circuit-Level Timing Speculation

Improving the Processing Performance of a DSP for High Temperature Electronics using Circuit-Level Timing Speculation Improving the Processing Performance of a DSP for High Temperature Electronics using Circuit-Level Timing Speculation Guillermo Payá-Vayá, Steffen Roskamp, Fritz Webering, and Holger Blume Payá-Vayá et

More information

A Complete Real-Time a Baseband Receiver Implemented on an Array of Programmable Processors

A Complete Real-Time a Baseband Receiver Implemented on an Array of Programmable Processors A Complete Real-Time 802.11a Baseband Receiver Implemented on an Array of Programmable Processors ACSSC 2008 Pacific Grove, CA Anh Tran, Dean Truong and Bevan Baas VLSI Computation Lab, ECE Department,

More information

Low Transistor Variability The Key to Energy Efficient ICs

Low Transistor Variability The Key to Energy Efficient ICs Low Transistor Variability The Key to Energy Efficient ICs 2 nd Berkeley Symposium on Energy Efficient Electronic Systems 11/3/11 Robert Rogenmoser, PhD 1 BEES_roro_G_111103 Copyright 2011 SuVolta, Inc.

More information

A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation

A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation Maziar Goudarzi, Tohru Ishihara, Hiroto Yasuura System LSI Research Center Kyushu

More information

POWER GATING. Power-gating parameters

POWER GATING. Power-gating parameters POWER GATING Power Gating is effective for reducing leakage power [3]. Power gating is the technique wherein circuit blocks that are not in use are temporarily turned off to reduce the overall leakage

More information

CSE 237A Winter 2018 Homework 1

CSE 237A Winter 2018 Homework 1 CSE 237A Winter 2018 Homework 1 Problem 1 [10 pts] a) As discussed in the lecture, ARM based systems are widely used in the embedded computing. Choose one embedded application and compare features (e.g.,

More information

The rise of always-listening sensors integrated in energy-scarce devices such as watches and remotecontrols

The rise of always-listening sensors integrated in energy-scarce devices such as watches and remotecontrols Context-Aware Hierarchical Information-Sensing in a 6 µw 9nm CMOS Voice Activity Detector Komail Badami, Steven Lauwereins, Wannes Meert, Marian Verhelst KU Leuven, Leuven, Belgium The rise of always-listening

More information

Design and Implement of Low Power Consumption SRAM Based on Single Port Sense Amplifier in 65 nm

Design and Implement of Low Power Consumption SRAM Based on Single Port Sense Amplifier in 65 nm Journal of Computer and Communications, 2015, 3, 164-168 Published Online November 2015 in SciRes. http://www.scirp.org/journal/jcc http://dx.doi.org/10.4236/jcc.2015.311026 Design and Implement of Low

More information

Low Power VLSI Circuit Synthesis: Introduction and Course Outline

Low Power VLSI Circuit Synthesis: Introduction and Course Outline Low Power VLSI Circuit Synthesis: Introduction and Course Outline Ajit Pal Professor Department of Computer Science and Engineering Indian Institute of Technology Kharagpur INDIA -721302 Agenda Why Low

More information

A Data Remanence based Approach to Generate 100% Stable Keys from an SRAM Physical Unclonable Function

A Data Remanence based Approach to Generate 100% Stable Keys from an SRAM Physical Unclonable Function A Data Remanence based Approach to Generate 100% Stable Keys from an SRAM Physical Unclonable Function Muqing Liu, Chen Zhou, Qianying Tang, Keshab K. Parhi and Chris H. Kim University of Minnesota, Twin

More information

A Low Power and High Speed Viterbi Decoder Based on Deep Pipelined, Clock Blocking and Hazards Filtering

A Low Power and High Speed Viterbi Decoder Based on Deep Pipelined, Clock Blocking and Hazards Filtering Int. J. Communications, Network and System Sciences, 2009, 6, 575-582 doi:10.4236/ijcns.2009.26064 Published Online September 2009 (http://www.scirp.org/journal/ijcns/). 575 A Low Power and High Speed

More information

Enhancing System Architecture by Modelling the Flash Translation Layer

Enhancing System Architecture by Modelling the Flash Translation Layer Enhancing System Architecture by Modelling the Flash Translation Layer Robert Sykes Sr. Dir. Firmware August 2014 OCZ Storage Solutions A Toshiba Group Company Introduction This presentation will discuss

More information

A FFT/IFFT Soft IP Generator for OFDM Communication System

A FFT/IFFT Soft IP Generator for OFDM Communication System A FFT/IFFT Soft IP Generator for OFDM Communication System Tsung-Han Tsai, Chen-Chi Peng and Tung-Mao Chen Department of Electrical Engineering, National Central University Chung-Li, Taiwan Abstract: -

More information

DESIGN FOR LOW-POWER USING MULTI-PHASE AND MULTI- FREQUENCY CLOCKING

DESIGN FOR LOW-POWER USING MULTI-PHASE AND MULTI- FREQUENCY CLOCKING 3 rd Int. Conf. CiiT, Molika, Dec.12-15, 2002 31 DESIGN FOR LOW-POWER USING MULTI-PHASE AND MULTI- FREQUENCY CLOCKING M. Stojčev, G. Jovanović Faculty of Electronic Engineering, University of Niš Beogradska

More information

Multi-core Platforms for

Multi-core Platforms for 20 JUNE 2011 Multi-core Platforms for Immersive-Audio Applications Course: Advanced Computer Architectures Teacher: Prof. Cristina Silvano Student: Silvio La Blasca 771338 Introduction on Immersive-Audio

More information

Computer Aided Design of Electronics

Computer Aided Design of Electronics Computer Aided Design of Electronics [Datorstödd Elektronikkonstruktion] Zebo Peng, Petru Eles, and Nima Aghaee Embedded Systems Laboratory IDA, Linköping University www.ida.liu.se/~tdts01 Electronic Systems

More information

A Framework for Assessing the Feasibility of Learning Algorithms in Power-Constrained ASICs

A Framework for Assessing the Feasibility of Learning Algorithms in Power-Constrained ASICs A Framework for Assessing the Feasibility of Learning Algorithms in Power-Constrained ASICs 1 Introduction Alexander Neckar with David Gal, Eric Glass, and Matt Murray (from EE382a) Whether due to injury

More information

Embedded Systems. 9. Power and Energy. Lothar Thiele. Computer Engineering and Networks Laboratory

Embedded Systems. 9. Power and Energy. Lothar Thiele. Computer Engineering and Networks Laboratory Embedded Systems 9. Power and Energy Lothar Thiele Computer Engineering and Networks Laboratory General Remarks 9 2 Power and Energy Consumption Statements that are true since a decade or longer: Power

More information

Image Processing Architectures (and their future requirements)

Image Processing Architectures (and their future requirements) Lecture 16: Image Processing Architectures (and their future requirements) Visual Computing Systems Smart phone processing resources Example SoC: Qualcomm Snapdragon Image credit: Qualcomm Apple A7 (iphone

More information

Fault Tolerance and Reliability Techniques for High-Density Random-Access Memories (Hardcover) by Kanad Chakraborty, Pinaki Mazumder

Fault Tolerance and Reliability Techniques for High-Density Random-Access Memories (Hardcover) by Kanad Chakraborty, Pinaki Mazumder 1 of 6 12/10/06 10:11 PM Fault Tolerance and Reliability Techniques for High-Density Random-Access Memories (Hardcover) by Kanad Chakraborty, Pinaki Mazumder (1 customer review) To learn more about the

More information

A Multiple SIMD Mesh Architecture for Multi-Channel Radar Processing

A Multiple SIMD Mesh Architecture for Multi-Channel Radar Processing A Multiple SIMD Mesh Architecture for Multi-Channel Radar Processing Mikael Taveniku 2,3, Anders Åhlander 1, Magnus Jonsson 1 and Bertil Svensson 1,2 1. Centre for Computer Architecture, Halmstad University,

More information

THE content-addressable memory (CAM) is one of the most

THE content-addressable memory (CAM) is one of the most 254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 1, JANUARY 2005 A 0.7-fJ/Bit/Search 2.2-ns Search Time Hybrid-Type TCAM Architecture Sungdae Choi, Kyomin Sohn, and Hoi-Jun Yoo Abstract This paper

More information

A Novel Technique to Reduce Write Delay of SRAM Architectures

A Novel Technique to Reduce Write Delay of SRAM Architectures A Novel Technique to Reduce Write Delay of SRAM Architectures SWAPNIL VATS AND R.K. CHAUHAN * Department of Electronics and Communication Engineering M.M.M. Engineering College, Gorahpur-73 010, U.P. INDIA

More information

Exploring Computation- Communication Tradeoffs in Camera Systems

Exploring Computation- Communication Tradeoffs in Camera Systems Exploring Computation- Communication Tradeoffs in Camera Systems Amrita Mazumdar Thierry Moreau Sung Kim Meghan Cowan Armin Alaghi Luis Ceze Mark Oskin Visvesh Sathe IISWC 2017 1 Camera applications are

More information

A wide-range all-digital duty-cycle corrector with output clock phase alignment in 65 nm CMOS technology

A wide-range all-digital duty-cycle corrector with output clock phase alignment in 65 nm CMOS technology A wide-range all-digital duty-cycle corrector with output clock phase alignment in 65 nm CMOS technology Ching-Che Chung 1a), Duo Sheng 2, and Sung-En Shen 1 1 Department of Computer Science & Information

More information

Semantic Segmentation on Resource Constrained Devices

Semantic Segmentation on Resource Constrained Devices Semantic Segmentation on Resource Constrained Devices Sachin Mehta University of Washington, Seattle In collaboration with Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi Project

More information

Detection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 -

Detection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 - Lecture 11: Detection and Segmentation Lecture 11-1 May 10, 2017 Administrative Midterms being graded Please don t discuss midterms until next week - some students not yet taken A2 being graded Project

More information

A 0.9 V Low-power 16-bit DSP Based on a Top-down Design Methodology

A 0.9 V Low-power 16-bit DSP Based on a Top-down Design Methodology UDC 621.3.049.771.14:621.396.949 A 0.9 V Low-power 16-bit DSP Based on a Top-down Design Methodology VAtsushi Tsuchiya VTetsuyoshi Shiota VShoichiro Kawashima (Manuscript received December 8, 1999) A 0.9

More information

Design and FPGA Implementation of an Adaptive Demodulator. Design and FPGA Implementation of an Adaptive Demodulator

Design and FPGA Implementation of an Adaptive Demodulator. Design and FPGA Implementation of an Adaptive Demodulator Design and FPGA Implementation of an Adaptive Demodulator Sandeep Mukthavaram August 23, 1999 Thesis Defense for the Degree of Master of Science in Electrical Engineering Department of Electrical Engineering

More information

Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising

Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising Peng Liu University of Florida pliu1@ufl.edu Ruogu Fang University of Florida ruogu.fang@bme.ufl.edu arxiv:177.9135v1 [cs.cv]

More information

Datorstödd Elektronikkonstruktion

Datorstödd Elektronikkonstruktion Datorstödd Elektronikkonstruktion [Computer Aided Design of Electronics] Zebo Peng, Petru Eles and Gert Jervan Embedded Systems Laboratory IDA, Linköping University http://www.ida.liu.se/~tdts80/~tdts80

More information

Cherry Picking: Exploiting Process Variations in the Dark Silicon Era

Cherry Picking: Exploiting Process Variations in the Dark Silicon Era Cherry Picking: Exploiting Process Variations in the Dark Silicon Era Siddharth Garg University of Waterloo Co-authors: Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu # Transistors Power/Dark

More information

Ultralow-Power and Robust Embedded Memory for Bioimplantable Microsystems

Ultralow-Power and Robust Embedded Memory for Bioimplantable Microsystems 2013 26th International Conference on VLSI Design and the 12th International Conference on Embedded Systems Ultralow-Power and Robust Embedded Memory for Bioimplantable Microsystems Maryam S. Hashemian

More information

Design and Implementation of Signal Processing Systems: An Introduction

Design and Implementation of Signal Processing Systems: An Introduction Design and Implementation of Signal Processing Systems: An Introduction Yu Hen Hu (c) 1997-2013 by Yu Hen Hu 1 Outline Course Objectives and Outline, Conduct What is signal processing? Implementation Options

More information

VLSI System Testing. Outline

VLSI System Testing. Outline ECE 538 VLSI System Testing Krish Chakrabarty System-on-Chip (SOC) Testing ECE 538 Krish Chakrabarty 1 Outline Motivation for modular testing of SOCs Wrapper design IEEE 1500 Standard Optimization Test

More information

MULTI-PORT MEMORY DESIGN FOR ADVANCED COMPUTER ARCHITECTURES. by Yirong Zhao Bachelor of Science, Shanghai Jiaotong University, P. R.

MULTI-PORT MEMORY DESIGN FOR ADVANCED COMPUTER ARCHITECTURES. by Yirong Zhao Bachelor of Science, Shanghai Jiaotong University, P. R. MULTI-PORT MEMORY DESIGN FOR ADVANCED COMPUTER ARCHITECTURES by Yirong Zhao Bachelor of Science, Shanghai Jiaotong University, P. R. China, 2011 Submitted to the Graduate Faculty of the Swanson School

More information

Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance

Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance Michael D. Powell, Arijit Biswas, Shantanu Gupta, and Shubu Mukherjee SPEARS Group, Intel Massachusetts EECS, University

More information

Gates Hall Phone: +1(650) Serra Mall, Room

Gates Hall Phone: +1(650) Serra Mall, Room Mingyu Gao Gates Hall Phone: +1(650)862-0664 353 Serra Mall, Room 318 Email: mgao12@stanford.edu Stanford, CA, 94305 https://www.stanford.edu/ mgao12 Research Interests Computer architecture and systems

More information

A fully synthesizable injection-locked PLL with feedback current output DAC in 28 nm FDSOI

A fully synthesizable injection-locked PLL with feedback current output DAC in 28 nm FDSOI LETTER IEICE Electronics Express, Vol.1, No.15, 1 11 A fully synthesizable injection-locked PLL with feedback current output DAC in 8 nm FDSOI Dongsheng Yang a), Wei Deng, Aravind Tharayil Narayanan, Rui

More information

Lecture #29. Moore s Law

Lecture #29. Moore s Law Lecture #29 ANNOUNCEMENTS HW#15 will be for extra credit Quiz #6 (Thursday 5/8) will include MOSFET C-V No late Projects will be accepted after Thursday 5/8 The last Coffee Hour will be held this Thursday

More information