Creating Intelligence at the Edge
|
|
- Sherilyn Dorsey
- 5 years ago
- Views:
Transcription
1 Creating Intelligence at the Edge Vladimir Stojanović E3S Retreat September 8, 2017
2 The growing importance of machine learning Page 2
3 Applications exploding in the cloud Huge interest to move to the edge Big issues speed, available power for local computation Opportunity for new technologies to help enter E3S Page 3
4 Deep Neural Network Operation Each layer learns to do a single thing (extract feature, compose objects, etc). Page 4
5 Deep Neural Network Example: AlexNet Key operations Feature extraction Classification Convolution (feature extraction - convolutional layers) Matrix-Vector multiply (classification - fully-connected layers) Page 5
6 Classes and Uses of Neural Nets Multi-layer Perceptron LeNet-300 hand-written digit recognition, fully-connected layers, feed-forward Recurrent Neural Networks Long-short term memory, voice recognition, natural language processing (FC layers with feedback) Convolutional Neural Networks Image processing, computer vision (e.g. AlexNet) [Jouppi et al, ISCA 2017] Page 6
7 Intelligence confined to the Cloud However, even in the cloud, special hardware is needed [Facebook BigSur] Page 7
8 [Nvidia Volta] 120TFlops/s Hardware specialization in the Cloud: Repurposing Graphics Processing Units Specialized TensorCores Optimize Matrix-vector multiplies High-bandwidth memory close to GPU chip on the interposer Specialized GPU-GPU interconnect (Nvlink) [Microsoft HGX-1] Page 8
9 Hardware specialization in the Cloud: Custom ASICs (Tensor Processing Unit) Custom chip specializing in Matrix-vector operations, lots of onchip memory Optimized for inference 8-bit integer arithmetic Custom interconnect Page 9
10 Why specialized hardware in the Cloud? [Jouppi et al, ISCA 2017] TPU GPU CPU Saves energy (primary OpEx) Speeds-up discovery of new networks (faster training) Offer as a service (inference TPUs, FPGAs, training GPUs, FPGAs) Page 10
11 TPU architecture [Jouppi et al, ISCA 2017] Weight coefficients streamed from the outside Significant energy burned on weight fetch Large-on chip memory for intermediate results 8-bit systolic datapath partially saves energy on SRAM writes Page 11
12 Limitations of centralized intelligence Costs at the mobile terminal communicating with central server/cloud Local energy/op ~1pJ Energy per bit To register - 100fJ To SRAM - 1pJ To DRAM - 20pJ Latency To register 100ps To SRAM 1ns To DRAM 100ns Over radio: >1uJ To local node >1ms To Cloud >50ms Page 12
13 Opportunity for Autonomous Intelligence at the Edge The rise of autonomous systems Great need for local intelligence Need to be extremely energy-efficient and low-latency Page 13
14 How to justify specialization? Volumes are huge (starting with smartphones) every major hw manufacturer looking into these now Lifetime is short (IoT and smartphone chips change at most every 3 years) Can afford to create custom chips that are very efficient Page 14
15 How hard do we need to work to make it happen? TPU2 has 40W for sustained ~10TOp/s At 20Hz, and 10 9 weights, and 200 Ops/weight Need 50 x 10 9 x 200 Op/s = 10 TOp/s 50ms x 40W = 2J/inference Radio energy to send an HD image for inference (~ a few J/image) How to make this work with 0.5W of power budget (e.g. a minidrone, a phone or 0.05W for an IoT device)? TPU2 still fetches all weights from DRAM (approx. third the power above) The rest spent on logic and local SRAM stores Page 15
16 Opportunity for E3S: Cross-layer design E3S strength in seeing the big picture from devices to systemlevel Cross-layer design needed to solve this problem Page 16
17 AlexNet1000 revisited >90% parameters in FC layers Speed bottleneck in FC layers (large matrices) Problems: 1) weight fetching 2) storing of partial results >90% operations in Conv layers Page 17
18 Algorithmic improvements: Making the problem smaller with compression FC layers are huge and sparse: use compression (pruning) [Han et al. 2015] Running a 1 billion connection neural network (a bit larger than AlexNet), at 20Hz would require: (20Hz)(1G)(640pJ) = 12.8W! Page 18
19 Problems with focusing on single layer e.g. threshold pruning (Han et al) Resulting matrix is sparse random (unpredictable locations) locations Sparsity adds significant hw overhead, prevents systolic implementation with local coefs Techniques that are oblivious to underlying hardware do not yield desired improvements Threshold pruning example Shrinks the coefficient storage Marginally improves the speed due to random nature of the resulting coefficient locations Page 19
20 Randomized sparsity affects performance Techniques that are oblivious to underlying hardware do not yield desired improvements Improvements only a fraction of the pruning ratio Threshold pruning example Shrinks the coefficient storage Marginally improves the speed due to random nature of the resulting coefficient locations Page 20
21 A better way: Architecture-aware compression Need an approach that exposes hardware features at the FC layer level Example sub-block matrix Good fit for parallel hardware (multiple-threads, systolic, etc) Example 8x structured FC layer compression However, accuracy significantly degraded with this FC layer topology Page 21
22 Scaffold pruning example: Permutation-based pruning Need a transformation that randomizes the hardware-friendly structure to capture the FC information Simple example block permutation Architecture-friendly template (efficient implementation) Random row-column permutation Algorithm-friendly template (good accuracy) Page 22
23 Scaffold training: PBP training results Many random permutations work Example - LeNet-300 on MNIST dataset (10x compression) Page 23
24 PBP impact on micro-architecture Input vector shuffle MxV Multiply-accumulates with local weight storage (dense sub-blocks) Output vector shuffle Algorithmic transformations enable simple, systolic (flow-through) architecture with in-situ coefficient storage (minimal energy) Fixed or reconfigurable input/output shuffle (permutation) Already runs 3-7x faster than sparse on mobile GPUs Page 24
25 Need for local storage new devices Spin devices or vertical NEM relays great candidates 10 9 weights more than enough to represent today s and future networks At 10x structured compression, 10 8 weights at 4-8 bits, still 0.5-1GB! Need to look across layers to optimize the resolution Local, NV storage to minimize fetch energy during run-time and power-cycling Page 25
26 NV reconfigurable logic Potentially use NV logic to fuse weight storage and multiply Input string: Answers: 1 0 (a) BL 1 I A 0 I A 1 I B 0 I B 1 I C 0 I C 1 I D 0 I D 1 I E 0 I E 1 O X 0 O X 1 O Y 0 O Y 1 Read enable (RE) PG BL BL BL 32 : Current to pull up BL : Current to pull down OL PG PG Pass gate (PG) S S S ( ) h f ( d ) b d K. Kato et al Embedded Nano-Electro-Mechanical Memory (b) d for Energy-Efficient Reconfigurable Logic EDL 2016 Page 26
27 Need for reconfigurable interconnect new devices Reconfigurable shuffle allows infrequent but efficient network change Adds flexibility to the custom chip Can utilize vertical NEM relays Page 27
28 Where next? Cross-layer design, model and compare with optimized CMOS A concrete system benchmark for E3S Page 28
Binary Neural Network and Its Implementation with 16 Mb RRAM Macro Chip
Binary Neural Network and Its Implementation with 16 Mb RRAM Macro Chip Assistant Professor of Electrical Engineering and Computer Engineering shimengy@asu.edu http://faculty.engineering.asu.edu/shimengyu/
More informationAn energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet
LETTER IEICE Electronics Express, Vol.14, No.15, 1 12 An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet Boya Zhao a), Mingjiang Wang b), and Ming Liu Harbin
More informationDeep Learning. Dr. Johan Hagelbäck.
Deep Learning Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Image Classification Image classification can be a difficult task Some of the challenges we have to face are: Viewpoint variation:
More informationConvolution Engine: Balancing Efficiency and Flexibility in Specialized Computing
Convolution Engine: Balancing Efficiency and Flexibility in Specialized Computing Paper by: Wajahat Qadeer Rehan Hameed Ofer Shacham Preethi Venkatesan Christos Kozyrakis Mark Horowitz Presentation by:
More informationExploring Computation- Communication Tradeoffs in Camera Systems
Exploring Computation- Communication Tradeoffs in Camera Systems Amrita Mazumdar Thierry Moreau Sung Kim Meghan Cowan Armin Alaghi Luis Ceze Mark Oskin Visvesh Sathe IISWC 2017 1 Camera applications are
More informationDEEP LEARNING ON RF DATA. Adam Thompson Senior Solutions Architect March 29, 2018
DEEP LEARNING ON RF DATA Adam Thompson Senior Solutions Architect March 29, 2018 Background Information Signal Processing and Deep Learning Radio Frequency Data Nuances AGENDA Complex Domain Representations
More informationProposers Day Workshop
Proposers Day Workshop Monday, January 23, 2017 @srcjump, #JUMPpdw Cognitive Computing Vertical Research Center Mandy Pant Academic Research Director Intel Corporation Center Motivation Today s deep learning
More informationHarnessing the Power of AI: An Easy Start with Lattice s sensai
Harnessing the Power of AI: An Easy Start with Lattice s sensai A Lattice Semiconductor White Paper. January 2019 Artificial intelligence, or AI, is everywhere. It s a revolutionary technology that is
More informationArtificial Neural Networks. Artificial Intelligence Santa Clara, 2016
Artificial Neural Networks Artificial Intelligence Santa Clara, 2016 Simulate the functioning of the brain Can simulate actual neurons: Computational neuroscience Can introduce simplified neurons: Neural
More informationComputational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs
5 th International Conference on Logic and Application LAP 2016 Dubrovnik, Croatia, September 19-23, 2016 Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs
More informationREVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.
December 3-6, 2018 Santa Clara Convention Center CA, USA REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND. https://tmt.knect365.com/risc-v-summit @risc_v ACCELERATING INFERENCING ON THE EDGE WITH RISC-V
More informationDesign of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm
Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm Vijay Kumar Ch 1, Leelakrishna Muthyala 1, Chitra E 2 1 Research Scholar, VLSI, SRM University, Tamilnadu, India 2 Assistant Professor,
More informationDeep Learning Overview
Deep Learning Overview Eliu Huerta Gravity Group gravity.ncsa.illinois.edu National Center for Supercomputing Applications Department of Astronomy University of Illinois at Urbana-Champaign Data Visualization
More informationA NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS
G RAMESH et al, Volume 2, Issue 7, PP:, SEPTEMBER 2014. A NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS G.Ramesh 1*, K.Naga Lakshmi 2* 1. II. M.Tech (VLSI), Dept of ECE, AM Reddy Memorial College
More informationCreating the Right Environment for Machine Learning Codesign. Cliff Young, Google AI
Creating the Right Environment for Machine Learning Codesign Cliff Young, Google AI 1 Deep Learning has Reinvigorated Hardware GPUs AlexNet, Speech. TPUs Many Google applications: AlphaGo and Translate,
More informationLecture Perspectives. Administrivia
Lecture 29-30 Perspectives Administrivia Final on Friday May 18 12:30-3:30 pm» Location: 251 Hearst Gym Topics all what was covered in class. Review Session Time and Location TBA Lab and hw scores to be
More information[Devi*, 5(4): April, 2016] ISSN: (I2OR), Publication Impact Factor: 3.785
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY DESIGN OF HIGH SPEED FIR FILTER ON FPGA BY USING MULTIPLEXER ARRAY OPTIMIZATION IN DA-OBC ALGORITHM Palepu Mohan Radha Devi, Vijay
More informationNano-device and Architecture Interaction in Machine/deep Learning
Nano-device and Architecture Interaction in Machine/deep Learning Assistant Professor of Electrical Engineering and Computer Engineering shimengy@asu.edu http://faculty.engineering.asu.edu/shimengyu/ 12/13/2017
More informationLecture 30. Perspectives. Digital Integrated Circuits Perspectives
Lecture 30 Perspectives Administrivia Final on Friday December 15 8 am Location: 251 Hearst Gym Topics all what was covered in class. Precise reading information will be posted on the web-site Review Session
More informationCPSC 340: Machine Learning and Data Mining. Convolutional Neural Networks Fall 2018
CPSC 340: Machine Learning and Data Mining Convolutional Neural Networks Fall 2018 Admin Mike and I finish CNNs on Wednesday. After that, we will cover different topics: Mike will do a demo of training
More informationResearch on Hand Gesture Recognition Using Convolutional Neural Network
Research on Hand Gesture Recognition Using Convolutional Neural Network Tian Zhaoyang a, Cheng Lee Lung b a Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China E-mail address:
More informationEmbracing Diversity: Enhanced DSP Blocks for Low-Precision Deep Learning on FPGAs
Embracing Diversity: Enhanced DSP Blocks for Low-Precision Deep Learning on FPGAs Andrew Boutros, Sadegh Yazdanshenas, and Vaughn Betz Department of Electrical and Computer Engineering, University of Toronto,
More informationEmbedding Artificial Intelligence into Our Lives
Embedding Artificial Intelligence into Our Lives Michael Thompson, Synopsys D&R IP-SOC DAYS Santa Clara April 2018 1 Agenda Introduction What AI is and is Not Where AI is being used Rapid Advance of AI
More informationResearch Statement. Sorin Cotofana
Research Statement Sorin Cotofana Over the years I ve been involved in computer engineering topics varying from computer aided design to computer architecture, logic design, and implementation. In the
More informationNeural Networks The New Moore s Law
Neural Networks The New Moore s Law Chris Rowen, PhD, FIEEE CEO Cognite Ventures December 216 Outline Moore s Law Revisited: Efficiency Drives Productivity Embedded Neural Network Product Segments Efficiency
More informationCourse Outcome of M.Tech (VLSI Design)
Course Outcome of M.Tech (VLSI Design) PVL108: Device Physics and Technology The students are able to: 1. Understand the basic physics of semiconductor devices and the basics theory of PN junction. 2.
More informationEE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling
EE241 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolic Lecture 15 Low-Power Design: Supply Voltage Scaling Announcements Homework #2 due today Midterm project reports due next Thursday
More informationRadio Deep Learning Efforts Showcase Presentation
Radio Deep Learning Efforts Showcase Presentation November 2016 hume@vt.edu www.hume.vt.edu Tim O Shea Senior Research Associate Program Overview Program Objective: Rethink fundamental approaches to how
More informationVideo Enhancement Algorithms on System on Chip
International Journal of Scientific and Research Publications, Volume 2, Issue 4, April 2012 1 Video Enhancement Algorithms on System on Chip Dr.Ch. Ravikumar, Dr. S.K. Srivatsa Abstract- This paper presents
More informationRANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM
RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM Fengbin Tu, Weiwei Wu, Shouyi Yin, Leibo Liu, Shaojun Wei Institute of Microelectronics Tsinghua University The 45th International
More informationRamon Canal NCD Master MIRI. NCD Master MIRI 1
Wattch, Hotspot, Hotleakage, McPAT http://www.eecs.harvard.edu/~dbrooks/wattch-form.html http://lava.cs.virginia.edu/hotspot http://lava.cs.virginia.edu/hotleakage http://www.hpl.hp.com/research/mcpat/
More informationMultiband NFC for High-Throughput Wireless Computer Vision Sensor Network
Multiband NFC for High-Throughput Wireless Computer Vision Sensor Network Fei Y. Li, Jason Y. Du 09212020027@fudan.edu.cn Vision sensors lie in the heart of computer vision. In many computer vision applications,
More informationA NOVEL VISION SYSTEM-ON-CHIP FOR EMBEDDED IMAGE ACQUISITION AND PROCESSING
A NOVEL VISION SYSTEM-ON-CHIP FOR EMBEDDED IMAGE ACQUISITION AND PROCESSING Neuartiges System-on-Chip für die eingebettete Bilderfassung und -verarbeitung Dr. Jens Döge, Head of Image Acquisition and Processing
More informationLecture 17 Convolutional Neural Networks
Lecture 17 Convolutional Neural Networks 30 March 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/22 Notes: Problem set 6 is online and due next Friday, April 8th Problem sets 7,8, and 9 will be due
More informationEECS 427 Lecture 22: Low and Multiple-Vdd Design
EECS 427 Lecture 22: Low and Multiple-Vdd Design Reading: 11.7.1 EECS 427 W07 Lecture 22 1 Last Time Low power ALUs Glitch power Clock gating Bus recoding The low power design space Dynamic vs static EECS
More informationExploring High-Speed Low-Power Hybrid Arithmetic Units at Scaled Supply and Adaptive Clock-Stretching
Exploring High-Speed Low-Power Hybrid Arithmetic Units at Scaled Supply and Adaptive Clock-Stretching Swaroop Ghosh and Kaushik Roy School of Electrical and Computer Engineering, Purdue University, West
More informationMICROELECTRONIC IMPLEMENTATIONS OF CONNECTIONIST NEURAL NETWORKS
515 MICROELECTRONIC IMPLEMENTATIONS OF CONNECTIONIST NEURAL NETWORKS Stuart Mackie, Hans P. Graf, Daniel B. Schwartz, and John S. Denker AT&T Bell Labs, Holmdel, NJ 07733 Abstract In this paper we discuss
More informationCHAPTER 4 FIELD PROGRAMMABLE GATE ARRAY IMPLEMENTATION OF FIVE LEVEL CASCADED MULTILEVEL INVERTER
87 CHAPTER 4 FIELD PROGRAMMABLE GATE ARRAY IMPLEMENTATION OF FIVE LEVEL CASCADED MULTILEVEL INVERTER 4.1 INTRODUCTION The Field Programmable Gate Array (FPGA) is a high performance data processing general
More informationUSING EMBEDDED PROCESSORS IN HARDWARE MODELS OF ARTIFICIAL NEURAL NETWORKS
USING EMBEDDED PROCESSORS IN HARDWARE MODELS OF ARTIFICIAL NEURAL NETWORKS DENIS F. WOLF, ROSELI A. F. ROMERO, EDUARDO MARQUES Universidade de São Paulo Instituto de Ciências Matemáticas e de Computação
More informationKeywords SEFDM, OFDM, FFT, CORDIC, FPGA.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Future to
More informationVector Arithmetic Logic Unit Amit Kumar Dutta JIS College of Engineering, Kalyani, WB, India
Vol. 2 Issue 2, December -23, pp: (75-8), Available online at: www.erpublications.com Vector Arithmetic Logic Unit Amit Kumar Dutta JIS College of Engineering, Kalyani, WB, India Abstract: Real time operation
More informationLow Transistor Variability The Key to Energy Efficient ICs
Low Transistor Variability The Key to Energy Efficient ICs 2 nd Berkeley Symposium on Energy Efficient Electronic Systems 11/3/11 Robert Rogenmoser, PhD 1 BEES_roro_G_111103 Copyright 2011 SuVolta, Inc.
More informationJDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS
JDT-002-2013 EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS E. Prakash 1, R. Raju 2, Dr.R. Varatharajan 3 1 PG Student, Department of Electronics and Communication Engineeering
More informationarxiv: v1 [cs.ce] 9 Jan 2018
Predict Forex Trend via Convolutional Neural Networks Yun-Cheng Tsai, 1 Jun-Hao Chen, 2 Jun-Jie Wang 3 arxiv:1801.03018v1 [cs.ce] 9 Jan 2018 1 Center for General Education 2,3 Department of Computer Science
More informationSno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations
Sno Projects List IEEE 1 High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations 2 A Generalized Algorithm And Reconfigurable Architecture For Efficient And Scalable
More informationHardware-Software Co-Design Cosynthesis and Partitioning
Hardware-Software Co-Design Cosynthesis and Partitioning EE8205: Embedded Computer Systems http://www.ee.ryerson.ca/~courses/ee8205/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer
More informationIntroduction to Machine Learning
Introduction to Machine Learning Deep Learning Barnabás Póczos Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2
More informationCHAPTER 3 NEW SLEEPY- PASS GATE
56 CHAPTER 3 NEW SLEEPY- PASS GATE 3.1 INTRODUCTION A circuit level design technique is presented in this chapter to reduce the overall leakage power in conventional CMOS cells. The new leakage po leepy-
More informationBuilding Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics
Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics Christopher Batten 1, Ajay Joshi 1, Jason Orcutt 1, Anatoly Khilo 1 Benjamin Moss 1, Charles Holzwarth 1, Miloš Popović 1,
More informationChallenges in Transition
Challenges in Transition Keynote talk at International Workshop on Software Engineering Methods for Parallel and High Performance Applications (SEM4HPC 2016) 1 Kazuaki Ishizaki IBM Research Tokyo kiszk@acm.org
More informationAn Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors
An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors T.N.Priyatharshne Prof. L. Raja, M.E, (Ph.D) A. Vinodhini ME VLSI DESIGN Professor, ECE DEPT ME VLSI DESIGN
More informationImage processing. Case Study. 2-diemensional Image Convolution. From a hardware perspective. Often massively yparallel.
Case Study Image Processing Image processing From a hardware perspective Often massively yparallel Can be used to increase throughput Memory intensive Storage size Memory bandwidth -diemensional Image
More informationData-Starved Artificial Intelligence
Data-Starved Artificial Intelligence Data-Starved Artificial Intelligence This material is based upon work supported by the Assistant Secretary of Defense for Research and Engineering under Air Force Contract
More informationEnergy efficient multi-granular arithmetic in a coarse-grain reconfigurable architecture
Eindhoven University of Technology MASTER Energy efficient multi-granular arithmetic in a coarse-grain reconfigurable architecture Louwers, S.T. Award date: 216 Link to publication Disclaimer This document
More information신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일
신경망기반자동번역기술 Konkuk University Computational Intelligence Lab. http://ci.konkuk.ac.kr kikim01@kunkuk.ac.kr 김강일 Index Issues in AI and Deep Learning Overview of Machine Translation Advanced Techniques in
More informationOpen Source Digital Camera on Field Programmable Gate Arrays
Open Source Digital Camera on Field Programmable Gate Arrays Cristinel Ababei, Shaun Duerr, Joe Ebel, Russell Marineau, Milad Ghorbani Moghaddam, and Tanzania Sewell Dept. of Electrical and Computer Engineering,
More informationCHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES
69 CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES 4.1 INTRODUCTION Multiplication is one of the basic functions used in digital signal processing. It requires more
More informationAI Application Processing Requirements
AI Application Processing Requirements 1 Low Medium High Sensor analysis Activity Recognition (motion sensors) Stress Analysis or Attention Analysis Audio & sound Speech Recognition Object detection Computer
More informationTable of Contents HOL EMT
Table of Contents Lab Overview - - Machine Learning Workloads in vsphere Using GPUs - Getting Started... 2 Lab Guidance... 3 Module 1 - Machine Learning Apps in vsphere VMs Using GPUs (15 minutes)...9
More informationMACHINE LEARNING Games and Beyond. Calvin Lin, NVIDIA
MACHINE LEARNING Games and Beyond Calvin Lin, NVIDIA THE MACHINE LEARNING ERA IS HERE And it is transforming every industry... including Game Development OVERVIEW NVIDIA Volta: An Architecture for Machine
More informationElectronic Circuits EE359A
Electronic Circuits EE359A Bruce McNair B206 bmcnair@stevens.edu 201-216-5549 1 Memory and Advanced Digital Circuits - 2 Chapter 11 2 Figure 11.1 (a) Basic latch. (b) The latch with the feedback loop opened.
More informationEfficient Deep Learning in Communications
Fraunhofer Image Processing Heinrich Hertz Institute Efficient Deep Learning in Communications Dr. Wojciech Samek Fraunhofer HHI, Machine Learning Group Fraunhofer Heinrich Hertz Institute, Einsteinufer
More informationArtificial Intelligence and Deep Learning
Artificial Intelligence and Deep Learning Cars are now driving themselves (far from perfectly, though) Speaking to a Bot is No Longer Unusual March 2016: World Go Champion Beaten by Machine AI: The Upcoming
More informationCS4617 Computer Architecture
1/26 CS4617 Computer Architecture Lecture 2 Dr J Vaughan September 10, 2014 2/26 Amdahl s Law Speedup = Execution time for entire task without using enhancement Execution time for entire task using enhancement
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 1, January 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Design of Digital
More informationGESTURE RECOGNITION WITH 3D CNNS
April 4-7, 2016 Silicon Valley GESTURE RECOGNITION WITH 3D CNNS Pavlo Molchanov Xiaodong Yang Shalini Gupta Kihwan Kim Stephen Tyree Jan Kautz 4/6/2016 Motivation AGENDA Problem statement Selecting the
More informationThe Metrics and Designs of an Arithmetic Logic Function over
The Metrics and Designs of an Arithmetic Logic Function over 2002-2015 Jimmy Vallejo Department of Electrical and Computer Engineering University of Central Flida Orlando, FL 32816-2362 Abstract There
More informationUNIT-III POWER ESTIMATION AND ANALYSIS
UNIT-III POWER ESTIMATION AND ANALYSIS In VLSI design implementation simulation software operating at various levels of design abstraction. In general simulation at a lower-level design abstraction offers
More informationDESIGN & IMPLEMENTATION OF SELF TIME DUMMY REPLICA TECHNIQUE IN 128X128 LOW VOLTAGE SRAM
DESIGN & IMPLEMENTATION OF SELF TIME DUMMY REPLICA TECHNIQUE IN 128X128 LOW VOLTAGE SRAM 1 Mitali Agarwal, 2 Taru Tevatia 1 Research Scholar, 2 Associate Professor 1 Department of Electronics & Communication
More informationEXACT SIGNAL RECOVERY FROM SPARSELY CORRUPTED MEASUREMENTS
EXACT SIGNAL RECOVERY FROM SPARSELY CORRUPTED MEASUREMENTS THROUGH THE PURSUIT OF JUSTICE Jason Laska, Mark Davenport, Richard Baraniuk SSC 2009 Collaborators Mark Davenport Richard Baraniuk Compressive
More informationBiologically Inspired Computation
Biologically Inspired Computation Deep Learning & Convolutional Neural Networks Joe Marino biologically inspired computation biological intelligence flexible capable of detecting/ executing/reasoning about
More informationTraining Steps Files File Type File Count Total Size L3 embedding knowledge distillation (SONYC) Google audioset (environmental)
PI: Justification for 30 TB Storage Request (1) Project space needs and file sizes This storage request is in relation to our ongoing effort in training deep learning models on large datasets in non-speech
More informationUnderstanding Neural Networks : Part II
TensorFlow Workshop 2018 Understanding Neural Networks Part II : Convolutional Layers and Collaborative Filters Nick Winovich Department of Mathematics Purdue University July 2018 Outline 1 Convolutional
More informationAn Optimized Design for Parallel MAC based on Radix-4 MBA
An Optimized Design for Parallel MAC based on Radix-4 MBA R.M.N.M.Varaprasad, M.Satyanarayana Dept. of ECE, MVGR College of Engineering, Andhra Pradesh, India Abstract In this paper a novel architecture
More informationA SCALABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye
A SCALABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS Theepan Moorthy and Andy Ye Department of Electrical and Computer Engineering Ryerson University 350
More informationLecture 6: Electronics Beyond the Logic Switches Xufeng Kou School of Information Science and Technology ShanghaiTech University
Lecture 6: Electronics Beyond the Logic Switches Xufeng Kou School of Information Science and Technology ShanghaiTech University EE 224 Solid State Electronics II Lecture 3: Lattice and symmetry 1 Outline
More informationPrototype Fabrication of Field-Programmable Digital Filter LSIs Using Multiple-Valued Current-Mode Logic Device Scaling and Future Prospects
D36(Degawa) J. of Mult.-Valued Logic & Soft Computing. April 3, 5 4:39 J. of Mult.-Valued Logic & Soft Computing., Vol., pp. Reprints available directly from the publisher Photocopying permitted by license
More informationGPU-accelerated track reconstruction in the ALICE High Level Trigger
GPU-accelerated track reconstruction in the ALICE High Level Trigger David Rohr for the ALICE Collaboration Frankfurt Institute for Advanced Studies CHEP 2016, San Francisco ALICE at the LHC The Large
More informationCROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen
CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS Kuan-Chuan Peng and Tsuhan Chen Cornell University School of Electrical and Computer Engineering Ithaca, NY 14850
More information11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO
Introduction to RNNs for NLP SHANG GAO About Me PhD student in the Data Science and Engineering program Took Deep Learning last year Work in the Biomedical Sciences, Engineering, and Computing group at
More informationSmall World Network Architectures. NIPS 2017 Workshop
Small World Network Architectures NIPS 2017 Workshop Small World Networks We'd like to explore training models with very wide hidden states. More active memory, more information bandwidth, more easily
More informationAuthor(s) Corr, Philip J.; Silvestre, Guenole C.; Bleakley, Christopher J. The Irish Pattern Recognition & Classification Society
Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please cite the published version when available. Title Open Source Dataset and Deep Learning Models
More informationLow Power Embedded Systems in Bioimplants
Low Power Embedded Systems in Bioimplants Steven Bingler Eduardo Moreno 1/32 Why is it important? Lower limbs amputation is a major impairment. Prosthetic legs are passive devices, they do not do well
More informationDESIGN OF LOW POWER MULTIPLIER USING COMPOUND CONSTANT DELAY LOGIC STYLE
DESIGN OF LOW POWER MULTIPLIER USING COMPOUND CONSTANT DELAY LOGIC STYLE 1 S. DARWIN, 2 A. BENO, 3 L. VIJAYA LAKSHMI 1 & 2 Assistant Professor Electronics & Communication Engineering Department, Dr. Sivanthi
More informationProgrammable Interconnect. CPE/EE 428, CPE 528: Session #13. Actel Programmable Interconnect. Actel Programmable Interconnect
Programmable Interconnect CPE/EE 428, CPE 528: Session #13 Department of Electrical and Computer Engineering University of Alabama in Huntsville In addition to programmable cells, programmable ASICs must
More informationASIC Implementation of High Speed Area Efficient Arithmetic Unit using GDI based Vedic Multiplier
INTERNATIONAL JOURNAL OF APPLIED RESEARCH AND TECHNOLOGY ISSN 2519-5115 RESEARCH ARTICLE ASIC Implementation of High Speed Area Efficient Arithmetic Unit using GDI based Vedic Multiplier 1 M. Sangeetha
More informationA HIGH PERFORMANCE HARDWARE ARCHITECTURE FOR HALF-PIXEL ACCURATE H.264 MOTION ESTIMATION
A HIGH PERFORMANCE HARDWARE ARCHITECTURE FOR HALF-PIXEL ACCURATE H.264 MOTION ESTIMATION Sinan Yalcin and Ilker Hamzaoglu Faculty of Engineering and Natural Sciences, Sabanci University, 34956, Tuzla,
More informationAim. Unit abstract. Learning outcomes. QCF level: 6 Credit value: 15
Unit T3: Microelectronics Unit code: A/503/7339 QCF level: 6 Credit value: 15 Aim The aim of this unit is to give learners an understanding of the manufacturing processes for and the purposes and limitations
More informationISSN Vol.03,Issue.02, February-2014, Pages:
www.semargroup.org, www.ijsetr.com ISSN 2319-8885 Vol.03,Issue.02, February-2014, Pages:0239-0244 Design and Implementation of High Speed Radix 8 Multiplier using 8:2 Compressors A.M.SRINIVASA CHARYULU
More informationMultiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1 Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator
More informationOn Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI
ELEN 689 606 Techniques for Layout Synthesis and Simulation in EDA Project Report On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital
More informationA Technical Perspective on Cognitive Architectures
A Technical Perspective on Cognitive Architectures March 14, 2015 Guna Seetharaman Ph.D., FIEEE Information Intelligence and Analysis Division Information Directorate, Rome, NY Gunasekaran.seetharaman@us.af.mil
More informationHigh Speed Binary Counters Based on Wallace Tree Multiplier in VHDL
High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL E.Sangeetha 1 ASP and D.Tharaliga 2 Department of Electronics and Communication Engineering, Tagore College of Engineering and Technology,
More informationDIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N
DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N Jan M. Rabaey, Anantha Chandrakasan, and Borivoje Nikolic CONTENTS PART I: THE FABRICS Chapter 1: Introduction (32 pages) 1.1 A Historical
More informationA Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics
A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics Ossia, SA; Shamsabadi, AS; Taheri, A; Rabiee, HR; Lane, N; Haddadi, H The Author(s) 2017 For additional information about this
More informationLeakage Power Minimization in Deep-Submicron CMOS circuits
Outline Leakage Power Minimization in Deep-Submicron circuits Politecnico di Torino Dip. di Automatica e Informatica 1019 Torino, Italy enrico.macii@polito.it Introduction. Design for low leakage: Basics.
More informationHigh Performance Computing for Engineers
High Performance Computing for Engineers David Thomas dt10@ic.ac.uk / https://github.com/m8pple Room 903 http://cas.ee.ic.ac.uk/people/dt10/teaching/2014/hpce HPCE / dt10/ 2015 / 0.1 High Performance Computing
More informationBricken Technologies Corporation Presentations: Bricken Technologies Corporation Corporate: Bricken Technologies Corporation Marketing:
TECHNICAL REPORTS William Bricken compiled 2004 Bricken Technologies Corporation Presentations: 2004: Synthesis Applications of Boundary Logic 2004: BTC Board of Directors Technical Review (quarterly)
More informationDeep Neural Network Architectures for Modulation Classification
Deep Neural Network Architectures for Modulation Classification Xiaoyu Liu, Diyu Yang, and Aly El Gamal School of Electrical and Computer Engineering Purdue University Email: {liu1962, yang1467, elgamala}@purdue.edu
More informationTopics. Memory Reliability and Yield Control Logic. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut
Topics Memory Reliability and Yield Control Logic Reliability and Yield Noise Sources in T DRam BL substrate Adjacent BL C WBL α-particles WL leakage C S electrode C cross Transposed-Bitline Architecture
More information