REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.
|
|
- Imogene Barton
- 5 years ago
- Views:
Transcription
1 December 3-6, 2018 Santa Clara Convention Center CA, USA REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.
2 ACCELERATING INFERENCING ON THE EDGE WITH RISC-V Russell Klein Technical Director Mentor, A Siemens Company
3 Accelerating inferencing Inferencing algorithms exceed the performance capabilities of embedded processors We examine creating accelerators as: RoCC co-processor for RISC-V Bus-based peripheral Instruction extensions were considered Was not a good fit for the algorithms we were working on Would need to contort things too far
4 Driving automation Already have 6 HD cameras at 30 frames/sec in ADAS 1920 x 1080 pixels x 30 x 3 colors (RGB) 186,624,000 pixels/sec for a single camera >1 billion pixels per second for the car More Radar-Lidar-cameras added at every level Camera resolutions and frame rates expected to double every few years
5 All embedded systems are not equal Should get smart enough not to wake me when the neighbor s cat is on the front porch at 4 a.m. but wake me when a person is there Inferencing rates of 1 or 2 seconds is fine <1 million pixels per second 5
6 Yolo Tiny (v3) Algorithm for detecting and classifying objects in pictures Used on cell phones and computationally limited systems Over 5.5 billion floating point operations per inference Over 36 million weight and bias values Neural network has 23 layers Full Yolo has 106 layers
7 Yolo-Tiny Profile
8 What is convolution? Multiply one array by another, element by element, and sum the results Source: Embedded-Vision.com
9 Algorithm Description 3x3 convolution with an optional 2x2 max pooling Kernel Image
10 Algorithm Description 3x3 convolution with an optional 2x2 max pooling Kernel 0 x 1 = 0 1 x 2 = 2 2 x 3 = 6 10 x 4 = x 5 = x 6 = x 7 = x 8 = x 9 = sum 681 Image
11 Algorithm Description 3x3 convolution with an optional 2x2 max pooling Kernel 1 x 1 = 1 2 x 2 = 4 3 x 3 = 9 11 x 4 = x 5 = x 6 = x 7 = x 8 = x 9 = sum 726 Image
12 Algorithm Description 3x3 convolution with an optional 2x2 max pooling Kernel 10 x 1 = x 2 = x 3 = x 4 = x 5 = x 6 = x 7 = x 8 = x 9 = sum 1131 Image
13 Algorithm Description 3x3 convolution with an optional 2x2 max pooling Kernel 11 x 1 = x 2 = x 3 = x 4 = x 5 = x 6 = x 7 = x 8 = x 9 = sum 1174 Image
14 Algorithm Description Repeat across the entire image Kernel Image
15 Algorithm Description Repeat across the entire image Kernel Image
16 Algorithm Description Repeat across the entire image Kernel Image
17 Algorithm Description Repeat across the entire image Kernel Image
18 Some observations For the 416x416 pixel image a 3x3 convolution requires 1,557,504 multiplications It is embarrassingly parallel, all multiplies could be done in parallel If you could get 1.5 million multipliers on a piece of silicon If you could get the data to and from the multipliers This is 0.002% of the multiplies for the inference
19 High Level Synthesis Enables faster design Write more abstractly, i.e. less coding Synthesis handles many details you probably don t want to Creates control/data flow graph from original C/C++ Automatically constructs: parallelism pipelining resource sharing Infers/interfaces memories
20 High Level Synthesis Makes verification easier Reuse stimulus from original C/C++ code Original C/C++ can be an oracle for created RTL Can enable formal verification CDFG can be used to compare against implemented RTL Even if complete equivalence cannot be proved, reduces verification space Original Algorithm Original Testbench Compare Transactor HLS RTL Transactor
21 80% reduction in verification effort Time Traditional RTL Functional Regression 3 months 1000 CPUs Time HLS C++ Functional Regression using formal 2 weeks 14 CPUs Resources Resources NVIDIA Xavier 12nFF SoC NVIDIA Case Study available on mentor.com
22 Yolo Tiny 1000 image verification suite Original C implementation on desktop computer 1.5 seconds per inference ~25 minutes Behavioral C implementation with overhead 5 seconds per inference - ~1 hour 25 minutes RTL implementation 389 minutes per inference - ~9 months Verifying the behavioral code and proving equivalence is much, much faster
23 3x3 convolve + 2x2 max pool
24 High Level Synthesis By unrolling loops and pipelining created various implementation with no changes to source code: 1 84 clocks 4 22 clocks 9 8 clocks 36 3 clocks 216 2/clock 36 multiplier schematic
25 Operation sizing Move from floating point to fixed point Select fixed point precision to meet needs of the algorithm For Yolo-Tiny accuracy improves moving from 32 bit floating point to 20 bit fixed point Fixed point algorithms can be verified on the desk-top Open source fixed point library: Multiplier costs in area and power a roughly proportional to the square of the size of the operands. Moving from 32 bits to 12 bits is about 1/7 th the area and power
26 RoCC accelerator Created accelerator in HDL using HLS Interface was AC channel (ready/valid/data) Good match for RISC-V protocols Instantiated protocol converter between accelerator and RoCC interface Interacted with custom0 instruction No change to compliers Accessed kernel and image data through L1 data cache port
27 RoCC Accelerator Details Followed excellent directions from Colin Schmidt from 2015 rocket chip tutorial bootcamp And course notes from CS250 at Berkeley and github examples. Defined method for creating and interfacing an RoCC accelerator They explain it so much better than I can Many thanks
28 Software vs. Accelerator Software was executing 1 multiply in 7 instructions Found by examination of object code from optimized compile Accelerator was executing 36 parallel multiplies in one clock Then adds and compares in 2 more clocks Accelerator was about 100x faster As expected: 3 clock cycles per conv/max_pool for accelerator 7 clock cycles x 36 multiplies = 252, plus some loop control for software
29 But wait, where s my performance Team member computed 3 x number of conv/map_pools per inference and clock cycles and found we were 4x too slow Looked at bus, nice burst cycles, almost 100 utilization Looked at cache misses and found lots of them
30 What s going on?
31 What s going on?
32 What s going on?
33 What s going on?
34 What s going on? All these pixels are on different cache lines At 100% of bandwidth on bus we would keep accelerator fed. With cache thrashing, accelerator was starving.
35 What to do about it? Create a shift register 3 lines + 4 pixels long Add 2 pixels for each computation
36 What to do about it? Create a shift register 3 lines + 4 pixels long Add 2 pixels for each computation
37 What to do about it? Create a shift register 3 lines + 4 pixels long Add 2 pixels for each computation Makes caches much happier
38 memory memory memory What to do about it? Too any registers to be efficient (1260) Add memories where there are no multiplier taps * * * Requires 7 array declarations in C++, the large arrays are automatically put into memories
39 Peripheral Accelerator Alternate approach for accelerator is to attach it to the bus. Instead of custom0 instructions, read and write to memory mapped control registers Accelerator has master interface to read image and kernel data directly from memory, but no longer shares caches with Rocket Core Core tile to interconnect was bottleneck 3x3 conv 2x2 max accel
40 Memory Architecture and Power Considerations Keeping data local is key to minimizing power consumption Very important for ASIC Floating-point is costly Used in training of networks Not needed in network inference engine, use fixed point Fixed-point doesn't need to be power-of-two For custom hardware can be anything *NVIDIA 2017
41 Yolo-Tiny Profile Small number of coefficients in early layers Small(ish) number of features in later layers
42 Reordering Loops to Minimize Weight Reads Loops are organized so that weights are only read once from system DRAM Weights are held stationary across feature maps Feature maps are computed in order and stored in local SRAM Original Algorithm OUT_CHAN:for(int oc=0;oc<out_channels;oc++){ FMAP_HEIGHT:for(int r=0;r<in_height;r++){ FMAP_WIDTH:for(int c=0;c<in_width+1;c++){ IN_CHAN:for(int ic=0;ic<in_channels;ic++){ KERNEL_Y:for(int i=0;i<3;i++){ KERNEL_X:for(int j=0;j<3;j++){ acc+=fmap[ic][r-i/2][c-j/2]*kernel[ic][oc][i][j] } } } fmap_out[d][r][c] = acc; } } OUT_CHAN:for(int oc=0;oc<max_out_chan;oc++){ KERNEL_Y:for(int i=-1;i<2;i++){ KERNEL_X:for(int j=-1;j<2;j++){ FMAP_HEIGHT:for(int r=0;r<max_height;r++){ FMAP_WIDTH:for(int c=0;c<max_width;c++){ IN_CHAN:for(int ic=0;ic!=max_in_chan;ic+=pack){ < FMAP ping-pong memory read PACK values > < Read and cache weights once for reuse > MAC:for(int p=0;p<pack;p++) acc += fmap_data[p] * kernel_data[p]; } acc_mem[r][c] += acc;//fmap accumulator memory } } }
43 In-place Architecture Layers processed one after another Feature maps stored locally in SRAM Weights read from system memory Layer Control Kernel I/F Feature memory A Multiply/Accumul ate Engine Feature memory B Ping-Pong memory
44 Reorganize Feature Maps Feature maps are interleaved by input channels Example: PACK=4, input chan=8, fmaps come in order Pack 4 input channels side-by-side i0 i1 i2 i3 i4 i5 i6 i7 mem Feature maps
45 Conclusion Built accelerator for 3x3 convolution and 2x2 max pool Comprises 96% of the load for object recognition algorithm 36 parallel multipliers, pipelined adder tree and comparitors Implemented as a RoCC accelerator Shares L1 cache with core, so a good choice when co-operating with software on common data Implemented as a bus based peripheral Has independent bus interface, so a good choice when communication bound High-level synthesis Enables highly customized accelerators Accommodates last minute changes Targets ASICs, efpgas, and FPGAs
46 THANK YOU
Digital Systems Design
Digital Systems Design Digital Systems Design and Test Dr. D. J. Jackson Lecture 1-1 Introduction Traditional digital design Manual process of designing and capturing circuits Schematic entry System-level
More informationLecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques.
Introduction EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Techniques Cristian Grecu grecuc@ece.ubc.ca Course web site: http://courses.ece.ubc.ca/353/ What have you learned so far?
More informationHardware-based Image Retrieval and Classifier System
Hardware-based Image Retrieval and Classifier System Jason Isaacs, Joe Petrone, Geoffrey Wall, Faizal Iqbal, Xiuwen Liu, and Simon Foo Department of Electrical and Computer Engineering Florida A&M - Florida
More informationCourse Outcome of M.Tech (VLSI Design)
Course Outcome of M.Tech (VLSI Design) PVL108: Device Physics and Technology The students are able to: 1. Understand the basic physics of semiconductor devices and the basics theory of PN junction. 2.
More informationHardware Implementation of Automatic Control Systems using FPGAs
Hardware Implementation of Automatic Control Systems using FPGAs Lecturer PhD Eng. Ionel BOSTAN Lecturer PhD Eng. Florin-Marian BÎRLEANU Romania Disclaimer: This presentation tries to show the current
More informationLecture 1: Introduction to Digital System Design & Co-Design
Design & Co-design of Embedded Systems Lecture 1: Introduction to Digital System Design & Co-Design Computer Engineering Dept. Sharif University of Technology Winter-Spring 2008 Mehdi Modarressi Topics
More informationTechnology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.
FPGAs 1 CMPE 415 Technology Timeline 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs FPGAs The Design Warrior s Guide
More informationConvolution Engine: Balancing Efficiency and Flexibility in Specialized Computing
Convolution Engine: Balancing Efficiency and Flexibility in Specialized Computing Paper by: Wajahat Qadeer Rehan Hameed Ofer Shacham Preethi Venkatesan Christos Kozyrakis Mark Horowitz Presentation by:
More informationChapter 6: DSP And Its Impact On Technology. Book: Processor Design Systems On Chip. By Jari Nurmi
Chapter 6: DSP And Its Impact On Technology Book: Processor Design Systems On Chip Computing For ASICs And FPGAs By Jari Nurmi Slides Prepared by: Omer Anjum Introduction The early beginning g of DSP DSP
More informationIntroduction to co-simulation. What is HW-SW co-simulation?
Introduction to co-simulation CPSC489-501 Hardware-Software Codesign of Embedded Systems Mahapatra-TexasA&M-Fall 00 1 What is HW-SW co-simulation? A basic definition: Manipulating simulated hardware with
More informationAudio Sample Rate Conversion in FPGAs
Audio Sample Rate Conversion in FPGAs An efficient implementation of audio algorithms in programmable logic. by Philipp Jacobsohn Field Applications Engineer Synplicity eutschland GmbH philipp@synplicity.com
More informationDatorstödd Elektronikkonstruktion
Datorstödd Elektronikkonstruktion [Computer Aided Design of Electronics] Zebo Peng, Petru Eles and Gert Jervan Embedded Systems Laboratory IDA, Linköping University http://www.ida.liu.se/~tdts80/~tdts80
More informationImage processing. Case Study. 2-diemensional Image Convolution. From a hardware perspective. Often massively yparallel.
Case Study Image Processing Image processing From a hardware perspective Often massively yparallel Can be used to increase throughput Memory intensive Storage size Memory bandwidth -diemensional Image
More informationComputer Aided Design of Electronics
Computer Aided Design of Electronics [Datorstödd Elektronikkonstruktion] Zebo Peng, Petru Eles, and Nima Aghaee Embedded Systems Laboratory IDA, Linköping University www.ida.liu.se/~tdts01 Electronic Systems
More informationUSING EMBEDDED PROCESSORS IN HARDWARE MODELS OF ARTIFICIAL NEURAL NETWORKS
USING EMBEDDED PROCESSORS IN HARDWARE MODELS OF ARTIFICIAL NEURAL NETWORKS DENIS F. WOLF, ROSELI A. F. ROMERO, EDUARDO MARQUES Universidade de São Paulo Instituto de Ciências Matemáticas e de Computação
More informationLecture 1. Tinoosh Mohsenin
Lecture 1 Tinoosh Mohsenin Today Administrative items Syllabus and course overview Digital systems and optimization overview 2 Course Communication Email Urgent announcements Web page http://www.csee.umbc.edu/~tinoosh/cmpe650/
More informationEE382V-ICS: System-on-a-Chip (SoC) Design
EE38V-CS: System-on-a-Chip (SoC) Design Hardware Synthesis and Architectures Source: D. Gajski, S. Abdi, A. Gerstlauer, G. Schirner, Embedded System Design: Modeling, Synthesis, Verification, Chapter 6:
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1.1 Project Background High speed multiplication is another critical function in a range of very large scale integration (VLSI) applications. Multiplications are expensive and slow
More informationEECS150 - Digital Design Lecture 28 Course Wrap Up. Recap 1
EECS150 - Digital Design Lecture 28 Course Wrap Up Dec. 5, 2013 Prof. Ronald Fearing Electrical Engineering and Computer Sciences University of California, Berkeley (slides courtesy of Prof. John Wawrzynek)
More informationNeural Networks The New Moore s Law
Neural Networks The New Moore s Law Chris Rowen, PhD, FIEEE CEO Cognite Ventures December 216 Outline Moore s Law Revisited: Efficiency Drives Productivity Embedded Neural Network Product Segments Efficiency
More informationHarnessing the Power of AI: An Easy Start with Lattice s sensai
Harnessing the Power of AI: An Easy Start with Lattice s sensai A Lattice Semiconductor White Paper. January 2019 Artificial intelligence, or AI, is everywhere. It s a revolutionary technology that is
More informationAn area optimized FIR Digital filter using DA Algorithm based on FPGA
An area optimized FIR Digital filter using DA Algorithm based on FPGA B.Chaitanya Student, M.Tech (VLSI DESIGN), Department of Electronics and communication/vlsi Vidya Jyothi Institute of Technology, JNTU
More informationCreating Intelligence at the Edge
Creating Intelligence at the Edge Vladimir Stojanović E3S Retreat September 8, 2017 The growing importance of machine learning Page 2 Applications exploding in the cloud Huge interest to move to the edge
More information(VE2: Verilog HDL) Software Development & Education Center
Software Development & Education Center (VE2: Verilog HDL) VLSI Designing & Integration Introduction VLSI: With the hardware market booming with the rise demand in chip driven products in consumer electronics,
More informationVLSI System Testing. Outline
ECE 538 VLSI System Testing Krish Chakrabarty System-on-Chip (SOC) Testing ECE 538 Krish Chakrabarty 1 Outline Motivation for modular testing of SOCs Wrapper design IEEE 1500 Standard Optimization Test
More informationFPGA Circuits. na A simple FPGA model. nfull-adder realization
FPGA Circuits na A simple FPGA model nfull-adder realization ndemos Presentation References n Altera Training Course Designing With Quartus-II n Altera Training Course Migrating ASIC Designs to FPGA n
More informationA Survey on Power Reduction Techniques in FIR Filter
A Survey on Power Reduction Techniques in FIR Filter 1 Pooja Madhumatke, 2 Shubhangi Borkar, 3 Dinesh Katole 1, 2 Department of Computer Science & Engineering, RTMNU, Nagpur Institute of Technology Nagpur,
More informationRANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM
RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM Fengbin Tu, Weiwei Wu, Shouyi Yin, Leibo Liu, Shaojun Wei Institute of Microelectronics Tsinghua University The 45th International
More information[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Design of Wallace Tree Multiplier using Compressors K.Gopi Krishna *1, B.Santhosh 2, V.Sridhar 3 gopikoleti@gmail.com Abstract
More informationVideo Enhancement Algorithms on System on Chip
International Journal of Scientific and Research Publications, Volume 2, Issue 4, April 2012 1 Video Enhancement Algorithms on System on Chip Dr.Ch. Ravikumar, Dr. S.K. Srivatsa Abstract- This paper presents
More informationA HIGH PERFORMANCE HARDWARE ARCHITECTURE FOR HALF-PIXEL ACCURATE H.264 MOTION ESTIMATION
A HIGH PERFORMANCE HARDWARE ARCHITECTURE FOR HALF-PIXEL ACCURATE H.264 MOTION ESTIMATION Sinan Yalcin and Ilker Hamzaoglu Faculty of Engineering and Natural Sciences, Sabanci University, 34956, Tuzla,
More informationAn Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors
An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors T.N.Priyatharshne Prof. L. Raja, M.E, (Ph.D) A. Vinodhini ME VLSI DESIGN Professor, ECE DEPT ME VLSI DESIGN
More informationDesign of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm
Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm Vijay Kumar Ch 1, Leelakrishna Muthyala 1, Chitra E 2 1 Research Scholar, VLSI, SRM University, Tamilnadu, India 2 Assistant Professor,
More informationPhysics Based Sensor simulation
Physics Based Sensor simulation Jordan Gorrochotegui - Product Manager Software and Services Mike Phillips Software Engineer Restricted Siemens AG 2017 Realize innovation. Siemens offers solutions across
More informationDesign of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing
Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing Yelle Harika M.Tech, Joginpally B.R.Engineering College. P.N.V.M.Sastry M.S(ECE)(A.U), M.Tech(ECE), (Ph.D)ECE(JNTUH), PG DIP
More informationASIC Implementation of High Throughput PID Controller
ASIC Implementation of High Throughput PID Controller 1 Chavan Suyog, 2 Sameer Nandagave, 3 P.Arunkumar 1,2 M.Tech Scholar, 3 Assistant Professor School of Electronics Engineering VLSI Division, VIT University,
More informationExploring Computation- Communication Tradeoffs in Camera Systems
Exploring Computation- Communication Tradeoffs in Camera Systems Amrita Mazumdar Thierry Moreau Sung Kim Meghan Cowan Armin Alaghi Luis Ceze Mark Oskin Visvesh Sathe IISWC 2017 1 Camera applications are
More informationENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER
ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER 1 ZUBER M. PATEL 1 S V National Institute of Technology, Surat, Gujarat, Inida E-mail: zuber_patel@rediffmail.com Abstract- This paper presents
More informationDetector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen
GIGA seminar 11.1.2010 Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen janne.janhunen@ee.oulu.fi 2 Outline Introduction Benefits and Challenges
More informationDigital Logic, Algorithms, and Functions for the CEBAF Upgrade LLRF System Hai Dong, Curt Hovater, John Musson, and Tomasz Plawski
Digital Logic, Algorithms, and Functions for the CEBAF Upgrade LLRF System Hai Dong, Curt Hovater, John Musson, and Tomasz Plawski Introduction: The CEBAF upgrade Low Level Radio Frequency (LLRF) control
More informationPolicy-Based RTL Design
Policy-Based RTL Design Bhanu Kapoor and Bernard Murphy bkapoor@atrenta.com Atrenta, Inc., 2001 Gateway Pl. 440W San Jose, CA 95110 Abstract achieving the desired goals. We present a new methodology to
More informationREAL TIME DIGITAL SIGNAL PROCESSING. Introduction
REAL TIME DIGITAL SIGNAL Introduction Why Digital? A brief comparison with analog. PROCESSING Seminario de Electrónica: Sistemas Embebidos Advantages The BIG picture Flexibility. Easily modifiable and
More informationAn energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet
LETTER IEICE Electronics Express, Vol.14, No.15, 1 12 An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet Boya Zhao a), Mingjiang Wang b), and Ming Liu Harbin
More informationIJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN
An efficient add multiplier operator design using modified Booth recoder 1 I.K.RAMANI, 2 V L N PHANI PONNAPALLI 2 Assistant Professor 1,2 PYDAH COLLEGE OF ENGINEERING & TECHNOLOGY, Visakhapatnam,AP, India.
More informationMulti-core Platforms for
20 JUNE 2011 Multi-core Platforms for Immersive-Audio Applications Course: Advanced Computer Architectures Teacher: Prof. Cristina Silvano Student: Silvio La Blasca 771338 Introduction on Immersive-Audio
More informationEmbedding Artificial Intelligence into Our Lives
Embedding Artificial Intelligence into Our Lives Michael Thompson, Synopsys D&R IP-SOC DAYS Santa Clara April 2018 1 Agenda Introduction What AI is and is Not Where AI is being used Rapid Advance of AI
More informationCMOS VLSI IC Design. A decent understanding of all tasks required to design and fabricate a chip takes years of experience
CMOS VLSI IC Design A decent understanding of all tasks required to design and fabricate a chip takes years of experience 1 Commonly used keywords INTEGRATED CIRCUIT (IC) many transistors on one chip VERY
More informationCyclone II Filtering Lab
May 2005, ver. 1.0 Application Note 376 Introduction The Cyclone II filtering lab design provided in the DSP Development Kit, Cyclone II Edition, shows you how to use the Altera DSP Builder for system
More informationPerformance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL
Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL E.Deepthi, V.M.Rani, O.Manasa Abstract: This paper presents a performance analysis of carrylook-ahead-adder and carry
More informationDesign of High-Performance HOG Feature Calculation Circuit for Real-Time Pedestrian Detection *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 31, 2055-2073 (2015) Design of High-Performance HOG Feature Calculation Circuit for Real-Time Pedestrian Detection * SOOJIN KIM AND KYEONGSOON CHO + Department
More informationHARDWARE ACCELERATION OF THE GIPPS MODEL
HARDWARE ACCELERATION OF THE GIPPS MODEL FOR REAL-TIME TRAFFIC SIMULATION Salim Farah 1 and Magdy Bayoumi 2 The Center for Advanced Computer Studies, University of Louisiana at Lafayette, USA 1 snf3346@cacs.louisiana.edu
More informationDESIGN OF MULTIPLE CONSTANT MULTIPLICATION ALGORITHM FOR FIR FILTER
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 3, March 2014,
More informationDesign and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors
Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors M.Satheesh, D.Sri Hari Student, Dept of Electronics and Communication Engineering, Siddartha Educational Academy
More informationECE6332 VLSI Eric Zhang & Xinfei Guo Design Review
Summaries: [1] Xiaoxiao Zhang, Amine Bermak, Farid Boussaid, "Dynamic Voltage and Frequency Scaling for Low-power Multi-precision Reconfigurable Multiplier", in Proc. of 2010 IEEE International Symposium
More informationEnhancing System Architecture by Modelling the Flash Translation Layer
Enhancing System Architecture by Modelling the Flash Translation Layer Robert Sykes Sr. Dir. Firmware August 2014 OCZ Storage Solutions A Toshiba Group Company Introduction This presentation will discuss
More informationCS Computer Architecture Spring Lecture 04: Understanding Performance
CS 35101 Computer Architecture Spring 2008 Lecture 04: Understanding Performance Taken from Mary Jane Irwin (www.cse.psu.edu/~mji) and Kevin Schaffer [Adapted from Computer Organization and Design, Patterson
More information4.4 Implementation Structures in FPGAs and DSPs. Presented by Lee Pucker President, ForwardLink Consulting
4.4 Implementation Structures in FPGAs and DSPs Presented by Lee Pucker President, ForwardLink Consulting Agenda Case Study on Implementation Structures Synchronization in a GSM Network Option 1: DSP Implementation
More informationDesign of High-Performance Intra Prediction Circuit for H.264 Video Decoder
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.9, NO.4, DECEMBER, 2009 187 Design of High-Performance Intra Prediction Circuit for H.264 Video Decoder Jihye Yoo, Seonyoung Lee, and Kyeongsoon Cho
More informationBricken Technologies Corporation Presentations: Bricken Technologies Corporation Corporate: Bricken Technologies Corporation Marketing:
TECHNICAL REPORTS William Bricken compiled 2004 Bricken Technologies Corporation Presentations: 2004: Synthesis Applications of Boundary Logic 2004: BTC Board of Directors Technical Review (quarterly)
More informationPerformance Analysis of FIR Filter Design Using Reconfigurable Mac Unit
Volume 4 Issue 4 December 2016 ISSN: 2320-9984 (Online) International Journal of Modern Engineering & Management Research Website: www.ijmemr.org Performance Analysis of FIR Filter Design Using Reconfigurable
More informationCHAPTER 4 FIELD PROGRAMMABLE GATE ARRAY IMPLEMENTATION OF FIVE LEVEL CASCADED MULTILEVEL INVERTER
87 CHAPTER 4 FIELD PROGRAMMABLE GATE ARRAY IMPLEMENTATION OF FIVE LEVEL CASCADED MULTILEVEL INVERTER 4.1 INTRODUCTION The Field Programmable Gate Array (FPGA) is a high performance data processing general
More informationImplementing Logic with the Embedded Array
Implementing Logic with the Embedded Array in FLEX 10K Devices May 2001, ver. 2.1 Product Information Bulletin 21 Introduction Altera s FLEX 10K devices are the first programmable logic devices (PLDs)
More information10. DSP Blocks in Arria GX Devices
10. SP Blocks in Arria GX evices AGX52010-1.2 Introduction Arria TM GX devices have dedicated digital signal processing (SP) blocks optimized for SP applications requiring high data throughput. These SP
More information5G R&D at Huawei: An Insider Look
5G R&D at Huawei: An Insider Look Accelerating the move from theory to engineering practice with MATLAB and Simulink Huawei is the largest networking and telecommunications equipment and services corporation
More informationChapter 1 Introduction
Chapter 1 Introduction 1.1 Introduction There are many possible facts because of which the power efficiency is becoming important consideration. The most portable systems used in recent era, which are
More informationSimulation Performance Optimization of Virtual Prototypes Sammidi Mounika, B S Renuka
Simulation Performance Optimization of Virtual Prototypes Sammidi Mounika, B S Renuka Abstract Virtual prototyping is becoming increasingly important to embedded software developers, engineers, managers
More informationStudy on Digital Multiplier Architecture Using Square Law and Divide-Conquer Method
Study on Digital Multiplier Architecture Using Square Law and Divide-Conquer Method Yifei Sun 1,a, Shu Sasaki 1,b, Dan Yao 1,c, Nobukazu Tsukiji 1,d, Haruo Kobayashi 1,e 1 Division of Electronics and Informatics,
More informationUsing an FPGA based system for IEEE 1641 waveform generation
Using an FPGA based system for IEEE 1641 waveform generation Colin Baker EADS Test & Services (UK) Ltd 23 25 Cobham Road Wimborne, Dorset, UK colin.baker@eads-ts.com Ashley Hulme EADS Test Engineering
More informationA New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm
A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm V.Sandeep Kumar Assistant Professor, Indur Institute Of Engineering & Technology,Siddipet
More informationEnabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks
Enabling HighPerformance DSP Applications with Arria V or Cyclone V VariablePrecision DSP Blocks WP011591.0 White Paper This document highlights the benefits of variableprecision digital signal processing
More informationModified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen
Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen Abstract A new low area-cost FIR filter design is proposed using a modified Booth multiplier based on direct form
More informationCS4617 Computer Architecture
1/26 CS4617 Computer Architecture Lecture 2 Dr J Vaughan September 10, 2014 2/26 Amdahl s Law Speedup = Execution time for entire task without using enhancement Execution time for entire task using enhancement
More informationCHAPTER 4 GALS ARCHITECTURE
64 CHAPTER 4 GALS ARCHITECTURE The aim of this chapter is to implement an application on GALS architecture. The synchronous and asynchronous implementations are compared in FFT design. The power consumption
More informationDESIGN OF LOW POWER MULTIPLIER USING COMPOUND CONSTANT DELAY LOGIC STYLE
DESIGN OF LOW POWER MULTIPLIER USING COMPOUND CONSTANT DELAY LOGIC STYLE 1 S. DARWIN, 2 A. BENO, 3 L. VIJAYA LAKSHMI 1 & 2 Assistant Professor Electronics & Communication Engineering Department, Dr. Sivanthi
More informationDesign of Mixed-Signal Microsystems in Nanometer CMOS
Design of Mixed-Signal Microsystems in Nanometer CMOS Carl Grace Lawrence Berkeley National Laboratory August 2, 2012 DOE BES Neutron and Photon Detector Workshop Introduction Common themes in emerging
More informationAdvanced FPGA Design. Tinoosh Mohsenin CMPE 491/691 Spring 2012
Advanced FPGA Design Tinoosh Mohsenin CMPE 491/691 Spring 2012 Today Administrative items Syllabus and course overview Digital signal processing overview 2 Course Communication Email Urgent announcements
More informationLow Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS
Low Power Design Part I Introduction and VHDL design Ricardo Santos ricardo@facom.ufms.br LSCAD/FACOM/UFMS Motivation for Low Power Design Low power design is important from three different reasons Device
More informationModel checking in the cloud VIGYAN SINGHAL OSKI TECHNOLOGY
Model checking in the cloud VIGYAN SINGHAL OSKI TECHNOLOGY Views are biased by Oski experience Service provider, only doing model checking Using off-the-shelf tools (Cadence, Jasper, Mentor, OneSpin Synopsys)
More informationJournal of Engineering Science and Technology Review 9 (5) (2016) Research Article. L. Pyrgas, A. Kalantzopoulos* and E. Zigouris.
Jestr Journal of Engineering Science and Technology Review 9 (5) (2016) 51-55 Research Article Design and Implementation of an Open Image Processing System based on NIOS II and Altera DE2-70 Board L. Pyrgas,
More informationFIR Filter Design on Chip Using VHDL
FIR Filter Design on Chip Using VHDL Mrs.Vidya H. Deshmukh, Dr.Abhilasha Mishra, Prof.Dr.Mrs.A.S.Bhalchandra MIT College of Engineering, Aurangabad ABSTRACT This paper describes the design and implementation
More informationCHAPTER 5 IMPLEMENTATION OF MULTIPLIERS USING VEDIC MATHEMATICS
49 CHAPTER 5 IMPLEMENTATION OF MULTIPLIERS USING VEDIC MATHEMATICS 5.1 INTRODUCTION TO VHDL VHDL stands for VHSIC (Very High Speed Integrated Circuits) Hardware Description Language. The other widely used
More informationHigh Performance DSP Solutions for Ultrasound
High Performance DSP Solutions for Ultrasound By Hong-Swee Lim Senior Manager, DSP/Embedded Marketing Hong-Swee.Lim@xilinx.com 12 May 2008 DSP Performance Gap Performance (Algorithmic and Processor Forecast)
More informationOn Current Strategies for Hardware Acceleration of Digital Image Restoration Filters
On Current Strategies for Hardware Acceleration of Digital Image Restoration Filters ERIC GRANGER Laboratoire d imagerie, de vision et d intelligence artificielle Dépt. de génie de la production automatisée
More informationAerial Photographic System Using an Unmanned Aerial Vehicle
Aerial Photographic System Using an Unmanned Aerial Vehicle Second Prize Aerial Photographic System Using an Unmanned Aerial Vehicle Institution: Participants: Instructor: Chungbuk National University
More informationAN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER
AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER K. RAMAMOORTHY 1 T. CHELLADURAI 2 V. MANIKANDAN 3 1 Department of Electronics and Communication
More informationPE713 FPGA Based System Design
PE713 FPGA Based System Design Why VLSI? Dept. of EEE, Amrita School of Engineering Why ICs? Dept. of EEE, Amrita School of Engineering IC Classification ANALOG (OR LINEAR) ICs produce, amplify, or respond
More information23270: AUGMENTED REALITY FOR NAVIGATION AND INFORMATIONAL ADAS. Sergii Bykov Technical Lead Machine Learning 12 Oct 2017
23270: AUGMENTED REALITY FOR NAVIGATION AND INFORMATIONAL ADAS Sergii Bykov Technical Lead Machine Learning 12 Oct 2017 Product Vision Company Introduction Apostera GmbH with headquarter in Munich, was
More informationIJMIE Volume 2, Issue 5 ISSN:
Systematic Design of High-Speed and Low- Power Digit-Serial Multipliers VLSI Based Ms.P.J.Tayade* Dr. Prof. A.A.Gurjar** Abstract: Terms of both latency and power Digit-serial implementation styles are
More informationLow Power VLSI Circuit Synthesis: Introduction and Course Outline
Low Power VLSI Circuit Synthesis: Introduction and Course Outline Ajit Pal Professor Department of Computer Science and Engineering Indian Institute of Technology Kharagpur INDIA -721302 Agenda Why Low
More informationDesign and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2
IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 07, 2015 ISSN (online): 2321-0613 Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse
More informationA Review on Different Multiplier Techniques
A Review on Different Multiplier Techniques B.Sudharani Research Scholar, Department of ECE S.V.U.College of Engineering Sri Venkateswara University Tirupati, Andhra Pradesh, India Dr.G.Sreenivasulu Professor
More informationMultiplier Design and Performance Estimation with Distributed Arithmetic Algorithm
Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm M. Suhasini, K. Prabhu Kumar & P. Srinivas Department of Electronics & Comm. Engineering, Nimra College of Engineering
More informationMeeting the Challenges of Formal Verification
Meeting the Challenges of Formal Verification Doug Fisher Synopsys Jean-Marc Forey - Synopsys 23rd May 2013 Synopsys 2013 1 In the next 30 minutes... Benefits and Challenges of Formal Verification Meeting
More informationRapid FPGA Modem Design Techniques For SDRs Using Altera DSP Builder
Rapid FPGA Modem Design Techniques For SDRs Using Altera DSP Builder Steven W. Cox Joel A. Seely General Dynamics C4 Systems Altera Corporation 820 E. McDowell Road, MDR25 0 Innovation Dr Scottsdale, Arizona
More informationModel-Based Design for Sensor Systems
2009 The MathWorks, Inc. Model-Based Design for Sensor Systems Stephanie Kwan Applications Engineer Agenda Sensor Systems Overview System Level Design Challenges Components of Sensor Systems Sensor Characterization
More informationEE25266 ASIC/FPGA Chip Design. Designing a FIR Filter, FPGA in the Loop, Ethernet
EE25266 ASIC/FPGA Chip Design Mahdi Shabany Electrical Engineering Department Sharif University of Technology Assignment #8 Designing a FIR Filter, FPGA in the Loop, Ethernet Introduction In this lab,
More informationMulti-Channel FIR Filters
Chapter 7 Multi-Channel FIR Filters This chapter illustrates the use of the advanced Virtex -4 DSP features when implementing a widely used DSP function known as multi-channel FIR filtering. Multi-channel
More informationVLSI IMPLEMENTATION OF MODIFIED DISTRIBUTED ARITHMETIC BASED LOW POWER AND HIGH PERFORMANCE DIGITAL FIR FILTER Dr. S.Satheeskumaran 1 K.
VLSI IMPLEMENTATION OF MODIFIED DISTRIBUTED ARITHMETIC BASED LOW POWER AND HIGH PERFORMANCE DIGITAL FIR FILTER Dr. S.Satheeskumaran 1 K. Sasikala 2 1 Professor, Department of Electronics and Communication
More informationBen Baker. Sponsored by:
Ben Baker Sponsored by: Background Agenda GPU Computing Digital Image Processing at FamilySearch Potential GPU based solutions Performance Testing Results Conclusions and Future Work 2 CPU vs. GPU Architecture
More informationOverview of Design Methodology. A Few Points Before We Start 11/4/2012. All About Handling The Complexity. Lecture 1. Put things into perspective
Overview of Design Methodology Lecture 1 Put things into perspective ECE 156A 1 A Few Points Before We Start ECE 156A 2 All About Handling The Complexity Design and manufacturing of semiconductor products
More information