High Performance Computing for Engineers
|
|
- Cornelia Suzan Lyons
- 6 years ago
- Views:
Transcription
1 High Performance Computing for Engineers David Thomas / Room HPCE / dt10/ 2015 / 0.1
2 High Performance Computing for Engineers Research Testing communication protocols Evaluating signal-processing filters Simulating analogue and digital designs Tools CAD tools: synthesis, place-and-route, verification Libraries/toolboxes: filter design, compressive sensing Products Oil exploration and discovery Mobile-phone apps Financial computing HPCE / dt10/ 2015 / 0.2
3 High Performance Computing for Engineers Types of performance metrics Throughput Latency Power Design-time Capital and running costs Required versus desired performance Subject to a throughput of X, minimise average power Subject to a budget of Y, maximise energy efficiency Subject to Z development days, maximise throughput HPCE / dt10/ 2015 / 0.3
4 What is available to you Types of compute device Multi-core CPUs GPUs (Graphics Processing Units) MPPAs (Massively Parallel Processor Arrays) FPGAs (Field Programmable Gate Arrays) Types of compute system Embedded Systems Mobile Phones Tablets Laptops Grid computing Cloud computing HPCE / dt10/ 2015 / 0.4
5 HTC Droid DNA Snapdragon S4 Pro - CPU : Quad-core Krait (ARM derivative) - GPU : Adreno 320 GPU (OpenCL compatible) Images Copyright HTC and Qaulcomm HPCE / dt10/ 2015 / 0.5
6 Lenovo Thinkpad Edge E525 AMD Fusion A8-3500M - CPU : Quad-Core 2.4GHz Phenom-II - GPU : HD 6620G 400MHz (320 cores) Img: HPCE / dt10/ 2015 / 0.6
7 Imperial HPC Cluster cx2 - SGI Altix ICE 8200 EX Racks and racks of high-performance PCs x64 cores running at 3GHz Available to researchers and undergrads (if they ask nicely) Grid-management system Run program on 1000 PCs with one command HPCE / dt10/ 2015 / 0.7
8 Performance and Efficiency Relative to CPU Uniform Gaussian Exponential Mean (Geo) MPPA FPGA GPU Uniform Gaussian 345 Exponential Mean (Geo) FPGA GPU MPPA Performance Power Efficiency HPCE / dt10/ 2015 / 0.8
9 Design tradeoffs 1 Sequential SW 10 Performance hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2015 / 0.9
10 Design tradeoffs 1 10 Performance 100 Sequential SW Thread-based SW hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2015 / 0.10
11 Design tradeoffs 1 10 Performance 100 Sequential SW Thread-based SW hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2015 / 0.11
12 Design tradeoffs Task-based parallelism vs threads Easy to program (less time coding) 1 Easy to get right (less time testing) 10Many implementations and APIs Performance 100 Intel Threaded Building Blocks (TBB) Microsoft.NET Task Parallel Library 1000 OpenCL 1 hour 1 day 1 week 1 month Sequential SW Task-based SW Thread-based SW Design-time HPCE / dt10/ 2015 / 0.12
13 Design tradeoffs 1 10 Performance 100 Sequential SW Task-based SW Thread-based SW hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2015 / 0.13
14 Design tradeoffs 1 10 Performance 100 Sequential SW Task-based SW Thread-based SW GPU hour 1 day 1 week 1 month Design-time Src: NVIDIA CUDA Compute Unified Device Architecture, Programmers Guide HPCE / dt10/ 2015 / 0.14
15 Design tradeoffs 1 10 Performance 100 Sequential SW Task-based SW Thread-based SW GPU hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2015 / 0.15
16 Design tradeoffs 1 10 Performance Sequential SW Task-based SW Thread-based SW GPU FPGA 1 hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2015 / 0.16
17 Design tradeoffs 1 10 Performance Sequential SW Task-based SW Thread-based SW GPU FPGA 1 hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2015 / 0.17
18 What you will learn Systems: what high-performance systems are available Methods: how these systems can be programmed Practise: concrete experience with multi-core and GPUs Analysis: knowing what to use and when Tools: making better use of your time HPCE / dt10/ 2015 / 0.18
19 Developer productivity is also part of performance HPCE / dt10/ 2015 / 0.19
20 Re: XKCD - My Professional Context Undergraduate degree and PhD from Computing If pushed, I self-identify as a programmer Research focuses on hardware acceleration Both academic and industrial applications My motivation for this course Supervising final year project students Working with PhD students Talking to industry people HPCE / dt10/ 2015 / 0.20
21 Why are you here? HPCE / dt10/ 2015 / 0.21
22 Course Assessment 40% : Four short course-works to build skills Get familiar with environments and how to do common tasks Structured and quite linear should not be taxing Force people to do work earlier in term 40% : Two larger tasks to apply skills to real problems Allow demonstration of knowledge and skills Unstructured; open-ended; competitive; hard 20% : Oral assessment; individual 30 minutes each, will happen at the start of summer term Test ability to communicate about your code and solutions (Check that you did the work) HPCE / dt10/ 2015 / 0.22
23 Skills needed Basic programming If you can t program in _any_ language then worry Intel TBB uses C++ rather than C Some weird C++ stuff, but not scary: explained in lectures Setup and basics covered in third coursework GPU programming uses OpenCL (C-like) Let s you use whatever graphics card you happen to have Working examples, explained in lectures Language and compiler setup covered in fourth coursework Not expected to become a guru, just make it faster HPCE / dt10/ 2015 / 0.23
24 Course admin Slides on the course homepage Zip/tarball of slides+code which updates after each lecture Blackboard site is only really for (some) coursework (Why? Because I can do make publish for a website. No clicks) Other sites we will be using (introduced in detail later) github for various forms of code distribution AWS (Amazon Web Services) for multi-core and GPUs later on Bring a device to lectures (laptop, tablet, charged phone) HPCE / dt10/ 2015 / 0.24
25 Key Focus: Engineering How does this apply to you? Examples from Elec. Eng. problems Mathematical analysis Simulation of digital circuits VLSI circuit layout Communication channel evaluation Tools and languages used in EE C / C++ MATLAB HPCE / dt10/ 2015 / 0.25
26 How do you do well in this course? HPCE / dt10/ 2015 / 0.26
27 Simple example : Totient function Eulers totient function: totient(n) Number of integers in range 1..n which are relatively prime to n Integers i and j are relatively prime if gcd(i,j)=1 Totient not included in MATLAB HPCE / dt10/ 2015 / 0.27
28 Version 0 : Simple loop Eulers totient function: totient(n) Number of integers in range 1..n which are relatively prime to n Integers i and j are relatively prime if gcd(i,j)=1 Not included in MATLAB function [res]=totient_v0(n) res=0; for i=1:n % Loop over all numbers in 1..n if gcd(i,n)==1 % Check if relatively prime res=res+1; % Count any that are end end HPCE / dt10/ 2015 / 0.28
29 Version 1 : Vectorising Convert loops into vector operations Standard MATLAB optimisation Actually a way of making parallelism explicit function [res]=totient_v1(n) numbers=1:n; % Generate all numbers in 1..n gcd_res= (gcd(numbers,n)==1); % Perform GCD on all numbers res=sum(gcd_res==1); % Count all relatively prime numbers HPCE / dt10/ 2015 / 0.29
30 Version 2 : Parallel for loop MATLAB supports a parfor command Each loop iteration is/may be executed in parallel Can operate on multiple cores, and even multiple machines HPCE / dt10/ 2015 / 0.30
31 Version 2 : Parallel for loop MATLAB supports a parfor command Each loop iteration is/may be executed in parallel Can operate on multiple cores, and even multiple machines function [res]=totient_v2(n) res=0; parfor i=1:n % Loop over all numbers in 1..n if gcd(i,n)==1 % Check if relatively prime res=res+1; % Count any that are end end HPCE / dt10/ 2015 / 0.31
32 Version 3 : Agglomeration Too much overhead with current parallel loop Each parallel iteration has a cost due to scheduling Process space in chunks, using smaller vectors function [res]=totient_v3(n, step) if nargin<2 % How large each chunk should be step=1000; end res=0; % Loop over each chunk parfor i=1:floor(n/step) % Then process each chunk as a vector numbers=(i-1)*step+1:min(i*step,n); rel_prime= (gcd(numbers,n)==1); res=res+sum(rel_prime); end HPCE / dt10/ 2015 / 0.32
33 Results from my 4-core desktop v0: For Loop v1: Vectorised v2: ParFor Loop v3: ParFor Chunked v4: Algorithm X x 10 4 HPCE / dt10/ 2015 / 0.33
34 Results from my 4-core desktop v0: For Loop v1: Vectorised v2: ParFor Loop v3: ParFor Chunked v4: Algorithm X HPCE / dt10/ 2015 / 0.34
Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs
5 th International Conference on Logic and Application LAP 2016 Dubrovnik, Croatia, September 19-23, 2016 Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs
More informationGPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links
DLR.de Chart 1 GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links Chen Tang chen.tang@dlr.de Institute of Communication and Navigation German Aerospace Center DLR.de Chart
More informationDigital Systems Design
Digital Systems Design Digital Systems Design and Test Dr. D. J. Jackson Lecture 1-1 Introduction Traditional digital design Manual process of designing and capturing circuits Schematic entry System-level
More informationChallenges in Transition
Challenges in Transition Keynote talk at International Workshop on Software Engineering Methods for Parallel and High Performance Applications (SEM4HPC 2016) 1 Kazuaki Ishizaki IBM Research Tokyo kiszk@acm.org
More informationTable of Contents HOL ADV
Table of Contents Lab Overview - - Horizon 7.1: Graphics Acceleartion for 3D Workloads and vgpu... 2 Lab Guidance... 3 Module 1-3D Options in Horizon 7 (15 minutes - Basic)... 5 Introduction... 6 3D Desktop
More informationReal-Time Software Receiver Using Massively Parallel
Real-Time Software Receiver Using Massively Parallel Processors for GPS Adaptive Antenna Array Processing Jiwon Seo, David De Lorenzo, Sherman Lo, Per Enge, Stanford University Yu-Hsuan Chen, National
More informationLecture 1: Introduction to Digital System Design & Co-Design
Design & Co-design of Embedded Systems Lecture 1: Introduction to Digital System Design & Co-Design Computer Engineering Dept. Sharif University of Technology Winter-Spring 2008 Mehdi Modarressi Topics
More informationEarly Adopter : Multiprocessor Programming in the Undergraduate Program. NSF/TCPP Curriculum: Early Adoption at the University of Central Florida
Early Adopter : Multiprocessor Programming in the Undergraduate Program NSF/TCPP Curriculum: Early Adoption at the University of Central Florida Narsingh Deo Damian Dechev Mahadevan Vasudevan Department
More informationScalable Multi-Precision Simulation of Spiking Neural Networks on GPU with OpenCL
Scalable Multi-Precision Simulation of Spiking Neural Networks on GPU with OpenCL Dmitri Yudanov (Advanced Micro Devices, USA) Leon Reznik (Rochester Institute of Technology, USA) WCCI 2012, IJCNN, June
More informationArchitecting Systems of the Future, page 1
Architecting Systems of the Future featuring Eric Werner interviewed by Suzanne Miller ---------------------------------------------------------------------------------------------Suzanne Miller: Welcome
More informationPerspective platforms for BOINC distributed computing network
Perspective platforms for BOINC distributed computing network Vitalii Koshura Lohika Odessa, Ukraine lestat.de.lionkur@gmail.com Profile page: https://www.linkedin.com/in/aenbleidd/ Abstract This paper
More informationBen Baker. Sponsored by:
Ben Baker Sponsored by: Background Agenda GPU Computing Digital Image Processing at FamilySearch Potential GPU based solutions Performance Testing Results Conclusions and Future Work 2 CPU vs. GPU Architecture
More informationCUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads
Terminology CUDA Threads Bedrich Benes, Ph.D. Purdue University Department of Computer Graphics Streaming Multiprocessor (SM) A SM processes block of threads Streaming Processors (SP) also called CUDA
More informationSynthetic Aperture Beamformation using the GPU
Paper presented at the IEEE International Ultrasonics Symposium, Orlando, Florida, 211: Synthetic Aperture Beamformation using the GPU Jens Munk Hansen, Dana Schaa and Jørgen Arendt Jensen Center for Fast
More informationHardware-Software Co-Design Cosynthesis and Partitioning
Hardware-Software Co-Design Cosynthesis and Partitioning EE8205: Embedded Computer Systems http://www.ee.ryerson.ca/~courses/ee8205/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer
More informationPerformance Metrics, Amdahl s Law
ecture 26 Computer Science 61C Spring 2017 March 20th, 2017 Performance Metrics, Amdahl s Law 1 New-School Machine Structures (It s a bit more complicated!) Software Hardware Parallel Requests Assigned
More informationCORRECTED VISION. Here be underscores THE ROLE OF CAMERA AND LENS PARAMETERS IN REAL-WORLD MEASUREMENT
Here be underscores CORRECTED VISION THE ROLE OF CAMERA AND LENS PARAMETERS IN REAL-WORLD MEASUREMENT JOSEPH HOWSE, NUMMIST MEDIA CIG-GANS WORKSHOP: 3-D COLLECTION, ANALYSIS AND VISUALIZATION LAWRENCETOWN,
More informationOverview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture
Overview 1 Trends in Microprocessor Architecture R05 Robert Mullins Computer architecture Scaling performance and CMOS Where have performance gains come from? Modern superscalar processors The limits of
More information2015 The MathWorks, Inc. 1
2015 The MathWorks, Inc. 1 What s Behind 5G Wireless Communications? 서기환과장 2015 The MathWorks, Inc. 2 Agenda 5G goals and requirements Modeling and simulating key 5G technologies Release 15: Enhanced Mobile
More informationHIGH PERFORMANCE COMPUTING USING GPGPU FOR RADAR APPLICATIONS
HIGH PERFORMANCE COMPUTING USING GPGPU FOR RADAR APPLICATIONS Viswam Gampala 1 (visgam@yahoo.co.in), Akshay BM 1, A Vengadarajan 1, PS Avadhani 2 1. Electronics & Radar Development Establishment, DRDO,
More informationLow Power Embedded Systems in Bioimplants
Low Power Embedded Systems in Bioimplants Steven Bingler Eduardo Moreno 1/32 Why is it important? Lower limbs amputation is a major impairment. Prosthetic legs are passive devices, they do not do well
More informationHardware Implementation of Automatic Control Systems using FPGAs
Hardware Implementation of Automatic Control Systems using FPGAs Lecturer PhD Eng. Ionel BOSTAN Lecturer PhD Eng. Florin-Marian BÎRLEANU Romania Disclaimer: This presentation tries to show the current
More informationDocument downloaded from:
Document downloaded from: http://hdl.handle.net/1251/64738 This paper must be cited as: Reaño González, C.; Pérez López, F.; Silla Jiménez, F. (215). On the design of a demo for exhibiting rcuda. 15th
More informationTrack and Vertex Reconstruction on GPUs for the Mu3e Experiment
Track and Vertex Reconstruction on GPUs for the Mu3e Experiment Dorothea vom Bruch for the Mu3e Collaboration GPU Computing in High Energy Physics, Pisa September 11th, 2014 Physikalisches Institut Heidelberg
More informationLecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques.
Introduction EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Techniques Cristian Grecu grecuc@ece.ubc.ca Course web site: http://courses.ece.ubc.ca/353/ What have you learned so far?
More informationCUDA-Accelerated Satellite Communication Demodulation
CUDA-Accelerated Satellite Communication Demodulation Renliang Zhao, Ying Liu, Liheng Jian, Zhongya Wang School of Computer and Control University of Chinese Academy of Sciences Outline Motivation Related
More informationREVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.
December 3-6, 2018 Santa Clara Convention Center CA, USA REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND. https://tmt.knect365.com/risc-v-summit @risc_v ACCELERATING INFERENCING ON THE EDGE WITH RISC-V
More informationAccelerated Impulse Response Calculation for Indoor Optical Communication Channels
Accelerated Impulse Response Calculation for Indoor Optical Communication Channels M. Rahaim, J. Carruthers, and T.D.C. Little Department of Electrical and Computer Engineering Boston University, Boston,
More informationReal-Time Testing Made Easy with Simulink Real-Time
Real-Time Testing Made Easy with Simulink Real-Time Andreas Uschold Application Engineer MathWorks Martin Rosser Technical Sales Engineer Speedgoat 2015 The MathWorks, Inc. 1 Model-Based Design Continuous
More informationTOOLS AND PROCESSORS FOR COMPUTER VISION. Selected Results from the Embedded Vision Alliance s Spring 2017 Computer Vision Developer Survey
TOOLS AND PROCESSORS FOR COMPUTER VISION Selected Results from the Embedded Vision Alliance s Spring 2017 Computer Vision Developer Survey 1 EXECUTIVE SUMMARY Since 2015, the Embedded Vision Alliance has
More informationMulti-core Platforms for
20 JUNE 2011 Multi-core Platforms for Immersive-Audio Applications Course: Advanced Computer Architectures Teacher: Prof. Cristina Silvano Student: Silvio La Blasca 771338 Introduction on Immersive-Audio
More information6 TH INTERNATIONAL CONFERENCE ON APPLIED INTERNET AND INFORMATION TECHNOLOGIES 3-4 JUNE 2016, BITOLA, R. MACEDONIA PROCEEDINGS
6 TH INTERNATIONAL CONFERENCE ON APPLIED INTERNET AND INFORMATION TECHNOLOGIES 3-4 JUNE 2016, BITOLA, R. MACEDONIA PROCEEDINGS Editor: Publisher: Prof. Pece Mitrevski, PhD Faculty of Information and Communication
More informationMOBILE VIRTUAL 3D MODEL OF A MEDIEVAL TOWN
International Journal on Information Technologies & Security, 4, 2013 13 MOBILE VIRTUAL 3D MODEL OF A MEDIEVAL TOWN Stanislav Dimchev Kostadinov, Tzvetomir Ivanov Vassilev Department of Informatics and
More informationHardware-Software Codesign. 0. Organization
Hardware-Software Codesign 0. Organization Lothar Thiele 0-1 Overview Introduction and motivation Course synopsis Administrativa 0-2 What is HW-SW Codesign?... integrated design of systems that consist
More informationPerformance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics
Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Performance Metrics http://www.yildiz.edu.tr/~naydin 1 2 Objectives How can we meaningfully measure and compare
More information22 Marzo 2012 IFEMA, Madrid spain.ni.com/nidays.
22 Marzo 2012 IFEMA, Madrid spain.ni.com/nidays www.infoplc.net The Art of Benchmarking Speed PXI Versus Rack-and-Stack Test Equipment Filippo Persia Systems Engineer Automated Test Mediterranean Region
More informationQUIZ. What do these bits represent?
QUIZ What do these bits represent? 1001 0110 1 QUIZ What do these bits represent? Unsigned integer: 1101 1110 Signed integer (2 s complement): Fraction: IBM 437 character: Latin-1 character: Huffman-compressed
More informationEECS 427 Lecture 21: Design for Test (DFT) Reminders
EECS 427 Lecture 21: Design for Test (DFT) Readings: Insert H.3, CBF Ch 25 EECS 427 F09 Lecture 21 1 Reminders One more deadline Finish your project by Dec. 14 Schematic, layout, simulations, and final
More informationBest Instruction Per Cycle Formula >>>CLICK HERE<<<
Best Instruction Per Cycle Formula 6 Performance tuning, 7 Perceived performance, 8 Performance Equation, 9 See also is the average instructions per cycle (IPC) for this benchmark. Even. Click Card to
More informationInstructional Demos, In-Class Projects, & Hands-On Homework: Active Learning for Electrical Engineering using the Analog Discovery
Instructional Demos, In-Class Projects, & Hands-On Homework: Active Learning for Electrical Engineering using the Analog Discovery by Dr. Gregory J. Mazzaro Dr. Ronald J. Hayne THE CITADEL, THE MILITARY
More informationImage Processing Architectures (and their future requirements)
Lecture 17: Image Processing Architectures (and their future requirements) Visual Computing Systems Smart phone processing resources Qualcomm snapdragon Image credit: Qualcomm Apple A7 (iphone 5s) Chipworks
More informationThreading libraries performance when applied to image acquisition and processing in a forensic application
Threading libraries performance when applied to image acquisition and processing in a forensic application Carlos Bermúdez MSc. in Photonics, Universitat Politècnica de Catalunya, Barcelona, Spain Student
More informationModel-Based Design for Sensor Systems
2009 The MathWorks, Inc. Model-Based Design for Sensor Systems Stephanie Kwan Applications Engineer Agenda Sensor Systems Overview System Level Design Challenges Components of Sensor Systems Sensor Characterization
More informationProgramming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp
Programming and Optimization with Intel Xeon Phi Coprocessors Colfax Developer Training One-day Boot Camp Abstract: Colfax Developer Training (CDT) is an in-depth intensive course on efficient parallel
More informationCreating Intelligence at the Edge
Creating Intelligence at the Edge Vladimir Stojanović E3S Retreat September 8, 2017 The growing importance of machine learning Page 2 Applications exploding in the cloud Huge interest to move to the edge
More informationComputer & Information Science & Engineering What s All This?
Computer & Information Science & Engineering What s All This? Marc Snir Department of Computer Science Time s man of the year, 1982 A New World Dawns Steven Jobs was 27 The IBM PC was a few months away
More informationWhat s Behind 5G Wireless Communications?
What s Behind 5G Wireless Communications? Marc Barberis 2015 The MathWorks, Inc. 1 Agenda 5G goals and requirements Modeling and simulating key 5G technologies Release 15: Enhanced Mobile Broadband IoT
More informationTOOLS & PROCESSORS FOR COMPUTER VISION. Selected Results from the Embedded Vision Alliance s Fall 2017 Computer Vision Developer Survey
TOOLS & PROCESSORS FOR COMPUTER VISION Selected Results from the Embedded Vision Alliance s Fall 2017 Computer Vision Developer Survey ABOUT THE EMBEDDED VISION ALLIANCE EXECUTIVE SUMMA Y Since 2015, the
More informationSupporting x86-64 Address Translation for 100s of GPU Lanes. Jason Power, Mark D. Hill, David A. Wood
Supporting x86-64 Address Translation for 100s of GPU s Jason Power, Mark D. Hill, David A. Wood Summary Challenges: CPU&GPUs physically integrated, but logically separate; This reduces theoretical bandwidth,
More informationLow Power VLSI Circuit Synthesis: Introduction and Course Outline
Low Power VLSI Circuit Synthesis: Introduction and Course Outline Ajit Pal Professor Department of Computer Science and Engineering Indian Institute of Technology Kharagpur INDIA -721302 Agenda Why Low
More informationSno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations
Sno Projects List IEEE 1 High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations 2 A Generalized Algorithm And Reconfigurable Architecture For Efficient And Scalable
More informationCS 6135 VLSI Physical Design Automation Fall 2003
CS 6135 VLSI Physical Design Automation Fall 2003 1 Course Information Class time: R789 Location: EECS 224 Instructor: Ting-Chi Wang ( ) EECS 643, (03) 5742963 tcwang@cs.nthu.edu.tw Office hours: M56R5
More informationTechnology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.
FPGAs 1 CMPE 415 Technology Timeline 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs FPGAs The Design Warrior s Guide
More informationParallelism Across the Curriculum
Parallelism Across the Curriculum John E. Howland Department of Computer Science Trinity University One Trinity Place San Antonio, Texas 78212-7200 Voice: (210) 999-7364 Fax: (210) 999-7477 E-mail: jhowland@trinity.edu
More informationGPU ACCELERATED DEEP LEARNING WITH CUDNN
GPU ACCELERATED DEEP LEARNING WITH CUDNN Larry Brown Ph.D. March 2015 AGENDA 1 Introducing cudnn and GPUs 2 Deep Learning Context 3 cudnn V2 4 Using cudnn 2 Introducing cudnn and GPUs 3 HOW GPU ACCELERATION
More information- Software Engineer con Laurea Magistrale in Informatica, Telecomunicazioni o Elettronica
Elettronica spa cerca: - Software Engineer con Laurea Magistrale in Informatica, Telecomunicazioni o Elettronica - Machine Learning Engineer con Laurea Magistrale in Informatica, Elettronica o Telecomunicazioni
More information(Theory-Practice-Lab) Credit BBM 1511 Introduction to Computer Engineering - 1 (2-0-0) 2
ARAS Brief Course Descriptions (Theory-Practice-Lab) Credit BBM 1511 Introduction to Computer Engineering - 1 (2-0-0) 2 Basic Concepts in Computer Science / Computer Systems and Peripherals / Introduction
More informationGPU-based data analysis for Synthetic Aperture Microwave Imaging
GPU-based data analysis for Synthetic Aperture Microwave Imaging 1 st IAEA Technical Meeting on Fusion Data Processing, Validation and Analysis 1 st -3 rd June 2015 J.C. Chorley 1, K.J. Brunner 1, N.A.
More informationAnalog Custom Layout Engineer
Analog Custom Layout Engineer Huawei Canada s rapid growth has created an excellent opportunity to build and grow your career and make a big impact to everyone s life. The IC Lab is currently looking to
More informationNew Paradigm in Testing Heads & Media for HDD. Dr. Lutz Henckels September 2010
New Paradigm in Testing Heads & Media for HDD Dr. Lutz Henckels September 2010 1 WOW an amazing industry 40%+ per year aerial density growth Source: Coughlin Associates 2010 2 WOW an amazing industry Aerial
More informationModeling and Simulating Large Phased Array Systems
Modeling and Simulating Large Phased Array Systems Tabrez Khan Senior Application Engineer Application Engineering Group 2015 The MathWorks, Inc. 1 Challenges with Large Array Systems Design & simulation
More informationOverview of current developments in haptic APIs
Central European Seminar on Computer Graphics for students, 2011 AUTHOR: Petr Kadleček SUPERVISOR: Petr Kmoch Overview of current developments in haptic APIs Presentation Haptics Haptic programming Haptic
More informationEE 434 ASIC & Digital Systems
EE 434 ASIC & Digital Systems Dae Hyun Kim EECS Washington State University Spring 2017 Course Website http://eecs.wsu.edu/~ee434 Themes Study how to design, analyze, and test a complex applicationspecific
More informationAdministrative Issues
dministrative Issues Text book ($56.69 in mazon.com) Scanned problem set Email list Homework 1 announced, due 01/13/10 Quiz, 01/15/10 Graduate students meeting Relevant chapters in textbook? Technology
More informationEE25266 ASIC/FPGA Chip Design. Designing a FIR Filter, FPGA in the Loop, Ethernet
EE25266 ASIC/FPGA Chip Design Mahdi Shabany Electrical Engineering Department Sharif University of Technology Assignment #8 Designing a FIR Filter, FPGA in the Loop, Ethernet Introduction In this lab,
More informationDr. D. M. Akbar Hussain
Course Objectives: To enable the students to learn some more practical facts about DSP architectures. Objective is that they can apply this knowledge to map any digital filtering algorithm and related
More informationDr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar. Data programming model for an operation based parallel image processing system
Name: Affiliation: Field of research: Specific Field of Study: Proposed Research Topic: Dr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar Information Science and Technology Computer Science
More informationEM Simulation of Automotive Radar Mounted in Vehicle Bumper
EM Simulation of Automotive Radar Mounted in Vehicle Bumper Abstract Trends in automotive safety are pushing radar systems to higher levels of accuracy and reliable target identification for blind spot
More informationni.com The NI PXIe-5644R Vector Signal Transceiver World s First Software-Designed Instrument
The NI PXIe-5644R Vector Signal Transceiver World s First Software-Designed Instrument Agenda Hardware Overview Tenets of a Software-Designed Instrument NI PXIe-5644R Software Example Modifications Available
More informationDESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS
DESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS P. Th. Savvopoulos. PhD., A. Apostolopoulos 2, L. Dimitrov 3 Department of Electrical and Computer Engineering, University of Patras, 265 Patras,
More informationTowards Real-Time Volunteer Distributed Computing
Towards Real-Time Volunteer Distributed Computing Sangho Yi 1, Emmanuel Jeannot 2, Derrick Kondo 1, David P. Anderson 3 1 INRIA MESCAL, 2 RUNTIME, France 3 UC Berkeley, USA Motivation Push towards large-scale,
More informationDeveloping and Prototyping Next-Generation Communications Systems
Developing and Prototyping Next-Generation Communications Systems Dr. Amod Anandkumar Team Lead Signal Processing and Communications Application Engineering Group 2015 The MathWorks, Inc. 1 Proliferation
More informationReal-time Volt/Var Optimization Scheme for Distribution Systems with PV Integration
Grid-connected Advanced Power Electronic Systems Real-time Volt/Var Optimization Scheme for Distribution Systems with PV Integration 02-15-2017 Presenter Name: Yan Chen (On behalf of Dr. Benigni) Outline
More informationTOOLS & PROCESSORS FOR COMPUTER VISION. Selected Results from the Embedded Vision Alliance s Computer Vision Developer Survey
TOOLS & PROCESSORS FOR COMPUTER VISION Selected Results from the Embedded Vision Alliance s Computer Vision Developer Survey JANUARY 2019 EXECUTIVE SUMMA Y Since 2015, the Embedded Vision Alliance has
More informationIntroduction to co-simulation. What is HW-SW co-simulation?
Introduction to co-simulation CPSC489-501 Hardware-Software Codesign of Embedded Systems Mahapatra-TexasA&M-Fall 00 1 What is HW-SW co-simulation? A basic definition: Manipulating simulated hardware with
More informationUNLV ME 425/625 Robotics. Introduction and Course Philosophy
UNLV ME 425/625 Robotics Introduction and Course Philosophy Paul Oh: Background 5+ years industry before Drexel ME Professor since 2000 Advisor: ASME 10-years Advisor: 8+ SD teams 3 SD awards (COE, ASME,
More informationAutomatic Kernel Code Generation for Focal-plane Sensor-Processor Devices
Automatic Kernel Code Generation for Focal-plane Sensor-Processor Devices Thomas Debrunner - MSc Student Imperial College London Paul Kelly - Software Performance Optimisation Group Lead, Imperial College
More informationBricken Technologies Corporation Presentations: Bricken Technologies Corporation Corporate: Bricken Technologies Corporation Marketing:
TECHNICAL REPORTS William Bricken compiled 2004 Bricken Technologies Corporation Presentations: 2004: Synthesis Applications of Boundary Logic 2004: BTC Board of Directors Technical Review (quarterly)
More informationOculus Rift Getting Started Guide
Oculus Rift Getting Started Guide Version 1.23 2 Introduction Oculus Rift Copyrights and Trademarks 2017 Oculus VR, LLC. All Rights Reserved. OCULUS VR, OCULUS, and RIFT are trademarks of Oculus VR, LLC.
More informationCSE502: Computer Architecture Welcome to CSE 502
Welcome to CSE 502 Introduction & Review Today s Lecture Course Overview Course Topics Grading Logistics Academic Integrity Policy Homework Quiz Key basic concepts for Computer Architecture Course Overview
More informationSoftware Computer Vision - Driver Assistance
Software Computer Vision - Driver Assistance Work @Bosch for developing desktop, web or embedded software and algorithms / computer vision / artificial intelligence for Driver Assistance Systems and Automated
More informationAudio Sample Rate Conversion in FPGAs
Audio Sample Rate Conversion in FPGAs An efficient implementation of audio algorithms in programmable logic. by Philipp Jacobsohn Field Applications Engineer Synplicity eutschland GmbH philipp@synplicity.com
More informationCHAPTER 4 GALS ARCHITECTURE
64 CHAPTER 4 GALS ARCHITECTURE The aim of this chapter is to implement an application on GALS architecture. The synchronous and asynchronous implementations are compared in FFT design. The power consumption
More informationCS Computer Architecture Spring Lecture 04: Understanding Performance
CS 35101 Computer Architecture Spring 2008 Lecture 04: Understanding Performance Taken from Mary Jane Irwin (www.cse.psu.edu/~mji) and Kevin Schaffer [Adapted from Computer Organization and Design, Patterson
More informationFast and Accurate RF component characterization enabled by FPGA technology
Fast and Accurate RF component characterization enabled by FPGA technology Guillaume Pailloncy Senior Systems Engineer Agenda RF Application Challenges What are FPGAs and why are they useful? FPGA-based
More informationDesign of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm
Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm Vijay Kumar Ch 1, Leelakrishna Muthyala 1, Chitra E 2 1 Research Scholar, VLSI, SRM University, Tamilnadu, India 2 Assistant Professor,
More informationParallel Simulation of Social Agents using Cilk and OpenCL
D. Moser, A. Riener, K. Zia, A. Ferscha Department for Pervasive Computing, JKU Linz/Austria Parallel Simulation of Social Agents using Cilk and OpenCL DS-RT 2011 15th International Symposium on Distributed
More informationProposers Day Workshop
Proposers Day Workshop Monday, January 23, 2017 @srcjump, #JUMPpdw Cognitive Computing Vertical Research Center Mandy Pant Academic Research Director Intel Corporation Center Motivation Today s deep learning
More informationPramod Kumar Naik Senior Application Engineer MathWorks Products
MATLAB & SIMULINK Pramod Kumar Naik Senior Application Engineer MathWorks Products 2 Enabling Excellence Through Innovation System Engineering Intellectual Property (IP) EDA & Semiconductor University
More informationLeading by design: Q&A with Dr. Raghuram Tupuri, AMD Chris Hall, DigiTimes.com, Taipei [Monday 12 December 2005]
Leading by design: Q&A with Dr. Raghuram Tupuri, AMD Chris Hall, DigiTimes.com, Taipei [Monday 12 December 2005] AMD s drive to 64-bit processors surprised everyone with its speed, even as detractors commented
More informationAI Application Processing Requirements
AI Application Processing Requirements 1 Low Medium High Sensor analysis Activity Recognition (motion sensors) Stress Analysis or Attention Analysis Audio & sound Speech Recognition Object detection Computer
More informationMonte Carlo integration and event generation on GPU and their application to particle physics
Monte Carlo integration and event generation on GPU and their application to particle physics Junichi Kanzaki (KEK) GPU2016 @ Rome, Italy Sep. 26, 2016 Motivation Increase of amount of LHC data (raw &
More informationSignal Processing on GPUs for Radio Telescopes
Signal Processing on GPUs for Radio Telescopes John W. Romein Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands 1 Overview radio telescopes motivation processing pipelines signal-processing
More informationImage Processing Architectures (and their future requirements)
Lecture 16: Image Processing Architectures (and their future requirements) Visual Computing Systems Smart phone processing resources Example SoC: Qualcomm Snapdragon Image credit: Qualcomm Apple A7 (iphone
More informationDetector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen
GIGA seminar 11.1.2010 Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen janne.janhunen@ee.oulu.fi 2 Outline Introduction Benefits and Challenges
More informationA Study of Optimal Spatial Partition Size and Field of View in Massively Multiplayer Online Game Server
A Study of Optimal Spatial Partition Size and Field of View in Massively Multiplayer Online Game Server Youngsik Kim * * Department of Game and Multimedia Engineering, Korea Polytechnic University, Republic
More informationImproving Loop-Gain Performance In Digital Power Supplies With Latest- Generation DSCs
ISSUE: March 2016 Improving Loop-Gain Performance In Digital Power Supplies With Latest- Generation DSCs by Alex Dumais, Microchip Technology, Chandler, Ariz. With the consistent push for higher-performance
More informationCMOS Technology for Computer Architects
CMOS Technology for Computer Architects Lecture 1: Introduction Iakovos Mavroidis Giorgos Passas Manolis Katevenis FORTH-ICS (University of Crete) Course Contents Implementation of high-performance digital
More informationVLSI System Testing. Outline
ECE 538 VLSI System Testing Krish Chakrabarty System-on-Chip (SOC) Testing ECE 538 Krish Chakrabarty 1 Outline Motivation for modular testing of SOCs Wrapper design IEEE 1500 Standard Optimization Test
More informationAirborne radar clutter simulation using GPU (CUDA)
Airborne radar clutter simulation using GPU (CUDA) 1 Priyanka A P, 2 Mr.Channabasappa Baligar 1 Department of VLSI and Embedded Systems, UTL technologies Ltd, Bangalore, India 2 Department of VLSI and
More information