High Performance Computing for Engineers

Size: px
Start display at page:

Download "High Performance Computing for Engineers"

Transcription

1 High Performance Computing for Engineers David Thomas / Room HPCE / dt10/ 2015 / 0.1

2 High Performance Computing for Engineers Research Testing communication protocols Evaluating signal-processing filters Simulating analogue and digital designs Tools CAD tools: synthesis, place-and-route, verification Libraries/toolboxes: filter design, compressive sensing Products Oil exploration and discovery Mobile-phone apps Financial computing HPCE / dt10/ 2015 / 0.2

3 High Performance Computing for Engineers Types of performance metrics Throughput Latency Power Design-time Capital and running costs Required versus desired performance Subject to a throughput of X, minimise average power Subject to a budget of Y, maximise energy efficiency Subject to Z development days, maximise throughput HPCE / dt10/ 2015 / 0.3

4 What is available to you Types of compute device Multi-core CPUs GPUs (Graphics Processing Units) MPPAs (Massively Parallel Processor Arrays) FPGAs (Field Programmable Gate Arrays) Types of compute system Embedded Systems Mobile Phones Tablets Laptops Grid computing Cloud computing HPCE / dt10/ 2015 / 0.4

5 HTC Droid DNA Snapdragon S4 Pro - CPU : Quad-core Krait (ARM derivative) - GPU : Adreno 320 GPU (OpenCL compatible) Images Copyright HTC and Qaulcomm HPCE / dt10/ 2015 / 0.5

6 Lenovo Thinkpad Edge E525 AMD Fusion A8-3500M - CPU : Quad-Core 2.4GHz Phenom-II - GPU : HD 6620G 400MHz (320 cores) Img: HPCE / dt10/ 2015 / 0.6

7 Imperial HPC Cluster cx2 - SGI Altix ICE 8200 EX Racks and racks of high-performance PCs x64 cores running at 3GHz Available to researchers and undergrads (if they ask nicely) Grid-management system Run program on 1000 PCs with one command HPCE / dt10/ 2015 / 0.7

8 Performance and Efficiency Relative to CPU Uniform Gaussian Exponential Mean (Geo) MPPA FPGA GPU Uniform Gaussian 345 Exponential Mean (Geo) FPGA GPU MPPA Performance Power Efficiency HPCE / dt10/ 2015 / 0.8

9 Design tradeoffs 1 Sequential SW 10 Performance hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2015 / 0.9

10 Design tradeoffs 1 10 Performance 100 Sequential SW Thread-based SW hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2015 / 0.10

11 Design tradeoffs 1 10 Performance 100 Sequential SW Thread-based SW hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2015 / 0.11

12 Design tradeoffs Task-based parallelism vs threads Easy to program (less time coding) 1 Easy to get right (less time testing) 10Many implementations and APIs Performance 100 Intel Threaded Building Blocks (TBB) Microsoft.NET Task Parallel Library 1000 OpenCL 1 hour 1 day 1 week 1 month Sequential SW Task-based SW Thread-based SW Design-time HPCE / dt10/ 2015 / 0.12

13 Design tradeoffs 1 10 Performance 100 Sequential SW Task-based SW Thread-based SW hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2015 / 0.13

14 Design tradeoffs 1 10 Performance 100 Sequential SW Task-based SW Thread-based SW GPU hour 1 day 1 week 1 month Design-time Src: NVIDIA CUDA Compute Unified Device Architecture, Programmers Guide HPCE / dt10/ 2015 / 0.14

15 Design tradeoffs 1 10 Performance 100 Sequential SW Task-based SW Thread-based SW GPU hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2015 / 0.15

16 Design tradeoffs 1 10 Performance Sequential SW Task-based SW Thread-based SW GPU FPGA 1 hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2015 / 0.16

17 Design tradeoffs 1 10 Performance Sequential SW Task-based SW Thread-based SW GPU FPGA 1 hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2015 / 0.17

18 What you will learn Systems: what high-performance systems are available Methods: how these systems can be programmed Practise: concrete experience with multi-core and GPUs Analysis: knowing what to use and when Tools: making better use of your time HPCE / dt10/ 2015 / 0.18

19 Developer productivity is also part of performance HPCE / dt10/ 2015 / 0.19

20 Re: XKCD - My Professional Context Undergraduate degree and PhD from Computing If pushed, I self-identify as a programmer Research focuses on hardware acceleration Both academic and industrial applications My motivation for this course Supervising final year project students Working with PhD students Talking to industry people HPCE / dt10/ 2015 / 0.20

21 Why are you here? HPCE / dt10/ 2015 / 0.21

22 Course Assessment 40% : Four short course-works to build skills Get familiar with environments and how to do common tasks Structured and quite linear should not be taxing Force people to do work earlier in term 40% : Two larger tasks to apply skills to real problems Allow demonstration of knowledge and skills Unstructured; open-ended; competitive; hard 20% : Oral assessment; individual 30 minutes each, will happen at the start of summer term Test ability to communicate about your code and solutions (Check that you did the work) HPCE / dt10/ 2015 / 0.22

23 Skills needed Basic programming If you can t program in _any_ language then worry Intel TBB uses C++ rather than C Some weird C++ stuff, but not scary: explained in lectures Setup and basics covered in third coursework GPU programming uses OpenCL (C-like) Let s you use whatever graphics card you happen to have Working examples, explained in lectures Language and compiler setup covered in fourth coursework Not expected to become a guru, just make it faster HPCE / dt10/ 2015 / 0.23

24 Course admin Slides on the course homepage Zip/tarball of slides+code which updates after each lecture Blackboard site is only really for (some) coursework (Why? Because I can do make publish for a website. No clicks) Other sites we will be using (introduced in detail later) github for various forms of code distribution AWS (Amazon Web Services) for multi-core and GPUs later on Bring a device to lectures (laptop, tablet, charged phone) HPCE / dt10/ 2015 / 0.24

25 Key Focus: Engineering How does this apply to you? Examples from Elec. Eng. problems Mathematical analysis Simulation of digital circuits VLSI circuit layout Communication channel evaluation Tools and languages used in EE C / C++ MATLAB HPCE / dt10/ 2015 / 0.25

26 How do you do well in this course? HPCE / dt10/ 2015 / 0.26

27 Simple example : Totient function Eulers totient function: totient(n) Number of integers in range 1..n which are relatively prime to n Integers i and j are relatively prime if gcd(i,j)=1 Totient not included in MATLAB HPCE / dt10/ 2015 / 0.27

28 Version 0 : Simple loop Eulers totient function: totient(n) Number of integers in range 1..n which are relatively prime to n Integers i and j are relatively prime if gcd(i,j)=1 Not included in MATLAB function [res]=totient_v0(n) res=0; for i=1:n % Loop over all numbers in 1..n if gcd(i,n)==1 % Check if relatively prime res=res+1; % Count any that are end end HPCE / dt10/ 2015 / 0.28

29 Version 1 : Vectorising Convert loops into vector operations Standard MATLAB optimisation Actually a way of making parallelism explicit function [res]=totient_v1(n) numbers=1:n; % Generate all numbers in 1..n gcd_res= (gcd(numbers,n)==1); % Perform GCD on all numbers res=sum(gcd_res==1); % Count all relatively prime numbers HPCE / dt10/ 2015 / 0.29

30 Version 2 : Parallel for loop MATLAB supports a parfor command Each loop iteration is/may be executed in parallel Can operate on multiple cores, and even multiple machines HPCE / dt10/ 2015 / 0.30

31 Version 2 : Parallel for loop MATLAB supports a parfor command Each loop iteration is/may be executed in parallel Can operate on multiple cores, and even multiple machines function [res]=totient_v2(n) res=0; parfor i=1:n % Loop over all numbers in 1..n if gcd(i,n)==1 % Check if relatively prime res=res+1; % Count any that are end end HPCE / dt10/ 2015 / 0.31

32 Version 3 : Agglomeration Too much overhead with current parallel loop Each parallel iteration has a cost due to scheduling Process space in chunks, using smaller vectors function [res]=totient_v3(n, step) if nargin<2 % How large each chunk should be step=1000; end res=0; % Loop over each chunk parfor i=1:floor(n/step) % Then process each chunk as a vector numbers=(i-1)*step+1:min(i*step,n); rel_prime= (gcd(numbers,n)==1); res=res+sum(rel_prime); end HPCE / dt10/ 2015 / 0.32

33 Results from my 4-core desktop v0: For Loop v1: Vectorised v2: ParFor Loop v3: ParFor Chunked v4: Algorithm X x 10 4 HPCE / dt10/ 2015 / 0.33

34 Results from my 4-core desktop v0: For Loop v1: Vectorised v2: ParFor Loop v3: ParFor Chunked v4: Algorithm X HPCE / dt10/ 2015 / 0.34

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs 5 th International Conference on Logic and Application LAP 2016 Dubrovnik, Croatia, September 19-23, 2016 Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs

More information

GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links

GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links DLR.de Chart 1 GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links Chen Tang chen.tang@dlr.de Institute of Communication and Navigation German Aerospace Center DLR.de Chart

More information

Digital Systems Design

Digital Systems Design Digital Systems Design Digital Systems Design and Test Dr. D. J. Jackson Lecture 1-1 Introduction Traditional digital design Manual process of designing and capturing circuits Schematic entry System-level

More information

Challenges in Transition

Challenges in Transition Challenges in Transition Keynote talk at International Workshop on Software Engineering Methods for Parallel and High Performance Applications (SEM4HPC 2016) 1 Kazuaki Ishizaki IBM Research Tokyo kiszk@acm.org

More information

Table of Contents HOL ADV

Table of Contents HOL ADV Table of Contents Lab Overview - - Horizon 7.1: Graphics Acceleartion for 3D Workloads and vgpu... 2 Lab Guidance... 3 Module 1-3D Options in Horizon 7 (15 minutes - Basic)... 5 Introduction... 6 3D Desktop

More information

Real-Time Software Receiver Using Massively Parallel

Real-Time Software Receiver Using Massively Parallel Real-Time Software Receiver Using Massively Parallel Processors for GPS Adaptive Antenna Array Processing Jiwon Seo, David De Lorenzo, Sherman Lo, Per Enge, Stanford University Yu-Hsuan Chen, National

More information

Lecture 1: Introduction to Digital System Design & Co-Design

Lecture 1: Introduction to Digital System Design & Co-Design Design & Co-design of Embedded Systems Lecture 1: Introduction to Digital System Design & Co-Design Computer Engineering Dept. Sharif University of Technology Winter-Spring 2008 Mehdi Modarressi Topics

More information

Early Adopter : Multiprocessor Programming in the Undergraduate Program. NSF/TCPP Curriculum: Early Adoption at the University of Central Florida

Early Adopter : Multiprocessor Programming in the Undergraduate Program. NSF/TCPP Curriculum: Early Adoption at the University of Central Florida Early Adopter : Multiprocessor Programming in the Undergraduate Program NSF/TCPP Curriculum: Early Adoption at the University of Central Florida Narsingh Deo Damian Dechev Mahadevan Vasudevan Department

More information

Scalable Multi-Precision Simulation of Spiking Neural Networks on GPU with OpenCL

Scalable Multi-Precision Simulation of Spiking Neural Networks on GPU with OpenCL Scalable Multi-Precision Simulation of Spiking Neural Networks on GPU with OpenCL Dmitri Yudanov (Advanced Micro Devices, USA) Leon Reznik (Rochester Institute of Technology, USA) WCCI 2012, IJCNN, June

More information

Architecting Systems of the Future, page 1

Architecting Systems of the Future, page 1 Architecting Systems of the Future featuring Eric Werner interviewed by Suzanne Miller ---------------------------------------------------------------------------------------------Suzanne Miller: Welcome

More information

Perspective platforms for BOINC distributed computing network

Perspective platforms for BOINC distributed computing network Perspective platforms for BOINC distributed computing network Vitalii Koshura Lohika Odessa, Ukraine lestat.de.lionkur@gmail.com Profile page: https://www.linkedin.com/in/aenbleidd/ Abstract This paper

More information

Ben Baker. Sponsored by:

Ben Baker. Sponsored by: Ben Baker Sponsored by: Background Agenda GPU Computing Digital Image Processing at FamilySearch Potential GPU based solutions Performance Testing Results Conclusions and Future Work 2 CPU vs. GPU Architecture

More information

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads Terminology CUDA Threads Bedrich Benes, Ph.D. Purdue University Department of Computer Graphics Streaming Multiprocessor (SM) A SM processes block of threads Streaming Processors (SP) also called CUDA

More information

Synthetic Aperture Beamformation using the GPU

Synthetic Aperture Beamformation using the GPU Paper presented at the IEEE International Ultrasonics Symposium, Orlando, Florida, 211: Synthetic Aperture Beamformation using the GPU Jens Munk Hansen, Dana Schaa and Jørgen Arendt Jensen Center for Fast

More information

Hardware-Software Co-Design Cosynthesis and Partitioning

Hardware-Software Co-Design Cosynthesis and Partitioning Hardware-Software Co-Design Cosynthesis and Partitioning EE8205: Embedded Computer Systems http://www.ee.ryerson.ca/~courses/ee8205/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer

More information

Performance Metrics, Amdahl s Law

Performance Metrics, Amdahl s Law ecture 26 Computer Science 61C Spring 2017 March 20th, 2017 Performance Metrics, Amdahl s Law 1 New-School Machine Structures (It s a bit more complicated!) Software Hardware Parallel Requests Assigned

More information

CORRECTED VISION. Here be underscores THE ROLE OF CAMERA AND LENS PARAMETERS IN REAL-WORLD MEASUREMENT

CORRECTED VISION. Here be underscores THE ROLE OF CAMERA AND LENS PARAMETERS IN REAL-WORLD MEASUREMENT Here be underscores CORRECTED VISION THE ROLE OF CAMERA AND LENS PARAMETERS IN REAL-WORLD MEASUREMENT JOSEPH HOWSE, NUMMIST MEDIA CIG-GANS WORKSHOP: 3-D COLLECTION, ANALYSIS AND VISUALIZATION LAWRENCETOWN,

More information

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture Overview 1 Trends in Microprocessor Architecture R05 Robert Mullins Computer architecture Scaling performance and CMOS Where have performance gains come from? Modern superscalar processors The limits of

More information

2015 The MathWorks, Inc. 1

2015 The MathWorks, Inc. 1 2015 The MathWorks, Inc. 1 What s Behind 5G Wireless Communications? 서기환과장 2015 The MathWorks, Inc. 2 Agenda 5G goals and requirements Modeling and simulating key 5G technologies Release 15: Enhanced Mobile

More information

HIGH PERFORMANCE COMPUTING USING GPGPU FOR RADAR APPLICATIONS

HIGH PERFORMANCE COMPUTING USING GPGPU FOR RADAR APPLICATIONS HIGH PERFORMANCE COMPUTING USING GPGPU FOR RADAR APPLICATIONS Viswam Gampala 1 (visgam@yahoo.co.in), Akshay BM 1, A Vengadarajan 1, PS Avadhani 2 1. Electronics & Radar Development Establishment, DRDO,

More information

Low Power Embedded Systems in Bioimplants

Low Power Embedded Systems in Bioimplants Low Power Embedded Systems in Bioimplants Steven Bingler Eduardo Moreno 1/32 Why is it important? Lower limbs amputation is a major impairment. Prosthetic legs are passive devices, they do not do well

More information

Hardware Implementation of Automatic Control Systems using FPGAs

Hardware Implementation of Automatic Control Systems using FPGAs Hardware Implementation of Automatic Control Systems using FPGAs Lecturer PhD Eng. Ionel BOSTAN Lecturer PhD Eng. Florin-Marian BÎRLEANU Romania Disclaimer: This presentation tries to show the current

More information

Document downloaded from:

Document downloaded from: Document downloaded from: http://hdl.handle.net/1251/64738 This paper must be cited as: Reaño González, C.; Pérez López, F.; Silla Jiménez, F. (215). On the design of a demo for exhibiting rcuda. 15th

More information

Track and Vertex Reconstruction on GPUs for the Mu3e Experiment

Track and Vertex Reconstruction on GPUs for the Mu3e Experiment Track and Vertex Reconstruction on GPUs for the Mu3e Experiment Dorothea vom Bruch for the Mu3e Collaboration GPU Computing in High Energy Physics, Pisa September 11th, 2014 Physikalisches Institut Heidelberg

More information

Lecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques.

Lecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques. Introduction EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Techniques Cristian Grecu grecuc@ece.ubc.ca Course web site: http://courses.ece.ubc.ca/353/ What have you learned so far?

More information

CUDA-Accelerated Satellite Communication Demodulation

CUDA-Accelerated Satellite Communication Demodulation CUDA-Accelerated Satellite Communication Demodulation Renliang Zhao, Ying Liu, Liheng Jian, Zhongya Wang School of Computer and Control University of Chinese Academy of Sciences Outline Motivation Related

More information

REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.

REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND. December 3-6, 2018 Santa Clara Convention Center CA, USA REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND. https://tmt.knect365.com/risc-v-summit @risc_v ACCELERATING INFERENCING ON THE EDGE WITH RISC-V

More information

Accelerated Impulse Response Calculation for Indoor Optical Communication Channels

Accelerated Impulse Response Calculation for Indoor Optical Communication Channels Accelerated Impulse Response Calculation for Indoor Optical Communication Channels M. Rahaim, J. Carruthers, and T.D.C. Little Department of Electrical and Computer Engineering Boston University, Boston,

More information

Real-Time Testing Made Easy with Simulink Real-Time

Real-Time Testing Made Easy with Simulink Real-Time Real-Time Testing Made Easy with Simulink Real-Time Andreas Uschold Application Engineer MathWorks Martin Rosser Technical Sales Engineer Speedgoat 2015 The MathWorks, Inc. 1 Model-Based Design Continuous

More information

TOOLS AND PROCESSORS FOR COMPUTER VISION. Selected Results from the Embedded Vision Alliance s Spring 2017 Computer Vision Developer Survey

TOOLS AND PROCESSORS FOR COMPUTER VISION. Selected Results from the Embedded Vision Alliance s Spring 2017 Computer Vision Developer Survey TOOLS AND PROCESSORS FOR COMPUTER VISION Selected Results from the Embedded Vision Alliance s Spring 2017 Computer Vision Developer Survey 1 EXECUTIVE SUMMARY Since 2015, the Embedded Vision Alliance has

More information

Multi-core Platforms for

Multi-core Platforms for 20 JUNE 2011 Multi-core Platforms for Immersive-Audio Applications Course: Advanced Computer Architectures Teacher: Prof. Cristina Silvano Student: Silvio La Blasca 771338 Introduction on Immersive-Audio

More information

6 TH INTERNATIONAL CONFERENCE ON APPLIED INTERNET AND INFORMATION TECHNOLOGIES 3-4 JUNE 2016, BITOLA, R. MACEDONIA PROCEEDINGS

6 TH INTERNATIONAL CONFERENCE ON APPLIED INTERNET AND INFORMATION TECHNOLOGIES 3-4 JUNE 2016, BITOLA, R. MACEDONIA PROCEEDINGS 6 TH INTERNATIONAL CONFERENCE ON APPLIED INTERNET AND INFORMATION TECHNOLOGIES 3-4 JUNE 2016, BITOLA, R. MACEDONIA PROCEEDINGS Editor: Publisher: Prof. Pece Mitrevski, PhD Faculty of Information and Communication

More information

MOBILE VIRTUAL 3D MODEL OF A MEDIEVAL TOWN

MOBILE VIRTUAL 3D MODEL OF A MEDIEVAL TOWN International Journal on Information Technologies & Security, 4, 2013 13 MOBILE VIRTUAL 3D MODEL OF A MEDIEVAL TOWN Stanislav Dimchev Kostadinov, Tzvetomir Ivanov Vassilev Department of Informatics and

More information

Hardware-Software Codesign. 0. Organization

Hardware-Software Codesign. 0. Organization Hardware-Software Codesign 0. Organization Lothar Thiele 0-1 Overview Introduction and motivation Course synopsis Administrativa 0-2 What is HW-SW Codesign?... integrated design of systems that consist

More information

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Performance Metrics http://www.yildiz.edu.tr/~naydin 1 2 Objectives How can we meaningfully measure and compare

More information

22 Marzo 2012 IFEMA, Madrid spain.ni.com/nidays.

22 Marzo 2012 IFEMA, Madrid spain.ni.com/nidays. 22 Marzo 2012 IFEMA, Madrid spain.ni.com/nidays www.infoplc.net The Art of Benchmarking Speed PXI Versus Rack-and-Stack Test Equipment Filippo Persia Systems Engineer Automated Test Mediterranean Region

More information

QUIZ. What do these bits represent?

QUIZ. What do these bits represent? QUIZ What do these bits represent? 1001 0110 1 QUIZ What do these bits represent? Unsigned integer: 1101 1110 Signed integer (2 s complement): Fraction: IBM 437 character: Latin-1 character: Huffman-compressed

More information

EECS 427 Lecture 21: Design for Test (DFT) Reminders

EECS 427 Lecture 21: Design for Test (DFT) Reminders EECS 427 Lecture 21: Design for Test (DFT) Readings: Insert H.3, CBF Ch 25 EECS 427 F09 Lecture 21 1 Reminders One more deadline Finish your project by Dec. 14 Schematic, layout, simulations, and final

More information

Best Instruction Per Cycle Formula >>>CLICK HERE<<<

Best Instruction Per Cycle Formula >>>CLICK HERE<<< Best Instruction Per Cycle Formula 6 Performance tuning, 7 Perceived performance, 8 Performance Equation, 9 See also is the average instructions per cycle (IPC) for this benchmark. Even. Click Card to

More information

Instructional Demos, In-Class Projects, & Hands-On Homework: Active Learning for Electrical Engineering using the Analog Discovery

Instructional Demos, In-Class Projects, & Hands-On Homework: Active Learning for Electrical Engineering using the Analog Discovery Instructional Demos, In-Class Projects, & Hands-On Homework: Active Learning for Electrical Engineering using the Analog Discovery by Dr. Gregory J. Mazzaro Dr. Ronald J. Hayne THE CITADEL, THE MILITARY

More information

Image Processing Architectures (and their future requirements)

Image Processing Architectures (and their future requirements) Lecture 17: Image Processing Architectures (and their future requirements) Visual Computing Systems Smart phone processing resources Qualcomm snapdragon Image credit: Qualcomm Apple A7 (iphone 5s) Chipworks

More information

Threading libraries performance when applied to image acquisition and processing in a forensic application

Threading libraries performance when applied to image acquisition and processing in a forensic application Threading libraries performance when applied to image acquisition and processing in a forensic application Carlos Bermúdez MSc. in Photonics, Universitat Politècnica de Catalunya, Barcelona, Spain Student

More information

Model-Based Design for Sensor Systems

Model-Based Design for Sensor Systems 2009 The MathWorks, Inc. Model-Based Design for Sensor Systems Stephanie Kwan Applications Engineer Agenda Sensor Systems Overview System Level Design Challenges Components of Sensor Systems Sensor Characterization

More information

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp Programming and Optimization with Intel Xeon Phi Coprocessors Colfax Developer Training One-day Boot Camp Abstract: Colfax Developer Training (CDT) is an in-depth intensive course on efficient parallel

More information

Creating Intelligence at the Edge

Creating Intelligence at the Edge Creating Intelligence at the Edge Vladimir Stojanović E3S Retreat September 8, 2017 The growing importance of machine learning Page 2 Applications exploding in the cloud Huge interest to move to the edge

More information

Computer & Information Science & Engineering What s All This?

Computer & Information Science & Engineering What s All This? Computer & Information Science & Engineering What s All This? Marc Snir Department of Computer Science Time s man of the year, 1982 A New World Dawns Steven Jobs was 27 The IBM PC was a few months away

More information

What s Behind 5G Wireless Communications?

What s Behind 5G Wireless Communications? What s Behind 5G Wireless Communications? Marc Barberis 2015 The MathWorks, Inc. 1 Agenda 5G goals and requirements Modeling and simulating key 5G technologies Release 15: Enhanced Mobile Broadband IoT

More information

TOOLS & PROCESSORS FOR COMPUTER VISION. Selected Results from the Embedded Vision Alliance s Fall 2017 Computer Vision Developer Survey

TOOLS & PROCESSORS FOR COMPUTER VISION. Selected Results from the Embedded Vision Alliance s Fall 2017 Computer Vision Developer Survey TOOLS & PROCESSORS FOR COMPUTER VISION Selected Results from the Embedded Vision Alliance s Fall 2017 Computer Vision Developer Survey ABOUT THE EMBEDDED VISION ALLIANCE EXECUTIVE SUMMA Y Since 2015, the

More information

Supporting x86-64 Address Translation for 100s of GPU Lanes. Jason Power, Mark D. Hill, David A. Wood

Supporting x86-64 Address Translation for 100s of GPU Lanes. Jason Power, Mark D. Hill, David A. Wood Supporting x86-64 Address Translation for 100s of GPU s Jason Power, Mark D. Hill, David A. Wood Summary Challenges: CPU&GPUs physically integrated, but logically separate; This reduces theoretical bandwidth,

More information

Low Power VLSI Circuit Synthesis: Introduction and Course Outline

Low Power VLSI Circuit Synthesis: Introduction and Course Outline Low Power VLSI Circuit Synthesis: Introduction and Course Outline Ajit Pal Professor Department of Computer Science and Engineering Indian Institute of Technology Kharagpur INDIA -721302 Agenda Why Low

More information

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations Sno Projects List IEEE 1 High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations 2 A Generalized Algorithm And Reconfigurable Architecture For Efficient And Scalable

More information

CS 6135 VLSI Physical Design Automation Fall 2003

CS 6135 VLSI Physical Design Automation Fall 2003 CS 6135 VLSI Physical Design Automation Fall 2003 1 Course Information Class time: R789 Location: EECS 224 Instructor: Ting-Chi Wang ( ) EECS 643, (03) 5742963 tcwang@cs.nthu.edu.tw Office hours: M56R5

More information

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to. FPGAs 1 CMPE 415 Technology Timeline 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs FPGAs The Design Warrior s Guide

More information

Parallelism Across the Curriculum

Parallelism Across the Curriculum Parallelism Across the Curriculum John E. Howland Department of Computer Science Trinity University One Trinity Place San Antonio, Texas 78212-7200 Voice: (210) 999-7364 Fax: (210) 999-7477 E-mail: jhowland@trinity.edu

More information

GPU ACCELERATED DEEP LEARNING WITH CUDNN

GPU ACCELERATED DEEP LEARNING WITH CUDNN GPU ACCELERATED DEEP LEARNING WITH CUDNN Larry Brown Ph.D. March 2015 AGENDA 1 Introducing cudnn and GPUs 2 Deep Learning Context 3 cudnn V2 4 Using cudnn 2 Introducing cudnn and GPUs 3 HOW GPU ACCELERATION

More information

- Software Engineer con Laurea Magistrale in Informatica, Telecomunicazioni o Elettronica

- Software Engineer con Laurea Magistrale in Informatica, Telecomunicazioni o Elettronica Elettronica spa cerca: - Software Engineer con Laurea Magistrale in Informatica, Telecomunicazioni o Elettronica - Machine Learning Engineer con Laurea Magistrale in Informatica, Elettronica o Telecomunicazioni

More information

(Theory-Practice-Lab) Credit BBM 1511 Introduction to Computer Engineering - 1 (2-0-0) 2

(Theory-Practice-Lab) Credit BBM 1511 Introduction to Computer Engineering - 1 (2-0-0) 2 ARAS Brief Course Descriptions (Theory-Practice-Lab) Credit BBM 1511 Introduction to Computer Engineering - 1 (2-0-0) 2 Basic Concepts in Computer Science / Computer Systems and Peripherals / Introduction

More information

GPU-based data analysis for Synthetic Aperture Microwave Imaging

GPU-based data analysis for Synthetic Aperture Microwave Imaging GPU-based data analysis for Synthetic Aperture Microwave Imaging 1 st IAEA Technical Meeting on Fusion Data Processing, Validation and Analysis 1 st -3 rd June 2015 J.C. Chorley 1, K.J. Brunner 1, N.A.

More information

Analog Custom Layout Engineer

Analog Custom Layout Engineer Analog Custom Layout Engineer Huawei Canada s rapid growth has created an excellent opportunity to build and grow your career and make a big impact to everyone s life. The IC Lab is currently looking to

More information

New Paradigm in Testing Heads & Media for HDD. Dr. Lutz Henckels September 2010

New Paradigm in Testing Heads & Media for HDD. Dr. Lutz Henckels September 2010 New Paradigm in Testing Heads & Media for HDD Dr. Lutz Henckels September 2010 1 WOW an amazing industry 40%+ per year aerial density growth Source: Coughlin Associates 2010 2 WOW an amazing industry Aerial

More information

Modeling and Simulating Large Phased Array Systems

Modeling and Simulating Large Phased Array Systems Modeling and Simulating Large Phased Array Systems Tabrez Khan Senior Application Engineer Application Engineering Group 2015 The MathWorks, Inc. 1 Challenges with Large Array Systems Design & simulation

More information

Overview of current developments in haptic APIs

Overview of current developments in haptic APIs Central European Seminar on Computer Graphics for students, 2011 AUTHOR: Petr Kadleček SUPERVISOR: Petr Kmoch Overview of current developments in haptic APIs Presentation Haptics Haptic programming Haptic

More information

EE 434 ASIC & Digital Systems

EE 434 ASIC & Digital Systems EE 434 ASIC & Digital Systems Dae Hyun Kim EECS Washington State University Spring 2017 Course Website http://eecs.wsu.edu/~ee434 Themes Study how to design, analyze, and test a complex applicationspecific

More information

Administrative Issues

Administrative Issues dministrative Issues Text book ($56.69 in mazon.com) Scanned problem set Email list Homework 1 announced, due 01/13/10 Quiz, 01/15/10 Graduate students meeting Relevant chapters in textbook? Technology

More information

EE25266 ASIC/FPGA Chip Design. Designing a FIR Filter, FPGA in the Loop, Ethernet

EE25266 ASIC/FPGA Chip Design. Designing a FIR Filter, FPGA in the Loop, Ethernet EE25266 ASIC/FPGA Chip Design Mahdi Shabany Electrical Engineering Department Sharif University of Technology Assignment #8 Designing a FIR Filter, FPGA in the Loop, Ethernet Introduction In this lab,

More information

Dr. D. M. Akbar Hussain

Dr. D. M. Akbar Hussain Course Objectives: To enable the students to learn some more practical facts about DSP architectures. Objective is that they can apply this knowledge to map any digital filtering algorithm and related

More information

Dr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar. Data programming model for an operation based parallel image processing system

Dr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar. Data programming model for an operation based parallel image processing system Name: Affiliation: Field of research: Specific Field of Study: Proposed Research Topic: Dr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar Information Science and Technology Computer Science

More information

EM Simulation of Automotive Radar Mounted in Vehicle Bumper

EM Simulation of Automotive Radar Mounted in Vehicle Bumper EM Simulation of Automotive Radar Mounted in Vehicle Bumper Abstract Trends in automotive safety are pushing radar systems to higher levels of accuracy and reliable target identification for blind spot

More information

ni.com The NI PXIe-5644R Vector Signal Transceiver World s First Software-Designed Instrument

ni.com The NI PXIe-5644R Vector Signal Transceiver World s First Software-Designed Instrument The NI PXIe-5644R Vector Signal Transceiver World s First Software-Designed Instrument Agenda Hardware Overview Tenets of a Software-Designed Instrument NI PXIe-5644R Software Example Modifications Available

More information

DESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS

DESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS DESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS P. Th. Savvopoulos. PhD., A. Apostolopoulos 2, L. Dimitrov 3 Department of Electrical and Computer Engineering, University of Patras, 265 Patras,

More information

Towards Real-Time Volunteer Distributed Computing

Towards Real-Time Volunteer Distributed Computing Towards Real-Time Volunteer Distributed Computing Sangho Yi 1, Emmanuel Jeannot 2, Derrick Kondo 1, David P. Anderson 3 1 INRIA MESCAL, 2 RUNTIME, France 3 UC Berkeley, USA Motivation Push towards large-scale,

More information

Developing and Prototyping Next-Generation Communications Systems

Developing and Prototyping Next-Generation Communications Systems Developing and Prototyping Next-Generation Communications Systems Dr. Amod Anandkumar Team Lead Signal Processing and Communications Application Engineering Group 2015 The MathWorks, Inc. 1 Proliferation

More information

Real-time Volt/Var Optimization Scheme for Distribution Systems with PV Integration

Real-time Volt/Var Optimization Scheme for Distribution Systems with PV Integration Grid-connected Advanced Power Electronic Systems Real-time Volt/Var Optimization Scheme for Distribution Systems with PV Integration 02-15-2017 Presenter Name: Yan Chen (On behalf of Dr. Benigni) Outline

More information

TOOLS & PROCESSORS FOR COMPUTER VISION. Selected Results from the Embedded Vision Alliance s Computer Vision Developer Survey

TOOLS & PROCESSORS FOR COMPUTER VISION. Selected Results from the Embedded Vision Alliance s Computer Vision Developer Survey TOOLS & PROCESSORS FOR COMPUTER VISION Selected Results from the Embedded Vision Alliance s Computer Vision Developer Survey JANUARY 2019 EXECUTIVE SUMMA Y Since 2015, the Embedded Vision Alliance has

More information

Introduction to co-simulation. What is HW-SW co-simulation?

Introduction to co-simulation. What is HW-SW co-simulation? Introduction to co-simulation CPSC489-501 Hardware-Software Codesign of Embedded Systems Mahapatra-TexasA&M-Fall 00 1 What is HW-SW co-simulation? A basic definition: Manipulating simulated hardware with

More information

UNLV ME 425/625 Robotics. Introduction and Course Philosophy

UNLV ME 425/625 Robotics. Introduction and Course Philosophy UNLV ME 425/625 Robotics Introduction and Course Philosophy Paul Oh: Background 5+ years industry before Drexel ME Professor since 2000 Advisor: ASME 10-years Advisor: 8+ SD teams 3 SD awards (COE, ASME,

More information

Automatic Kernel Code Generation for Focal-plane Sensor-Processor Devices

Automatic Kernel Code Generation for Focal-plane Sensor-Processor Devices Automatic Kernel Code Generation for Focal-plane Sensor-Processor Devices Thomas Debrunner - MSc Student Imperial College London Paul Kelly - Software Performance Optimisation Group Lead, Imperial College

More information

Bricken Technologies Corporation Presentations: Bricken Technologies Corporation Corporate: Bricken Technologies Corporation Marketing:

Bricken Technologies Corporation Presentations: Bricken Technologies Corporation Corporate: Bricken Technologies Corporation Marketing: TECHNICAL REPORTS William Bricken compiled 2004 Bricken Technologies Corporation Presentations: 2004: Synthesis Applications of Boundary Logic 2004: BTC Board of Directors Technical Review (quarterly)

More information

Oculus Rift Getting Started Guide

Oculus Rift Getting Started Guide Oculus Rift Getting Started Guide Version 1.23 2 Introduction Oculus Rift Copyrights and Trademarks 2017 Oculus VR, LLC. All Rights Reserved. OCULUS VR, OCULUS, and RIFT are trademarks of Oculus VR, LLC.

More information

CSE502: Computer Architecture Welcome to CSE 502

CSE502: Computer Architecture Welcome to CSE 502 Welcome to CSE 502 Introduction & Review Today s Lecture Course Overview Course Topics Grading Logistics Academic Integrity Policy Homework Quiz Key basic concepts for Computer Architecture Course Overview

More information

Software Computer Vision - Driver Assistance

Software Computer Vision - Driver Assistance Software Computer Vision - Driver Assistance Work @Bosch for developing desktop, web or embedded software and algorithms / computer vision / artificial intelligence for Driver Assistance Systems and Automated

More information

Audio Sample Rate Conversion in FPGAs

Audio Sample Rate Conversion in FPGAs Audio Sample Rate Conversion in FPGAs An efficient implementation of audio algorithms in programmable logic. by Philipp Jacobsohn Field Applications Engineer Synplicity eutschland GmbH philipp@synplicity.com

More information

CHAPTER 4 GALS ARCHITECTURE

CHAPTER 4 GALS ARCHITECTURE 64 CHAPTER 4 GALS ARCHITECTURE The aim of this chapter is to implement an application on GALS architecture. The synchronous and asynchronous implementations are compared in FFT design. The power consumption

More information

CS Computer Architecture Spring Lecture 04: Understanding Performance

CS Computer Architecture Spring Lecture 04: Understanding Performance CS 35101 Computer Architecture Spring 2008 Lecture 04: Understanding Performance Taken from Mary Jane Irwin (www.cse.psu.edu/~mji) and Kevin Schaffer [Adapted from Computer Organization and Design, Patterson

More information

Fast and Accurate RF component characterization enabled by FPGA technology

Fast and Accurate RF component characterization enabled by FPGA technology Fast and Accurate RF component characterization enabled by FPGA technology Guillaume Pailloncy Senior Systems Engineer Agenda RF Application Challenges What are FPGAs and why are they useful? FPGA-based

More information

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm Vijay Kumar Ch 1, Leelakrishna Muthyala 1, Chitra E 2 1 Research Scholar, VLSI, SRM University, Tamilnadu, India 2 Assistant Professor,

More information

Parallel Simulation of Social Agents using Cilk and OpenCL

Parallel Simulation of Social Agents using Cilk and OpenCL D. Moser, A. Riener, K. Zia, A. Ferscha Department for Pervasive Computing, JKU Linz/Austria Parallel Simulation of Social Agents using Cilk and OpenCL DS-RT 2011 15th International Symposium on Distributed

More information

Proposers Day Workshop

Proposers Day Workshop Proposers Day Workshop Monday, January 23, 2017 @srcjump, #JUMPpdw Cognitive Computing Vertical Research Center Mandy Pant Academic Research Director Intel Corporation Center Motivation Today s deep learning

More information

Pramod Kumar Naik Senior Application Engineer MathWorks Products

Pramod Kumar Naik Senior Application Engineer MathWorks Products MATLAB & SIMULINK Pramod Kumar Naik Senior Application Engineer MathWorks Products 2 Enabling Excellence Through Innovation System Engineering Intellectual Property (IP) EDA & Semiconductor University

More information

Leading by design: Q&A with Dr. Raghuram Tupuri, AMD Chris Hall, DigiTimes.com, Taipei [Monday 12 December 2005]

Leading by design: Q&A with Dr. Raghuram Tupuri, AMD Chris Hall, DigiTimes.com, Taipei [Monday 12 December 2005] Leading by design: Q&A with Dr. Raghuram Tupuri, AMD Chris Hall, DigiTimes.com, Taipei [Monday 12 December 2005] AMD s drive to 64-bit processors surprised everyone with its speed, even as detractors commented

More information

AI Application Processing Requirements

AI Application Processing Requirements AI Application Processing Requirements 1 Low Medium High Sensor analysis Activity Recognition (motion sensors) Stress Analysis or Attention Analysis Audio & sound Speech Recognition Object detection Computer

More information

Monte Carlo integration and event generation on GPU and their application to particle physics

Monte Carlo integration and event generation on GPU and their application to particle physics Monte Carlo integration and event generation on GPU and their application to particle physics Junichi Kanzaki (KEK) GPU2016 @ Rome, Italy Sep. 26, 2016 Motivation Increase of amount of LHC data (raw &

More information

Signal Processing on GPUs for Radio Telescopes

Signal Processing on GPUs for Radio Telescopes Signal Processing on GPUs for Radio Telescopes John W. Romein Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands 1 Overview radio telescopes motivation processing pipelines signal-processing

More information

Image Processing Architectures (and their future requirements)

Image Processing Architectures (and their future requirements) Lecture 16: Image Processing Architectures (and their future requirements) Visual Computing Systems Smart phone processing resources Example SoC: Qualcomm Snapdragon Image credit: Qualcomm Apple A7 (iphone

More information

Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen

Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen GIGA seminar 11.1.2010 Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen janne.janhunen@ee.oulu.fi 2 Outline Introduction Benefits and Challenges

More information

A Study of Optimal Spatial Partition Size and Field of View in Massively Multiplayer Online Game Server

A Study of Optimal Spatial Partition Size and Field of View in Massively Multiplayer Online Game Server A Study of Optimal Spatial Partition Size and Field of View in Massively Multiplayer Online Game Server Youngsik Kim * * Department of Game and Multimedia Engineering, Korea Polytechnic University, Republic

More information

Improving Loop-Gain Performance In Digital Power Supplies With Latest- Generation DSCs

Improving Loop-Gain Performance In Digital Power Supplies With Latest- Generation DSCs ISSUE: March 2016 Improving Loop-Gain Performance In Digital Power Supplies With Latest- Generation DSCs by Alex Dumais, Microchip Technology, Chandler, Ariz. With the consistent push for higher-performance

More information

CMOS Technology for Computer Architects

CMOS Technology for Computer Architects CMOS Technology for Computer Architects Lecture 1: Introduction Iakovos Mavroidis Giorgos Passas Manolis Katevenis FORTH-ICS (University of Crete) Course Contents Implementation of high-performance digital

More information

VLSI System Testing. Outline

VLSI System Testing. Outline ECE 538 VLSI System Testing Krish Chakrabarty System-on-Chip (SOC) Testing ECE 538 Krish Chakrabarty 1 Outline Motivation for modular testing of SOCs Wrapper design IEEE 1500 Standard Optimization Test

More information

Airborne radar clutter simulation using GPU (CUDA)

Airborne radar clutter simulation using GPU (CUDA) Airborne radar clutter simulation using GPU (CUDA) 1 Priyanka A P, 2 Mr.Channabasappa Baligar 1 Department of VLSI and Embedded Systems, UTL technologies Ltd, Bangalore, India 2 Department of VLSI and

More information