Early Adopter : Multiprocessor Programming in the Undergraduate Program. NSF/TCPP Curriculum: Early Adoption at the University of Central Florida

Similar documents
Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp

High Performance Computing for Engineers

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

Dr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar. Data programming model for an operation based parallel image processing system

Challenges in Transition

Parallelism Across the Curriculum

Scalable Multi-Precision Simulation of Spiking Neural Networks on GPU with OpenCL

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Labs CDT 102

EE 382C EMBEDDED SOFTWARE SYSTEMS. Literature Survey Report. Characterization of Embedded Workloads. Ajay Joshi. March 30, 2004

CS4961 Parallel Programming. Lecture 1: Introduction 08/24/2010. Course Details Time and Location: TuTh, 9:10-10:30 AM, WEB L112 Course Website

Convolution Engine: Balancing Efficiency and Flexibility in Specialized Computing

Design of Parallel Algorithms. Communication Algorithms

Introduction to co-simulation. What is HW-SW co-simulation?

From the New York Times Introduction to Concurrency

UNIT-III LIFE-CYCLE PHASES

Concepts of Parallelism In An Introductory Computer Architecture Courses With FPGA Laboratories

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors

Diffracting Trees and Layout

PAINTING AND PRINTMAKING, BACHELOR OF FINE ARTS (B.F.A.)

Statement of Research Weiwei Chen

Architecting Systems of the Future, page 1

Real Time Operating Systems Lecture 29.1

Self-Aware Adaptation in FPGAbased

Computing Requirements of Sri Lankan Scientific Community

GC for interactive and real-time systems

A Study of Optimal Spatial Partition Size and Field of View in Massively Multiplayer Online Game Server

Datorstödd Elektronikkonstruktion

Building a Cell Ecosystem. David A. Bader

The Message Passing Interface (MPI)

Hardware/Software Codesign - introducing an interdisciplinary course

TOOLS & PROCESSORS FOR COMPUTER VISION. Selected Results from the Embedded Vision Alliance s Fall 2017 Computer Vision Developer Survey

Hardware-Software Codesign. 0. Organization

Real-Time Face Detection and Tracking for High Resolution Smart Camera System

Scheduling and Communication Synthesis for Distributed Real-Time Systems

Computer Aided Design of Electronics

Advances in Parallel Discrete Event Simulation for Electronic System-Level Design

TOOLS & PROCESSORS FOR COMPUTER VISION. Selected Results from the Embedded Vision Alliance s Computer Vision Developer Survey

Simulation Performance Optimization of Virtual Prototypes Sammidi Mounika, B S Renuka

Real-time Concurrent Collection on Stock Multiprocessors

A Message Scheduling Scheme for All-to-all Personalized Communication on Ethernet Switched Clusters

Accelerated Impulse Response Calculation for Indoor Optical Communication Channels

CUDA-Accelerated Satellite Communication Demodulation

Hardware-Software Co-Design Cosynthesis and Partitioning

Instructional Demos, In-Class Projects, & Hands-On Homework: Active Learning for Electrical Engineering using the Analog Discovery

Computer Organization and Architecture

Design and Implementation of Gaussian, Impulse, and Mixed Noise Removal filtering techniques for MR Brain Imaging under Clustering Environment

Game Architecture. 4/8/16: Multiprocessor Game Loops

COMPUTER SCIENCE AND ENGINEERING

Sourjya Bhaumik, Shoban Chandrabose, Kashyap Jataprolu, Gautam Kumar, Paul Polakos, Vikram Srinivasan, Thomas Woo

TOOLS AND PROCESSORS FOR COMPUTER VISION. Selected Results from the Embedded Vision Alliance s Spring 2017 Computer Vision Developer Survey

Introduction to Real-Time Systems

Lecture 1. Tinoosh Mohsenin

CS4617 Computer Architecture

A PageRank Algorithm based on Asynchronous Gauss-Seidel Iterations

Killzone Shadow Fall: Threading the Entity Update on PS4. Jorrit Rouwé Lead Game Tech, Guerrilla Games

Course Specifications

HACETTEPE ÜNİVERSİTESİ COMPUTER ENGINEERING DEPARTMENT BACHELOR S DEGREE INFORMATION OF DEGREE PROGRAM 2012

PAINTING AND PRINTMAKING, BACHELOR OF FINE ARTS (B.F.A.) [VCUQ]

Computer Architecture A Quantitative Approach

CS586: Distributed Computing Tutorial 1

Energy Efficient Scheduling Techniques For Real-Time Embedded Systems

MSc(CompSc) List of courses offered in

AC : RF AND MICROWAVE ENGINEERING ELECTIVE COURSE WITH A CO-REQUISITE IN THE ELECTROMAGNETICS COURSE. Ernest Kim, University of San Diego

Contents. Basic Concepts. Histogram of CPU-burst Times. Diagram of Process State CHAPTER 5 CPU SCHEDULING. Alternating Sequence of CPU And I/O Bursts

Image Processing Architectures (and their future requirements)

Author: Yih-Yih Lin. Correspondence: Yih-Yih Lin Hewlett-Packard Company MR Forest Street Marlboro, MA USA

Chapter 1 Introduction

Embedded & Robotics Training

Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes

San José State University Aerospace Engineering AE20 Computer-Aided Design for Aerospace Engineers, Fa

MULTISCALAR PROCESSORS

Embedded & Robotics Training

A Multi-Level Curriculum in Digital Instrumentation and Control based on Field Programmable Gate Array Technology

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Image Processing Architectures (and their future requirements)

Real-time Grid Computing : Monte-Carlo Methods in Parallel Tree Searching

Energy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control

WiMAX Basestation: Software Reuse Using a Resource Pool. Arnon Friedmann SW Product Manager

High Performance Computing Systems and Scalable Networks for. Information Technology. Joint White Paper from the

Lecture5: Lossless Compression Techniques

INF3430 Clock and Synchronization

Radio Deep Learning Efforts Showcase Presentation

Lecture 1: Introduction to Digital System Design & Co-Design

3.5: Multimedia Operating Systems Resource Management. Resource Management Synchronization. Process Management Multimedia

Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs

Advanced FPGA Design. Tinoosh Mohsenin CMPE 491/691 Spring 2012

CSCI-564 Advanced Computer Architecture

ENGI 128 INTRODUCTION TO ENGINEERING SYSTEMS

DYNAMIC MEDIA INSTITUTE MFA: DESIGN COURSES

GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links

A Parallel Monte-Carlo Tree Search Algorithm

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

Hardware Implementation of Automatic Control Systems using FPGAs

AC : TECHNOLOGIES TO INTRODUCE EMBEDDED DESIGN EARLY IN ENGINEERING. Shekhar Sharad, National Instruments

Cosimulating Synchronous DSP Applications with Analog RF Circuits

PhD PRELIMINARY WRITTEN EXAMINATION READING LIST

CS256 Applied Theory of Computation

Performance Metrics, Amdahl s Law

A NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS

Transcription:

Early Adopter : Multiprocessor Programming in the Undergraduate Program NSF/TCPP Curriculum: Early Adoption at the University of Central Florida Narsingh Deo Damian Dechev Mahadevan Vasudevan Department of Electrical Engineering and Computer Science University of Central Florida Orlando, FL 32816 May 16, 2011 NSF TCPP (2011) Early Adoption at UCF May 16, 2011 1 / 12

Parallel Programming Education at UCF COP 4520: Concepts in Parallel and Distributed Computing Elective senior computer programming class Prerequisites COP 3503: Sequential Algorithms and Data Structures COP 3402: Systems Software Carries: 3 semester hours, 45 in-class instruction hours Spring 2011: 33 enrolled students Scope parallel graph algorithms principles of distributed computing programming models frameworks for parallel processing NSF TCPP (2011) Early Adoption at UCF May 16, 2011 2 / 12

TCPP Topics and Integration Plan COP 4520: place for highly motivated and ambitious undergraduate students Prior to Spring 2011 Fundamentals (3 weeks) taxonomy and architectures data vs. control parallel approach algorithm analysis, running time, speedup, cost/work, efficiency Brent s scheduling Amdahl s law, Gustafson s law Cluster Computing with MPI (5 weeks) overview of cluster architecture, granularity constraints message passing, frequently used functions sample applications, using the UCF cluster parallel software development team project involving substantial parallel programming NSF TCPP (2011) Early Adoption at UCF May 16, 2011 3 / 12

COP 4520: place for highly motivated and ambitious undergraduate students Prior to Spring 2011 Designing, Implementing, and Evaluating Parallel Algorithms (remainder of the class) Prefix-sums / list-ranking, finding the max of a set, sorting matrix problems: matrix partitioning, matrix multiplication Gaussian elimination graph problems: all-pairs shortest paths: Dijkstra s, Warshall-Floyd, minimum spanning tree performance comparisons Overview (last week of classes) Recent advanced: programming models, architectures P-completeness: a glimpse of the problems that resist parallelization NSF TCPP (2011) Early Adoption at UCF May 16, 2011 4 / 12

COP 4520: the paradigm shift in our core computing architecture requires a fundamental change in how we program Spring 2011 Core Topics I introduction to multi-threading and multiprocessor synchronization design of highly concurrent data structures and algorithms lock-free synchronization software transactional memory (STM) models programming tools and techniques for parallel computing program analysis tools (such as Intel Pin, Valgrind, ROSE Compiler) NSF TCPP (2011) Early Adoption at UCF May 16, 2011 5 / 12

COP 4520: the paradigm shift in our core computing architecture requires a fundamental change in how we program Spring 2011 Core Topics II emerging parallel programming models (Intel TBB, Intel Ct, STM) recent advances and future trends in concurrent programming validation and verification of parallel processes industrial applications heterogeneous platforms (CPUs, GPUs, FPGAs) hardware-software co-design advanced simulation tools NSF TCPP (2011) Early Adoption at UCF May 16, 2011 6 / 12

COP 4520: the paradigm shift in our core computing architecture requires a fundamental change in how we program Spring 2011 Lectures I mutual exclusion concurrent objects, consistency and semantics shared memory data structures synchronization primitives, transactional memory spin-locks, read-write locks, contention nonblocking data structures: linked-lists, queues, vectors, hash tables hazardous concurrency bugs: ABA Problem, race-conditions NSF TCPP (2011) Early Adoption at UCF May 16, 2011 7 / 12

COP 4520: the paradigm shift in our core computing architecture requires a fundamental change in how we program Spring 2011 Lectures II hazardous concurrency bugs: ABA Problem, race-conditions progress guarantees, linearizability validation and verification of multi-processor algorithms scheduling and work distribution real-time systems, HPC applications, advanced simulations programming language support for concurrency: new languages and language standards the application of static and dynamic program analysis new programming models for multi-core computing NSF TCPP (2011) Early Adoption at UCF May 16, 2011 8 / 12

Spring 2011 The Art of Multiprocessor Programming Nir Shavir and Maurice Herlihy Morgan Kaufmann 2008, ISBN 978012370591. NSF TCPP (2011) Early Adoption at UCF May 16, 2011 9 / 12

COP 4520: rapidly expanding set of important topics in parallel computing and our desire to provide to our students a dynamic curriculum After Spring 2011 introduce multiprocessor programming earlier in the curriculum create a sophomore Parallel Programming Course in C++ class offer a sequel elective junior class in Parallel Graph Algorithms and Design Patterns offer a class on Parallel Computer Organization and Architectures split COP 4520 into two advanced classes: a class on multiprocessor synchronization and lock-free programming a course on the more traditional distributed computing and MPI programming models NSF TCPP (2011) Early Adoption at UCF May 16, 2011 10 / 12

Evaluation Plan Serve two purposes Collect meaningful feedback about the current state-of-the art parallel programming technique Estimate the level of preparedness of our students for applying their skills to modern industrial projects Main method: a carefully crafted survey with multiple choice and short-answer questions sent to graduates 1-3 years after graduation Establish the relevance of the current set of topics in our curriculum NSF TCPP (2011) Early Adoption at UCF May 16, 2011 11 / 12

Thank You! NSF TCPP (2011) Early Adoption at UCF May 16, 2011 12 / 12