Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Labs CDT 102

Similar documents
Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp

Lab MIC Offload Experiments 11/13/13 offload_lab.tar TACC


Challenges in Transition

FROM KNIGHTS CORNER TO LANDING: A CASE STUDY BASED ON A HODGKIN- HUXLEY NEURON SIMULATOR

PRACE PATC Course Intel MIC Programming Workshop. February, 7-8, 2017, IT4Innovations, Ostrava, Czech Republic

Early Adopter : Multiprocessor Programming in the Undergraduate Program. NSF/TCPP Curriculum: Early Adoption at the University of Central Florida

Building a Cell Ecosystem. David A. Bader

HIGH-LEVEL SUPPORT FOR SIMULATIONS IN ASTRO- AND ELEMENTARY PARTICLE PHYSICS

Evaluation of CPU Frequency Transition Latency

Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs

Characterizing, Optimizing, and Auto-Tuning Applications for Energy Efficiency

escience: Pulsar searching on GPUs

Application of Maxwell Equations to Human Body Modelling

Evaluation of CPU Frequency Transition Latency

HP Laboratories. US Labor Rates for Directed Research Activities. Researcher Qualifications and Descriptions. HP Labs US Labor Rates

Stress Testing the OpenSimulator Virtual World Server

Dr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar. Data programming model for an operation based parallel image processing system

Smarter oil and gas exploration with IBM

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

NRC Workshop on NASA s Modeling, Simulation, and Information Systems and Processing Technology

The Hessian competence center for high performance computing (

High Performance Computing in Europe A view from the European Commission

High Performance Computing Facility for North East India through Information and Communication Technology

Science and engineering driving the global economy David Delpy, CEO May 2012

Committee on Development and Intellectual Property (CDIP)

23rd VI-HPS Tuning Workshop & LLNL Performance Tools Deep-Dive

Establishment of a Multiplexed Thredds Installation and a Ramadda Collaboration Environment for Community Access to Climate Change Data

Computer Architecture A Quantitative Approach

Embedded Systems Programming Instruction Using a Virtual Testbed

The Bump in the Road to Exaflops and Rethinking LINPACK

Proposers Day Workshop

LS-DYNA Performance Enhancement of Fan Blade Off Simulation on Cray XC40

Analysis of Image Compression Algorithm: GUETZLI

D8.1 PROJECT PRESENTATION

The Study on the Architecture of Public knowledge Service Platform Based on Collaborative Innovation

Leveraging HPC for Alzheimer s Research and Beyond. Joseph Lombardo Executive Director, UNLV s National Supercomputing Center April 2015

Architecting Systems of the Future, page 1

NAPA User Meeting 2017

Academic Course Description. VL2004 CMOS Analog VLSI Second Semester, (Even semester)

Great Minds. Internship Program IBM Research - China

Exascale Initiatives in Europe

Track and Vertex Reconstruction on GPUs for the Mu3e Experiment

Enabling Science and Discovery at Georgia Tech With MVAPICH2

A Real-Time Regulator, Turbine and Alternator Test Bench for Ensuring Generators Under Test Contribute to Whole System Stability

22nd VI-HPS Tuning Workshop PATC Performance Analysis Workshop

What can POP do for you?

Sourjya Bhaumik, Shoban Chandrabose, Kashyap Jataprolu, Gautam Kumar, Paul Polakos, Vikram Srinivasan, Thomas Woo

Experience with new architectures: moving from HELIOS to Marconi

RAPS ECMWF. RAPS Chairman. 20th ORAP Forum Slide 1

Neural Networks The New Moore s Law

MT-4E P25 Voting / Simulcast Training Course Outline

Media and Communication (MMC)

CSCI-564 Advanced Computer Architecture

Concluding remarks. Makoto Asai (SLAC SD/EPP) April 19th, 2015 Geant4 MC2015

High Performance Computing for Engineers

ARDUINO. Gianluca Martino.

Architecture ISCA 16 Luis Ceze, Tom Wenisch

Scalable and Lightweight CTF Infrastructures Using Application Containers

CP2K PERFORMANCE FROM CRAY XT3 TO XC30. Iain Bethune Fiona Reid Alfio Lazzaro

Trinity Center of Excellence

PRACE PATC Course: Intel MIC Programming Workshop & Scientific Workshop: HPC for natural hazard assessment and disaster mitigation, June 2017,

Research Challenges in Forecasting Technical Emergence. Dewey Murdick, IARPA 25 September 2013

Lecture # 01. Introduction

Master of Comm. Systems Engineering (Structure C)

24th VI-HPS Tuning Workshop PATC course in conjunction with POP CoE

John Weaver, PhD AIM Scientific Core Technical Director. Larry Sklar, PhD Autophagy Scientific Core Director

A High Definition Motion JPEG Encoder Based on Epuma Platform

An Experimentation Framework to Support UMV Design and Development

Vampir Getting Started. Holger Brunst March 4th 2008

Decentralized Data Detection for Massive MU-MIMO on a Xeon Phi Cluster

Extreme Light Infrastructure ELI Beamlines. High-Energy Beam Pillar of the pan-european Research Infrastructure ELI

Training Schedule. Robotic System Design using Arduino Platform

23270: AUGMENTED REALITY FOR NAVIGATION AND INFORMATIONAL ADAS. Sergii Bykov Technical Lead Machine Learning 12 Oct 2017

Developing a GPU Processing Framework for Accelerating Remote Sensing Algorithms

Session 12. Quality assessment and assurance in the civil registration and vital statistics system

Non-Blocking Collectives for MPI-2

Enduring Understandings 1. Design is not Art. They have many things in common but also differ in many ways.

Software Radio Satellite Terminal: an experimental test-bed

NCN vision NCN vision 2002

A NEW ARCHITECTURE FOR FLIGHTGEAR FLIGHT SIMULATOR

1) Fixed point [15 points] a) What are the primary reasons we might use fixed point rather than floating point? [2]

MACHINE LEARNING Games and Beyond. Calvin Lin, NVIDIA

28th VI-HPS Tuning Workshop UCL, London, June 2018

ADVANCED TRAINING SIMULATORS

Straight to the heart of innovation.

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

Digital Signal Processing. VO Embedded Systems Engineering Armin Wasicek WS 2009/10

NAPA User Meeting 2017

CUDA-Accelerated Satellite Communication Demodulation

FET Open in Horizon Roumen Borissov Scientific/Technical Project Officer Future and Emerging Technologies, DG CONNECT European Commission

Human Factors in Control

GPU ACCELERATED DEEP LEARNING WITH CUDNN

Recent Advances in Simulation Techniques and Tools

BMOSLFGEMW: A Spectrum of Game Engine Architectures

High Performance Computing and Visualization at the School of Health Information Sciences

Audio Hub Evolution. May

Figure 1.1: Quanser Driving Simulator

Outline. PRACE A Mid-Term Update Dietmar Erwin, Forschungszentrum Jülich ORAP, Lille, March 26, 2009

High Performance Computing

Transcription:

Programming and Optimization with Intel Xeon Phi Coprocessors Colfax Developer Training One-day Labs CDT 102

Abstract: Colfax Developer Training (CDT) is an in-depth intensive course on efficient parallel programming of Intel Xeon family processors and Intel Xeon Phi coprocessors. The 1-day labs course (CDT 102) features hands-on exercises on the available programming models and best optimization practices for the Intel many-core platform, and on the usage of the Intel software development and diagnostic tools. The pre-requisite for this class is is the one-day seminar CDT 101. 9 am to 4 pm: Hands-on session. Offload and Native: Hello World to complex; using MPI. Performance Analysis: VTune. Case Study: all aspects of tuning in the N-body calculation. Optimization I: strip-mining for vectorization, parallel reduction. Optimization II: loop tiling, thread affinity. Intel Xeon Phi coprocessors, featuring the Intel Many Integrated Core (MIC) architecture, are novel many-core computing accelerators for highly parallel applications, capable of delivering greater performance per system and per watt than general-purpose CPUs. Unlike GPGPUs, they support traditional HPC programming frameworks, including OpenMP and MPI, and require the same software optimization methods as multi-core CPUs.

Schedule 9:00 9:30 Remote Access Configuration, Lab Orientation 9:30 10:30 Programming with Explicit Offload Offload pragmas and object markup Diagnostics and control with environment variables Data persistence and memory retention Multiple coprocessors Overlapping communication with computation. 10:30 11:00 Native Programming Cross-compilation Running a native application with ssh, micnativeloadex Using native applications in MPI. 11:00-12:00 Performance Analysis Using Intel VTune Amplifier. Lunch break 1:00 2:00 Comprehensive optimization: N-body calculation all areas of optimization in one exercise. 2:00 3:00 Partnering vectors and cores: histogram example strip-mining for vectorization eliminating synchronization through parallel reduction first-touch allocation impact on Xeon. 3:00 4:00 Boosting memory and cache traffic: transposition example loop tiling for cached data re-use compiler hints for vectorization thread affinity control regularizing vectorization pattern.

Instructor: Vadim Karpusenko, Ph. D., is Principal HPC Research Engineer at Colfax International involved in training and consultancy projects on data mining, software development and statistical analysis of complex systems. His research interests are in the area of physical modeling with HPC clusters, highly parallel architectures, and code optimization. Vadim holds a PhD from North Carolina State University for his computational biophysics research on the free energy and stability of helical secondary structures of proteins. He is a co-author of the book Parallel Programming and Optimization with Intel Xeon Phi Coprocessors 1, and a regular contributor to the online resource Colfax Research 2. Instructor: Andrey Vladimirov, Ph. D., is Head of HPC Research at Colfax International. His primary interest is the application of modern computing technologies to computationally demanding scientific problems. Prior to joining Colfax, A. Vladimirov was involved in computational astrophysics research at Stanford University, North Carolina State University, and the Ioffe Institute (Russia), where he studied cosmic rays, collisionless plasmas and the interstellar medium using computer simulations. He is a co-author of the book Parallel Programming and Optimization with Intel Xeon Phi Coprocessors, a regular contributor to the online resource Colfax Research, and an author or co-author of over 10 peer-reviewed publications in the fields of theoretical astrophysics and scientific computing. Instructor: Ryo Asai is a Researcher at Colfax International. Ryo holds a B. A. degree in Physics from University of California, Berkeley. He develops optimization methods for scientific applications targeting emerging parallel computing platforms, computing accelerators and interconnect technologies. Having joined Colfax s research team early on, Ryo has acquired deep domain expertise in programming the Intel MIC architecture. He has committed a great deal of work to the Colfax Developer Training materials, and his peer-reviewed work is among the most widely read publications of Colfax Research. 1 March 2013, ISBN-10: 0-9885234-1-8, ISBN-13: 978-0-9885234-1-8, more details available at http://www.colfax-intl.com/nd/xeonphi/book.aspx 2 http://research.colfaxinternational.com/

Notes Presentations Video and audio recording and still photography during Colfax Developer Training (CDT) is permitted only for private or institutional use by the attendees and their direct collaborators. No recorded materials shall be publicly disseminated without explicit written authorization from Colfax International. Materials The slides of all presentations will be made available to all attendees in electronic form. Attendees are free to use these materials privately and share them with direct collaborators. However, no materials shall be publicly disseminated without explicit written authorization from Colfax International. The book on which the CDT is based, Parallel Programming and Optimization with Intel Xeon Phi Coprocessors, is available in the electronic format and as a hard copy at http://www.colfaxintl.com/nd/xeonphi/book.aspx. An electronic copy of the book and enclosed codes of exercises is included in the training price. Contacts and Resources The instructors of this CDT can be contacted via email at vadim@colfaxintl.com, andrey@colfax-intl.com and ryo@colfax-intl.com. You may also find useful our online resource research.colfaxinternational.com, where explanatory and research publications can be found. General inquiries regarding Colfax s business can be sent to phi@colfax-intl.com. Colfax s business Web site www.colfax-intl.com contains information about the company s hardware solutions, education and consulting offerings.