NRC Workshop on NASA s Modeling, Simulation, and Information Systems and Processing Technology

Similar documents
December 10, Why HPC? Daniel Lucio.

NRC Workshop on NASA Technologies

INCITE Proposal Writing Webinar April 24, 2012

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir

Scientific Computing Activities in KAUST

Petascale Design Optimization of Spacebased Precipitation Observations to Address Floods and Droughts

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

Challenges in Transition

Exascale Initiatives in Europe

Extreme Scale Computational Science Challenges in Fusion Energy Research

Impact from Industrial use of HPC HPC User Forum #59 Munich, Germany October 2015

CP2K PERFORMANCE FROM CRAY XT3 TO XC30. Iain Bethune Fiona Reid Alfio Lazzaro

Post K Supercomputer of. FLAGSHIP 2020 Project. FLAGSHIP 2020 Project. Schedule

First Experience with PCP in the PRACE Project: PCP at any cost? F. Berberich, Forschungszentrum Jülich, May 8, 2012, IHK Düsseldorf

e-infrastructures for open science

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp

Building a Cell Ecosystem. David A. Bader

Establishment of a Multiplexed Thredds Installation and a Ramadda Collaboration Environment for Community Access to Climate Change Data

Deep Learning Overview

The Bump in the Road to Exaflops and Rethinking LINPACK

Center for Hybrid Multicore Productivity Research (CHMPR)

IESP AND APPLICATIONS. IESP BOF, SC09 Portland, Oregon Paul Messina November 18, 2009

Sourcing in Scientific Computing

Experience with new architectures: moving from HELIOS to Marconi

Exascale Challenges for the Computational Science Community

Best practice in participation in ECSEL Calls. Recommendations to prospective Bulgarian participants.

Report on NSF Workshop on Center Scale Activities Related to Accelerators for Data Intensive Applications

BASICS: TECHNOLOGIES. EEC 116, B. Baas

High Performance Computing: Infrastructure, Application, and Operation

Panel 2: Observatories

Canada s Most Powerful Research Supercomputer Niagara Fuels Canadian Innovation and Discovery

INCITE Program Overview May 15, Jack Wells Director of Science Oak Ridge Leadership Computing Facility

Algorithm-Based Master-Worker Model of Fault Tolerance in Time-Evolving Applications

Observations and Recommendations by JPL

THE INFLUENCE OF ACADEMIC RESEARCH ON INDUSTRY R&D. Steve Keckler, Vice President of Architecture Research June 19, 2016

UNIT-III LIFE-CYCLE PHASES

Exascale-related EC activities

HIGH-LEVEL SUPPORT FOR SIMULATIONS IN ASTRO- AND ELEMENTARY PARTICLE PHYSICS

escience: Pulsar searching on GPUs

Climate Change Innovation and Technology Framework 2017

High Performance Computing and Visualization at the School of Health Information Sciences

High Performance Computing and Modern Science Prof. Dr. Thomas Ludwig

GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links

CS4961 Parallel Programming. Lecture 1: Introduction 08/24/2010. Course Details Time and Location: TuTh, 9:10-10:30 AM, WEB L112 Course Website

Software-Intensive Systems Producibility

Democratising Parallel Software

High Performance Computing Systems and Scalable Networks for. Information Technology. Joint White Paper from the

GPU-based data analysis for Synthetic Aperture Microwave Imaging

Commission proposal for Horizon Europe. #HorizonEU THE NEXT EU RESEARCH & INNOVATION PROGRAMME ( )

Advanced Scientific Computing Advisory Committee Petascale Metrics Report

The Spanish Supercomputing Network (RES)

USING SIMPLE PID CONTROLLERS TO PREVENT AND MITIGATE FAULTS IN SCIENTIFIC WORKFLOWS

The end of Moore s law and the race for performance

THE EARTH SIMULATOR CHAPTER 2. Jack Dongarra

GPU ACCELERATED DEEP LEARNING WITH CUDNN

FROM BRAIN RESEARCH TO FUTURE TECHNOLOGIES. Dirk Pleiter Post-H2020 Vision for HPC Workshop, Frankfurt

The UK e-infrastructure Landscape Dr Susan Morrell Chair of UKRI e-infrastructure Group

IBI GROUP S TOP 10. Smart City Strategy Success Factors

The Promise and Realities of Additive Manufacturing (3D Printing) in Space Betsy Cantwell, PhD

Enabling Scientific Breakthroughs at the Petascale

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

CITRIS and LBNL Computational Science and Engineering (CSE)

Towards Real-Time Volunteer Distributed Computing

Hardware Software Science Co-design in the Human Brain Project

A FRAMEWORK FOR PERFORMING V&V WITHIN REUSE-BASED SOFTWARE ENGINEERING

Thoughts on Reimagining The University. Rajiv Ramnath. Program Director, Software Cluster, NSF/OAC. Version: 03/09/17 00:15

New Approaches to Manufacturing Innovation in DOE

Recent Advances in Simulation Techniques and Tools

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Labs CDT 102

Development of Innovation Strategy and Patent Systems. Paik Saber Assistant General Counsel, IP Law IBM Asia Pacific

A High Definition Motion JPEG Encoder Based on Epuma Platform

Model checking in the cloud VIGYAN SINGHAL OSKI TECHNOLOGY

National e-infrastructure for Science. Jacko Koster UNINETT Sigma

An Interim Report on Petascale Computing Metrics Executive Summary

Enabling technologies for beyond exascale computing

Commission proposal for Horizon Europe. #HorizonEU THE NEXT EU RESEARCH & INNOVATION PROGRAMME ( )

Oak Ridge National Lab Update on Cray XT3. presented by Sarp Oral, Ph.D.

Mission Agency Perspective on Assessing Research Value and Impact

RAPS ECMWF. RAPS Chairman. 20th ORAP Forum Slide 1

ECE 546 Introduction

Behind the scenes of Big Science. Amber Boehnlein Department of Energy And Fermi National Accelerator Laboratory

Corporate Mind 2016 Corporate Responsibility Report

RF and Microwave Test and Design Roadshow Cape Town & Midrand

GPU-accelerated track reconstruction in the ALICE High Level Trigger

High Performance Computing Scientific Discovery and the Importance of Collaboration

Parallel Computing in the Multicore Era

An Introduction to Load Balancing CCSM3 Components

Innovative Approaches in Collaborative Planning

Nova Scotia s Energy Matrix: Connecting the Dots. Stephen Dempsey Executive Director

Image Processing Architectures (and their future requirements)

A STUDY ON THE DOCUMENT INFORMATION SERVICE OF THE NATIONAL AGRICULTURAL LIBRARY FOR AGRICULTURAL SCI-TECH INNOVATION IN CHINA

Parallel Computing in the Multicore Era

Software and High Performance Computing: Challenges for Research

High Performance Computing Facility for North East India through Information and Communication Technology

Instrumentation and Control

Catapult Network Summary

2012 International Ocean Vector Wind ST Meeting Utrecht, Netherlands, May 2012

Synthetic Aperture Beamformation using the GPU

Technology Roadmapping. Lesson 3

Real-time Grid Computing : Monte-Carlo Methods in Parallel Tree Searching

Transcription:

NRC Workshop on NASA s Modeling, Simulation, and Information Systems and Processing Technology Bronson Messer Director of Science National Center for Computational Sciences & Senior R&D Staff Oak Ridge National Laboratory

Outline What are the top technical challenges in the area of your presentation topic? What are technology gaps that the roadmap did not cover? What are some of the high priority technology areas that NASA should pursue? Do the high priority areas align well with the NASA s expertise, capabilities, facilities and the nature of the NASA s role in developing the specified technology? In your opinion, how well is NASA s proposed technology development effort competitively placed? What specific technology can we call a Game Changing Technology? Is there a technology component near the tipping point? (Tipping point: large advance in technology readiness is possible with a relatively small additional investment.) In your opinion, what is the time horizon for the technology to be ready for insertion (5-30 years)? Provide a sense of value in terms of payoffs, risk, technical barriers and chance of success. 2

Today, ORNL is the world s most powerful computing facility Jaguar Peak performance 2.33 PF/s Memory 300 TB Disk bandwidth > 240 GB/s Square feet 5,000 Power 7 MW Dept. of Energy s most powerful computer #2 Kraken Peak performance Memory Disk bandwidth 1.03 PF/s 132 TB > 50 GB/s Square feet 2,300 Power 3 MW National Science Foundation s most powerful computer #8 3 NOAA Gaea Peak Performance 1.1 PF/s Memory 248 TB Disk Bandwidth 104 GB/s Square feet 1,600 Power 2.2 MW #32 National Oceanic and Atmospheric Administration s most powerful computer

Our science requires that we advance computational capability 1000x over the next decade Mission: Deploy and operate the computational resources required to tackle global challenges Deliver transforming discoveries in climate, materials, biology, energy technologies, etc. Ability to investigate otherwise inaccessible systems, from regional climate impacts to energy grid dynamics Vision: Maximize scientific productivity and progress on the largest scale computational problems Providing world-class computational resources and specialized services for the most computationally intensive problems Providing stable hardware/software path of increasing scale to maximize productive applications development 4 Cray XT5 2+ PF Leadership system for science OLCF-3: 10-20 PF Leadership system with some HPCS technology OLCF-4: 100-250 PF based on DARPA HPCS technology 2009 2012 2015 2018 OLCF-5: 1 EF

OLCF-3 node description New node for Cray XE infrastructure Gemini interconnect AMD Socket G34 processor 1 AMD socket G34 processor and 1 NVIDIA GPU per node Interlagos uses AMD socket G34 and new Bulldozer core DDR3-1600 memory HyperTransport version 3 NVIDIA Kepler accelerator Successor to Fermi Jaguar s XT5 node OLCF-3 node Opteron sockets 2 1 Opteron memory (GB) 16 32 Interconnect Seastar2 Gemini Node peak GFLOPS 110 >1500 5

What will the exascale look like? Node architectures are expected to change dramatically in the next decade, becoming more hierarchical and heterogeneous.... computer companies are dramatically increasing on-chip parallelism to improve performance. The traditional doubling of clock speeds every 18 to 24 months is being replaced by a doubling of cores or other parallelism mechanisms. Systems will consist of one hundred thousand to one million nodes and perhaps as many as a billion cores. Architectures and Technology for Extreme Scale Computing, Workshop Report, 2009; http://www.er.doe.gov/ascr/programdocuments/docs/arch-techgrandchallengesreport.pdf 6

Systems 2009 2015 +1/-0 2018 +1/-0 System peak 2 Peta 100-200 Peta 1 Exa Power 6 MW ~15 MW ~20 MW System memory 0.3 PB 5 PB 64 PB (+) Node performance 125 GF 0.5 TF or 7 TF 1,2 or 15TF Node memory BW 25 GB/s 1-2TB/s 2-4TB/s Node concurrency 12 O(100) O(1k) or 10k Total Node Interconnect BW 3.5 GB/s 100-200 GB/s 10:1 vs memory bandwidth 2:1 alternative 200-400GB/s (1:4 or 1:8 from memory BW) System size (nodes) 18,700 50,000 or 500,000 O(100,000) or O(1M) Total concurrency 225,000 O(100,000,000) *O(10)- O(50) to hide latency O(billion) * O(10) to O(100) for latency hiding Storage 15 PB 150 PB 500-1000 PB (>10x system memory is min) IO 0.2 TB 10 TB/s 60 TB/s (how long to drain the machine) MTTI days O(1day) O(0.1 day) 7

What does this say about the programming model? The principal programming environment challenges will be on the exascale node: concurrency, hierarchy and heterogeneity. An exascale node will also be the workgroup/departmental-scale computing resource... more than a billion-way parallelism to fully utilize an exascale system Portability will be a significant concern... In order to improve productivity a programming model that abstracts some of the architectural details from software developers is highly desirable. Architectures and Technology for Extreme Scale Computing, Workshop Report, 2009; http://www.er.doe.gov/ascr/programdocuments/docs/arch-techgrandchallengesreport.pdf 8

OLCF-3 Applications Analysis informed by two requirements surveys Project application requirements Elicited, analyzed, and validated using a new comprehensive requirements questionnaire Project overview, science motivation & impact, application models, algorithms, parallelization strategy, S/W, development process, SQA, V&V, usage workflow, performance Results, analysis, and conclusions documented in 2009 OLCF application requirements document OLCF-3 baseline plan developed in consultation with 50+ leading scientists in many domains What are the science goals and does OLCF-3 enable them? What might the impact be if the improved science result occurs? What does it matter if this result is delivered in the 2012 timeframe? 9

PF Survey Findings Algorithm development is evolutionary No algorithm sweet spots But algorithm footprints share characteristics No one is clamoring for new languages MPI until the water gets too hot (frog analogy) Apps lifetimes are >3-5x machine lifetimes Refactoring is already a way of life Fault tolerance via defensive checkpointing de facto standard Won t this eventually bite us? Artificially drives I/O demands Weak or strong scale or both (no winner) 10

What kind of software infrastructure do we want? inter-node layer is straightforward MPI, SHMEM, Global Arrays, Co-Array Fortran, maybe UPC intra-node layer that allows us to easily move identified kernels to the accelerator it should be as facile as OpenMP directive-based where accelerator regions are bounded work with C/C++/Fortran single compiler handles all aspects of the system intra-node architecture integrated libraries for BLAS/FFT/LAPACK Where do HPCS languages (e.g., Chapel) sit The original view might have have been at the inter-node layer Incremental, evolutionary introduction might demand at the intra-node level 11

12 What should the programming model look like? 1. MPI or Global Address Space languages across nodes 2. Within the very powerful nodes, use OpenMP, or other threads package to exploit the large number of cores 3. In each thread, use directives to invoke vector, SIMD, or SSE style instructions in the processor or accelerator to maximize performance 4. Explicitly manage data movement to minimize power 5. Describe the parallelism in the high-level language in a portable way, then let the compiler and libraries generate the best code for the architecture We are implementing this programming model for Titan, but this model works on current and future systems

Tools can enable more effective application development pre-processing technology to manage complexity ROSE (http://rosecompiler.org/ ) performance hints, including opportunities for buffering frameworks that generate code MADNESS Tensor Contraction Engine (TCE), http://www.csc.lsu.edu/~gb/tce/ MAGMA (Atlas+ for GPUs) 13 build application-centric functionality into compiler/tools chain encapsulate appropriate prescribed tasks for accelerator work similar to evolution of vectorizing or OpenMP compilers & technologies IPORT Scidac Institute proposed to build integrated, productionlevel, user-friendly refactoring toolchain from extant tools and new tools (PI s: R. Graham and B. Messer)