Early Science on Theta

Similar documents
EXPERIENCES WITH KNL IN THE ALCF EARLY SCIENCE PROGRAM

Enabling Scientific Breakthroughs at the Petascale

Post K Supercomputer of. FLAGSHIP 2020 Project. FLAGSHIP 2020 Project. Schedule

HIGH-LEVEL SUPPORT FOR SIMULATIONS IN ASTRO- AND ELEMENTARY PARTICLE PHYSICS

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Labs CDT 102

INFLUENCE OF MEMBRANE AMPLITUDE AND FORCING FREQUENCY ON SYNTHETIC JET VELOCITY

High Performance Computing Scientific Discovery and the Importance of Collaboration

Advanced Scientific Computing Advisory Committee Petascale Metrics Report

High Performance Computing

IBM Research Zurich. A Strategy of Open Innovation. Dr. Jana Koehler, Manager Business Integration Technologies. IBM Research Zurich

NASA Fundamental Aeronautics Program Jay Dryer Director, Fundamental Aeronautics Program Aeronautics Research Mission Directorate

Perspectives on CFD V&V in Nuclear Regulatory Applications

The Bump in the Road to Exaflops and Rethinking LINPACK

IESP AND APPLICATIONS. IESP BOF, SC09 Portland, Oregon Paul Messina November 18, 2009

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp

BETTER THAN REMOVING YOUR APPENDIX WITH A SPORK: DEVELOPING FACULTY RESEARCH PARTNERSHIPS

An Interim Report on Petascale Computing Metrics Executive Summary

December 10, Why HPC? Daniel Lucio.

Finite Element & Boundary Element Technology in Acoustics & Structural Dynamics : Current Status & Key Trends for the Future

Development of a parallel, tree-based neighbour-search algorithm

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Supercomputers have become critically important tools for driving innovation and discovery

Scientific Computing Activities in KAUST

Global Alzheimer s Association Interactive Network. Imagine GAAIN

Mission Agency Perspective on Assessing Research Value and Impact

High Performance Computing Systems and Scalable Networks for. Information Technology. Joint White Paper from the

Exascale Initiatives in Europe

Deep Learning Overview

CS4961 Parallel Programming. Lecture 1: Introduction 08/24/2010. Course Details Time and Location: TuTh, 9:10-10:30 AM, WEB L112 Course Website

Challenges in Transition

Application of Maxwell Equations to Human Body Modelling

Durham Research Online

Joint Collaborative Project. between. China Academy of Aerospace Aerodynamics (China) and University of Southampton (UK)

Institute for the Theory of Advance Materials in Information Technology. Jim Chelikowsky University of Texas

Sourcing in Scientific Computing

Computer-aided Casting Method Design, Simulation and Optimization

CERN-PH-ADO-MN For Internal Discussion. ATTRACT Initiative. Markus Nordberg Marzio Nessi


XSEDE at a Glance Aaron Gardner Campus Champion - University of Florida

Real-time Systems in Tokamak Devices. A case study: the JET Tokamak May 25, 2010

Enabling technologies for beyond exascale computing

Evaluation of CPU Frequency Transition Latency

Evaluation of Strategic Area: Marine and Maritime Research. 1) Strategic Area Concept

Computer Science as a Discipline

Thoughts on Reimagining The University. Rajiv Ramnath. Program Director, Software Cluster, NSF/OAC. Version: 03/09/17 00:15

TAU Experiences with Detached-Eddy Simulations

LS-DYNA Performance Enhancement of Fan Blade Off Simulation on Cray XC40

Princeton University. Honors Faculty Members Receiving Emeritus Status

NRC Workshop on NASA s Modeling, Simulation, and Information Systems and Processing Technology

A Balanced Introduction to Computer Science, 3/E

INCITE Program Overview May 15, Jack Wells Director of Science Oak Ridge Leadership Computing Facility

Trends in. Archives. Practice MODULE 8. Steve Marks. with an Introduction by Bruce Ambacher. Edited by Michael Shallcross

Interview with Dr. Ken Hutcherson (Ph.D., 1994) Staff Scientist, Osram Sylvania

European Wind Energy Technology Roadmap

ABOUT COMPUTER SCIENCE

Sawako Kaijima, Roland Bouffanais, Karen Willcox and Suresh Naidu

The ERC: a contribution to society and the knowledge-based economy

Michael P. Ridley, Director. NYSTAR High Performance Computing Program

SEAM Pressure Prediction and Hazard Avoidance

Wind Energy Technology Roadmap

The PRACE Scientific Steering Committee

FOREBODY VORTEX CONTROL ON HIGH PERFORMANCE AIRCRAFT USING PWM- CONTROLLED PLASMA ACTUATORS

Interactive Modeling and Authoring of Climbing Plants

By Mark Hindsbo Vice President and General Manager, ANSYS

VMD: Biomolecular Visualization and Analysis

THE BIOMEDICAL ENGINEERING TEACHING & INNOVATION CENTER. at Boston University s College of Engineering

NEES CYBERINFRASTRUCTURE: A FOUNDATION FOR INNOVATIVE RESEARCH AND EDUCATION

The end of Moore s law and the race for performance

PBL Challenge: Of Mice and Penn McKay Orthopaedic Research Laboratory University of Pennsylvania

The Spanish Supercomputing Network (RES)

Case Study. A Tale of Stylish Shades and High Performance Computing. High Performance Computing

!! Enabling!Exascale!in!Europe!for!Industry! PRACEdays15!Satellite!Event!by!European!Exascale!Projects!

THE EARTH SIMULATOR CHAPTER 2. Jack Dongarra

Cross Linking Research and Education and Entrepreneurship

Reducing MEMS product development and commercialization time

Technology Evaluation. David A. Berg Queen s University Kingston, ON November 28, 2017

SLAC National Accelerator Laboratory

Graduate Studies in Computational Science at U-M. Graduate Certificate in Computational Discovery and Engineering. and

RDT&E BUDGET ITEM JUSTIFICATION SHEET (R-2 Exhibit)

Institute of Physical and Chemical Research Flowcharts for Achieving Mid to Long-term Objectives

Particle Image Velocimetry

University of Queensland. Research Computing Centre. Strategic Plan. David Abramson

Instrumentation, Controls, and Automation - Program 68

THE STATE OF UC ADOPTION

NUMECA International Experience with European project participation

LTE Small-Cell Base Station Antenna Matched for Maximum Efficiency

UNIT-4 POWER QUALITY MONITORING

PASSIVE COMPONENTS FOR DENSE OPTICAL INTEGRATION

L A N D R A Y P R O D U C T 1 BREAKTHROUGH PERFORMANCE BY GROUND PENETRATING RADAR

Research Statement. Sorin Cotofana

Smarter oil and gas exploration with IBM

NEXT GENERATION MODELS FOR PLANETARY MANAGERS

In 1984, a cell phone in the U.S. cost $3,995 and

Wind load testing methodology for measuring drag coefficient of aerodynamically efficient base station antenna profiles

Investigations of spray painting processes using an airless spray gun

Statistical analysis of nonlinearly propagating acoustic noise in a tube

Scientific Data e-infrastructures in the European Capacities Programme

Looking ahead : Technology trends driving business innovation.

Advances in Core CFD Technology: Meeting Your Evolving Product Development Needs

High Performance Computing Facility for North East India through Information and Communication Technology

Transcription:

DEPARTMENT: Leadership Computing Early Science on Theta Timothy J. Williams Argonne National Laboratory Editors: James J. Hack, jhack@ornl.gov; Michael E. Papka, papka@anl.gov Supercomputers are essential peers to the most powerful scientific instruments used today. These machines run high-performance codes, tuned and optimized over decades, that push the boundaries of what we can observe, and help unravel the secrets of the physical world at all of its scales. The scientific supercomputers run by the US Department of Energy s (DOE s) Office of Science are an elite class of machines, among the most powerful in the world. They represent a collaboration between different research communities and the nation s leading computing manufacturers, chip makers, and software developers to produce breakthrough scientific discoveries. Research teams come from far and wide from corporate R&D labs, government research facilities, and university faculties to exploit the massive power of these systems. Launching a new supercomputer is no small feat: clearing design reviews and preparing the physical site can take up to several years. All the system components literally tons of hardware are then unpacked, tested, and assembled in a matter of weeks, followed by an intensive period of troubleshooting and benchmarking. At the Argonne Leadership Computing Facility (ALCF), a DOE Office of Science user facility located at Argonne National Laboratory, the process for standing up a new system is paired with The Early Science Program, a separate effort that begins long before the first crate is opened. This competitive program prepares a select group of applications science teams to use the new machine very readily by providing them staff support and early access to preproduction system hardware and software. These science teams, assisted by a dedicated postdoc with training in a relevant scientific domain, are the first to learn the features of the new architecture and to run actual science campaigns. USING REAL SCIENCE TO SOLVE TECHNICAL CHALLENGES While memory and speed are perhaps the most prominent features of supercomputers, a machine s ability to support the science is the most critical. Scientists themselves are the best troubleshooters: they understand what kinds of problems can arise from the codes they use, and can recognize when results are wrong, or at least different from what they expected. The ALCF ran its first competitive Early Science Program in 2012 to ready its 10-Pflops IBM Blue Gene/Q system, Mira. Since then, the program has been replicated at several other DOE leadership computing facilities, including at ALCF s sister facility at Oak Ridge National Laboratory and at the National Energy Research Scientific Computing Center (NERSC), a user facility operated by Lawrence Berkeley National Laboratory and the Office of Science. Computing in Science & Engineering 73 Copublished by the IEEE CS and the AIP 1521-9615/18/$33 2018 IEEE

COMPUTING IN SCIENCE & ENGINEERING The idea behind Early Science is to maximize use of the critical period between the acceptance of the completed system and the day it transitions to production mode. On that day, the new machine shifts to supporting projects awarded by one or more of DOE s major allocation programs INCITE (Innovative Novel Computational Impact on Theory and Experiment) and ALCC (ASCR Leadership Computing Challenge) as well as a number of in-house discretionary projects focused on developing application scaling and performance toward leadership computing. The Early Science teams, whose projects cover key scientific areas and numerical methods, have by then worked with ALCF and technical vendors to adapt their codes to the architecture and scale of the new machine. Because these projects collectively represent a typical system workload at the ALCF, vendors and staff can use the project teams run campaigns to validate and debug the libraries and infrastructure on which future production users will rely. DIFFERENT PROGRAMS FOR DIFFERENT ARCHITECTURES In general, proposed Early Science problems will have visible scientific impact and be appropriate for the new machine. In addition, the code must be very likely to run in production mode immediately upon machine availability. The ALCF s evolving computational workload also plays a role in the type of awards that are made. Mira s program supported 16 traditional simulation science projects and all 16 projects were enabling productive science on day one. The projects yielded numerous insights on tuning and scaling for a vastly more powerful architecture than its predecessor. In addition, all 16 applications were enhanced to model more complex phenomena, with higher fidelity, collectively consuming nearly 2 billion core-hours in a matter of months. In 2013, the year Mira was accepted, the machine took on a portion of INCITE projects typically the largest and most challenging projects at the facility. By 1 January 2014, Mira was supporting all of them. The program for ALCF s latest system, Theta, now an 11.69-Pflops Cray XC system that went into production 1 July 2017, supported six Tier 1 science applications and six Tier 2 code development projects. The Tier 1 teams received preproduction allocations to run scientific calculations and a postdoc to help meet their science goals. Tier 2 teams were able to conduct substantial science runs whenever the machine nodes were otherwise idle. Most projects reported significant science results, and many teams are already building on their Early Science work to run new or next-phase research campaigns. Theta, too, began producing science at full capacity on day one. In its first six months in production, Theta supported a combination of ALCC projects, Early Science projects, and discretionary projects that included ALCF Data Science Program projects. As of 1 January 2018, Theta added INCITE projects to its workload. THETA EARLY SCIENCE SUCCESSES Here we describe three Theta Early Science projects: two science applications and one code development effort. All three investigators are longtime users of leadership class machines. Katrin Heitmann s project, Next-Generation Cosmology Simulations with HACC: Conquering the Baryon Problem, used Theta to run a large-scale simulation containing dark matter and baryons with hydrodynamics that will facilitate comparisons with upcoming observations. Kenneth E. Jansen s code project, Extreme Scale Unstructured Adaptive CFD: From Multiphase Flow to Aerodynamic Flow Control, applied the computational fluid dynamics analysis package PHASTA to the problem of active flow control for aerodynamics. This project got strong results, including comparison with experiment. Benoît Roux s project, Free Energy Landscapes of Membrane Transport Proteins, developed and implemented a scalable algorithm for constant ph simulation in NAMD, a molecular dynamics code for biological simulation. 74

LEADERSHIP COMPUTING Next-Generation Cosmology Simulations with HACC: Conquering the Baryon Problem The next generation of cosmological surveys covering the microwave, optical, and X-ray bands will start taking data soon, and a major challenge in interpreting these new observations will be to disentangle astrophysical effects from fundamental physics. Argonne physicist and computational scientist Katrin Heitmann worked with researchers from Argonne, the University of Chicago, and the University of Rennes to build new capabilities for the extreme-scale Hardware/Hybrid Accelerated Cosmology Code (HACC) to allow simultaneous modeling of observations in all of these wavebands at high fidelity. Figure 1. Density slice of dark matter (left) and baryons (right) centered on a large galaxy cluster. Baryonic thermal pressure resists gravitational collapse and induces suppression in the small-scale clustering of matter. Hydrodynamic simulations are required to disentangle the effects of baryons from those associated with dark matter and dark energy. Image: J.D. Emberson, Argonne National Laboratory. As part of this work, the team developed and tested a new algorithm called Conservative Reproducing Kernel Smoothed Particle Hydrodynamics (CRK-SPH), which addresses some of the shortcomings of traditional SPH methods compared with adaptive mesh refinement (AMR) methods. The CRK-SPH scheme showed significant improvement in hydrodynamics problems involving shearing and mixing, in addition to shocks, while preserving the advantages of particle-based methods in performance and scalability. Before the start of this project, the HACC framework only considered gravitational forces; now it includes a hydrodynamics solver and a first set of subgrid models, creating the CRK-HACC framework. The team extensively tested the CRK-SPH implementation on Theta s architecture and verified the correctness of the implementation of combining gravity with CRK-SPH hydrodynamics by running the Santa Barbara cluster comparison benchmark for hydrodynamic cosmology codes. The results were compared to AMR methods and very good agreement was found. HACC (and therefore CRK-HACC) is designed to run on all available high-performance computing architectures. The team completed the optimization of the short-range solver for Theta s architecture and demonstrated excellent performance and scaling results on the full machine. The team also carried out two simulations: one with HACC and one using identical initial conditions with CRK-HACC, including baryonic physics (see Figure 1). The simulations evolved a total of ~25 billion particles in the case of the CRK-HACC run. These two simulations are now allowing the team to carry out detailed studies of the impact of baryons on the matter distribution in the universe by comparing the simulations side by side (in reality, these simulations are much too big to be viewed side by side, so what is compared are summary statistics). The CRK- HACC run on Theta marks a major milestone for the team on the way to next-generation machines and physics questions that can be studied. 75

COMPUTING IN SCIENCE & ENGINEERING Extreme Scale Unstructured Adaptive CFD: From Multiphase Flow to Aerodynamic Flow Control Kenneth E. Jansen s Early Science code development project explored the impact of applying the open-source, massively parallel computational fluid dynamics (CFD) analysis package, called PHASTA, on Theta to advance computational modeling of turbulent flows around aerodynamic bodies, such as airplanes and wind turbines. Jansen, a professor in the University of Colorado Boulder s Department of Aerospace Engineering Sciences, worked with researchers from the University of Colorado Boulder, Rensselaer Polytechnic Institute, Argonne, and North Carolina State University. With previous allocations at the ALCF, Jansen s team has performed highly resolved detached eddy simulations of a vertical tail-rudder assembly with 12 synthetic jet actuators small speaker-like cavities that take in and expel air to alter and control the flow across the assembly in flight. These simulations have shown excellent agreement with wind tunnel experiments not only of integrated quantities like total force, but also phase-averaged flow structures issued from the synthetic jets, yielding, for the first time, clear insight into the fundamental mechanisms of flow control. Their work included using 2,048 Theta nodes to run fully implicit PHASTA calculations using 5 billion finite elements, based on an unstructured adaptive mesh. Using Theta, the researchers set out to extend these experiment-scale simulations to an 8X higher Reynolds number, which brings them substantially closer to flight scale. Achieving the highest possible portable performance on new architectures has been a major focus of PHASTA s development since its inception. Flexibility has been built into the code to make it highly adaptable to hardware and software advances. As part of the team s Early Science work, the researchers used Intel s VTune performance profiling tool on Theta to confirm that a very high percentage of the PHASTA s computationally intensive kernels were already highly vectorized. The team s runs on Theta showed a roughly 5X improvement in per core performance compared to Mira. Adaptive, implicit unstructured grid CFD makes use of very complicated algorithms with formidable scaling challenges. In their studies on Theta, the researchers found that, with some effort, they can be made not only scalable but highly efficient in terms of science provided per CPU hour. Through careful design, the more complex algorithms can provide great reductions in computational cost relative to simpler methods that are easier to mate efficiently to hardware. By making complex features like adaptivity and implicit methods as efficient and scalable as possible, the team confirmed that realistic aircraft components, like a vertical tail/rudder assembly complete with active control, can be simulated accurately at wind tunnel scale. These simulations are on the path to flight scale with next-generation exascale systems. Free Energy Landscapes of Membrane Transport Proteins Molecular dynamics (MD) is widely used to investigate biochemical and biomolecular systems, thanks in part to advances in computational hardware over the past few decades. Conventional MD can investigate the behavior of a single protonation state, but it cannot address how to link the network of states together in a physically meaningful way. The vast majority of simulations today employ only rudimentary algorithmic approaches (so-called brute force MD), which only permit access to a small fraction of what the approach has to offer. The Nanoscale Molecular Dynamics (NAMD) code is used to simulate large systems (millions of atoms) and is designed primarily for all-atom and coarse-grained MD with a focus on biomolecular applications. For more than a decade, Argonne senior scientist and University of Chicago professor Benoît Roux has been using leadership systems to carry out MD simulations to provide detailed visualizations of the large conformational changes of membrane transport proteins and quantitative predictions of the energetics of these processes. For this Early Science project, Roux and his team developed and implemented a constant-ph MD algorithm in the NAMD simulation engine. They achieved, for the first time, constant-ph 76

LEADERSHIP COMPUTING simulations of a membrane transport protein and used the results to analyze its free energy landscape for ion selectivity. (The potential of hydrogen, a measure of acidity commonly known as ph, plays a significant role in macromolecular interactions, and also plays an important part in drug optimization and ph-dependent drug molecule aggregation.) Computational biophysicists from the University of Chicago and the ALCF, in collaboration with the NAMD development team at the University of Illinois at Urbana-Champaign, assisted with the algorithm s design and implementation. In addition, the team explored the structural response of a small drug molecule, called a mini-peptide, under various acidity/alkalinity conditions. The team plotted NAMD strong scaling on Theta and other machines versus nodes, sockets, and cores for 21 million and 224 million atom benchmarks. Compared to Mira, NAMD on Theta is a factor of 12 faster per node. As a whole, these advances provide a new tool for the broader MD simulation community. CONCLUSION The Early Science program for ALCF s next supercomputer, Aurora, has identified five simulation science projects and will soon identify five data science projects and five machine learning projects. Aurora, expected in 2021, will be a capable exascale platform equally suited for largescale simulation, deep learning, and data analysis applications. ACKNOWLEDGMENTS This research used resources of the ALCF, a US Department of Energy Office of Science User Facility supported under contract DE-AC02-06CH11357. The author gratefully acknowledges Laura Wolf of Argonne National Laboratory for her assistance in the production of this article. ABOUT THE AUTHOR Timothy J. Williams is deputy director of the Computational Science Division at Argonne National Laboratory, where he also manages the Argonne Leadership Computing Facility s Early Science Program. His research interests include plasma physics, particle-in-cell simulation of tokamak plasmas in particular, and wide-ranging applications of large-scale supercomputing in science and applied math. Williams has a PhD in physics from William & Mary. Contact him at tjwilliams@anl.gov. 77