INCITE Proposal Writing Webinar April 24, 2012 Judith Hill OLCF Scientific Computing Group Oak Ridge National Laboratory Charles Bacon ALCF Catalyst Team Argonne National Laboratory and Julia C. White, INCITE Manager
Overview Allocation programs [3-5] INCITE mission and stats [6-10] Titan and Mira [11-16] Tips for applicants [17-35] Common oversights Requesting a startup account Benchmarking data Q&A [36, open discussion] Conclusions [37-44] Submittal, review, and awards decisions Contact links 2
INCITE: Innovative and Novel Computational Impact on Theory and Experiment INCITE promotes transformational advances in science and technology through large allocations of computer time, supporting resources, and data storage at the Argonne and Oak Ridge Leadership Computing Facilities (LCFs) for computationally intensive, large-scale research projects. 3
Provide access to LCFs: More than 2.7 billion core hours awarded in 2012 10% Director s Discretionary 270 million core-hours 30% ALCC Up to 810 million core-hours 60% INCITE 1.7 billion core-hours 4
Allocation Programs at the LCFs Mission 60% 30% 10% INCITE High-risk, high-payoff science that requires LCF-scale resources* ALCC High-risk, high-payoff science aligned with DOE mission Director s Discretionary Strategic LCF goals Call 1x/year (Closes June) 1x/year (Closes February) Rolling Duration 1-3 years, yearly renewal 1 year 3m,6m,1 year Typical Size 30-40 projects 10M - 100M core-hours/yr. 5-10 projects 1M 75M core-hours/yr. 100s of projects 10K 1M core-hours Review Process Scientific Peer-Review Computational Readiness Scientific Peer-Review Computational Readiness Strategic impact and feasibility Managed By INCITE management committee (ALCF & OLCF) DOE Office of Science LCF management Availability Open to all scientific researchers and organizations Capability >20% of cores 5
Is INCITE right for me? INCITE seeks computationally intensive, large-scale research projects with the potential to significantly advance key areas in science and engineering. 1 Impact criterion High-impact science and engineering 2 Computational leadership criterion Computationally intensive runs that cannot be done anywhere else 3 Eligibility criterion INCITE grants allocations regardless of funding source (ex. DOE, NSF, private, etc) Non-US-based researchers are welcome to apply Call for 2013 proposals opens in Spring 2012 The INCITE program seeks proposals for high-impact science and technology research challenges that require the power of the leadership-class systems 6
Twofold review process Peer review: INCITE panels Computational readiness review: LCF centers Award Decisions New proposal assessment Scientific and/or technical merit Appropriateness of proposal method, milestones given Team qualifications Reasonableness of requested resources Technical readiness Appropriateness for requested resources Renewal assessment Change in scope Met milestones On track to meet future milestones Scientific and/or technical merit Met technical/ computational milestones On track to meet future milestones INCITE Awards Committee comprised of LCF directors, INCITE program manager, LCF directors of science, sr. management 7
2012 INCITE allocations 28 new projects, 32 renewals Acceptance rate: 33% for new proposals, 91% for renewals Distribution of INCITE time by science domain Awarded 1.67 billion hours for CY 2012 8
2012 INCITE award demographics PI Affiliations 2012 INCITE Awards PI Demographics 2012 INCITE Awards 9
Diversity of INCITE science Simulating a flow of healthy (red) and diseased (blue) blood cells with a Dissipative Particle Dynamics method. - George Karniadakis, Brown University Provide new insights into the dynamics of turbulent combustion processes in internal-combustion engines. -Jacqueline Chen and Joseph Oefelein, Sandia National Laboratories Demonstration of high-fidelity capture of airfoil boundary layer, an example of how this modeling capability can transform product development. - Umesh Paliath, GE Global Research Calculating an improved probabilistic seismic hazard forecast for California. - Thomas Jordan, University of Southern California Modeling charge carriers in metals and semiconductors to understand the nature of these ubiquitous electronic devices. - Richard Needs, University of Cambridge, UK High-fidelity simulation of complex suspension flow for practical rheometry. - William George, National Institute of Standards and Technology Other INCITE research topics Glimpse into dark matter Supernovae ignition Protein structure Creation of biofuels Replicating enzyme functions Global climate Regional earthquakes Carbon sequestration Turbulent flow Propulsor systems Membrane channels Protein folding Chemical catalyst design Combustion Algorithm development Nano-devices Batteries Solar cells Reactor design Nuclear structure 10
INCITE resources: Mira at ALCF Mira - Blue Gene/Q System 48K nodes / 768K cores 786 TB of memory Peak flop rate: 10 PF Storage ~35 PB capacity, 240GB/s bandwidth (GPFS) Disk storage upgrade planned in 2015 Double capacity and bandwidth New Visualization Systems Initial system in 2012 Advanced visualization system in 2014 State-of-the-art server cluster with latest GPU accelerators Provisioned with the best available parallel analysis and visualization software 11
Overview of Blue Gene/Q Design Parameters BG/P BG/Q Improvement Cores / Node 4 16 4x Clock Speed (GHz) 0.85 1.6 1.9x Flop / Clock / Core 4 8 2x Nodes / Rack 1,024 1,024 -- RAM / core (GB) 0.5 1 2x Flops / Node (GF) 13.6 204.8 15x Mem. BW/Node (GB/sec) 13.6 42.6 3x Latency (MPI zero-length, nearest-neighbor node) 2.6 ms 2.2 ms ~15% less Bisection BW 1.39 TB/s 13.1 TB/s 9.42x Network Interconnect 3D torus 5D torus Smaller diameter Concurrency / Rack 4,096 65,536 16x GFlops/Watt 0.77 2.10 3x 12
ALCF workshop: Leap to Petascale May 22-25, 2012 at Argonne National Laboratory Scaling workshop tailored for current INCITE and discretionary projects that have already run on multiple racks of Intrepid. Participants will be given special access to our Blue Gene/Q test and development rack throughout the event. Hands-on tuning of applications with one-on-one assistance from our performance engineers and computational scientists. http://www.alcf.anl.gov/workshops/l2p-workshop 13
Titan at OLCF Upgrade of existing Jaguar Cray XT5 Cray Linux Environment operating system Gemini interconnect 3-D Torus Globally addressable memory Advanced synchronization features AMD Opteron 6200 processor (Interlagos) New accelerated node design using NVIDIA multi-core accelerators 2011: 960 NVIDIA M2090 Fermi GPUs 2012: 10-20 PF NVIDIA Kepler GPUs 10-20 PFlops peak performance Performance based on available funds 600 TB DDR3 memory (2x that of Jaguar) Titan Specs Compute Nodes 18,688 Login & I/O Nodes 512 Memory per node 32 GB + 6 GB NVIDIA Fermi (2011) 665 GFlops # of Fermi chips 960 NVIDIA Kepler (2012) >1 TFlops Opteron 2.2 GHz Opteron performance 141 GFlops Total Opteron Flops 2.6 PFlops Disk Bandwidth ~ 1 TB/s 14
Cray XK6 compute node XK6 Compute Node Characteristics AMD Opteron 6200 Interlagos 16 core processor @ 2.2GHz Tesla M2090 Fermi @ 665 GF with 6GB GDDR5 memory Host Memory 32GB 1600 MHz DDR3 Gemini High Speed Interconnect Upgradeable to NVIDIA s next generation Kepler processor in 2012 Four compute nodes per XK6 blade. 24 blades per rack 15
INCITE resources to request ALCF Intrepid will still be in production mode in 2013 732M core-hours available Mira is expected to be in production by early CY 2013 Committed to 768M core-hours, for October start Up to 2-3B core-hours, depending on start date OLCF The Cray XK6 will be in production in CY 2012 Projects may apply for production time on Titan in CY 2013 16
Key questions to ask yourself Is both the scale of the runs and the time demands of the problem of LCF scale? Yes, I can t get the amount of time I need anywhere else. Yes, I my simulations are too large to run on other systems. Do you need specific LCF hardware? Yes, the memory and I/O available here are necessary for my work. TIPS Do answer these questions in the proposal. This is especially helpful for the computational readiness reviewers. 17
Key questions to ask yourself (cont.) Do you have the people ready to do this work? No, I m waiting to hire a postdoc. Yes, I have commitments from the major participants. Do you have a post-processing strategy? Do you have a workflow? Do you use ensemble runs and need LCF resources? My ensembles can run under the direction of a large MPI job, with I/O scaling on a parallel file system. -> possible yes My ensemble expects to run millions of serial jobs on nodes with local disk available. -> probably no Some of these characteristics are negotiable, so make sure to discuss atypical requirements with the centers 18
More about our ensemble policy or, Can I meet the computationally intensive criterion by loosely coupling my jobs? Possibly yes, If you require large numbers of discrete or loosely coupled simulations where time-to-solution is an untenable pacing issue, and If a software workflow solution (e.g., pre- and post-processing scripts that automate run management and analysis) is provided to facilitate this volume of work. Probably no, If by decoupling the simulations the work could be effectively carried out on a smaller resource within a reasonable time-to-solution. TIPS Do examine the Frequently Asked Questions for these and other topics at http://hpc.science.doe.gov/allocations/incite/faq.do 19
Some limitations on what can be done Laws regulate what can be done on these systems LCF systems have cyber security plans that bound the types of data that can be used and stored on them Some kinds of information we cannot have Personally Identifiable Information (PII) Classified Information or National Security Information Unclassified Controlled Nuclear Information (UCNI) Naval Nuclear Propulsion Information (NNPI) Information about development of nuclear, biological or chemical weapons, or weapons of mass destruction Inquire if you are unsure or have questions 20
Proposal form: Outline 1 Principal investigator and co-principal investigators 2 Project title 3 Research category 4 Project summary (50 words) 5 Computational resources requested 6 Funding sources 7 Other high-performance computing support for this project 8 Project narrative, other materials (A) executive summary (1 page) (B) project narrative, including computational readiness, job characterization (15 pages) (C) personnel justification (1 page) 9 Application packages 10 Proprietary and sensitive information 11 Export control 12 Monitor information 21
Project narrative: Impact of the work Audience Computational-science-savvy senior scientists/engineers drawn around the world from national labs, universities, and industry TIPS Don t assume that your audience is familiar with your work through other review programs (ex. funding agencies). INCITE is very broad in scope and you may be competing against a diverse set of proposals. 22
Project narrative: Elements Story elements to be sure to include What the problem is, and its significance Impact of a successful computational campaign the big picture Reasons this work needs to be done now, on the resources requested TIPS Do give a compelling picture of the impact of this work, both in the context of your field and, where appropriate, beyond. Do explain why this work cannot be done elsewhere. Reviewers scrutinize whether another allocation program may be a better fit. 23
Project narrative: Elements Story elements to be sure to include (cont.) Key objectives, key simulations/computations, project milestones Approach to solving the problem, its challenging aspects, preliminary results TIPS Do clearly articulate your project s milestones for each year. Reviewers have downgraded proposals that don t show that the PI has a well thought out plan for using the allocation. Do bear in mind that the average INCITE award of time for a single project is equivalent to several million dollars. Spend your time on the proposal accordingly. 24
Project narrative: Proposal teams Experience and credibility List the scientific and technical members and their experience as related to the proposed scientific or technical goals Successful proposal teams demonstrate a clear understanding of petascale computing and can optimally use these resources to accomplish the stated scientific/technical goals TIPS Do include in Personnel Justification a brief description of the role of each team member. Although not a requirement, proposals with application developers or clear connections to development teams are favorably viewed by readiness reviewers. 25
Project Narrative: Next-generation systems LCF resources in CY2013 and CY2014 will have new and highly parallel architectures Local memory per flop will dramatically decrease Relative cost of data movement to/from accelerators will become a potential bottleneck TIPS Do provide a development plan articulating a strategy for maximizing node-level parallelism. 26
Project Narrative: Next-generation systems Strategies to consider Hybridization utilizing OpenMP or Pthreads to expose thread-level or SMP-like parallelism Exposing vector or streaming parallelism through, e.g. CUDA, OpenCL, compiler directives, etc. Algorithmic improvements or design to maximize data locality and memory hierarchy usage 27
Are you ready to apply now? Port your code before submitting the proposal Check to see if someone else has already ported it Request a startup account if needed (see next slide) Provide compelling benchmark data Prove application scalability in your proposal Run example cases at full scale If you cannot show proof of runs at full scale, then provide a very tight story about how you will succeed TIPS Do make the benchmark examples as similar to your production runs as possible, or, make it clear why another benchmark example is valid for your proposed work. 28
Request a start up account Director s Discretionary Proposals considered year-round Award up to millions of hours Allocated by LCF center directors Director s Discretionary (DD) requests can be submitted anytime DD may be used for porting, tuning, scaling in preparation for an INCITE submittal Submit applications at least 2 months before INCITE Call for Proposals closes Argonne DD Program: www.alcf.anl.gov/resource-guides/getting-started-directors-discretionary Oak Ridge DD Program: www.olcf.ornl.gov/support/getting-started/olcf-director-discretion-project-application/ 29
Project narrative: Computational campaign Describe the kind of runs you plan with your allocation L exploratory runs using M nodes for N hours X big runs using Y nodes for Z hours P analysis runs using Q nodes for R hours Big runs often have big output Show you can deal with it and understand the bottlenecks Understand the size of results, where you will analyze them, and how you will get the data there TIPS Do clearly emphasize the relationship between the proposed runs and the major milestones of the proposal. This helps the Awards Committee maximize your milestones, if they can t grant the full award requested. 30
Project narrative: Computational approach Programming languages, libraries and tools used Check that what you need is available on the system Description of underlying formulation Don t assume reviewers know all the codes Do show that the code you plan to employ is the correct tool for your research plan Do explain the differences if you plan to use a private version of a well-known code 31
A sample of codes with local expertise available at Argonne and Oak Ridge Application Field ALCF OLCF FLASH Astrophysics MILC,CPS LQCD Nek5000 Nuclear energy Rosetta Protein structure DCA++ Materials science ANGFMC Nuclear structure NUCCOR Nuclear structure Qbox Chemistry LAMMPS Molecular dynamics NWChem Chemistry GAMESS Chemistry MADNESS Chemistry CHARMM Molecular dynamics NAMD Molecular dynamics Application Field ALCF OLCF AVBP Combustion GTC Fusion Allstar Life science CPMD, CP2K Molecular dynamics CCSM3 Climate HOMME Climate WRF Climate Amber Molecular dynamics enzo Astrophysics Falkon Computer science/htc s3d Combustion DENOVO Nuclear energy LSMS Materials science GPAW Materials science 32
Time to solution (m) Parallel performance: Direct evidence Time to solution (s) WEAK SCALING DATA Increase problem size as resources are increased STRONG SCALING DATA Increase resources (nodes) while doing the same computation Pick the approach(es) relevant to your work and show results Weak Scaling Example Strong Scaling Example 11.0 10.8 10.6 10.4 10.2 Actual Ideal 6,000 5,000 4,000 3,000 2,000 1,000 Actual Ideal 10.0 2400 4800 9600 19200 38400 76800 0 2400 4800 9600 19200 38400 76800 Number of processors Number of processors 33
Parallel performance: Direct evidence Performance data should support the required scale Use similar problems to what you will be running Show that you can get to the range of processors required Best to run on the same machine, but similar size runs on other machines can be useful Describe how you will address any scaling deficiencies Be aware of scaling data from other groups and literature TIPS Do provide performance data in the requested format. Do provide performance of the scaling baseline, not just scaling efficiency 34
I/O Requirements Restart I/O - Application initiated program restart data I/O technique used, e.g., MPI I/O, HDF5, raw Number of processors doing I/O, number of files Sizes of files and overall dump Periodicity of the checkpoint process Analysis I/O - Application written files for later analysis I/O technique used, e.g., pnetcdf, phdf5 Number of processors doing I/O, number of files Sizes of files and overall dump Archival I/O - Data archived for later use/reference Number and sizes of files Retention length If archived remotely, the transport tool used, e.g., GridFTP 35
Q&A Open discussion on what authors should include in their proposal 36
Final checks You may save your proposal at any time without having the entire form complete Your Co-PIs may also log in and edit your proposal Required fields must be completed for the form to be successfully submitted An incomplete form may be saved for later revisions After submitting your proposal, you will not be able to edit it Submit 37
INCITE awards committee decisions The INCITE Awards Committee is comprised of the LCF center directors, INCITE program manager, LCF directors of science and senior management. The committee identifies the top-ranked proposals by a) peer-review panel ratings, rankings, and reports; and b) additional considerations, such as the desire to promote use of HPC resources by underrepresented communities. Computational readiness review is used to identify whether the topranked proposals are ready for the requested system. 38
INCITE awards committee decisions (cont.) A balance is struck to ensure each awarded project has sufficient allocation to enable all or part of the proposed scientific or technical achievements a robust support model for each INCITE project When the centers are oversubscribed, each potential project is assessed to determine the amount of time that may be awarded to allow the researchers to accomplish significant scientific goals. Requests for appeals can be submitted to the INCITE manager or LCF center directors. If an error has occurred in the decision-making process (e.g. procedural, clerical), consideration is given by the INCITE management and an award may be granted. 39
2013 INCITE award announcements Notice comes from INCITE Manager, Julia White, in November 2012 Welcome and startup information comes from centers Agreements to sign: Start this process as soon as possible! Getting started materials: Work closely with the center Centers provide expert-to-expert assistance to help you get the most from your allocation Scientific Liaisons and Catalysts (OLCF / ALCF) 40
PI obligations Let us know your achievements and challenges Provide quarterly status updates (on supplied template) Milestone reports Publications, awards, journal covers, presentations, etc., related to the work Provide highlights on significant science/engineering accomplishments as they occur Submit annual renewal request Complete annual surveys Encourage your team to be good citizens on the computers Use the resources for the proposed work 41
It is a small world INCITE program and center resources will continue to grow as researchers around the world require larger systems for highimpact results Let the science agency that funds your work know how significant the INCITE program and the Leadership Computing Facilities will be to your work Be sure to include the appropriate acknowledgements Contact us if you have questions: we want to hear from you 42
Relevant links INCITE Program www.doeleadershipcomputing.org/ Argonne Discretionary Program www.alcf.anl.gov/resource-guides/getting-started-directors-discretionary Oak Ridge Discretionary Program www.olcf.ornl.gov/support/getting-started/olcf-director-discretion-project-application Contact the center if you d like to request Discretionary time for benchmarking 43
Contacts For details about the INCITE program: www.doeleadershipcomputing.org INCITE@DOEleadershipcomputing.org For details about the centers: www.olcf.ornl.gov help@nccs.gov, 865-241-6536 www.alcf.anl.gov support@alcf.anl.gov, 866-508-9181 44