Bridging Campuses to National Cyberinfrastructure: Experience and Perspective from the NSF Dr. Jennifer M. Schopf National Science Foundation Office of CyberInfrastructure May 3, 2009
Outline OCI mission and the CF21 Vision How NSF supports campus work Sustain, Advance, Experiment Bridging from campus to national and back 2
Framing the Question Si Science has been Revolutionized lti i dby CI Modern science Data- and computeintensive Integrative Multiscale Collabs Add l complexity Individuals, groups, teams, communities Must Transition NSF CI approach to address these issues 3 3
NSF Vision for Cyberinfrastructure National-level, integrated system of hardware, software, data resources & services... to enable new paradigms of science Virtual Organizations for Distributed Communities High Performance Computing Data & Visualization/ Interaction Learning & Work Force Needs & Opportunities http://www.nsf.gov/pubs/2007/nsf0728/index.jsp 4 4
Office of Cyberinfrastructure (OCI) Support collaborative computational and data science Research and development of a comprehensive CI Application of CI to solve complex problems in science, engineering, i behavioral science, economics and education Provide stewardship for computational science at NSF, in strong collaborations with other offices, directorates, and other agencies Support t the preparation and training i of current and future generations of researchers and educators to use Cyberinfrastructure to further research and education goals 5 5
Cyberinfrastructure Framework for the 21 st century y( (CF21) High-end computation, data, visualization for transformative science Facilities/centers as hubs of innovation MREFCs and collaborations including large-scale NSF collaborative facilities, international partners Software, tools, science applications, and VOs critical to science, integrally connected to instruments Campuses fundamentally linked end-to-end; grids, clouds, loosely coupled campus services, policy to support People Comprehensive approach workforce development for 21st century science and engineering 6 6
What is Needed? An ecosystem, not components NSF-wide CI Framework for 21 st Century Science & Engineering People, Sustainability, Innovation, Integration 7 7
CyberInfrastructure Ecosystem Expertise Research and Scholarship Education Learning and Workforce Development Interoperability and ops Cyberscience Computational Resources Supercomputers Clouds, Grids, Clusters Visualization Compute services Data Centers Software Applications, middleware Software dev t & support Cybersecurity: access, authorization, authen. Organizations Universities, schools Government labs, agencies Research and Med Centers Libraries, Museums Virtual Organizations Communities Discovery Collaboration Education Networking Campus, national, international networks Research and exp networks End-to-end throughput Cybersecurity y Scientific Instruments Large Facilities, MREFCs,telescopes Colliders, shake Tables Sensor Arrays - Ocean, env t, weather, buildings, climate. etc Data Databases, Data reps, Collections and Libs Data Access; stor., nav mgmt, mining tools, curation Sustain, Advance, Experiment 8
What Does Sustainability Mean? Ability to maintain a certain process or state In a biological context Resources must be used at a rate at which they can be replenished In an CI context Creating CI that can be used in broad contexts (reuse) Adopting approaches to funding that encourage long-term support (beyond normal NSF grants) 9
We should fund and view CyberInfrastructure as Infrastructure National Level Fund same as telescopes, colliders, shake tables Line items in the directorate budgets Constant or growing over time, reliably Factor in maintenance and replacement NSF supports the science that a campus can t fund at a sustainable level Campus level Campus should fund CI the same way it does other infrastructure Libraries, phone system (clean rooms?) Constant or growing over time, reliably 10
Note: The answer is not more money from NSF More money, even if we had it, which we don t, won t address the fundamental problems We need to spend the money we have wiser We need to understand cost models and return on investments t What are the best practices we re missing? i How can we leverage existing support? Where could a small investment of funds have the most significant impact? 11
Campus Bridging Craig Stewart ACCI Task Forces Data (Viz) Shenda Baker Tony Hey Software David idkeyes Education Workforce Alex Ramerez Timelines: 12-18 months Advising NSF Workshop(s) Recommendations Grids) Input to NSF informs CF21 programs Grand 2011-2 CI Vision Plan Computing (Clouds Thomas Zacharia Challenge VOs Tinsley Oden 12
Campus Bridging Task Force Goal of Virtual Proximity as though you are one with your resources (including people) Collapse barrier of distance Remove geographic location as an issue All resources virtually present, accessible, secure Leverages, informs, and depends d upon the whole suite of CI elements HPC, Vis, Data, Software, Expertise, VOs, etc Provides end-to-end connectivity Deployment of leading edge networking infrastructure, cybersecurity to support CF21 13
Driving Forces Need to support the efficient pursuit of S&E Multi-domain, multi-disciplinary, multi-location Leading edge CI network capabilities Seamless integration Need to connect Researcher to Resource Access to major scientific resources and instruments CI resource availability at speed and in real-time (HPC, MREFC, Data Center, Vis center, Clouds, etc) Campus environment including intra-campus State, regional, national and international network and infrastructure transparency 14
The Shift Towards Data Implications All science is becoming data-dominated Experiment, computation, theory Totally new methodologies Algorithms, mathematics All disciplines from science and engineering to arts and humanities End-to-end networking becomes critical part of CI ecosystem Campuses, please note! How do we train data-intensive scientists? Data policy becomes critical! 15
Preliminary Task Force (TF) Results Computing TF Workshop Interim Report Rec: Address sustainability, people, innovation Software TF Interim Report Rec: Address sustainability, create long term, multi- directorate, multi-level l l software program GCC/VO TF Interim Report Rec: Address sustainability, OCI to nurture computational science across NSF units Software Sustainability WS (Campus Bridging) Rec: Open source, use sw eng practices, reproducibility 16 16
Innovation vs Sustainability Tension between: Bleeding edge & tried and true Novel and new & dependable Might have a new way & method that always works We need a spectrum of approaches Allow broad scale innovation Continue to advance approaches Yet sustain scientific disciplines 17 17
Over-arching Approach For Upcoming Programs Sustain Large-scale Institute -style projects to promote long term approaches Long term (5+ years), many PIs, and institutions Highly multi-disciplinary, perhaps multi-agency Advance Medium-scale collaborative teams to harden and expand successful experiments Collaborative teams, multi-year (3-5) Experiment Smaller scale, trials of new approaches 18 18
Sustain 19
Sustain 20
Sustain 21 21
CF21 Software Infrastructure for Sustained Innovation (SI2) Significant multiscale, long-term software program Perhaps $200-300M over a decade $10M identified in FY10 ($4M OCI/$6M Dirs) $14M annual in OCI in future years Catalyze significant ifi funds from Dirs Sustain: Connected institutes, teams, investigators Integrated into CF21 framework w/dirs 3-6 centers, 5+5 years, for critical mass, sustainability Advance: Numerous teams of scientists and computational and computer scientists with longer term grants Experiment: Many individuals w/short term 22 grants, funded by OCI and directorates 22
Software, continued Ongoing discussions to build this program across NSF Some of the institutes will be discipline specific Some may be algorithm/tool themed (e.g., data, provenance, viz) i) All should be fundamental to other programs (e.g., SEES) Education, science applications, industrial partners linked deeply MREFC s, other large facilities need to participate pate iplant, NEON, LSST, etc 23 23
Scientific Software Innovation Institutes Call for Exploratory Workshop Proposals Scale and complexity beyond community experience Many unknowns: models, modes, scales,. domain, community specific aspects crosscutting aspects and many links Must be grown bottom up in a coordinated way smaller group evolving into community wide teams and institutes Must leverage existing investments, expertise Collaborations across communities, disciplines and directorates critical Exploratory activities in during the summer Call for Exploratory Workshop Proposals http://www.nsf.gov/pubs/2010/nsf10050/nsf10050.jsp?org=nsf 24
Goals of S2I2 Workshops Inform NSF in its writing of the solicitation Inform the community as it responds to the solicitation in FY11 Provide a forum of discussions about the SI2 vision, and S2I2 models and structures within and across communities. i 25
Software Infrastructure for Sustained Innovation (SI 2 ): Metrics of Success (Beyond Lines of Code) Buy-in from the broader community Demonstrated leverage and reuse Emergence of successful models, processes, architectures, metrics for S&E software empirically validated Emergence of models and mechanisms for community sustainability of software institutes Accepted research agenda by academic community 26
Open Source Requirement for all current OCI programs And many others across NSF Strongly encourages reuse Some people think simply open source is enough it s not Necessary but not sufficient for sustainable software 27 27
Open Source software is free Free as in speech free as in beer, or 28
Open Source Software is Like a Free Puppy v Seems like a great bargain Easy to access Can catch you eye at a weak moment but sometimes more than you expected 29
Long term costs Needs love and attention May lose charm after growing up Occasional clean-ups required Many left abandoned by their owners May not be quite what you think 30
Data Programs DataNet: OCI Flagship Data Program Focus on data-level interoperability and data preservation Sustain: 5 Centers, $20M, 5years (+5) Advance: eg. SDCI awards ~3-4 year, $1-2M, support of fdata tools for broad set of applications and disciplines Experiment: eg. InterOp awards Smaller scale, innovative use of data for new communities 31
2008 DataNet Awards DataNet Observation Network for Earth (PI: Michener) Facilitates research on climate change and biodiversity, integrating earth observing networks Emphasis on user community engagement, promote data deposition and re-use Science question: What are the relationships among population density, atmospheric nitrogen, CO2, energy consumption and global temps? Data Conservancy (PI: Choudhury) Integrates observational data to enable scientists to identify causal and critical relationships in physical, biological, ecological, and social systems User centered design paradigm, ethnographic studies Science question: How do land and energy use in mega-cities impact the carbon cycle and climate change? 32
Planned CF21 HPC Program Sustain: Petascale-to-Exascale 1-2 Large-scale sustainable facilities Likely NSF-DOE cooperation 10 years (5+5) Advance UIUC Petascale Facility: $60M building! 4-5 hubs of Excellence/Innovation, people, expertise Mixture of data and compute-intensive t i centers, supporting broader array of services Experiment Explore new architectures, couple with application/software dev 33 33
HPC Will Also Need Discipline specific connections MRI, Divisional, Directorate programs can be aligned to connect in to this NSF-wide structure Recommended common software, identity management, policy Data, software sharing How does extreme Digital (XD), TeraGrid Phase 3 fit in? Competition underway now Foundation to build broader CF21 services in future at the national level 34
Outside of SW, Data, and HPC Postdoc program: CITracs Emphasis on helping computational scientists learn about CI or vice versa http://www.nsf.gov/pubs/2010/nsf10553/nsf10553.htm htm CI-TEAM: Training, Education, Advancement, and Mentoring for Our 21st Century WF Prepare current and future generations of scientists, engineers, and educators Design, develop, adopt and deploy cyber-based tools and environments for research and learning, both formal and informal http://www.nsf.gov/pubs/2010/nsf10532/nsf10532.pdf 35 35
Sustain 36 36
Sustain Sustain Sustain 37 37
Track 2 SDCI SDCI Cross Directorate SW, Data and HPC interacting DataNet Sustain PetaAps Sustain PetaAps MRI PetaAps DataNet Sustain MRI SDCI SDCI 38 38
CF21 Strategy Driven by science and engineering Intense coupling of data, sensors, satellites, computing, visualization, grids, software, VOs; entire CI ecosystem Better campus integration Major Facilities CI planning Task Forces and research community provides guidance and input All NSF Directorates involved Sustain, Advance, Experiment 39 39 39
40
ARRA Catalyzed OCI Transition Virtual Organizations Networking 3.97% Workforce Development 4.06% Data 3.45% Budget 1.5% Initiatives 15% 1.5% Budget Software Other 1.79% 6.31% Virtual HPC 77.21% Organizations 5.01% Initiatives 7.69% Includes Viz Software 51.68% HPC 21.25% Workforce Development 14.38% FY 09 Budget (Before ARRA) Includes PetaApps Recovery Act Funds 41 Includes GRF, CAREER 41
ARRA Catalyzed OCI Transition Virtual Organizations 1.5% Budget Initiatives 15% 1.5% Software 6.31% Networking 3.97% Other 1.79% Virtual Organizations 2.45% Budget Initiatives 3.50% Other 2.37% Workforce Development 4.06% Data 3.45% Software 19.07% HPC 77.21% Networking 2.84% Workforce Development 4.55% HPC 61.19% FY 09 Budget (Before ARRA) Data 4.03% FY 09 Budget (After ARRA) 42
OCI BUDGET BREAKDOWN 43 43
Underestimations (and education) Support costs are often underestimated Grad student support is cheap (except when it isn t) Space Cooling- Power triumvirate i t People forget about data, networking, software (software licensing) i Duplication of services vs need for special architectures 44
Branscomb Pyramid National to Campus Slide from Gary Crane, http://sura.org/programs/docs/ci_white_paper_final.pdf 45
Branscomb Pyramid National to Campus OCI Focus MRI & others Slide from Gary Crane, http://sura.org/programs/docs/ci_white_paper_final.pdf 46
Beyond Branscomb, Sept 2006 47
Broaden Awareness through CI Days Work with campuses to develop leadership in promoting CI to accelerate scientific discovery Catalyze campus-wide and regional discussions and planning Collaboration of Open Science Grid, Internet 2, National Lamda Rail, EDUCAUSE, Minority i Serving Institution Cyberinfrastructure Empowerment Coalition, TeraGrid, and local & regional organizations Identify Campus Champions https://wiki.internet2.edu/confluence/display/cidays 48
TG Campus Champions Program Source of local, regional and national high performance computing and cyberinfrastructure information at home campus Source of information about TeraGrid resources and services that will benefit their campus Source of startup accounts to quickly get researchers and educators using their allocation of time on the TeraGrid resources Direct access to TeraGrid staff https://www.teragrid.org/web/eot/campus_champions 49
50
51
An Idea from EPSCoR State-wide CI plans State-wide CI proposals and funding 52
How to measure return on investment? Must measure to improve Must measure to justify additional funds at all level Would love to hear suggestions! 53
CyberInfrastructure Ecosystem Expertise Research and Scholarship Education Learning and Workforce Development Interoperability and ops Cyberscience Computational Resources Supercomputers Clouds, Grids, Clusters Visualization Compute services Data Centers Software Applications, middleware Software dev t & support Cybersecurity: access, authorization, authen. Organizations Universities, schools Government labs, agencies Research and Med Centers Libraries, Museums Virtual Organizations Communities Discovery Collaboration Education Networking Campus, national, international networks Research and exp networks End-to-end throughput Cybersecurity y Scientific Instruments Large Facilities, MREFCs,telescopes Colliders, shake Tables Sensor Arrays - Ocean, env t, weather, buildings, climate. etc Data Databases, Data reps, Collections and Libs Data Access; stor., nav mgmt, mining tools, curation Sustain, Advance, Experiment 54
Conclusions Campus HPC is more than just machines Posit: Better central computing attracts more grants (and researchers) Treat CI is infrastructure NSF continues to fund national-scale CI Campus-scale CI should be part of campus strategic plan Ecosystem approach Sustain, Advance, Experiment Bridging is an urgent need 55
More Information Jennifer M. Schopf jschopf@nsf.gov jms@nsf.gov Dear Colleague letter for CF21 http://www.nsf.gov/pubs/2010/nsf10015/nsf10015.jsp / / / f / f Software infrastructure for sustained innovation http://www.nsf.gov/pubs/2010/nsf10551/nsf10551.pdf / /2010/ f10551/ f10551 S2I2 workshop DCL http://www.nsf.gov/pubs/2010/nsf10050/nsf10050.jsp 56