The Five R s for Developing Trusted Software Frameworks to increase confidence in, and maximise reuse of, Open Source Software

Similar documents
Establishment of a Multiplexed Thredds Installation and a Ramadda Collaboration Environment for Community Access to Climate Change Data

DATA STEWARDSHIP A FUNDAMENTAL PART OF THE SCIENTIFIC METHOD. Clinton Foster, Jonathon Ross, Lesley Wyborn

RV Investigator Voyage Deliverables

e-infrastructures for open science

Digitisation Plan

Why? A Documentation Consortium Ted Habermann, NOAA. Documentation: It s not just discovery... in global average

Cyberinfrastructure Frameworks for Community Driven Science

The Marine Virtual Laboratory (MARVL) and the MARVL Information System (MARVLIS)

Taking a broader view

Challenges in Transition

SCAI SuperComputing Application & Innovation. Sanzio Bassini October 2017

The Data Conservancy. CNI Spring Forum April 7, 2009

High Performance Computing Facility for North East India through Information and Communication Technology

A Model-Driven Approach to Systems-of-Systems Engineering

INSTITUTE FOR TELECOMMUNICATIONS RESEARCH (ITR)

Building a Cell Ecosystem. David A. Bader

High-performance computing for soil moisture estimation

Liquid Benchmarks. Sherif Sakr 1 and Fabio Casati September and

Building an Infrastructure for Data Science Data and the Librarians Role. IAMSLIC, Anchorage August, 2012 Linda Pikula, NOAA and IODE GEMIM

Open Science in the Digital Single Market

ANNUAL REPORT U S T R A L I A WORLD-CLASS HIGH-END COMPUTING SERVICES FOR AUSTRALIAN RESEARCH AND INNOVATION

e-infrastructures in FP7: Call 9 (WP 2011)

December 10, Why HPC? Daniel Lucio.

Scientific Data e-infrastructures in the European Capacities Programme

Computational Reproducibility in Medical Research:

Roadmap of Cooperative Activities

National e-infrastructure for Science. Jacko Koster UNINETT Sigma


The Spanish Supercomputing Network (RES)

Fugro commence new Airborne Lidar Bathymetry trials

Project Title: Submitter: Team Problem Statement

Open Data, Open Science, Open Access

Earth Cube Technical Solution Paper the Open Science Grid Example Miron Livny 1, Brooklin Gore 1 and Terry Millar 2

ITU Telecom World 2018 SMART ABC

ICSU World Data System Strategic Plan Trusted Data Services for Global Science

Using Web2.0 to share and advance knowledge in the SET community

Big Data Visualization for Planetary Science

Smarter oil and gas exploration with IBM

Rheology Solutions Pty Ltd. Focused on providing our customers with materials characterisation solutions through knowledge, experience and support.

Our digital future. SEPA online. Facilitating effective engagement. Enabling business excellence. Sharing environmental information

XSEDE at a Glance Aaron Gardner Campus Champion - University of Florida

Control Design Made Easy By Ryan Gordon

NASA s Strategy for Enabling the Discovery, Access, and Use of Earth Science Data

Job Description. Commitment: Must be available to work full-time hours, M-F for weeks beginning Summer of 2018.

Software-Intensive Systems Producibility

Enabling FAIR Data in the Earth, Space, and Environmental Sciences

OpenAIRE: a pillar for Open Science in the EU

UPGRADE YOUR MPT NETWORK THE SMART WAY. harris.com #harriscorp

Cisco IPICS Dispatch Console

Open Science and Research Initiative Infrastructures and networking for Open Science Seminar on at the University of Helsinki

HDR UK & Digital Innovation Hubs Introduction. 22 nd November 2018

Research Data Management at LRZ and beyond

Raising the profile of TA in Australia s national science agency

AGENTLESS ARCHITECTURE

COMMISSION RECOMMENDATION. of on access to and preservation of scientific information. {SWD(2012) 221 final} {SWD(2012) 222 final}

The PaNOSC Project. R. Dimper on behalf of the Consortium 30 January Photon and Neutron Open Science Cloud

N E T W O R K UPGRADE SOLUTIONS UPGRADE YOUR MPT NETWORK YOUR WAY

DEVELOPING A CLOUD-BASED ONLINE GEOSPATIAL INFORMATION SHARING AND GEOPROCESSING PLATFORM TO FACILITATE COLLABORATIVE EDUCATION AND RESEARCH

Belmont Forum E-INFRASTRUCTURES & DATA MANAGEMENT. Collaborative Research Action

NASA Earth Exchange (NEX)

Behind the scenes of a FOSS-powered HPC cluster at UCLouvain

FROM BRAIN RESEARCH TO FUTURE TECHNOLOGIES. Dirk Pleiter Post-H2020 Vision for HPC Workshop, Frankfurt

Speech during the 12 th Esri Eastern Africa Users Conference at the Hyatt Regency, Dar es Salaam, Tanzania

Evolution of Software-Only-Simulation at NASA IV&V

Citizens' Observatories & Crowdsourcing Novel ways to engage citizens in science and environmental policy-making

Helen Glaves (NERC-BGS), Dick Schaap(MARIS), Roger Proctor (IMOS) & Stephen Miller (SIO)

What is volunteer computing? opportunity for any computer owner to contribute to research in energy, climate change, medicine, astronomy, mathematics

Development of a parallel, tree-based neighbour-search algorithm

NASA Perspective on Machine Learning

NRC Workshop on NASA s Modeling, Simulation, and Information Systems and Processing Technology

University of Queensland. Research Computing Centre. Strategic Plan. David Abramson

Impact from Industrial use of HPC HPC User Forum #59 Munich, Germany October 2015

Marine Earth Observation & Applications at University College Cork

Ansible in Depth WHITEPAPER. ansible.com

STRATEGIC FRAMEWORK Updated August 2017

Behind the scenes of Big Science. Amber Boehnlein Department of Energy And Fermi National Accelerator Laboratory

Response to the Western Australian Government Sustainable Health Review

NUIT Support of Researchers

ENVRIPLUS GENERAL INTRODUCTION. Ari Asmi ENVRIplus director. H2020 Project Project Number:

A New Path for Science?

457 APR The Fourth Medium to Long-term Plan has started. No.

How to write a Successful Proposal

Standardised procedures for acoustic data collection as part of an integrated marine observing system (IMOS)

AUTOMATION ACROSS THE ENTERPRISE

About NEC. Co-creation. Highlights for social value creation. Telecommunications. Safety. Internet of Things. AI/Big Data.

Digital Preservation Analyst

Center for Hybrid Multicore Productivity Research (CHMPR)

Some Research Trends: おはようございます. Outline:

Sourcing in Scientific Computing

WITH Woodside. The Woodside Innovation & Technology Hub. Building a community of world-class innovators to unlock tomorrow s growth opportunities.

The New Zealand context and data infrastructure. Alison Stringer, Principal Policy Advisor System Integrity and Engagement

The European Research Council. The ERC Open Access Working Group Views on Research Data Management and DMPs. Martin Stokhof

FORWARD LOOK. Mathematics and Industry Success Stories - DRAFT. European Mathematical Society

University of Kansas. The University of Kansas Libraries

RESEARCH DATA MANAGEMENT PROCEDURES 2015

Cross Linking Research and Education and Entrepreneurship

Mining Innovation: The Importance of Science Entreprise

Intel and XENON Help Oil Search Dig Deeper Into Sub-Surface Oil and Gas Analysis

Open Data and the Future of Science

Ground Systems Department

Transcription:

The Five R s for Developing Trusted Software Frameworks to increase confidence in, and maximise reuse of, Open Source Software Ryan Fraser 1, Lutz Gross 2, Lesley Wyborn 3, Ben Evans 3 and Jens Klump 1 1 CSIRO 1 University of Queensland 1 NCI Australian National University @NCInews

The NCI Integrated Data-Intensive Science Platform Data Services THREDDS Server-side analysis and visualization VDI: Cloud scale user desktops on data 10PB+ Research Data Web-time analytics software

A deluge of 10 PB of Shared Science Data CMIP5 3PB Earth Observ. 2 PB Marine Videos 10 TB Astronomy (Optical) 200 TB Atmosphere 2.4 PB Water Ocean 1.5 PB Weather 340 TB Bathy, DEM 100 TB Geophysics 300 TB BOM GA CSIRO ANU Other National International Mirrored from major science agencies and other sources

A Tsunami of CPU s 4/25 Raijin: 57,472 cores (Intel Xeon Sandy Bridge technology, 2.6 GHz) in 3592 compute nodes; 160 TBytes (approx.) of main memory; Infiniband FDR interconnect; and 7 PBytes (approx.) of usable fast filesystem (for short-term scratch space). 1.5 MW power; 100 tonnes of water in cooling Partner Cloud Same generation of technology as raijin (Intel Xeon Sandy Bridge technology, 2.6 GHz) but only 1500 cores; Infiniband FDR interconnect; Collaborative platform for services and The platform for hosting non-batch services NCI Nectar Cloud Same generation as partner cloud Non-managed environment Weak integration

There is a famine of trusted software: the free lunch is over Herb Sutter: The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software https://www.cs.utexas.edu/~lin/cs380p/free_lunch.pdf

Stages and Key events in Software Evolution Year Stages Year Key Events 1950-1985 Write your own 1983 GNU available 1986-2010 Commercial: Buy Buy Buy 1991 Linux released 2010-2015 Rise of Open Source: scalabilty becomes king 2006 The Free Lunch Ended

OK so what is the issue How do I find the software I need: Problem 1: Understand flood risk in Perth Solution: Simulate tsunami hazard(s), develop risk map Problem 2: I want to model earthquake risk Solution: Simulate fault movement, develop risk map And when I find the code(s) how do I: appreciate the dependencies, understand the environment required know if the code is scientifically robust (trust) know if I will crash the facility I chose to run it on (trust)

Software sites are springing up

How do I find the code for the problem I want to solve?

2015 Data-Intensive Science Climate Report!!! Data Deluge Tsunami of CPU s Famine of Software Megadrought of funding Source: http://jordanrussiacenter.org/event/feast-famine/

The approach to software development is changing Conceptual Section Mathematical Model Numerics Section f ( x) = f ( x) = 3 {,,..., } FEM f ( x) = 2 ( x) x f = Particle Code Fault Intrusive Sandstone f ( x) = x + sin( x) Generic Algorithm Conceptual-Mathematical Interface Mathematical Numerics Interface

Software in the era of Data Intensive Science EVOLUTION Source: http://www.nsf.gov/pubs/2012/nsf12113/nsf12113.p df Software needs to transition from a set of individual research projects to a production infrastructure via a trusted software framework

Introducing the 5 R s of a Trusted Software Framework Number Component Goal 1 Register Find the required software 2 Review Can I trust it? 3 Reference Who else was game enough to use it 4 Run Get cracking 5 Repeat Provide on-line exemplars

Component 2 : Peer Review of software

Component 3: Reference - Who else has used it? Figshare Impactstory Impactstory is an open-source, web-based tool that helps scientists explore and share the diverse impacts of all their research products from traditional ones like journal articles, to emerging products like blog posts, datasets, and software.

Component 4: Run

Component 5: Reference implementations (1) A job s console log can be inspected All of a job s outputs are also accessible Each job has a lifecycle that can be managed Presentation title Presenter name Source: Carina Kemp - The Virtual Geophysics Laboratory 2015 and beyond

Component 5: Reference implementations (2) Each provenance record tracks all inputs, outputs, processing scripts and other metadata allowing repeatability and tranparency... Input/output data... Successful jobs can have their entire process captured in a ISO 19115 provenance record Source: Carina Kemp - The Virtual Geophysics Laboratory 2015 and beyond

Reviewing the 5 R s of Building trusted software Number Component Goal What 1 Register Find the required software Finding relevant software from multiple open source code repositories. 2 Review Can I trust it? Verifying of the software through peer review forums (Mozilla Science Lab/Journals) to assist users to know which codes to trust 3 Reference Who else used it Linking the Software to Figshare or ImpactStory that help disseminate and measure the impact of scientific research, including program code 4 Run Get cracking Draws on information supplied in the registration process, benchmark cases described in the review to instantiate the scientific code 5 Repeat Provide on-line exemplars Provenance Workflow engines that capture information that relate to a run of that software, input and output artefacts, and transactions

Building into a Science Software Solution Centre User asks What Solutions do you have for inundation Modelling Hazards Virtual Laboratory SSSC offers Solutions: x, y & z User Chooses Solution User requests Solution Details from SSSC (science code, dependencies, template) Scientific Software Solution Centre (Registry + Governance) Selected Solution Details returned to VL Solution instantiated and ready for user in VL

How much of this stuff do we actually have? Number Component What is missing How 1 Register Metadata profile, DOI frameworks Develop a metadata standard which includes: licensing, hardware environments, testing procedures, critical dependencies, core scientific algorithms, numerical methods 2 Review Geoscientific Model Development Journal, Mozilla Science 3 Reference Figshare, Impact Story 4 Run GitHub, SourceForge 5 Repeat Provenance Workflow engines For 1: We need to harness expertise in the data community to move forward All components need to be linked and have persistent identifiers

Conclusions: R U now aware? A trusted software framework 1. Is critical to creating Data-Intensive Science Platforms 2. Will enable Researchers to: Rapidly access Reliable code, Reduce the time to deploy it greatly facilitate Reuse and Reinstallation of code and then Rejoice 3. Will provide operational Robustness around our Science 4. But we need to get started on Rolling-out the Register

R U Ready with Questions? Ryan Fraser: Lutz Gross: Lesley Wyborn: Ben Evans: Jens Klump: ryan.fraser@csiro.au l.gross@uq.edu.au lesley.wyborn@anu.edu.au ben.evans@anu.edu.au jens.klump@anu.edu.au http://forum.en.grepolis.com/showthread.php?55048-it-s-a-fairy-tale 2015 Science ICT Network Conference