Reproducibility Interest Group

Similar documents
Scientific Transparency, Integrity, and Reproducibility

The Importance of Scientific Reproducibility in Evidence-based Rulemaking

Elements of Scholarly Discourse in a Digital World

Reproducibility in Computational Science: A Computable Scholarly Record

Reproducibility in Computationally-Enabled Research: Integrating Tools and Skills

Enhancing Reproducibility for Computational Methods

Law & Ethics of Big Data Research Dissemination

How Science is Different: Digitizing for Discovery

When Should We Trust the Results of Data Science?

A CyberInfrastructure Wish List for Statistical and Data Driven Discovery

Reproducibility in Computational Science: Opportunities and Challenges

Open Licensing and Science Policy

Document Downloaded: Wednesday September 16, June 2013 COGR Meeting Afternoon Presentation - Victoria Stodden. Author: Victoria Stodden

Enabling Reproducibility in Computational and Data-enabled Science

Advancing Data Science through a Lifecycle Approach

Computational Reproducibility in Medical Research:

Two Ideas for Open Science (forget Open Data!)

The Value of Computational Transparency

Tools for Academic Research: Resolving the Credibility Crisis in Computational Science

Applying the Creative Commons Philosophy to Scientific Innovation

The Impact of Computational Science on the Scientific Method

Disseminating Numerically Reproducible Research

Scientific Reproducibility and Software

The Reproducible Research Movement in Statistics

Open Methodology and Reproducibility in Computational Science

Reproducible Research for Scientific Computing: Tools and Strategies for Changing the Culture

Thoughts on Reimagining The University. Rajiv Ramnath. Program Director, Software Cluster, NSF/OAC. Version: 03/09/17 00:15

Testimony of Dr. Victoria Stodden Columbia University. Before the House Committee on Science, Space and Technology Subcommittee on Research

Our stock of scientific knowledge is now accumulating in 17:

Software Patents as a Barrier to Scientific Transparency: An Unexpected Consequence of Bayh-Dole

Software Patents as a Barrier to Scientific Transparency: An Unexpected Consequence of Bayh-Dole

The 45 Adopted Recommendations under the WIPO Development Agenda

STRATEGIC FRAMEWORK Updated August 2017

COMMISSION RECOMMENDATION. of on access to and preservation of scientific information. {SWD(2012) 221 final} {SWD(2012) 222 final}

Goals Planned Outcomes & Benefits Who Chairs:

Data and Knowledge as Infrastructure. Chaitan Baru Senior Advisor for Data Science CISE Directorate National Science Foundation

WIPO Development Agenda

GENEVA COMMITTEE ON DEVELOPMENT AND INTELLECTUAL PROPERTY (CDIP) Fifth Session Geneva, April 26 to 30, 2010

Reproducible Research in Computational Science

G20 Initiative #eskills4girls

Open Science policy and infrastructure support in the European Commission. Joint COAR-SPARC Conference. Porto, 15 April 2015

Research Data Alliance Austria

Liquid Benchmarks. Sherif Sakr 1 and Fabio Casati September and

Open Science for the 21 st century. A declaration of ALL European Academies

Ocean Data Interoperability Platform: developing a global framework for marine data management

UN GA TECHNOLOGY DIALOGUES, APRIL JUNE

Intellectual Property Policy. DNDi POLICIES

Digital Preservation Strategy Implementation roadmaps

Second Annual Forum on Science, Technology and Innovation for the Sustainable Development Goals

14 th Berlin Open Access Conference Publisher Colloquy session

Big Data Analytics in Science and Research: New Drivers for Growth and Global Challenges

DRM vs. CC: Knowledge Creation and Diffusion on the Internet

PREFACE. Introduction

Journal Policy and Reproducible Computational Research

President Barack Obama The White House Washington, DC June 19, Dear Mr. President,

Open Science in the Digital Single Market

Keynote Address: "Local or Global? Making Sense of the Data Sharing Imperative"

EarthCube Conceptual Design: Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences

COST Action CHARME (CA15110)

Human Rights Approach

Artificial intelligence and judicial systems: The so-called predictive justice

Trust, but Verify : What the Digital and Transparency Revolutions in Social Science Mean for You. Andrew Moravcsik

Sharing data alongside publications with Taylor & Francis

Science of Science & Innovation Policy and Understanding Science. Julia Lane

Cisco Live Healthcare Innovation Roundtable Discussion. Brendan Lovelock: Cisco Brad Davies: Vector Consulting

Facilitating Technology Transfer and Management of IP Assets:

SLIDE FOR DISCUSSION NICOLE & KATE

WIPO-WTO Colloquium for Teachers of Intellectual Property

g~:~: P Holdren ~\k, rjj/1~

Draft Plan of Action Chair's Text Status 3 May 2008

First Meeting Committee on Reproducibility and Replicability in Science December 12-13, 2017

Vanderbilt CQS: Next Steps for Next-Generation Success

Ethical issues raised by big data and real world evidence projects. Dr Andrew Turner

Why? A Documentation Consortium Ted Habermann, NOAA. Documentation: It s not just discovery... in global average

NUIT Support of Researchers

ENABLING REPRODUCIBLE RESEARCH: OPEN LICENSING FOR SCIENTIFIC INNOVATION

Hamburg, 25 March nd International Science 2.0 Conference Keynote. (does not represent an official point of view of the EC)

Working Paper Series of the German Data Forum (RatSWD)

Open Science. challenge and chance for medical librarians in Europe.

A POLICY in REGARDS to INTELLECTUAL PROPERTY. OCTOBER UNIVERSITY for MODERN SCIENCES and ARTS (MSA)

TRUSTING THE MIND OF A MACHINE

Technology forecasting used in European Commission's policy designs is enhanced with Scopus and LexisNexis datasets

EOSC Governance Development Forum 6 April 2017 Per Öster

BIG IDEAS. Personal design choices require self-exploration, collaboration, and evaluation and refinement of skills. Learning Standards

Access to Medicines, Patent Information and Freedom to Operate

Can Linguistics Lead a Digital Revolution in the Humanities?

Progress in computational science

A Harmonised Regulatory Framework for Supporting Single European Electronic Market: Achievements and Perspectives

THE ROLE OF TRANSPORT TECHNOLOGY PLATFORMS IN FOSTERING EXPLOITATION. Josef Mikulík Transport Research Centre - CDV

Convergence and Differentiation within the Framework of European Scientific and Technical Cooperation on HTA

Distinguished Co-facilitators, Ambassadors, delegates and representatives from capitals,

Data-intensive environmental research: re-envisioning science, cyberinfrastructure, and institutions

Enabling FAIR Data in the Earth, Space, and Environmental Sciences

New forms of scholarly communication Lunch e-research methods and case studies

Helen Glaves (NERC-BGS), Dick Schaap(MARIS), Roger Proctor (IMOS) & Stephen Miller (SIO)

Open Data, Open Science, Open Access

Role of Public funding in Enhancing Innovation

no.10 ARC PAUL RABINOW GAYMON BENNETT ANTHONY STAVRIANAKIS RESPONSE TO SYNTHETIC GENOMICS: OPTIONS FOR GOVERNANCE december 5, 2006 concept note

Cross-Border Interoperability Report Overview CANUS CIWG Meeting

THE WIPO DEVELOPMENT AGENDA. New York February 2011

Transcription:

Reproducibility Interest Group co-chairs: Bernard Schutz; Victoria Stodden Research Data Alliance Denver, CO September 16, 2016

Agenda Introductory comments Presentations: Andi Rauber, others? Conclusions and Next Steps

RIG Goals 1. Theme: Where does reproducibility fit in the RDA structure? Can we leverage the work of other IGs and WGs? 2. What are tools that support reproducibility? Can we collate a list? Find gaps? 3. Use cases for reproducibility research. Exemplars. (4.) Can we match tools and use cases?

Update and Recap Previous meetings: RDA-4: 1st meeting; lots of interest and lively discussion RDA-5: joint session with Provenance WG RDA-6: Google doc: http://bit.ly/2cce2q1 (or https://docs.google.com/document/d/ 18ptKKJQJLOC4B71Mcd9mATyYxOTQYmza0w2v pg-sp7e/edit )

Parsing Reproducibility Empirical Reproducibility Statistical Reproducibility Computational Reproducibility V. Stodden, IMS Bulletin (2013)

Computational Reproducibility Traditionally two branches to the scientific method: Branch 1 (deductive): mathematics, formal logic, Branch 2 (empirical): statistical analysis of controlled experiments. Now, new branches due to technological changes? Branch 3,4? (computational): large scale simulations / data driven computational science. CLAIM: computation presents only a potential third/fourth branch of the scientific method (Donoho et al 2009).

Infrastructure Responses Tools and software to enhance reproducibility and disseminate the scholarly record: Dissemination Platforms ResearchCompendia.org IPOL Madagascar MLOSS.org thedatahub.org nanohub.org Open Science Framework RunMyCode.org Workflow Tracking and Research Environments Vistrails Kepler CDE Jupyter torch.ch Galaxy GenePattern Sumatra Taverna DataCenterHub Pegasus Kurator RCloud Embedded Publishing Verifiable Computational Research SOLE knitr Collage Authoring Environment SHARE Sweave

Three Principles for CI 1. Supporting scientific norms not only should CI enable new discoveries, but it should also permit others to reproduce the computational findings, reuse and combine digital outputs such as datasets and code, and facilitate validation and comparisons to previous findings. 2. Supporting best practices in science CI in support of science should embed and encourage best practices in scientific research and discovery. 3. Taking a holistic approach to CI the complete end-to-end research pipeline should be considered to ensure interoperability and the effective implementation of 1 and 2. Changes embedded in a social and political environment. Exceptions: privacy, HIPAA, FERPA, other constraints on sharing. See Stodden, Miguez, Seiler, ResearchCompendia.org: Cyberinfrastructure for Reproducibility and Collaboration in Computational Science CiSE 2015

Community Responses Declarations and Documents: Yale Declaration 2009 ICERM 2012 XSEDE 2014

Really Reproducible Research Really Reproducible Research (1992) inspired by Stanford Professor Jon Claerbout: The idea is: An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete... set of instructions [and data] which generated the figures. David Donoho, 1998 Note the difference between: reproducing the computational steps and, replicating the experiments independently including data collection and software implementation. (Both required)

Querying the Scholarly Record Show a table of effect sizes and p-values in all phase-3 clinical trials for Melanoma published after 1994; Name all of the image denoising algorithms ever used to remove white noise from the famous Barbara image, with citations; List all of the classifiers applied to the famous acute lymphoblastic leukemia dataset, along with their type-1 and type-2 error rates; Create a unified dataset containing all published whole-genome sequences identified with mutation in the gene BRCA1; Randomly reassign treatment and control labels to cases in published clinical trial X and calculate effect size. Repeat many times and create a histogram of the effect sizes. Perform this for every clinical trial published in the year 2003 and list the trial name and histogram side by side. Courtesy of Donoho and Gavish 2012

Government Mandates OSTP 2013 Open Data and Open Access Executive Memorandum; Executive Order. Public Access to Results of NSF-Funded Research NOAA Data Management Plan, Data Sharing Plan NIST Common Access Platform

Federal Agencies

Journal Requirements Science: code data sharing since 2011. Nature: data sharing. AER: data and code access others See also Stodden V, Guo P, Ma Z (2013) Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals. PLoS ONE 8(6): e67111. doi:10.1371/journal.pone.0067111

The Larger Community 1. Production: Crowdsourcing and public engagement in science primarily data collection/donation today, but open up pipeline: - access to coherent digital scholarly objects, - mechanism for ingesting/evaluating new findings, - addressing legal issues (use, re-use, privacy, ). 2. Use: Evidence-based -{policy, medicine, }, decision making.

Open Questions Incentivizing changes toward the production and dissemination of reproducible research. Who funds and supports cyberinfrastructure? Who controls access and gateways? Who owns data, code, and research outputs? Working around and within blocks such as privacy, legal barriers,.. What are community standards around documentation, citation standards, best practices? Who enforces?

Empirical Reproducibility

Statistical Reproducibility False discovery, p-hacking (Simonsohn 2012), file drawer problem, overuse and mis-use of p-values, lack of multiple testing adjustments. Low power, poor experimental design, nonrandom sampling, Data preparation, treatment of outliers, re-combination of datasets, insufficient reporting/tracking practices, inappropriate tests or models, model misspecification, Model robustness to parameter changes and data perturbations, Investigator bias toward previous findings; conflicts of interest.

Background: Open Source Innovation: Open Licensing Software Software with licenses that communicate alternative terms of use to code developers, rather than the copyright default. Hundreds of open source software licenses: - GNU Public License (GPL) - (Modified) BSD License - MIT License - Apache 2.0 License -... see http://www.opensource.org/licenses/alphabetical

The Reproducible Research Standard The Reproducible Research Standard (RRS) (Stodden, 2009) A suite of license recommendations for computational science: Release media components (text, figures) under CC BY, Release code components under Modified BSD or similar, Release data to public domain or attach attribution license. Remove copyright s barrier to reproducible research and, Realign the IP framework with longstanding scientific norms.

Research Compendia Pilot project: improve understanding of reproducible computational science, trace sources of error link data/code to published claims, re-use, a guide to empirical researchers, certifies results, large scale validation of findings, stability, sensitivity checks.