Reproducible Research in Computational Science IPOL, a Research Journal for Image Processing Algorithms and Software Facultad de Ingeniería Universidad de la República Montevideo, UY, April 11th, 2013 Nicolas Limare CMLA, ENS Cachan, FR Image Processing On Line IPOL http://www.ipol.im/ REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 1
A Researcher's Story Let's do research on... REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 2
A Researcher's Story Let's do research on... maté Alejo2083@wikipedia, ZooFari@wikipedia REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 3
Road to HPMDS (High-Perf Maté Dynamics Simulation) Review past research and state of the art theories and methods Create new measurement tools, models and simulation software Compare with existing works Present, publish Drink high-performance maté REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 4
Error 404 How do you compute?? Marcin Wichary REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 5
Ask the author? Code not available Code not usable secret, lost binary-only, not for your OS, obsolete Code not compilable won't debug 2000 obscure lines Code not meant to be read by others Not the exact version used for the article REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 6
Rewrite? Might take some time, days, weeks, months... You won't get much credit for this work Everything is not explained in the article No way to verify that the implementation is correct [...] software is the specification for how the software is supposed to work. Anything less [...] doesn t really tell you anything about how it s ultimately going to behave. And that just makes software really, really hard. Douglas Crockford REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 7
No Software Can not verify Can not reuse Can not reproduce Can not extend Can not compare Can not do science REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 8
Beyond Maté Simulations Sometimes more than a missing code: Misleading performance reports Manipulated figures 2000 retractions in biomedical, 43% for fraud Clinical trials based on wrong assumptions Climategate Public policies based on wrong expectations David Bailey, Twelve ways to fool the masses when giving performance results on parallel computers, Supercomputing Review (1991). Nicholas Wade, It May Look Authentic; Here's How to Tell It Isn't, The New York Times (2006). Ferric C. Fang, R. Grant Steen, and Arturo Casadevall, Misconduct accounts for the majority of retracted scientific publications, PNAS (2012). http://dx.doi.org/10.1073/pnas.1212247109 Kevin R. Coombes, Jing Wang, and Keith A. Baggerly, Irreproducibility of NCI60 Predictors of Chemotherapy, http://bioinformatics.mdanderson.org/supplements/reprorsch-chemo/ Bill Chameides, Climategate Redux, Scientific American (2010). REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 9
But... We can make better science. We are trying with IPOL. REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 10
Scientific Method REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 11
Scientific Method 1200 ~ 1800 Roger Bacon, Francis Bacon, Galileo Galilei, Robert Boyle, René Descartes, Science needs to be reproduced. REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 12
Reproducible Research? Research is reproducible if other researchers can independently obtain the same results from the published material. Theoretical scientists share demonstrations Experimental scientists share procedures Computational scientists (usually) share no software, no full description, no data Ø cf. Claerbout 1992, Donoho 1995, Stonned 201X, Vandewalle 201X Sfoster83@wikipedia, Madprime@wikipedia REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 13
Reproducible (Computational) Research 1990 ~ Jon Claerbout, David Donoho Serguei Fomel, Randy Leveque, David Bailey, Victoria Stodden, Juliana Freire, The science is in the software, data and process. An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. D. Donoho REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 14
Computation Everywhere particle physics fluid dynamics econometrics signal processing quantum chemistry LIDAR archeology MRI analysis climate & weather geophysics CERN, rreis@flickr, rafael grompone, info-nftk@flickr, mohapj@flickr, mario stefanutti, argonne@flickr REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 15
ScienceCodeManifesto.org Code: All source code written specifically to process data for a published paper must be available to the reviewers and readers of the paper. Copyright: The copyright ownership and license of any released source code must be clearly stated. Citation: Researchers who use or adapt science source code in their research must credit the code s creators in resulting publications. Credit: Software contributions must be included in systems of scientific assessment, credit, and recognition. Curation: Source code must remain available, linked to related materials, for the useful lifetime of the publication. REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 16
Why not Share the Code? Code not ready for public view no time/motivation to cleanup, simplify and document Prevent Incorrect Use documentation and explanations again Keep competitive advantage better not publish at all? REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 17
Revisit Objectives of Publishing Articles KEY: lure researchers into sharing their code vs Impact Factor rare picture of an utopian community in the act of sharing their research code freeclipartnow.com REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 18
Revisit Objectives of Publishing Articles Researcher Cite Publish Code Publish Article Community Cite Researcher Community traditional research articles source code Step 1 make the code a publication by itself REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 19
Revisit Objectives of Publishing Articles Researcher Cite Publish Code Publish Article Community Cite Researcher Community traditional research articles source code Step 2 guide the community to use and cite the code REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 20
IPOL REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 21
IPOL IPOL is a research journal of image processing and image analysis. Each article contains a text describing an algorithm and source code, with an online demonstration facility and an archive of online experiments. The text and source code are peer-reviewed and the demonstration is controlled. IPOL follows the Open Access and Reproducible Research models. article = manuscript + software REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 22
IPOL IPOL is a research journal of image processing and image analysis. Each article contains a text describing an algorithm and source code, with an online demonstration facility and an archive of online experiments. The text and source code are peer-reviewed and the demonstration is controlled. IPOL follows the Open Access and Reproducible Research models. article = manuscript + software (+ demo + archive) REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 23
REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 24
Publishing Software IPOL wants to provide reference implementations of image processing algorithms. For every article, the implementation is reviewed and published under GPL/BSD license can be tested online in real time on free data Everything is online, free, reusable. http://ipol.im/ REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 25
Reviewing Software Software is reviewed like a manuscript: manually, by selected reviewers must match the description of the algorithm follows editorial guidelines for correctness, portability, documentation, style This is already a lot asked to image processing researchers. REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 26
IPOL Not a prototype, publishing since 2010 Research journal, self-published ISSN, DOI, editorial policy and int'l board, indexed Partnership with SIAM journal for dual articles + IPOL publishes algorithms, not software; code is here to provide all details to study the algorithm IPOL exists because we need it and no other journal did it REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 27
Reproducible Research Initiatives Journals Science requires that all data and code is available to any reader Math Programming Computation requires the code Biostatistics stamps reproducible articles JMLR publishes software Geophysics has some software guidelines Source Code for Biology and Medicine publishes software, Journal of Open Research Software will too Computing in Science and Engineering reviews software MetaJournals publish articles about software and data REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 28
More Reproducible Research Initiatives Publishers SIAM updated its supp. material policies to include software ACM reformed its supp. material copyright policy Elsevier experiments with executable papers and post-pdf Tools and Services RunMyCode hosts executable research software FLOSShub, mloss/mldata host software DataDryad, Figshare host data Conferences and Workshops ICERM Workshop Dec. 2012 SINTEF Winter School Jan. 2013 SIAM CiSE13 Conference track Feb. 2013 NYU Workshops May 2013 REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 29
IPOL Article: Manuscript+Software Manuscript: description and study of an algorithm Software: complete and documented implementation REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 30
IPOL Article: +Demo Manuscript: description and study of an algorithm Software: complete and documented implementation Demo: universal www interface, test and explore REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 31
IPOL Article: +Archive Manuscript: description and study of an algorithm Software: complete and documented implementation Demo: universal www interface, test and explore Archive: shared test data REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 32
Activity 40 articles published with code and demo since 2011 25 articles under review, 10+ public preprints 100+ citations (cf. Google Scholar) 2012 125.000 visits 13.000 code/data downloads 50.000 demo runs, 30.000 on original data (archived) REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 33
Results Reference implementations of algorithms Verifiable claims on performances and results Algorithms described and analyzed Algorithms improved by mass-testing Implementations improved by review More than reproducible. Reusable and open. REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 34
Challenges Still the work of a small community join and spread the word spin-off to other research areas (next: audio) Competition from less stringent journals and conferences they can evolve by peer-review pressure Reusable is more complex than reproducible software project derived from IPOL Conservative community habits must learn to cite software, article PDF Substantial effort to prepare good code computation at center of computational sciences cursus templates? other ideas? REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 35
Collaboration Work funded by and in collaboration with New participants are welcome! REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 36
Follow-up to... http://ipol.im/ edit@ipol.im discuss@list.ipol.im @IPOL_journal and also http://stodden.net/ http://reproducibleresearch.net/ http://www.runmycode.org/ interested in more authors, editors, reviewers, readers and users productive relations with new researchers assistance and collaboration to new similar projects REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE 37