Open Data and the Future of Science Geoffrey Boulton CODATA ICSU-ROLAC The Open Science Imperative San Salvador August 2016
Knowledge and understanding are the engines of material & social progress depend on technologies that enable the accumulation and communication of information 1454 2002 All societies and their economies need to adapt to these changes if they are not merely to be derivative, depending on inspiration from elsewhere BUT WHY OPEN?
Openness the bedrock of science in the modern era Henry Oldenburg
Some fundamentals Not validation but negation No amount of experimentation can prove me right. A single experiment can prove me wrong. Albert Einstein Scientific self-correction & its consequences The progress of science is strewn, like an ancient desert trail, with the bleached skeletons of discarded theories that once seemed to possess eternal life. Arthur Koestler Facts (data) are sacrosanct & must therefore be open to scrutiny False facts are highly injurious to the progress of science, for they often long endure. But false views do little harm, as everyone takes a salutary pleasure in proving them wrong. Charles Darwin
The digital revolution storage analysis communication Global information storage capacity In optimally compressed bytes 2007 19 Exabytes 1986 1993 Analogue Storage Explosion of the Digital revolution The technological bases for open science if we choose to use them! 2000 Digital Storage 280 Exabytes 2014-4000 Exabytes 2016-20,000 Exabytes Based on: http://www.martinhilbert.net/worldonfocapacity.html 1 Exabyte=10 18 bytes
Information: how much is crystallised into knowledge?
A crisis of reproducibility and credibility? Pre-clinical oncology 89% not reproducible Why? Misconduct/fraud Invalid reasoning Absent or inadequate data and/or metadata The data providing the evidence for a published concept MUST be concurrently published, together with the metadata. To do otherwise is scientific MALPRACTICE
Analytic overload of human cognition E.g. - Global Earth Observation System of Systems A disconnect between machine analysis & human cognition? What is the human role? Can we analyse & scrutinise what is in the black box? - &who owns the box? What does it mean to be a researcher in a data intensive age?
The opportunities: four key drivers of change for science Micro-satellite Big data Semantically-linked data Open data Cost reduction Looking at clouds Ozone Levels
From simple to complex systems From uncoupled to highly coupled behaviour Uncoupled systems Highly coupled systems
Complexity: system state & dynamic evolution Simulating system dynamics Mapping a complex state Emergent behaviour of a specific 6-component coupled system Image of brain cells in a rat
The opportunity: data-modelling: iterative integration Satellite observation Surface monitoring Initial conditions Model forecast Model-data iteration - forecast correction
The opportunity: new modes of technology- enabled creativity Tim Gowers - crowd-sourced mathematics Mathematics related discussions An unsolved problem posed on his blog. 32 days 27 people 800 substantive contributions Emerging contributions rapidly developed or discarded Problem solved! Its like driving a car whilst normal research is like pushing it What inhibits such processes? - The criteria for credit and promotion ALTMETRICS THE ANSWER?
The semantic opportunity: deepening data integration Scientific opportunity 4500 Variables: e.g. Annual Precipitation Annual Temperature Anthropogenic impacts on Marine Ecosystems - Nutrient Pollution (Fertilizer) Aquaculture Production - Inland Waters Aquaculture Production - Marine Aquaculture Production - Total Arable Land Arable and Permanent Crops Arsenic in Groundwater - Probability of Commercial opportunity Purchases For $930 million In order to: Predict agricultural yields to ascend to the next level of agricultural evaluation Historic rainfall & infiltration data Soil properties & quality
Business is taking up the opportunity
Scientific exploitation of the digital revolution Patterns not hitherto seen Unsuspected relationships States and dynamics of complex systems
The Open Data Iceberg Technology The Technical Challenge The Consent Challenge Processes & Organisation People The Ecosystem Challenge The Funding Challenge The Support Challenge The Skills Challenge The Incentives Challenge The Mindset Challenge motivation and ethos. A National Infrastructure Developed from: Deetjen, U., E. T. Meyer and R. Schroeder (2015).
System components National/regional open data systems Responsibilities Govt policy priorities Open data/science policies Research funders Open data as the cost of doing research Research bodies (universities, institutes) Researchers Incentives for open data management support data science support education Concurrent publication of concept & data mindset change Publishers Requirement for concurrent data publication
Science International Accord: principles of open data (www.icsu.org/science-international) 12 principles: Responsibility Boundaries of openness Enabling practices Responsibilities of scientists Publicly funded scientists have a responsibility to contribute to the public good through the creation and communication of new knowledge, of which associated data are intrinsic parts. They should make such data openly available to others as soon as possible after their production in ways that permit them to be re-used and repurposed.
An Open Data Platform for Africa Principles the rules of the game (the Accord) Planning - coordination Hardware procurement Policies governmental priorities Processes stakeholders applying the rules to support open data Support structures, Management Incentives Skill development Practices (tools & standards) implementing open data DATA APPLICATIONS Open Access Repositories
Regional Platforms for Open Science Shared investment in infrastructure; harvesting and circulating good ideas; spreading and supporting good practice; capacity building; promoting applications; linking to international programmes and standards.? Latin American Platform? African Platform Asian Platform? Australian Platform
Open Data Maintains the rigour of self-correction Maximises public investment Minimises Waste Enables Data Integration Open knowledge is creative & productive Open Science Opening laboratory & library doors Engaging with other societal stakeholders Joint production of actionable knowledge
CODATA Strategic Priorities Principles, Policies and Practice Frontiers of Data Science Capacity Building Data Science Journal SciDataCon 2016, 11-13 Sept, Denver, CO.