Big Data Analytics in Science and Research: New Drivers for Growth and Global Challenges Richard A. Johnson CEO, Global Helix LLC and BLS, National Academy of Sciences ICCP Foresight Forum Big Data Analytics and Policies 22 October 2012 johnsri@alum.mit.edu
Session 3: 4 Questions for Discussion Q1 Importance of data openness and interoperability for science and research, especially in biomedicine and health? Q2 Are current IPR regimes data-intensive scientific discovery? Q3 Do we still need scientific methods (and traditional domain scientists) in an era of big data analytics? Q4 How, and why, does this matter for policy?
Convergence of Biology with Physical Sciences & Engineering through Data and Data Analytics = the New Biology or Third Revolution in the Life Sciences Foundational trend in STI for next 20 years NAS (2010); MIT (2011)
Genomic Data is Increasing Faster than Computing Power Convergence of 3 key DATA DRIVERS with RESEARCH and ECONOMIC VALUE: (1)Sequencing + (2) Synthesis + (3) Reading AND Writing DNA Data Tools in the Life Sciences: Moore s Law on Steroids Gene Expression Data Sets (Nature 2012)
Life Sciences and Biomedical Research as an Information Science: Quantitative, Data-driven, Simulation-oriented, Predictive Science
Data and Convergence Driving the Future: Data Analytic Tools, Platforms, and Measurement for New Sources of Growth Technology Convergence, Data Analytics and Metrology as Interdependent Drivers (Agilent 2012) Energy and the Environment Advancing High Growth Economies Portable, Mobile and Out-of-Lab Nanotechnology Food Safety Personalized Medicine Single Cells and Microbiome Synthetic Biology Intern Executive Speaker Series 6
Beyond Interoperability, The Power of Interconvertibility: FROM PHYSICAL LIVING MATERIAL/DNA to DIGITAL DATA, and back 1 s and 0 s A, C, T, G s IT from Bits (Poste 2012) Programming: increasing ability to both Read and Write DNA Tools to Edit and Write Genomes: MAGE + CAGE (Church/Isaacs 2011, 2012) DNA Construction (analog to Read/Write; 1 s and 0 s manipulation) - Genetic Expression Operating Systems; Scale DNA construction engineering Data enables Decoupling: biological processes from evolutionbased descent and replication + design from fabrication
Big Data and Data Analytics Drive new 21 st Century Infrastructures and KNMs, and Create Opportunities for New Research, Better Health Outcomes, and Value Creation (Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and New Taxonomy of Disease: NAS 2011)
The Creative Destruction of Medicine (Topol 2012)
Data Sharing, Disease Modeling and Biomarkers to Accelerate the Development
Big Data and Engineering Biology as the Transformative New Normal in the Life Sciences Driving New Sources of Growth Synthetic Biology - Standardization, Abstraction and Modularity Predictive Platforms for Engineering Biology and Predictable Integration of new Genetic Designs built on Massive Data an Engineering METHODOLOGY to construct complex systems and novel properties based on biological components (EU-US Task Force, June 2010)
Data-driven and Engineering Biology Value Proposition Increasingly Drives Science, New Sources of Growth, and our ability to meet societal Grand Challenges NAS 2011
Neuroscience a 21 st Century Frontier for Human Understanding and Grand Challenges Traversing the scales at all levels in understanding the brain from molecular and cellular to systems neurons (100 Billion)/synapses (150 Trillion), and neural signaling Human Connectome Project = mapping neural networks with >1 million more connections than the genome has letters of DNA, and linking all this to other life experience data sets
ENCODE: the Encyclopedia of DNA Elements Big Data, Data Analytics, and Big Science increasingly change how we do science (Sept. 2012)
The Plasticity of IPR/Open Science Meanings and lots of rethinking in different domains about IPR, Openness and Scientific Research IPR and Competing Visions of Openness Open Science (Public domain; BioBricks library/bbf) v. Open Source (IPR-driven; GPL, BSD, CC) v. Open Standards v. Open Development v. Open Access (including reuse and sharing public-funded data) v. Open Innovation (depends on strong, well-functioning IPR system) Innovative New Thinking e.g., Semi-commons as a new lens to view Data interacting common and private uses that are dynamic/scalable over the same resources and that can adjust through contracting and other mechanisms Knowledge Networks and Markets (KNMs) and Knowledgebased Capital KBC) major OECD initiatives on-going Growing Counter-intuitive View that Role of IPR Increasingly Important as a Tool to Promote Openness, Transparency, and Diffusion, e.g., Algorithms, Data Exchanges, Tools and Re-use
Growing Linkage of Data-intensive Science, IPR, and New Models of Innovation: Big Data Analytics Intersect with Open Innovation, Multi-directional S&T, University-Industry Partnering, New Business Models, Forward-looking IPR, and New Public-Private Collaborative Mechanisms to Enable Cutting-edge Research and Innovation
The Fourth Paradigm, the Internet of Things, Automated Data Extraction Methods, and Big Data Analytics the Need for a New Generation of Scientific computing tools and platforms to manage, visualize and analyze Big Data for Research (Gray 2009)
Wide Range of New Data Analytic Convergence Challenges with Policy Implications (Gray 2009) Risks to Scientific Research from (Bad) Data Analytics? - Jeopardize reproducibility - Retard pace of research - Produce poorly written code/bad algorithms on which science relies - Create serious errors in scientific outcomes, and the interpretations of them
New Day-to-day Science Research Implications of Big Data: Data Analytics Challenges Which data to keep in what format? for how long? What about emergent properties? resulting from elaborate networks of interactions and data patterns How to deal with data distributed across many locations, formats, scales, etc., and merge them? How to model large complex data, and derive valuable knowledge from analytics/models? How to infuse data into complex computations to enable simulations of predictive value? How to deal with different kinds of big data (temporal, spatial, dimensional, heterogeneous) Massive data High-dimensional data Multi-modal data Real-time and Streaming data
In a data-driven science era, should we still fund, incentivize and value Empirical, Theoretical, Model-based Approaches to Scientific Discovery? Is Popper s scientific method paradigm outdated? I believe that math is trumping science. What I mean by that is you don't really have to know why, you just have to know that if a and b happen, c will happen. Vivek Ranadivé, entrepreneur and CEO, financialdata software company TIBCO (2011) With enough numbers, the data speak for themselves Chris Anderson, Editor-in-Chief, Wired, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete (2008) All models are wrong, and increasingly you can succeed without them. Peter Norvig, Director of Research, Google The numbers have no way of speaking for themselves.data-driven predictions can succeed and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves. Nate Silver, The Signal and the Noise: Why So Many Predictions Fail but Some Don t (2012) The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning. Stephen Jay Gould, American evolutionary biologist (1981)
Thank you! Contact Information -- Richard A. Johnson CEO, Global Helix LLC richard.johnson@globalhelix.net MIT johnsri@alum.mit.edu