How can Physics Inform Deep Learning Methods in Scientific Problems:

Similar documents
ANUJ KARPATNE. Integrated M.Tech, Mathematics and Computing Indian Institute of Technology Delhi (IITD)

ANUJ KARPATNE. Integrated M.Tech, Mathematics and Computing Indian Institute of Technology Delhi (IITD)

Machine Learning and Decision Making for Sustainability

Global Environmental MEMS Sensors (GEMS): Revolutionary Observing Technology for the 21st Century

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Sensor Technologies and Sensor Materials for Small Satellite Missions related to Disaster Management CANEUS Indo-US Cooperation

This list supersedes the one published in the November 2002 issue of CR.

Autonomous and Autonomic Systems: With Applications to NASA Intelligent Spacecraft Operations and Exploration Systems

Investigate the great variety of body plans and internal structures found in multi cellular organisms.

Surveillance and Calibration Verification Using Autoassociative Neural Networks

WORKSHOP ON BASIC RESEARCH: POLICY RELEVANT DEFINITIONS AND MEASUREMENT ISSUES PAPER. Holmenkollen Park Hotel, Oslo, Norway October 2001

Fundamentals of Remote Sensing

A SELF-CONTAINED MODEL TO INVESTIGATE THE PHYSICAL BEHAVIOUR OF DESIGN OBJECTS

Journal Title ISSN 5. MIS QUARTERLY BRIEFINGS IN BIOINFORMATICS

Advanced Analytics for Intelligent Society

PREFACE. Introduction

28th Seismic Research Review: Ground-Based Nuclear Explosion Monitoring Technologies

A Workshop on Predictive Theoretical and Computational Approaches for Additive Manufacturing

Use of Knowledge Modeling to Characterize the NOAA Observing System Architecture

Texas Hold em Inference Bot Proposal. By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005

DETERMINATION OF THE EFFECTIVE ACCURACY OF SATELLITE-DERIVED GLOBAL, DIRECT AND DIFFUSE IRRADIANCE IN THE CENTRAL UNITED STATES

e-science Acknowledgements

VSI Labs The Build Up of Automated Driving

Citrine Informatics. Materials Informatics: Artificial Intelligence Driven Materials Development and Optimization.

MREFC thoughts. Larry J. Paxton

SCAN: Multi-Hop Calibration for Mobile Sensor Arrays

Computational Science and Engineering Introduction

URI Imagine the Future

Goals of this Course. CSE 473 Artificial Intelligence. AI as Science. AI as Engineering. Dieter Fox Colin Zheng

Image Extraction using Image Mining Technique

WS01 B02 The Impact of Broadband Wavelets on Thin Bed Reservoir Characterisation

Lecture 1 What is AI?

TURNING IDEAS INTO REALITY: ENGINEERING A BETTER WORLD. Marble Ramp

UML and Patterns.book Page 52 Thursday, September 16, :48 PM

Proposers Day Workshop

Claire Jolly Head, Innovation Policies for Space and Oceans Unit, OECD. Our Ocean Wealth Summit: Investing in Marine Ireland

Safeguards in a Big Data World

Space Challenges Preparing the next generation of explorers. The Program

Roadmapping. Market Products Technology. People Process. time, ca 5 years

DATE 2016 Early Reliability Modeling for Aging and Variability in Silicon System (ERMAVSS Workshop)

Evidence Engineering. Audris Mockus University of Tennessee and Avaya Labs Research [ ]

2018 Research Campaign Descriptions Additional Information Can Be Found at

Petascale Design Optimization of Spacebased Precipitation Observations to Address Floods and Droughts

A COMPARISON OF ELECTRODE ARRAYS IN IP SURVEYING

Concepts and Challenges

Human-Centric Trusted AI for Data-Driven Economy

Computer Science as a Discipline

December 10, Why HPC? Daniel Lucio.

Bias estimation and correction for satellite data assimilation

Implementing Quality Systems

DIGITALGLOBE ATMOSPHERIC COMPENSATION

What is Big Data? Jaakko Hollmén. Aalto University School of Science Helsinki Institute for Information Technology (HIIT) Espoo, Finland

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm

A Balanced Introduction to Computer Science, 3/E

Environmental & Interference Effects of HVDC Converters & Lines

Bias correction of satellite data at ECMWF

Gen-Adler: The Generalized Adler's Equation for Injection Locking Analysis in Oscillators

TRAINING THE NEXT GENERATION OF QUANTITATIVE BIOLOGISTS IN THE ERA OF BIG DATA

Artificial Intelligence

Social Science: Disciplined Study of the Social World

An Investigation of Scalable Anomaly Detection Techniques for a Large Network of Wi-Fi Hotspots

Recommender Systems TIETS43 Collaborative Filtering

MANITOBA FOUNDATIONS FOR SCIENTIFIC LITERACY

CVT Workshop October 31 November 1, 2018

Introduction to IEEE CAS Publications

25823 Mind the Gap Broadband Seismic Helps To Fill the Low Frequency Deficiency

Using Data Analytics and Machine Learning to Assess NATO s Information Environment

Iridium NEXT SensorPODs: Global Access For Your Scientific Payloads

Copyright: Conference website: Date deposited:

N J Exploitation of Cyclostationarity for Signal-Parameter Estimation and System Identification

Module 1: Introduction to Experimental Techniques Lecture 2: Sources of error. The Lecture Contains: Sources of Error in Measurement

Copernicus Introduction Lisbon, Portugal 13 th & 14 th February 2014

The A.I. Revolution Begins With Augmented Intelligence. White Paper January 2018

Application of Artificial Intelligence in Mechanical Engineering. Qi Huang

Practices and Challenges. For Open Pit Geotechnical Characterization, Design and Execution

Notes from a seminar on "Tackling Public Sector Fraud" presented jointly by the UK NAO and H M Treasury in London, England in February 1998.

Machinery Prognostics and Health Management. Paolo Albertelli Politecnico di Milano

Fresh from the boat: Great Duck Island habitat monitoring. Robert Szewczyk Joe Polastre Alan Mainwaring June 18, 2003

National Science Education Standards, Content Standard 5-8, Correlation with IPS and FM&E

Overview of the NSF Programs

Data assimilation of FORMOSAT-3/COSMIC using NCAR Thermosphere Ionosphere Electrodynamic General Circulation Model (TIE-GCM)

Some Parameter Estimators in the Generalized Pareto Model and their Inconsistency with Observed Data

Evaluation of Direct Broadcast and Global Microwave Sounder Data from FY-3C

Reconsidering the Role of Systems Engineering in DoD Software Problems

The marginalisation of cross-cutting issues in CCUS Mission Innovation PRDs

Autonomous Underwater Vehicle Navigation.

APPROXIMATE KNOWLEDGE OF MANY AGENTS AND DISCOVERY SYSTEMS

Challenge AS3: Utilise Recent Developments in IT, Computing & Energy Storage Technology to Transform the Analytical Operations

Miguel A. Aguirre. Introduction to Space. Systems. Design and Synthesis. ) Springer

A ROBUST SCHEME TO TRACK MOVING TARGETS IN SENSOR NETS USING AMORPHOUS CLUSTERING AND KALMAN FILTERING

BI TRENDS FOR Data De-silofication: The Secret to Success in the Analytics Economy

Reliability and Risk in Theory and Practice

The Basic Kak Neural Network with Complex Inputs

A Positon and Orientation Post-Processing Software Package for Land Applications - New Technology

Norsk Regnesentral (NR) Norwegian Computing Center

Subsumption Architecture in Swarm Robotics. Cuong Nguyen Viet 16/11/2015

Wireless Spectral Prediction by the Modified Echo State Network Based on Leaky Integrate and Fire Neurons

Spoofing GPS Receiver Clock Offset of Phasor Measurement Units 1

During the summer of 2008, I created a sensor survey

Image Processing and Particle Analysis for Road Traffic Detection

Transcription:

How can Physics Inform Deep Learning Methods in Scientific Problems: Recent Progress and Future Prospects Anuj Karpatne Post-Doctoral Associate, University of Minnesota karpa009@umn.edu http://www.cs.umn.edu/~anuj 1

Outline Why Deep Learning Needs Physics? Theory-guided Data Science Recent Progress Future Prospects 2

Big Data in Physical and Life Sciences Earth Science Genomics Satellite Data In-situ Sensors Model Simulations Experimental Data Survey Reports Material Science 3

Age of Data Science Deep Learning Input Output 1 Input Output Black-box models learn patterns and models solely from data without relying on scientific knowledge N Hugely successful in commercial applications: 4

Promise of Data Science in Transforming Scientific Discovery Unlike earlier attempts [AI systems] can see patterns and spot anomalies in data sets far larger and messier than human beings can cope with. July 7 2017 Issue 5

Promise of Data Science in Transforming Scientific Discovery Will the rapidly growing area of Unlike earlier attempts black-box data science models [AI systems] can see patterns and spot anomalies in data make existing theory-based models obsolete? sets far larger and messier than human beings can cope Wired Magazine, 2008 with. July 7 2017 Issue 6

Limits of Black-box Data Science Methods Predicted flu using Google search queries Overestimated by twice in later years Climate Science: 7

Why Do Black-box Methods Fail? (1/2) Scientific problems are often under-constrained Complex, dynamic, and non-stationary relationships Large number of variables, small number of samples Standard methods for evaluating ML models (e.g., cross-validation) break down Easy to learn spurious relationships that look deceptively good on training and test sets But lead to poor generalization outside the available data Huge number of samples is critical to success of methods such as deep learning 12/8/17 8

Why Do Black-box Methods Fail? (2/2) Interpretability is an important end-goal (esp. in scientific problems) - Castelvecchi 2016 Need to explain or discover mechanisms of underlying processes to Form a basis for scientific advancements Safeguard against the learning of non-generalizable patterns 12/8/17 9

Contain knowledge gaps in describing certain processes (turbulence, groundwater flow) Gravitational Law Theory-based Models Theory-based vs. Data Science Models Conservation of Mass, Momentum, Energy Navier-Stokes Equation Schrodinger s Equation 10

Contain knowledge gaps in describing certain processes (turbulence, groundwater flow) Theory-based Models Theory-based vs. Data Science Models Take full advantage of data science methods without ignoring the treasure of accumulated knowledge in scientific theories 1 Karpatne et al. Theory-guided data science: A new paradigm for scientific discovery, TKDE 2017 Theory-guided Data Science Models (TGDS)1 Data Science Models Require large number of representative samples 11

Theory-guided Data Science: Emerging Applications Material Science: Earth Science: Karpatne et al., Physics-guided Neural Networks: Application in Lake Temperature Modeling, SDM 2018 (in review). Faghmous et al., Theory-guided data science for climate change, IEEE Computer, 2014. Faghmous and Kumar, A big data guide to understanding climate change: The case for theory-guided data science, Big data, 2014. Fluid Dynamics: Singh et al., Machine learning- augmented predictive modeling of turbulent separated flows over airfoils, arxiv, 2016. Curtarolo et al., The high-throughput highway to computational materials design, Nature Materials, 2013. Computational Chemistry: Li et al., Understanding machine-learned density functionals, International Journal of Quantum Chemistry, 2015. Neuroscience, Biomedicine, Particle Physics, Workshop on Deep Learning for Physical Sciences 2017 AI for Scientific Progress, 2016 Symposium by Los Alamos National Laboratory, 2016, 2018 Physical Analytics Research Division 12

An Overarching Objective of TGDS Learning Physically Consistent Models Traditionally, simpler models are preferred for generalizability Basis of several statistical principles such as bias-variance trade-off M1 (less complex model): High bias Low variance M3 (more complex model): Low bias High variance Generalization Performance Accuracy + Simplicity 13

An Overarching Objective of TGDS Learning Physically Consistent Models Traditionally, simpler models are preferred for generalizability Basis of several statistical principles such as bias-variance trade-off M1 (less complex model): High bias Low variance M3 (more complex model): Low bias High variance In scientific problems, physical consistency can be used as another measure of generalizability Can help in pruning large spaces of inconsistent solutions Result in generalizable and physically meaningful results Generalization Performance Accuracy + Simplicity + Consistency 14

Physics-Guided Neural Networks (PGNN) A Framework for Learning Physically Consistent Deep Learning Models Scientific Knowledge (Physics) Used to guide selection of model architecture, activation functions, loss functions, Karpatne et al., Physics-guided neural networks (PGNN): Application in Lake Temperature Modeling, SDM 2018 (in review; arxiv: 1710.11431). 15

Case Study: Lake Temperature Modeling Input Drivers: Target Output: Short-wave Radiation, Long-wave Radiation, Air Temperature, Relative Humidity, Wind Speed, Rain, Temp. of water at every depth Temp Physics-based Approach: General Lake Model (GLM)1 Captures physical processes responsible for energy balance Requires lake-specific calibration using large amounts of data and computational resources 1 Hipsey et al., 2014 RMSE of Uncalibrated Model: 2.57 RMSE of Calibrated Model: 1.26 (for Lake Mille Lacs in Minnesota) 16

PGNN 1: Use GLM Output as Input in Neural Network Deep Learning can augment physics-based models by modeling their errors Part of a broader research theme on creating hybrid-physics-data models Input Drivers + Output of GLM (Uncalibrated) 17

PGNN 2: Use Physics-based Loss Functions Temp estimates need to be consistent with physical relationships b/w temp, density, and depth Physical Constraint: Denser water is at higher depth 18 Does not require labels!

Physical Consistency Ensures Generalizability GLM (Uncalibrated) Black-box Neural Network PGNN GLM (Calibrated) 2.57 1.77 1.16 1.26 RMSE (in C) PGNN PGNN 19

Future Prospects: Theory-guided Data Science 1. Theory-guided Learning Choice of Loss Function Constrained Optimization Methods Probabilistic Models [Limnology, Chemistry, Biomedicine, Climate, Genomics] 2. 3. 4. Theory-guided Design Creating Hybrid Models of Theory and Data Science Residual Modeling Predicting Intermediate Quantities [Hydrology, Turbulence Modeling] 5. [Turbulence Modeling, Neuroscience] Post-processing Pruning [Remote Sensing, Material Science] Choice of Response/Loss Functions Design of Model Architecture Theory-guided Refinement Augmenting Theory-based Models using Data Calibrating Model Parameters Data Assimilation 20 [Hydrology, Climate Science, Fluid Dynamics]

Concluding Remarks Black-box deep learning methods not sufficient for knowledge discovery in scientific domains Physics can be combined with deep learning in a variety of ways under the paradigm of theory-guided data science Use of physical knowledge ensures physical consistency as well as generalizability Theory-guided data science is already starting to gain attention in several disciplines: Climate science and hydrology Turbulence modeling Bio-medical science Bio-marker discovery Material discovery Computational chemistry, 21

Thank You! Karpatne, A., Atluri, G., Faghmous, J.H., Steinbach, M., Banerjee, A., Ganguly, A., Shekhar, S., Samatova, N. and Kumar, V., Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data. IEEE Transactions on Knowledge and Data Engineering, 29(10), pp.2318-2331, 2017. Karpatne, A., Watkins W., Read, J., and Kumar, V., Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling. SIAM International Conference on Data Mining 2018 (in review; arxiv: 1710.11431). Contact: karpa009@umn.edu 22