Data and Knowledge as Infrastructure. Chaitan Baru Senior Advisor for Data Science CISE Directorate National Science Foundation

Similar documents
Overview of the NSF Programs

Research Trends in NSF and JST-NSF Collaboration Opportunities

Computer & Information Science & Engineering (CISE)

Thoughts on Reimagining The University. Rajiv Ramnath. Program Director, Software Cluster, NSF/OAC. Version: 03/09/17 00:15

Overview: National AI R&D Strategic Plan

Computer & Information Science & Engineering (CISE)

High Performance Computing Systems and Scalable Networks for. Information Technology. Joint White Paper from the

Science of Science & Innovation Policy (SciSIP) Julia Lane

Measuring and Analyzing the Scholarly Impact of Experimental Evaluation Initiatives

December 10, Why HPC? Daniel Lucio.

Opening Science & Scholarship

President Barack Obama The White House Washington, DC June 19, Dear Mr. President,

Science of Science & Innovation Policy and Understanding Science. Julia Lane

A Journal for Human and Machine

EDUCATION EMPLOYMENT. 2009: Elected to Member of IBM Academy of Technology.

Deep Learning Overview

DISCUSSION. 12th IAPR International Workshop on Graphics Recognition Kyoto, Japan - November Josep Lladós

TITLE OF PRESENTATION. Elsevier s Challenge. Dynamic Knowledge Stores and Machine Translation. Presented By Marius Doornenbal,, Anna Tordai

OVERVIEW OF ARTIFICIAL INTELLIGENCE (AI) TECHNOLOGIES. Presented by: WTI

3D-Assisted Image Feature Synthesis for Novel Views of an Object

Demonstration of DeGeL: A Clinical-Guidelines Library and Automated Guideline-Support Tools

CYBER-INFRASTRUCTURE SUPPORT FOR ENGINEERING DESIGN

FDA Centers of Excellence in Regulatory and Information Sciences

A CYBER PHYSICAL SYSTEMS APPROACH FOR ROBOTIC SYSTEMS DESIGN

Discovering Undiscovered Public Knowledge with Influence Search

Collaborative Research Assistant

Journal Title ISSN 5. MIS QUARTERLY BRIEFINGS IN BIOINFORMATICS

2018 NISO Calendar of Educational Events

M-CREAM: A Tool for Creative Modeling of Emergency Scenarios in Smart Cities

Models as a Foundation for Systems Engineering Should We Expect a Breakthrough? Brett Malone Vitech Corporation

FORESIGHT AND UNDERSTANDING FROM SCIENTIFIC EXPOSITION (FUSE) Incisive Analysis Office. Dewey Murdick Program Manager

VIVO + ORCID = a collaborative project

PRESIDENT S FORUM NOVEMBER 7, 2013

Creating a human-centered society

ArkPSA Arkansas Political Science Association

Cyber-Engineering: Advances in Simulation and Visualization for Engineering Design

NEDO s Activities in the Robotics and Artificial Intelligence Fields

Autonomy Test & Evaluation Verification & Validation (ATEVV) Challenge Area

Don R. Swanson Impact on Information Science

Understanding Research with Semantic Technologies

Service Science: A Key Driver of 21st Century Prosperity

Space Biology RESEARCH FOR HUMAN EXPLORATION

Patent Mining: Use of Data/Text Mining for Supporting Patent Retrieval and Analysis

MSc(CompSc) List of courses offered in

Keynotes. Visual Mining Interpreting Image and Video. Stefan Rüger Professor Knowledge Media Institute, The Open University, UK

Pure Versus Applied Informatics

457 APR The Fourth Medium to Long-term Plan has started. No.

Beyond MBSE: Looking towards the Next Evolution in Systems Engineering

Discovering Undiscovered Public Knowledge with Influence Search

Telecoms and Tech Week

Executive Summary Industry s Responsibility in Promoting Responsible Development and Use:

Modelling and Mapping the Dynamics and Transfer of Knowledge. A Co-Creation Indicators Factory Design

Data-intensive environmental research: re-envisioning science, cyberinfrastructure, and institutions

Development and Integration of Artificial Intelligence Technologies for Innovation Acceleration

Bricken Technologies Corporation Presentations: Bricken Technologies Corporation Corporate: Bricken Technologies Corporation Marketing:

AI Day on Knowledge Representation and Automated Reasoning

Social Networks, Cyberinfrastructure (CI) and Meta CI

Cyber-Physical Systems, Power Grid, and Engineering Education NSF Perspective

Big Data Analytics in Science and Research: New Drivers for Growth and Global Challenges

STRATEGIC FRAMEWORK Updated August 2017

Data Sciences for Humanity

Discovering Knowledge in Design and Manufacturing Repositories

A Framework towards Sustaining Scalable Community- Driven Ontology Engineering

Artificial Intelligence in Medicine. The Landscape. The Landscape

Reproducibility Interest Group

Pervasive Services Engineering for SOAs

Information Visualizations that Improve Access to Scholarly Knowledge and Expertise

A Social Creativity Support Tool Enhanced by Recommendation Algorithms: The Case of Software Architecture Design

Expert Group Meeting on

InSciTe Adaptive: Intelligent Technology Analysis Service Considering User Intention

Graduate Studies in Computational Science at U-M. Graduate Certificate in Computational Discovery and Engineering. and

35. NIH/LA-INBRE Program Core in Bioinformatics, PI: Sumeet Dua; Funding Agency: National Institutes of Health, Funded: $550,000; 5/ /2015.

(Network) Data Visualization Literacy

DRK-12 Research and Development:

Human-Centric Trusted AI for Data-Driven Economy

Knowledge Management for Command and Control

Convergence, Grand Challenges, Team Science, and Inclusion

2018 ASSESS Update. Analysis, Simulation and Systems Engineering Software Strategies

Health Care Analytics: Driving Innovation

Institute of Information Systems Hof University

The five senses of Artificial Intelligence

2018 Avanade Inc. All Rights Reserved.

Introduction to Computer Science - PLTW #9340

Seoul Initiative on the 4 th Industrial Revolution

Modelling Science, Technology, and Innovation

Wi-Fi Fingerprinting through Active Learning using Smartphones

Good Benchmarks are Hard To Find: Toward the Benchmark for Information Retrieval Applications in Software Engineering ABSTRACT 1. WHY?

The robots are coming, but the humans aren't leaving

Humanities, Arts, Social Science - Research Group

Scientific Transparency, Integrity, and Reproducibility

The Center for Identification Technology Research (CITeR)

AI: The New Electricity to Harness Our Digital Future Lindholmen Software Development Day Oct

Fujitsu Laboratories R&D Strategy Briefing

Visual analytic tools for monitoring and understanding the emergence and evolution of innovations in science & technology

Interaction Design in Digital Libraries : Some critical issues

Use of Ontology to Facilitate the Creation of Synthetic Imagery of Industrial Facilities

Initial communication and dissemination plan. Elias Alevizos, Alexander Artikis, George Giannakopoulos. Scalable Data Analytics Scalable Algorithms,

The Importance of Scientific Data Curation for Evaluation Campaigns

Automotive Sector What is our interest in CAV & ITS and Why? Nigel J Francis

g~:~: P Holdren ~\k, rjj/1~

Transcription:

Data and Knowledge as Infrastructure Chaitan Baru Senior Advisor for Data Science CISE Directorate National Science Foundation 1

Motivation Easy access to data The Hello World problem (courtesy: R.V. Guha) Access a 1PB (or, 100TB, or 10TB?) dataset Create a subset of 10TB Perform an operation (statistical computation) Print the result Do this as a homework problem by next class session In a class of 500 students Dataset size is not important; could be about accessing multiple, heterogeneous data sources, 2

Motivation Better access to data Why can t I talk to my data? Natural (natural language) interfaces to data And talk to my data about other data? Story Telling Need to be able to tell stories about your data Milind Kamkolkar, CDO, Sanofi, hired journalists as his first hires as a CDO. From MIT CDOIQ meeting, July 12-14, 2017 Want to tell stories with data 3

Motivation Data in an interlinked world NITRD Big Data Interagency Working Group Workshop on Metrics for Digital Data Repositories, July 2017 An observation: One of the evaluation criteria for data repositories should be about how well they are networked to other data 4

NSF RESEARCH IDEAS Big Ideas Work at the Human- Technology Frontier: Shaping the Future Windows on the Universe: The Era of Multimessenger Astrophysics The Quantum Leap: Leading the Next Quantum Revolution Harnessing Data for 21 st Century Science and Engineering Navigating the New Arctic Understanding the Rules of Life: Predicting Phenotype PROCESS IDEAS Mid-scale Research Infrastructure NSF 2050: Seeding Innovation Growing Convergent Research at NSF NSF-INCLUDES: Enhancing Science and Engineering through Diversity 2017 MIT CDOIQ Symposium, Jul 12-14, 2017 2

Harnessing the Data Revolution: five themes Research across all NSF Directorates Theoretical foundations mathematics, statistics, computer & computational science Data-intensive research in all areas of science and engineering Systems, algorithms data-centric algorithms, systems Science domains Systems, algorithms Foundations Cyber infrastructure Education, Workforce Educational pathways Innovations grounded in an education-research-based framework Advanced cyberinfrastructure NSF AC-ERE Meeting Oct 31, 2017 Accelerating data-intensive research 2

Motivation for Knowledge Infrastructure Foster research on a class of new applications leveraging data, context, and inferences from data Support integrative analysis and interpretation of multimodal data Develop advanced applications, e.g.: Question/answer interfaces Dialog-based interactions Explanatory/story-telling interfaces 12

Past/Current Related NSF Efforts Research on creation of knowledge bases (representation, performance) creation of ontologies knowledge extraction knowledge aggregation reasoning 8

Example NSF projects - 1 Knowledge Graph Mining for Financial Risk Analytics, PI: Mohammed Zaki, 2017 a "financial risk" knowledge graph from textual and semantic features mined from the publicly available annual and quarterly reports filed with the SEC; and textual data from news articles and credit assessment reports. Developing the Next Generation of Community Financial CyberInfrastructure for Monitoring and Modeling Financial Eco-Systems and for Managing Systemic Risk, PI: Louiqa Raschid, 2013 Financial entity identification data challenges 2016, 2017 In collaboration with NIST and OFR, https://ir.nist.gov/dsfin Creation of multiple open source graph datasets using SEC filings in collaboration with IBM Almaden. 9

Example NSF projects - 2 From Data to Knowledge: Extracting and Utilizing Concept Graphs in Online Environments, PI: Cornelia Caragea, 2016 Explore construction of scholarly knowledge graphs by combining evidence from multiple resources, in an open information extraction framework; Design and develop novel algorithms for detection and analysis of interesting and previously unknown connections between concepts, to enforce knowledge discovery on the Scholarly Web; Investigate the utility of scholarly knowledge graphs in a question answering system 10

Example NSF projects 3 Scalable Probabilistic Inference for Large Knowledge Bases, PI: Dan Suciu, 2016 Use of database technology to support construction of knowledge bases/graphs Efficient Query Processing over Large Probabilistic Knowledge Bases, PI: Daisy Zhe Wang, 2015 Infer missing knowledge from large-scale knowledge bases Fusion of Heterogeneous Networks for Synergistic Knowledge Discovery, PI: Philip Yu, 2015 Effective transfer of relevant knowledge across partially aligned networks depends upon the relatedness of the different networks, and also the target applications/uses 11

Example NSF projects - 4 Constructing Knowledge Bases by Extracting Entity- Relations and Meanings from Natural Language via "Universal Schema, PI: Andrew McCallum, 2015 Automated knowledge base (KB) construction from natural language Knowledge Graph Query Processing and Benchmarking, PI: Xifeng Yan, 1528175 Provide a standardized way to fairly and comprehensively evaluate different knowledge graph query algorithms; Improve understanding of existing query engines; Advance the area by providing a common benchmarking framework 12

Example NSF projects - 5 Using Knowledge Resources to Improve Information Retrieval, PI: Jamie Callan, 2014 Examines how to use knowledge bases to improve IR tasks such as ad hoc search Some of the work was performed in conjunction with Allen Institute for Artificial Intelligence's Semantic Scholar search engine. Link documents and queries to the KB through entities which improves the representation of the query and document, leading to more accurate ranking. KG4IR: The First Workshop on Knowledge Graphs and Semantics for Text Retrieval and Analysis, in conjunction with ACM SIGIR 2017, Tokyo, Japan, August 11, 2017 13

Science and Ontologies Many efforts across sciences, especially Biomedical, Biology, Ecology, in developing and using ontologies Some significant effort in other domains, e.g. astronomy, hydrology, some areas of engineering More recent efforts in other domains, e.g. materials science, social science, education research, 14

Recent related meetings Community and inter-agency meetings Entities, Facts, Questions, Answers: Building the Foundations for Semantic Information Processing July 2016, Washington, DC TOKeN: The Open Knowledge Network February 27 th, Sunnyvale, CA Workshop on Creating an Open Knowledge Network October 4-5, 2017, National Library of Medicine, Bethesda, MD, Attendees from academia, industry, govt Participation by NSF, NIH, DARPA, NIST, NASA 12