COMPUTATIONAL SOCIAL SCIENCE AND ADVANCED COMPUTING INFRASTRUCTURE: CHALLENGES AND OPPORTUNITIES

Similar documents
Thoughts on Reimagining The University. Rajiv Ramnath. Program Director, Software Cluster, NSF/OAC. Version: 03/09/17 00:15

Science of Science & Innovation Policy (SciSIP) Julia Lane

Science of Science & Innovation Policy and Understanding Science. Julia Lane

Big Data Analytics in Science and Research: New Drivers for Growth and Global Challenges

DISCIPLINARY AND INTERDISCIPLINARY RESEARCH AT NSF

Future City Glasgow. City of Glasgow

Please send your responses by to: This consultation closes on Friday, 8 April 2016.

Engineering NSF Budget and Priorities

Educating Leaders for the 21 st Century Role of Engineering

Research infrastructure in future plans of African Union. Mahama Ouedraogo Head of Division S&T

European Nuclear Education Network Association

Centre for Doctoral Training: opportunities and ideas

Open Data, Open Science, Open Access

g~:~: P Holdren ~\k, rjj/1~

Finland s drive to become a world leader in open science

Open Science for the 21 st century. A declaration of ALL European Academies

CERN-PH-ADO-MN For Internal Discussion. ATTRACT Initiative. Markus Nordberg Marzio Nessi

Realigning Historical Census Tract and County Boundaries

The Long Tail of Research Data

Over the 10-year span of this strategy, priorities will be identified under each area of focus through successive annual planning cycles.

Data Sciences for Humanity

The Uses of Big Data in Social Research. Ralph Schroeder, Professor & MSc Programme Director

RECOMMENDATIONS. COMMISSION RECOMMENDATION (EU) 2018/790 of 25 April 2018 on access to and preservation of scientific information

From FP7 towards Horizon 2020 Workshop on " Research performance measurement and the impact of innovation in Europe" IPERF, Luxembourg, 31/10/2013

Synergies between the ESIFs and H Research Infrastructures

DIGITAL TRANSFORMATION LESSONS LEARNED FROM EARLY INITIATIVES

Enhancing Access to the Radio Spectrum (EARS) Andrew Clegg National Science Foundation

Innovation and Funding Priorities at the Technology Strategy Board

HeliophysicsScience Centers

#Renew2030. Boulevard A Reyers 80 B1030 Brussels Belgium

Technology Trends for Government

MILAN DECLARATION Joining Forces for Investment in the Future of Europe

Swiss Re Institute. September 2018 Dr. Jeffrey R. Bohn

George Washington Carver (Red)

Enabling ICT for. development

ARTEMIS Industry Association

Information & Communication Technology Strategy

Innovation. Key to Strengthening U.S. Competitiveness. Dr. G. Wayne Clough President, Georgia Institute of Technology

Artificial Intelligence Machine learning and Deep Learning: Trends and Tools. Dr. Shaona

Tekes in the Finnish innovation system encouraging change in construction

Workshop on Enabling Technologies in CSF for EU Research and Innovation Funding

THE BLUEMED INITIATIVE AND ITS STRATEGIC RESEARCH AGENDA

United Nations Secretariat Department of Economic and Social Affairs Statistics Division. Census Info Workshop Onsite Technical Support

THE DIGITAL ECONOMY. BIAC OECD Business Day 7 November 2014 Panel on the Business Case for Innovation

CyPhers Project: Main Results

Roadmap for European Universities in Energy December 2016

UN GA TECHNOLOGY DIALOGUES, APRIL JUNE

ArcGIS Online Content

Roadmap Pitch: Road2CPS - Roadmapping Project Platforms4CPS Roadmap Workshop

THE TECH MEGATRENDS Christina CK Kerley

Elsevier LibraryConnect Seminar 9 July 2012, InterContinnental Hotel, Makati Lourdes J Cruz, PhD NAST, NRCP & UPMSI

Raviv Murciano-Goroff

STRATEGIC PLAN

Building the ERA of Knowledge for Growth. Proposals for the 7 th Research Framework Programme

SMART CITY: A SURVEY

TRIUMF ACCELERATING CANADIAN BUSINESS THROUGH SCIENCE AND INNOVATION Pre-Budget Consultation

MILTON KEYNES: HOW WE MADE OUR CITY SMARTER

Framework Programme 7

Our position. ICDPPC declaration on ethics and data protection in artificial intelligence

Written response to the public consultation on the European Commission Green Paper: From

The Stewardship Gap INTRODUCTION

University of Massachusetts Amherst Libraries. Digital Preservation Policy, Version 1.3

Innovations in fuel cells and related hydrogen technology in Norway

Fourth Annual Multi-Stakeholder Forum on Science, Technology and Innovation for the Sustainable Development Goals

Re-engineering Collaborative Mechanisms and Knowledge Networks to Accelerate Innovation for Alzheimer s

Digitisation Plan

Building an Infrastructure for Data Science Data and the Librarians Role. IAMSLIC, Anchorage August, 2012 Linda Pikula, NOAA and IODE GEMIM

TECHNOLOGICAL AND ORGANISATIONAL ASPECTS OF GLOBAL RESEARCH DATA INFRASTRUCTURES TOWARDS YEAR 2020

THE ROLE OF TRANSPORT TECHNOLOGY PLATFORMS IN FOSTERING EXPLOITATION. Josef Mikulík Transport Research Centre - CDV

Highways, ring road, expressways of tomorrow in the Greater Paris

9 th AU Private Sector Forum

CO-ORDINATION MECHANISMS FOR DIGITISATION POLICIES AND PROGRAMMES:

A New Path for Science?

Computer Challenges to emerge from e-science

Second Annual Forum on Science, Technology and Innovation for the Sustainable Development Goals

MERIL MAPPING OF THE EUROPEAN RESEARCH INFRASTRUCTURE LANDSCAPE

Sparking a New Economy. Canada s Advanced Manufacturing Supercluster

International Collaboration Tools for Industrial Development

SPACE EXPLORATION AS A DRIVER FOR GROWTH ESA INITIATIVE TO PARTNER WITH PRIVATE SECTOR

CHAPTER 5. MUSEUMS ADVISORY GROUP s RECOMMENDATIONS ON CACF. 5.1 M+ (Museum Plus)

High Performance Computing Systems and Scalable Networks for. Information Technology. Joint White Paper from the

More Than Citations and Impact Factor: Altmetric.com

Using Infrastructure Density for Resource Allocation Policy

Ethical, Epistemological, Methodological, Social and Other

Hackathons as a Source of Entrepreneurship in Corporations

FLOODRISE: AN INTERDISCIPLINARY APPROACH TO LEVERAGE TECHNOLOGY FOR RESILIENCE

Data-intensive environmental research: re-envisioning science, cyberinfrastructure, and institutions

Cross Linking Research and Education and Entrepreneurship

Strategy EXECUTIVE SUMMARY NATIONAL DOCUMENTATION CENTRE NHRF

What we are expecting from this presentation:

7424/18 CF/lv 1 DG G 3 C

ACADEMY PROGRAMMES 1 ACADEMY OF FINLAND 2016

Future of Cities. Harvard GSD. Smart[er] Citizens Bergamo University

Conclusions on the future of information and communication technologies research, innovation and infrastructures

Research Trends in NSF and JST-NSF Collaboration Opportunities

Wael Al-Delaimy MD, PhD. President, Society for Advancement of Science and Technology in the Arab World

FP7-INFRASTRUCTURES

NUIT Support of Researchers

Toppindustrisenteret AS. April 2017

Mission Agency Perspective on Assessing Research Value and Impact

Transcription:

COMPUTATIONAL SOCIAL SCIENCE AND ADVANCED COMPUTING INFRASTRUCTURE: CHALLENGES AND OPPORTUNITIES Myron Gutmann Directorate for the Social, Behavioral and Economic Sciences March, 2012 1 10/24/11

Portrait of Herman Hollerith courtesy of the Computer History Museum www.computerhistory.org Hollerith Electric Tabulator, US Census Bureau, 1908 Photograph by Waldon Fawcett Library of Congress, LC-USZ62-45687 image courtesy of the Early Office Museum www.earlyofficemuseum.com 2 3/2/2012

TAKING THE CENSUS 1870. ILLUSTRATION IN HARPER S WEEKLY, NOVEMBER 19, 1870, P.749 3 3/2/2012 Photo Credit: U.S. Census Bureau, Public Information Office Digital ID: cph 3b39850 Source: b&w film copy neg. Reproduction Number: LC-USZ62-93675 (b&w film copy neg.) [LC]

SBE RESEARCH INFRASTRUCTURE NHGIS General Social Survey 4 3/2/2012

5 3/2/2012

6 3/2/2012

COMMUNITY STRUCTURE OF POLITICAL BLOGS (2004) SHOWN USING A GEM LAYOUT IN THE GUESS VISUALIZATION AND ANALYSIS TOOL. THE COLORS REFLECT POLITICAL ORIENTATION, RED FOR CONSERVATIVE, AND BLUE FOR LIBERAL. ORANGE LINKS GO FROM LIBERAL TO CONSERVATIVE, AND PURPLE ONES FROM CONSERVATIVE TO LIBERAL. THE SIZE OF EACH BLOG REFLECTS THE NUMBER OF OTHER BLOGS THAT LINK TO IT. 7 3/2/2012 From The Political Blosphere and the 2004 US Election: Divided They Blog by Lada Adamic and Natalie Glance

FROM THE COLLECTIVE DYNAMICS OF SMOKING IN A LARGE SOCIAL NETWORK By Nicholas A. Christakis and James H. Fowler 8 3/2/2012 New England Journal of Medicine 358:21 (May 22, 2008)

New York Times, March 17, 2010 9 10/24/11

SUMMARY SO FAR: Long tradition of using computational technology in SBE research Shared traditions grew out of large-scale surveys and traditions of archiving, sharing, and reuse Newest research infrastructure is solidly cyber and largely sustainable Next-generation research questions at new scales while preserving confidentiality and privacy 10 3/2/2012

SBE 2020: WHAT WE LEARNED Goals: Identify decadal scale research through a community-based process Understand the programmatic implications for the directorate 252 white papers, several campus visits, attendance at professional organizations to solicit input and ideas Rebuilding the Mosaic (http://www.nsf.gov/sbe/sbe_2020) Vision: Future research will be collaborative, multidisciplinary, and data intensive and will address societal problems and fundamental scientific questions 11 3/2/2012

FUTURE SBE RESEARCH: TECHNOLOGY AND DATA DRIVERS Scale: More data from more sources (environmental, sensor, administrative, survey, commercial, usage, and so on) Density (merge, overlap, georectify) Tools (statistics, GIS, network analysis, modeling, scenarios) Granularity (fmri, administrative, commercial and behavioral level) Greater access to and demand for high performance computational resources 12 3/2/2012

SBE DEMAND: TWO MAJOR DIMENSIONS Platform for analysis Access to data (discovery) Access to related tools & software Access to compute cycles Access to assistance, training, and relevant expertise Infrastructure Maintain, archive, store, and preserve data in a useable form (understanding that some can be very large, e.g., fmri) Make available a core set of tools to enable comparable results 13 3/2/2012

BIGGER, FASTER, SMARTER. BUT HOW BIG IS BIG? HOW FAST IS FAST? Big Data : Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze (McKinsey Global Institute, Big data: the next frontier for innovation, competition, and productivity, May 2011). Fast: faster than can be achieved at the (1) desktop; (2) local cluster maintained by the department; (3) computational resources I am accustomed to using 14 3/2/2012

WHERE ARE WE SEEING APPLICATIONS...AND FRUSTRATIONS? What do investigators want to do? Text analysis at scale Network analysis Some statistical techniques (e.g., Bayesian analysis) Simulation Visualization And what fails? Exceeds the capabilities of the software package or the operating system Exceeds the computational power of the resource ( too slow ) Requires skills the investigator/ team doesn t have ( hire a programmer ) 15 3/2/2012

DATA ANALYSIS, PROCESSING, AND MERGING AND REGISTRATION 16 3/2/2012 Text analysis/language processing techniques at scale to understand dimensions of innovation, firm behavior, and productivity Patent, award, citation, text databases Record linking (individuals, firms) & geocoding Name/entity disambiguation Un or semi-structured text Large-scale data (IPUMS now as > 800 million records) Machine-learning techniques on images (fmri) to enable encoding Merging data from multiple streams (sensor, brain, administrative)

NETWORK ANALYSIS Social Exploit social media, such as Twitter, to under stand social networks and transmission of information (e.g., financial) Extensive application in commercial research Neuroscience/brain function Interactions of neurons in large-scale neural systems FROM ENTROPY OF DIALOGUES CREATES COHERENT STRUCTURES IN E-MAIL TRAFFIC By Jean-Pierre Eckmann, Elisha Moses, and Danilo Sergi Proceedings of the National Academy of Sciences 101(40): 14333-14337 17 10/24/11

SIMULATION AND MODELING Processes are: Inherently compute intensive Must be repeated (Monte Carlo) to remove noise Comparisons of models A hardware/software issue: Not all problems can be subdivided and distributed but must be run in parallel Applications in: Decision making and global climate change Brain function and specialization Learning Credit: Matthew K Leonard, University of California, San Diego Avian Flu Timeseries 18 10/24/11 www.nature.com

WHERE ARE THE CHALLENGES? 19 3/2/2012 Dirty data : Using commercial, administrative, and usage data will require new solutions disambiguation solutions will also be useful for cleaning non-traditional data sources. Commercial, administrative, & usage data may be restricted for subsequent use or may be covered by competing regulatory regimes. Streamed data requires extensive post-processing. Large and linked datasets may be exploited to identify individuals. Notions of consent affect use of legacy and future datasets. Public perceptions of the research may affect how data might be used.

WHERE IS THE SCIENCE Decision making, in the context of climate change but also in many other areas Effective policy making, for science and more generally Networks (social, information, neural) as either the object of study (e..g. Twitter) or a mechanism for understanding relationships (firm behavior, innovation) Neuroscience Learning What else? 20 3/2/2012

NSF S ROLE: CIF21 ADVANCED COMPUTING INFRASTRUCTURE Foundational research in computation Partnerships with the scientific domains Building, testing, and deploying innovative and sustainable resources in collaborative ecosystems Education and workshop programs in relevant scientific and technical areas Development and evaluation of transformational and grand challenge programs 21 3/2/2012

SBE S ROLE Continue investments in the existing data and computational infrastructure and in upgrades to it Release a new FY2012 solicitation : Building Community and Capacity for Data-Intensive Research in the Social, Behavioral, and Economic Sciences and in Education and Human Resources Seek opportunities to find ways to provide relevant training Continue programmatic support for infrastructure activities such as data collection, management, archiving, and storage 22 3/2/2012

QUESTIONS FOR YOU If social & behavioral scientists self-censor, defining experiments in terms of what they know or believe they know how to do, what should we do? What is the real capacity need? Cycles? Software? New data? Services? What s the role of NSF, other public bodies, universities, researchers? How should we meet those services, at the campus, regional, and national levels? 23 3/2/2012

10/24/11 THANK YOU!