The Stewardship Gap INTRODUCTION

Similar documents
Update: Blue Ribbon Task Force on Sustainable Digital Preservation and Access. Dr. Francine Berman

Science Impact Enhancing the Use of USGS Science

Sustaining Domain Repositories for Digital Data: A Call for Change from an Interdisciplinary Working Group of Domain Repositories

STRATEGIC FRAMEWORK Updated August 2017

International Federation of Library Associations, Social Science Libraries Section, Satellite Conference

Opening Science & Scholarship

ICSU World Data System Strategic Plan Trusted Data Services for Global Science

University of Massachusetts Amherst Libraries. Digital Preservation Policy, Version 1.3

Digital Curation in the Era of Big Data: Career Opportunities and Educational Requirements: Entertainment Industry Perspective

ADVANCING KNOWLEDGE. FOR CANADA S FUTURE Enabling excellence, building partnerships, connecting research to canadians SSHRC S STRATEGIC PLAN TO 2020

Open Science policy and infrastructure support in the European Commission. Joint COAR-SPARC Conference. Porto, 15 April 2015

The Long Tail of Research Data

Interoperable systems that are trusted and secure

RECOMMENDATIONS. COMMISSION RECOMMENDATION (EU) 2018/790 of 25 April 2018 on access to and preservation of scientific information

How CRISs are key to the future of research libraries INCONECSS April 2016 Berlin

Our digital future. SEPA online. Facilitating effective engagement. Enabling business excellence. Sharing environmental information

Computational Reproducibility in Medical Research:

Open Science for the 21 st century. A declaration of ALL European Academies

Introduction to Data- PASS

EOSC Governance Development Forum 6 April 2017 Per Öster

Office of Science and Technology Policy th Street Washington, DC 20502

HSS Scholars & Scientists Workgroup Report

Evolution of Data Creation, Management, Publication, and Curation in the Research Process

Enabling FAIR Data in the Earth, Space, and Environmental Sciences

The European Research Council. The ERC Open Access Working Group Views on Research Data Management and DMPs. Martin Stokhof

g~:~: P Holdren ~\k, rjj/1~

Survey of Institutional Readiness

Strategic Planning for Digital Data Stewardship: Head in the Clouds but Feet on the Floor*

Project Title: Submitter: Team Problem Statement

Attribution and impact for social science data

Canadian Clay & Glass Gallery. Strategic Plan

Research Data Preservation in Canada A White Paper

International comparison of education systems: a European model? Paris, November 2008

KU Libraries Digital Data Services Strategy

UKRI research and innovation infrastructure roadmap: frequently asked questions

Why does it cost so much?

KT for TT Ensuring Technologybased R&D matters to Stakeholders. Center on Knowledge Translation for Technology Transfer University at Buffalo

Brief to the. Senate Standing Committee on Social Affairs, Science and Technology. Dr. Eliot A. Phillipson President and CEO

Positioning Libraries in the Digital Preservation Landscape

Strategic Research Plan

Sustainable Society Network+ Research Call

The Contribution of the Social Sciences to the Energy Challenge

Research and Innovation Strategy and Action Plan UPDATE Advancing knowledge and transforming lives through education and research

Building an Infrastructure for Data Science Data and the Librarians Role. IAMSLIC, Anchorage August, 2012 Linda Pikula, NOAA and IODE GEMIM

Digitisation Plan

Investing in Knowledge: Insights on the Funding Environment for Research on Inequality Among Young People in the United States

The value of libraries has been a prominent topic in library literature over the last five years with much emphasis placed on developing assessment

Earth Science and Applications from Space National Imperatives for the Next Decade and Beyond

STATE REGULATORS PERSPECTIVES ON LTS IMPLEMENTATION AND TECHNOLOGIES Results of an ITRC State Regulators Survey. Thomas A Schneider

SERBIA. National Development Plan. November

WORKSHOP ON BASIC RESEARCH: POLICY RELEVANT DEFINITIONS AND MEASUREMENT ISSUES PAPER. Holmenkollen Park Hotel, Oslo, Norway October 2001

Making It Your Own A PUBLIC ART POLICY AND PLANNING TEMPLATE. Arts North West Creative Opportunities 2012

University of Oxford Gardens, Libraries and Museums Digital Strategy

NEES CYBERINFRASTRUCTURE: A FOUNDATION FOR INNOVATIVE RESEARCH AND EDUCATION

Our brand is the total Colorado State University experience. Who we are, what we do, why we do it, how we do it, and who we do it for.

Project Title: Submitter: Team Problem Statement

From: President Magna Charta Observatory To: Council and Review Group Date: 8 September Towards a new MCU a first exploration and roadmap

Finland s drive to become a world leader in open science

Earth Cube Technical Solution Paper the Open Science Grid Example Miron Livny 1, Brooklin Gore 1 and Terry Millar 2

SI Digital Libraries, Winter 2008

DISCIPLINARY AND INTERDISCIPLINARY RESEARCH AT NSF

Country Paper : Macao SAR, China

Keynote Address: "Local or Global? Making Sense of the Data Sharing Imperative"

Infusing Consumer Data Reuse Practices into Curation and Preservation Activities

Committee on Development and Intellectual Property (CDIP)

Request for Information (RFI): Strategic Plan for the National Library of Medicine, National Institutes of Health

National Workshop on Responsible Research & Innovation in Australia 7 February 2017, Canberra

Over the 10-year span of this strategy, priorities will be identified under each area of focus through successive annual planning cycles.

Vietnam s Innovation System: Toward a Product Innovation Ecosystem.

Public Report Briefing July 23, 2014 Jerry Schubel, Committee Chair

The Data Conservancy. CNI Spring Forum April 7, 2009

EXECUTIVE BOARD MEETING METHODOLOGY FOR DEVELOPING STRATEGIC NARRATIVES

Selecting, Developing and Designing the Visual Content for the Polymer Series

Graduate in Food Engineering. Program Educational Objectives and Student Outcomes

Interdisciplinary Topics in Science 40S Course Code 0140 DRAFT November 2008 GLO A Nature of Science and Technology

The European Approach

A Digitisation Strategy for the University of Edinburgh

Using Emergence to Take Social Innovations to Scale Margaret Wheatley & Deborah Frieze 2006

DRAFT RECOMMENDED INFORMATION NEEDS AND PROGRAM ELEMENTS FOR A PROPOSED AMP SOCIOECONOMIC PROGRAM SOCIOECONOMIC AD HOC GROUP

Lifecycle of Emergence Using Emergence to Take Social Innovations to Scale

English National Curriculum Key Stage links to Meteorology

Connecting Science and Society. NWO strategy

Enabling ICT for. development

Science and engineering driving the global economy David Delpy, CEO May 2012

THE NUMBERS OPENING SEPTEMBER BE PART OF IT

Creating a New Kind of Knowledge Institution. Directions for JUNE 2004

IP and Technology Management for Universities

University of Kansas. The University of Kansas Libraries

COMPUTATIONAL SOCIAL SCIENCE AND ADVANCED COMPUTING INFRASTRUCTURE: CHALLENGES AND OPPORTUNITIES

GESIS Leibniz Institute for the Social Sciences

The Policy Content and Process in an SDG Context: Objectives, Instruments, Capabilities and Stages

SEMINAR: Preparing research data for open access

The Geotechnical Data Journey How the Way We View Data is Being Transformed

Committee on Earth Science and Applications from Space

Scientific Data e-infrastructures in the European Capacities Programme

Science of Science & Innovation Policy (SciSIP) Julia Lane

Continuity and change Opportunities and challenges for the future of research libraries in a data-intensive age

CHAPTER-5. Suggestions and Conclusion

LSCB Pan-Lancashire LSCB Online Safeguarding Strategy

FP7-INFRASTRUCTURES

Transcription:

The Stewardship Gap Myron Gutmann, University of Colorado Boulder Jeremy York, University of Colorado Boulder Francine Berman, Rensselaer Polytechnic Institute http://bit.ly/stewardshipgap Coalition for Networked Information April 3-4, 2016 Austin, Texas INTRODUCTION Stewardship Gap @ CNI 2016 2 1

Stewardship Gap Problem Research data à innovation. Research increasingly expected to be available to the broader research community and general public now and in the future. Preservation and stewardship of research data often ad hoc with much of it at risk How much is sustainable? What data is at risk? What should we do about it? Lack of understanding about the sustainable stewardship gap hampers evidence-based discussion, prioritization and potential strategic investments. At Risk Sustainable (Valuable) Sponsored Research Data Sustainable Stewardship Gap? Stewardship Gap @ CNI 2016 3 Is there a Stewardship Gap? NIH estimates* for 2011 PubMed Central publications: 12% of publication data sets deposited in recognized repositories, 88% of the data sets were invisible Estimated approximately 200,000-235,000 invisible data sets generated NIH work published in 2011 87% of the invisible are new, 13% reflect data re-use More than 50% of the datasets based on live human/ animal subjects Lack of comprehensive understanding about the broader sustainable stewardship gap hampers evidence-based discussion, prioritization and potential strategic investments. * From PLOS ONE http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132735 Stewardship Gap @ CNI 2016 4 2

4/5/16 How would knowing the size and nature of the Stewardship Gap help? Funders, and particularly public funders, are under great pressure to show how their funding contributes to broad economic growth, how it addresses the needs of society, and to demonstrate that the requirements that they impose on the work they fund makes discovery ever more rapid, extensive, and cost-effective. From this perspective, they are not interested in data preservation or even data sharing other than as a necessary precondition to data reuse; they are interested in conformance to their data management and sharing policies because it is the only way they can create the preconditions for data reuse. They are hungry for examples of how data reuse has improved the processes of scholarship and discovery, or contributed to economic growth, job creation, control of health care costs, or public policy. Clifford Lynch,The Next Generation of Challenges in the Curation of Scholarly Data, Research Data Management: Practical Strategies for Information Professionals, edited by Joyce M. Ray.West Lafayette, IN: Purdue University Press, 2013. IDC reports on the Digital Universe, http://www.emc.com/leadership/ digital-universe/index.htm#archive AMPAS report on the Digital Dilemma, http://www.scribd.com/doc/55498058/ 5 Stewardship Gap @ CNI 2016 The-Digital-Dilemma The Stewardship Gap Project Understand the gap between valuable digital data and the amount responsibly stewarded Address the question: So what if there is a stewardship gap? Who s Involved? [Planning Group] Myron Gutmann, U. of Colorado (PI, co-lead) Fran Berman, RPI (co-lead) Jeremy York (Project Manager) George Alter, ICPSR Chris Borgman, UCLA Phil Bourne, NIH Vint Cerf, Google Sayeed Choudhury, Johns Hopkins University Elizabeth Cohen, Stanford University Trisha Cruse, DataONE Peter Fox, RPI John Gantz, IDC Margaret Hedstrom, U. of Michigan Brian Lavoie, OCLC Cliff Lynch, CNI Andy Maltz, Science and Technology Council, Academy of Motion Picture Arts and Sciences Guha Ramanathan, Google Stewardship Gap @ CNI 2016 6 3

Specific Tasks Identify a sampling frame and strategic case studies Develop a robust evaluation instrument Produce a set of actionable recommendations and summary reports that can help guide strategic decisions about the stewardship gap Understand Universe Perform Evaluation Make Recommend -ations Provoke Action Stewardship Gap @ CNI 2016 7 Not One Gap But Many Many kinds of gaps Different gaps require different measurements Need to connect future policy and strategies-- investment and otherwise--to the measurable gaps Method Read Literature: The Stewardship literature identifies many kinds of gaps, which we explore in this research Interview members of the community to learn what s being done and how they perceive the stewardship of their data. Stewardship Gap @ CNI 2016 8 4

The Stewardship literature is extensive See our bibliography at: http://bit.ly/1pd9vvo Seven important themes: Culture, Knowledge, Resources, Actions, Responsibility,, and Value (which is inside Culture but overarching in its importance) This tree diagram takes the literature we ve explored and shows the important topics scaled to their prevalence in the literature, divided into six themes Culture Knowledge Actions Resources Responsibility 5

Six Stewardship Gaps Culture Knowledge Responsibility Resources Actions Gaps arising from differences in community attitudes norms and goals that affect data stewardship Gap between the knowledge needed to effectively steward data, and what is currently known Gap between who has responsibility for stewardship and who is best placed to steward data over time Gap between the commitments that exist for valuable data and those necessary to ensure long-term stewardship Gap between the people, money, infrastructure, and tools needed to steward data, and what is now available Gap between the actions taken to facilitate stewardship of data and the actions needed Stewardship Gap @ CNI 2016 11 Six Stewardship Gaps Value (of the data) Culture Knowledge Gaps arising from differences in community attitudes norms and goals that affect data stewardship Gap between the knowledge needed to effectively steward data, and what is currently known Responsibility Resources Actions Gap between who has responsibility for stewardship and who is best placed to steward data over time Gap between the commitments that exist for valuable data and those necessary to ensure long-term stewardship Gap between the people, money, infrastructure, and tools needed to steward data, and what is now available Gap between the actions taken to facilitate stewardship of data and the actions needed Stewardship Gap @ CNI 2016 12 6

The Critical Importance of Value Value is an overarching theme Articulated or not, the value of data should determine the extent of stewardship Value is measured multiple ways, to the original researcher and others, in one field of study as opposed to others, now and in the future The hardest question to answer is the tradeoff between value and investment. What value of data is worth what amount of stewardship investment? Stewardship Gap @ CNI 2016 13 What to measure and how? PHASE 1: PRELIMINARY INVESTIGATION Stewardship Gap @ CNI 2016 14 7

What to Measure Is there a gap? Stewardship Gap @ CNI 2016 15 What to Measure Is there a gap? What is the value of data and for how long will they be valuable What is the extent of stewardship commitment on data Value Stewardship Gap @ CNI 2016 16 8

What to Measure Is there a gap? What is the value of data and for how long will they be valuable What is the extent of stewardship commitment on data Who can act to address the gap? Value Responsibility Stewardship Gap @ CNI 2016 17 What to Measure Is there a gap? What is the value of data and for how long will they be valuable What is the extent of stewardship commitment on data Who can act to address the gap? How much data and what kind is at risk? Value Responsibility Amount and Characteristics Stewardship Gap @ CNI 2016 18 9

What to Measure Scope of data interest Data resulting from sponsored research or creative work in the US, whether publicly or privately funded (we have focused on research outputs, primarily federally-funded) Unit of Analysis: Project A body of work that has a defined scope and resources and a distinct beginning and end (not necessarily a single grant) Stewardship Gap @ CNI 2016 19 How to Measure Interviews Whom to ask Those responsible for project data Principle Investigators, staff involved in data production and management Stewardship Gap @ CNI 2016 20 10

What to ask Project Context Stewardship Value Purpose, domains of science, collaborators, funders, size and characteristics of data (Responsibility, Knowledge) For how much of the data is there 1) a commitment to preserve 2) an intention to preserve 3) no intention to preserve (no intention to delete) 4) the data are temporary (and will be deleted) Who stewarding data, what is being done to take care of them, concerns about stewardship, prospects when current commitment has ended (Culture, Responsibility,, Resources, Actions) Why is the data valuable and for how long, how does the valuation affect stewardship decisions, worthwhile to reassess the value in the future? (Culture, Activities) Culture Knowledge Responsibility Resources Actions Stewardship Gap @ CNI 2016 21 PROJECT CONTEXT Stewardship Gap @ CNI 2016 22 11

Respondents 17 Respondents in 16 disciplines from 13 institutions (31 contacts) Data Sets Ranged from tiny to 50 TB Geography History Archaeology Economics Political science Psychology Public administration Information Researcher Disciplines Education Environmental studies Physical performance & recreation Neuroscience Astronomy Computer sciences Physics Statistics Stewardship Gap @ CNI 2016 23 Respondents 17 Respondents in 16 disciplines from 13 institutions (31 contacts) Data Sets Ranged from tiny to 50 TB Education History Environmental studies Archaeology Physical performance & recreation represent Economics 32 Neuroscience domains Political science Astronomy Psychology Computer sciences Public administration Information Researcher Disciplines Geography Resulting data of research Physics Statistics Stewardship Gap @ CNI 2016 24 12

Data Description 17 projects, 39 datasets Number of Projects 5 4 3 2 1 0 <.1 GB < 5 GB Data size < 100 GB < 500 GB < 20 TB < 50 TB Video, Audio, Text Digital image streams Data from interviews, questionnaires, surveys Chat files Field same of vegetation and soils Housing prices Simulation models of land use Voltage measurements Software Topic models Tag clouds Behavioral action logs GIS information Plant and animal diversity data Maps, on-site images Database graphs Service and configuration data Business transaction information Project Years Multi-year projects are represented in each project year 14 12 10 8 6 4 2 0 Stewardship Gap @ CNI 2016 26 13

Project Funding Institute of Educational Studies Society for Research and Development Sloan Foundation Department of Energy NSF NEH NIH Stewardship Gap @ CNI 2016 27 Limitations Small number of respondents, but observations are revelatory Weak on biological science and medicine Our next set of sample cases will add 50 more observations by late spring Stewardship Gap @ CNI 2016 28 14

COMMITMENT AND VALUE Stewardship Gap @ CNI 2016 29 Number of Datasets 25 20 15 10 5 0 Type of and Term of Intention No Intention Temporary Unsure Indefinite 10s of years 10 years 5 years < 2 years Unsure *One project reported two commitment levels on the same data Researchers want to keep data for a long time, but the desire is not matched by commitment 3/5 of datasets have an intention to preserve For 3/4 of these, the intention is 10+ years 1/10 of 10+ yr datasets have commitment Do intentions translate into preserved data? Stewardship Gap @ CNI 2016 30 15

Type of and Term of Value Number of Datasets 25 20 15 10 5 0 Intention No Intention Temporary Unsure Indefinite 100s of years 10s of years <= 10 years <2 years Life of Project Researchers believe their data have longterm value. For datasets with >10 years of value: 2 out of 34 have a matching commitment ~1/3 have no explicit intention to preserve Stewardship Gap @ CNI 2016 31 Type of Value, and Term of Value Number of Datasets 30 25 20 15 10 5 0 Own research Costly to reproduce Reuse by others Impact Most common reasons for data value: Their own research use Data costly to reproduce Reuse by others Demonstrated or potential impact Indefinite < 100 years <= 10 years Life of project Stewardship Gap @ CNI 2016 32 16

Number of Datasets 14 12 10 8 6 4 2 0 Demand in Community Reasons for Value with Greatest Impact on Preservation s #3 Longitudinal Value Uniqueness of Data Most common reasons for value Steward's Mission to Preserve #2 Difficult to Reproduce There is a mismatch between the value researchers believe their data to have and the value researchers believe drives preservation commitments #1 Own Research Some types of value had the greatest impact on preservation decisions: Community demand Unique data Data preservation mission Data hard to reproduce Value for the researcher s own work Stewardship Gap @ CNI 2016 33 Number of Datasets 10 8 6 4 2 0 Confidence in Stewardship Personal Institutional Multi-institutional or public Type of Stewardship In 13 out of 20 stewardship locations researchers felt very (5) or reasonably (8) confident in the ability of the data steward to fulfill the preservation commitment on the data Very confident Reasonably confident Confident in short-term, concerns in long-term Somewhat concerned Opinion not obtained How well-founded is this confidence? 17

Number of Projects 4 3 2 1 0 Prospects for stewardship when the existing commitment/intention is over Personal Within institution Multi-institutional or public Type of Stewardship Few researchers had specific plans for stewardship; many assumed that their institution would take on that role. No specific plans Tentative plans Definite plans Stewardship Gap @ CNI 2016 35 Progress on Objectives (1) 1. To get a good sense of the sponsored research data universe by identifying a sampling frame and strategic case studies that provide an accurate and meaningful view of research data stewardship on a broader scale. à Working on in Phase 2 2. To assess the stewardship gap by developing a robust evaluation instrument, flexible to multiple levels on which research data is created and maintained, and capable of providing useful information for data stewards, research administrators, and other stakeholders to underlie strategic decision-making about research data stewardship. à Developed in Phase 1 and refined for Phase 2 Stewardship Gap @ CNI 2016 36 18

Progress on Objectives (2) 3. To produce a set of actionable recommendations and summary reports that can help guide strategic decisions about the stewardship gap, research data stewardship landscape, and needed efforts to ensure sustainable long-term access to valuable sponsored research data. à Pending Stewardship Gap @ CNI 2016 37 Next Steps 50 more interviews with a more structured sample in the next couple of months Added questions about Are data collected to share or to test a specific hypothesis? Use of secondary data (previously implicit) Was the primary goal of transferring responsibility to share with others or to preserve data? Expectations about stewardship of project data Make a decision about a future, more comprehensive study Stewardship Gap @ CNI 2016 38 19

What have we learned so far? There s a lot of diversity in research data stewardship, which makes our task challenging but exciting One of the challenges is a need to improve knowledge translation about data between researchers, data scientists, and data stewards Researchers want to have their data well stewarded, but don t always get the commitments that would ensure long-term stewardship Stewardship Gap @ CNI 2016 39 From Gaps to Policy: Possible Examples Culture Knowledge Responsibility Resources Actions Value If researchers don t always get the commitments that would ensure long-term stewardship, find ways to give them and stewardship organizations incentives to do so Stewardship Gap @ CNI 2016 40 20

From Gaps to Policy: Possible Examples Culture Knowledge Responsibility Resources Actions Value Knowledge Data management plans have a lot to teach us, but they need to be more informative and more readily available. Find ways to improve DMPs and make them useful for data science research Stewardship Gap @ CNI 2016 41 From Gaps to Policy: Possible Examples Culture Knowledge Responsibility Resources Actions Value Value Researchers distinguish degrees and durations of data value for different purposes. Provide policy structures to use information about value to inform stewardship Stewardship Gap @ CNI 2016 42 21

Topics for Discussion What do we need to do to make this relevant for you? What additional information do we need for findings from our project to have policy implications What have we missed and what else should we be thinking about? How do the limits of our methodology (a small number of detailed interviews) affect our results and future work? Stewardship Gap @ CNI 2016 43 Stewardship Gap Bibliography: http://bit.ly/1pd9vvo Jeremy.York@colorado.edu Myron.Gutmann@colorado.edu Tag cloud of bibliography topics Generated 1/10/2016 from https://www.jasondavies.com/ Stewardship Gap @ CNI 2016 44 22