Data-intensive environmental research: re-envisioning science, cyberinfrastructure, and institutions

Similar documents
Cyberinfrastructure Frameworks for Community Driven Science

University of Massachusetts Amherst Libraries. Digital Preservation Policy, Version 1.3

The Data Conservancy. CNI Spring Forum April 7, 2009

Project Title: Submitter: Team Problem Statement

Why? A Documentation Consortium Ted Habermann, NOAA. Documentation: It s not just discovery... in global average

Project Title: Submitter: Team Problem Statement

Enabling FAIR Data in the Earth, Space, and Environmental Sciences

STRATEGIC FRAMEWORK Updated August 2017

Digital Curation in the Era of Big Data: Career Opportunities and Educational Requirements: Entertainment Industry Perspective

Update: Blue Ribbon Task Force on Sustainable Digital Preservation and Access. Dr. Francine Berman

Cheryl Walters Tawnya Keller Chris Erickson ULA 2012

Digital Preservation Policy

ICSU World Data System Strategic Plan Trusted Data Services for Global Science

KU Libraries Digital Data Services Strategy

COMMISSION RECOMMENDATION. of on access to and preservation of scientific information. {SWD(2012) 221 final} {SWD(2012) 222 final}

The European Approach

Building an Infrastructure for Data Science Data and the Librarians Role. IAMSLIC, Anchorage August, 2012 Linda Pikula, NOAA and IODE GEMIM

Establishment of a Multiplexed Thredds Installation and a Ramadda Collaboration Environment for Community Access to Climate Change Data

RECOMMENDATIONS. COMMISSION RECOMMENDATION (EU) 2018/790 of 25 April 2018 on access to and preservation of scientific information

Open Science at Web-Scale: Breaking

Royal Pavilion & Museums DRAFT Digital Preservation Policy 2018

University of Kansas. The University of Kansas Libraries

g~:~: P Holdren ~\k, rjj/1~

from Science to Solutions ADVANCING THE ROLE OF FIELD STATIONS

Ensuring Continuity of Access to Resources for Scholarship Peter Burnhill

The ARK Iden+fier Scheme at Ten Years Old

A Preservation Compass finding digital preservation partners and solutions

ENVRIPLUS GENERAL INTRODUCTION. Ari Asmi ENVRIplus director. H2020 Project Project Number:

NEES CYBERINFRASTRUCTURE: A FOUNDATION FOR INNOVATIVE RESEARCH AND EDUCATION

KEK Archives, 11 August Why are social scientists interested in HEP?

Personas to Support Development of Cyberinfrastructure for Scientific Data Sharing

Introduction to Data- PASS

The PaNOSC Project. R. Dimper on behalf of the Consortium 30 January Photon and Neutron Open Science Cloud

Research Data Preservation in Canada A White Paper

Goals Planned Outcomes & Benefits Who Chairs:

GROUP OF SENIOR OFFICIALS ON GLOBAL RESEARCH INFRASTRUCTURES

Digital Sustainability: Tyler O. Walters

THE FIFTH DIMENSION. Chris Greer 1 INTRODUCTION. Definitions CHAPTER TWO

Digital Preservation:

Data and Knowledge as Infrastructure. Chaitan Baru Senior Advisor for Data Science CISE Directorate National Science Foundation

Strategic Plan Approved by Council 7 June 2010

Over the 10-year span of this strategy, priorities will be identified under each area of focus through successive annual planning cycles.

Rural Systems Visioneering: Paradigm Shift from Flux Monitoring to Sustainability Metrics

e-infrastructures for open science

Libraries on the Cutting Edge: The Evolution of The Journal of escience Librarianship

Some Research Trends: おはようございます. Outline:

Canada-Italy Innovation Award Call for Proposals

Starting a Digital Preservation Program

Strategy for a Digital Preservation Program. Library and Archives Canada

Library Special Collections Mission, Principles, and Directions. Introduction

Global Alzheimer s Association Interactive Network. Imagine GAAIN

Digital Libraries for Biodiversity and Natural History Collections

Scientific Data e-infrastructures in the European Capacities Programme

Data Preservation, Sharing, and Discovery: Challenges for Small Science in the Digital Era

NABCI Monitoring Subcommittee:

VIVO + ORCID = a collaborative project

Comparative Interoperability Project: Collaborative Science, Interoperability Strategies, and Distributing Cognition

Brief to the. Senate Standing Committee on Social Affairs, Science and Technology. Dr. Eliot A. Phillipson President and CEO

Public Report Briefing July 23, 2014 Jerry Schubel, Committee Chair

Trends in. Archives. Practice MODULE 8. Steve Marks. with an Introduction by Bruce Ambacher. Edited by Michael Shallcross

A Journal for Human and Machine

The Preservation of Electronic Records

REPORT FROM THE COMMISSION TO THE EUROPEAN PARLIAMENT AND THE COUNCIL. on the evaluation of Europeana and the way forward. {SWD(2018) 398 final}

The Hague Summer School

HEALTH-RI THE NETHERLANDS

Sustaining Domain Repositories for Digital Data: A Call for Change from an Interdisciplinary Working Group of Domain Repositories

DATA STEWARDSHIP A FUNDAMENTAL PART OF THE SCIENTIFIC METHOD. Clinton Foster, Jonathon Ross, Lesley Wyborn

Reframing Collections for a Digital Age: A Preparatory Study for. Collecting and Preserving Web-based Art Research Materials

Amgueddfa Cymru National Museum Wales. Collection Care & Conservation Policy

Systems Approaches to Health and Wellbeing in the Changing Urban Environment

University of Oxford Gardens, Libraries and Museums Digital Strategy

Office of Science and Technology Policy th Street Washington, DC 20502

Philippine Development Foundation (PhilDev)

Finland s drive to become a world leader in open science

Earth Cube Technical Solution Paper the Open Science Grid Example Miron Livny 1, Brooklin Gore 1 and Terry Millar 2

Workshop on the Open Archives Initiative (OAI) and Peer Review Journals in Europe: A Report

ORGANISATION FOR THE PROHIBITION OF CHEMICAL WEAPONS ADDRESS BY AMBASSADOR AHMET ÜZÜMCÜ DIRECTOR-GENERAL

EarthCube Conceptual Design: Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences

The International Journal of Digital Curation Volume 8, Issue

Preservation Curriculum

part of our cultural heritage? University of Freiburg, Germany

Digital Preservation Program: Organizational Policy Framework (06/07/2010)

International Cooperation in Horizon 2020

Global Libraries Challenges - e-libraries on the Agenda!

Social Networks and Archival Context R&D to Cooperative

Idea propagation in organizations. Christopher A White June 10, 2009

Introduction to Planets. Hans Hofman Nationaal Archief Netherlands Barcelona, 27 March 2009

Comparing Preservation Strategies and Practices for Electronic Records Michèle V. Cloonan and Shelby Sanett, University of California, Los Angeles

Open Science policy and infrastructure support in the European Commission. Joint COAR-SPARC Conference. Porto, 15 April 2015

Open Science for the 21 st century. A declaration of ALL European Academies

COMPUTATIONAL SOCIAL SCIENCE AND ADVANCED COMPUTING INFRASTRUCTURE: CHALLENGES AND OPPORTUNITIES

Preserving and Expanding Access to Legacy HEP Data Sets

The Scholarly Communication Cycle and Research Data

SMITHSONIAN GRAND CHALLENGES CONSORTIA

Measuring and Analyzing the Scholarly Impact of Experimental Evaluation Initiatives

Memorandum on the long-term accessibility. of digital information in Germany

The Stewardship Gap INTRODUCTION

REGIONAL DIALOGUE ON TECHNOLOGY FACILITATION FOR SUSTAINABLE DEVELOPMENT 18 MAY 2014, PATTAYA, THAILAND

14 th Berlin Open Access Conference Publisher Colloquy session

USEFUL TOOLS IN IMPLEMENTING MIGRATORY BIRD CONSERVATION BY THE DOD

Transcription:

Data-intensive environmental research: re-envisioning science, cyberinfrastructure, and institutions Patricia Cruse John Kunze California Digital Library University of California

Environmental research and global change Global change presents a complex scientific and societal challenge Society needs good data in order to build good science inform wise policy-making enable sustainable resource management decisions Good data and data-intensive research need solid technical infrastructure sound organization community engagement (you)

Data curation is hard Data sets encompass everything, including regular object types Documents, images, audio, video, etc. Tension between establishing standards and fostering innovation Data is like software, but even more specialized Heavy processing requirements imply tricky long-term migration/ emulation of custom data/software Heavy provenance and snapshot coherence requirements Instability: value of some preserved data depends on ongoing change, in particular, on researcher annotation

The complexities of global change Smith, Knapp, Collins. In press.

Data challenge 1: dispersed sources ( finding the needle in the haystack ) Data are widely distributed Ecological field stations and research centers (100 s) Natural history museums and bio-collection facilities (100 s) Agency data collections (100 s to 1000 s) Individual scientists (1000 s to 10,000s to 100,000s)

Data challenge 2: diversity the flood of increasingly heterogeneous data Data are heterogeneous Syntax (format) Schema (model) Semantics (meaning) Jones et al. 2007

Data challenge 3: poor practice data entropy Time of publication Specific details Information Content Accident General details Retirement or career change Death Time (Michener et al. 1997)

CDL Data challenge 4: loss Natural disaster Facilities infrastructure failure Storage failure Server hardware/software failure Application software failure External dependencies (e.g. PKI failure) Format obsolescence Legal encumbrance Human error Malicious attack by human or automated agents Loss of staffing competencies Loss of institutional commitment Loss of financial stability Changes in user expectations and requirements Source: S. Abrams, CDL

Data challenge 4: more loss 1,000,000 Petabytes Worldwide 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000 Information Available Storage Transient information or unfilled demand for storage 0 2005 2006 2007 2008 2009 2010 Source: John Gantz, IDC Corporation: The Expanding Digital Universe

Cumulative impact on data longevity Study Resource Type Resource Half-life Rumsey (2002) Legal Citations 1.4 years Harter and Kim (1996) Scholarly Article Citations 1.5 years Koehler (1999 and 2002) Random Web Pages 2.0 years Spinellis (2003) Markwell and Brooks (2002) Computer Science Citations Biological Science Education Resources 4.0 years 4.6 years Nelson and Allen (2002) Digital Library Objects 24.5 years Koehler, W. (2004) Information Research 9(2): 174.

Data Observation Network for Earth The goal of DataONE is to enable new science through universal access to data about life on earth by: engaging the scientist in the data preservation process supporting the full data life cycle, encouraging data stewardship and sharing promoting best practices engaging citizens DataONE and Data Conservancy (JHU) are two DataNet awardees recommended for funding by the US National Science Foundation (NSF), and between which collaboration is expected

CDL Initial data types Biological genes to biomes Environmental Atmospheric Ecological Hydrological Oceanographic

Existing biological data archives ESA s Ecological Archive Distributed Active Archive Center National Biological Information Infrastructure Fire Research & Management Exchange System Long Term Ecological Research Network Knowledge Network for Biocomplexity

Existing cyberinfrastructure: tools 15

New distributed framework Coordinating Nodes Member Nodes retain complete metadata diverse institutions catalog subset of all data serve local community perform basic indexing provide network-wide resources for managing services their data ensure data availability (preservation) provide replication services Flexible, scalable, sustainable network

CDL/UC3 roles in DataONE Member of overall DataONE leadership team Chairing Governance and Sustainability working group Chairing Preservation working group

DataONE management and partners William Michener, University of New Mexico Suzie Allard University of Tennessee Bob Cook Oak Ridge National Laboratory DAAC Patricia Cruse California Digital Library Mike Frame USGS, National Biological Info. Infrastructure Matt Jones University of California Santa Barbara Steve Kelling Cornell Lab of Ornithology DataONE Partners plus Kepler- CORE and SEEK/KNB Teams We welcome your involvement!

Building global communities of practice and long-lived cyberinfrastructure Community engagement Involve library and science educators Build on existing programs Involvement of cultural memory organizations brings centuries of preservation experience to datasets

Other data researcher support projects DataCITE initiative to encourage data publishing through global data citation support, including citation standards and persistent reference to datasets in regional archives Curation and hosting for existing museums and archives (e.g., UC Berkeley s Media Vault Project) Support for publication of extended data description appendices, typically omitted from journals, but which drops crucial details support research conclusions and credit for data producers (reducing incentive for data sharing)

Summary Libraries and cultural memory organizations play a vital role in meeting the global change challenge through dataintensive research Getting started includes projects such as DataONE to work directly with scientists DataCITE for data citation support Joining forces with museums and archives