Project Title: Submitter: Team Problem Statement

Similar documents
Project Title: Submitter: Team Problem Statement

Finland s drive to become a world leader in open science

Digital Preservation Program: Organizational Policy Framework (06/07/2010)

Data-intensive environmental research: re-envisioning science, cyberinfrastructure, and institutions

Open Science policy and infrastructure support in the European Commission. Joint COAR-SPARC Conference. Porto, 15 April 2015

RECOMMENDATIONS. COMMISSION RECOMMENDATION (EU) 2018/790 of 25 April 2018 on access to and preservation of scientific information

RESEARCH DATA MANAGEMENT PROCEDURES 2015

Evolution of Data Creation, Management, Publication, and Curation in the Research Process

University of Kansas. The University of Kansas Libraries

New forms of scholarly communication Lunch e-research methods and case studies

Digitisation Plan

SEMINAR: Preparing research data for open access

HANDSOME LAMS?: COLLABORATIONS AROUND COLLECTIONS AT YALE UNIVERSITY

STRATEGIC FRAMEWORK Updated August 2017

Workshop on the Open Archives Initiative (OAI) and Peer Review Journals in Europe: A Report

Catching Up: Creating a Digital Preservation Policy After the Fact

The Scholarly Communication Cycle and Research Data

Open Science. challenge and chance for medical librarians in Europe.

Research Data Preservation in Canada A White Paper

Enabling FAIR Data in the Earth, Space, and Environmental Sciences

Trends in. Archives. Practice MODULE 8. Steve Marks. with an Introduction by Bruce Ambacher. Edited by Michael Shallcross

Attribution and impact for social science data

Convergence of Knowledge and Culture

Anne Gilliland Summer School in the Study of Old Books Zadar, Croatia, 27 September, 2009

Digital Preservation Policy

COMMISSION RECOMMENDATION. of on access to and preservation of scientific information. {SWD(2012) 221 final} {SWD(2012) 222 final}

What Do Librarians Want? How Google Has Changed Traditional Expectations

TeesRep policy document

Pedro Príncipe, Najla Rettberg, Eloy Rodrigues, Mikael K. Elbæk, Jochen Schirrwagen, Nikos Houssos, Lars Holm Nielsen, Brigitte Jörg

National Perpetual Access & Digital Preservation CRKN & Scholars Portal

Cyberinfrastructure Frameworks for Community Driven Science

Institutional Repositories: A Disruptive Response To an Established Paradigm

Digital Preservation Analyst

e-infrastructures in FP7: Call 9 (WP 2011)

Research Data - Infrastructure and Services Wim Jansen European Commission DG CONNECT einfrastructure

The NEW IUScholarWorks at Indiana University. Repositories, Journals, and Scholarly Publishing

OpenAIRE: a pillar for Open Science in the EU

SERBIA. National Development Plan. November

Why we need a Network of Usage Data Providers - OpenAIRE Impact Metrics Results

If These Crawls Could Talk: Studying and Documenting Web Archives Provenance

Elements of Scholarly Discourse in a Digital World

University of Massachusetts Amherst Libraries. Digital Preservation Policy, Version 1.3

The European Research Council. The ERC Open Access Working Group Views on Research Data Management and DMPs. Martin Stokhof

1. Digital Humanities in the Library: Challenges and Opportunities for Subject Specialists. Copyright 2015 by The Association of College & Research

The Stewardship Gap INTRODUCTION

Why? A Documentation Consortium Ted Habermann, NOAA. Documentation: It s not just discovery... in global average

Research and Publication in the Digital Age

Documentary Heritage Development Framework. Mark Levene Library and Archives Canada

Significant Properties of Digital Objects

Reframing Collections for a Digital Age: A Preparatory Study for. Collecting and Preserving Web-based Art Research Materials

Open Data, Open Science, Open Access

Computational Reproducibility in Medical Research:

Disciplines, Documents, and Data: Roles for Research Libraries in e-research

Perspectives on Negotiating Licenses and Copyright

Redefining Value: Alternative Metrics and Research Outputs

Open Repositories 2017 Isomorphic Pressures on Institutional Repositories in Japan

The Impact of Electronic Publishing

Guidelines for the Professional Evaluation of Digital Scholarship by Historians

Hacking the Web of Science data? From bibliometric projects to researcher portals

New perspectives on article-level metrics: developing ways to assess research uptake and impact online

Sharing the effort a personal view on D3.4

A STUDY OF UNDERGRADUATE USE OF CLOUD COMPUTING APPLICATIONS: SPECIAL REFERENCE TO GOOGLE DOCS.

Best Practice and Minimum Standards in Digital Preservation. Adrian Brown, UK Parliament Oracle PASIG, London, 5 April 2011

Increased Visibility in the Social Sciences and the Humanities (SSH)

Digital Libraries: Concept Map Exercise

Long term preservation, discovery, access and exploitation of Earth Science data: the CASPAR and GENESI-DR combined approach

Own the User Experience: Provide Discovery for Your Users

Europeana as a Resource for Social Scientists in Agriculture and Food: a Case Study

KU Libraries Digital Data Services Strategy

Open Science for the 21 st century. A declaration of ALL European Academies

OpenAIRE Guidelines. Release 4.0 alpha. OpenAIRE

Opening Science & Scholarship

The Five R s for Developing Trusted Software Frameworks to increase confidence in, and maximise reuse of, Open Source Software

Open Science and Research Initiative Infrastructures and networking for Open Science Seminar on at the University of Helsinki

Sharing data alongside publications with Taylor & Francis

Starting a Digital Preservation Program

Open Science in the Digital Single Market

SI Digital Libraries, Winter 2008

For more information about how to cite these materials visit

The Digital National Library of Scotland Strategic Plan

Memorandum on the long-term accessibility. of digital information in Germany

Data the NIH: What is Happening & What is Coming: A Conversation

Increasing Access to Certain North Carolina Environmental Data -- North Carolina Policy Collaboratory Project Update

Mergers Possibilities & Impact of Mergers in Australia and Overseas

THE UNIVERSITY OF NOTTINGHAM Recruitment Role Profile Form

Earth Cube Technical Solution Paper the Open Science Grid Example Miron Livny 1, Brooklin Gore 1 and Terry Millar 2

8) NOR AZLINAYATI ABDUL MANAF

FSD and CESSDA ERIC: Trusted, sustainable and integrated infrastructures

PROGRESS REPORT

In Defense of the Book

Embedding Digital Preservation across the Organisation: A Case Study of Internal Collaboration in the National Library of New Zealand

Personas to Support Development of Cyberinfrastructure for Scientific Data Sharing

General Briefing v.1.1 February 2016 GLOBAL INTERNET POLICY OBSERVATORY

Initial communication and dissemination plan. Elias Alevizos, Alexander Artikis, George Giannakopoulos. Scalable Data Analytics Scalable Algorithms,

Responsible Data Use Policy Framework

e-infrastructures for open science

Introduction to Data- PASS

Webserver deployment on. Amazon Web Services using IAC tool Terraform

OPEN SCIENCE: TOOLS, APPROACHES, AND IMPLICATIONS *

Report from the Usage Dimensions of Open Workgroup

Transcription:

Project Title: Dash: an easy to use Data Publication service Submitter: Marisa Strong, Application Development Manager, UC Curation Center, California Digital Library, University of California, Office of the President 510 987 0228 marisa.strong@ucop.edu Team consists of John Chodacki and Stephen Abrams (Principal Investigators), Daniella Lowenberg, Product Manager, Marisa Strong, Technical Development Manager, Scott Fisher, Lead Front end developer, David Moles, Lead Backend Developer, Bhavitavya Vedula, Developer, John Kratz, UI/UX Designer, Joel Hagedorn, Web Production Developer Problem Statement The integration of information technology and resources into all phases of scientific activity has led to the development of a new paradigm of data intensive science [1]. However, this paradigm can only realize its full potential in the context of a scientific culture of widespread data curation, publication, sharing, and reuse. Unfortunately, the record to date is not encouraging: far too few datasets are appropriately documented, effectively managed and preserved, or made available for public discovery and retrieval [2]. There are many reasons for this lack of data stewardship, and the most commonly 1. A lack of education about good data management practices [3], 2. Poor incentives for researchers to describe and share their datasets [4], and 3. A dearth of easy to use tools for data curation. The incentives problem is being addressed by increasing mandates for more proactive data management. Furthermore, it is increasingly no longer optional to provide access to data: sharing is becoming a matter of institutional policy and disciplinary best practice, and a precondition for grant funding and publication (e.g., recent directives from the US Office of Science and Technology Policy [5]). Although this means researchers have more incentives to participate in data stewardship, there is still a lack of easy to use tools, resulting in practices that may impede future access to datasets. As evidence, many researchers that do choose to archive are doing so in one of three ways, each potentially problematic: Commercially owned systems (e.g., figshare, Dropbox, Amazon S3). Potential problem: these solutions are owned by groups who may not fully share the academic value of openness, and who may not have a primary goal of long term data preservation. Supplemental materials alongside the main journal article. Potential problem: These materials are not always preserved and accessible for the long term [6]. Personal website. Potential problem: personal websites are often poorly maintained and eventually abandoned. Both research and anecdotal evidence indicate the average lifespan of a website is between 44 and 100 days [7]. A better option for data archiving is community repositories, which are owned and operated by trusted organizations (i.e., institutional or disciplinary repositories). Although disciplinary repositories are often known and used by researchers in the relevant field, institutional repositories are less well known as a place to archive and Why aren t researchers using institutional repositories? First, the repositories are often not set up for self service operation by individual researchers who wish to deposit a single dataset without assistance. Second, many (or perhaps most) institutional repositories were created with publications in mind [8], rather than datasets, which may in part account for their less than ideal functionality. Third, user interfaces for the repositories are often poorly designed and do not take into account the user s experience (or inexperience) and expectations. Because more of our activities are conducted on the Internet, we are exposed to many high quality, commercial grade user interfaces in the

course of a workday. Correspondingly, researchers have expectations for clean, simple interfaces that can be learned quickly, with minimal need for contacting repository administrators. Solution We are addressing the three issues above with Dash, a well designed, user friendly data publication platform that can be layered on top of existing community repositories. Rather than creating a new repository or rebuilding community repositories from the ground up, Dash provides a way for organizations to allow self service deposit of datasets via a simple, intuitive interface that is designed with individual researchers in mind. Researchers are able to document, preserve, and publicly share their own data with minimal support required from repository staff, as well as be able to find, retrieve, and reuse data made available by others. Collaboration Dash is very much a service that has involved collaboration across campuses, external organizations (DataONE and Orange County Data Portal), and CDL s UI/UX department. Campuses have and will continue to provide feedback via usability testing which will influence an iterative development model. While campus has their own URL and landing page (example: dash.berkeley.edu, datashare.ucsf.edu, etc.) Dash is a single instance application hosted by CDL. Deployment Timeline After initial research into existing platforms and frameworks, Dash development began in earnest in Summer 2015. An agile development methodology was utilized to create user stories which produced the feature set of the Minimum Viable Product (MVP) production release last Fall 2016. User feedback was obtained on the MVP version to assess and refine the features of the tool with continuing, iterative development. The project continues to provide releases to the service in 2 4 week increments. Development and release iterations can be tracked on the Github project page. Technology Dash utilizes a combination of technologies, the web application itself, hosted on Amazon Web Services Cloud infrastructure (EC2 and RDS), is built on a Ruby On Rails framework. Many of the technologies used are open source. Dash utilizes both Shibboleth and Google authentication mechanisms, provides submission processing to the Merritt institutional repository via the SWORD protocol, which in turn exposes metadata for harvesting via the OAI PMH protocol. The harvested metadata is indexed using SOLR technology with the discovery of datasets and publications provided by a GeoBlacklight portal. Persistent identifiers (DOIs) for assigned utilizing the EZID API, another service designed and implemented at CDL. All of these technologies are implemented modularly to allow for customization of campus and institutional branding, storage upload limits, and defining time periods for time released publication of datasets.

Measuring Project Success For qualitative assessment, our product manager has been coordinating with each campus utilizing Dash capturing feedback from both the researchers and libraries. A team of representatives from each campus have made up a Dash User Group that meets regularly to advise on future releases and necessary improvements. Throughout the project we have captured usage metrics as indicators of Dash adoption and community uptake. Particularly we have monitored metrics with regards to the use of Dash for data publication and access.

APPENDIX 2: BIBLIOGRAPHY [1] Hey, T, S Tansley, and K Tolle (2009), The Fourth Paradigm: Data Intensive Scientific Discovery. Microsoft Research. Available at http://fourthparadigm.org/ [2] Tenopir, C, S Allard, K Douglass, A Aydinoglu, L Wu, E Read, M Manoff, and M Frame (2011), Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6: e21101+. http://dx.doi.org/10.1371/journal.pone.0021101 [3] Strasser, C and SE Hampton (2012), The Fractured Lab Notebook: Undergraduates and Ecological Data Management Training in the United States. Ecopshere 3:art116. doi:10.1890/es12 00139.1 [4] Borgman, C (2012), "The conundrum of sharing research data," Journal of the American Society for Information Science 63(6): 1059 1078. [5] Holdren, JP (2013), Memorandum for the Heads of the Executive Departments and Agencies: Increasing Access to the Results of Federally Funded Scientific Research. February 22, 2013 Memo from the White House Office of Science and Technology Policy. Available at http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.p df [6] Evangelou, E, T Trikalinos, and J Ioannidis (2005), Unavailability of online supplementary scientific information from articles published in major journals. FASEB Journal 19(14): 1943 1944. [7] Taylor, N (2011), "The average lifespan of a webpage," The Signal Digital Preservation Blog, available at http://blogs.loc.gov/digitalpreservation/2011/11/the average lifespan of a webpage/