Why does it cost so much?

Similar documents
Update: Blue Ribbon Task Force on Sustainable Digital Preservation and Access. Dr. Francine Berman

University of Massachusetts Amherst Libraries. Digital Preservation Policy, Version 1.3

Strategy for a Digital Preservation Program. Library and Archives Canada

Costing the Digital Preservation Lifecycle More Effectively

Digital Preservation Policy

Selection and Acquisition of Materials for Digitization in Libraries 1

LIFE 3 : Predicting Long Term Digital Preservation Costs

Digitisation Plan

Survey of Institutional Readiness

Digital Preservation Program: Organizational Policy Framework (06/07/2010)

RLG, Where Museums, Libraries, and Archives Intersect

SERBIA. National Development Plan. November

Digitization and Scanning Basics at RRLC Planning a Digitization Project: November 27, Vision & Goals:

What is a collection in digital libraries?

WAY TO A DIGITAL NATION

The Stewardship Gap INTRODUCTION

Digital Curation in the Era of Big Data: Career Opportunities and Educational Requirements: Entertainment Industry Perspective

Library Special Collections Mission, Principles, and Directions. Introduction

LIFE 3 : A PREDICTIVE COSTING TOOL FOR DIGITAL COLLECTIONS

LIFE 3 : A PREDICTIVE COSTING TOOL FOR DIGITAL COLLECTIONS

Digital Preservation:

A Digitisation Strategy for the University of Edinburgh

Economies of the Commons 2, Paying the cost of making things free, 13 December 2010, Session Materiality and sustainability of digital culture)

Over the 10-year span of this strategy, priorities will be identified under each area of focus through successive annual planning cycles.

Starting a Digital Preservation Program

Metadata for Photographs SHN Post-Conference Workshop - ATALM 2016 Part 2: Image Digitization

STRATEGIC FRAMEWORK Updated August 2017

Royal Pavilion & Museums DRAFT Digital Preservation Policy 2018

Gardens, Libraries and Museums. Digital Strategy Termly Update, June 2018

REPORT ON THE INTERNATIONAL CONFERENCE MEMORY OF THE WORLD IN THE DIGITAL AGE: DIGITIZATION AND PRESERVATION OUTLINE

Technology Investment Plan for Research Announcement 19-01

Best Practice and Minimum Standards in Digital Preservation. Adrian Brown, UK Parliament Oracle PASIG, London, 5 April 2011

Trends in. Archives. Practice MODULE 8. Steve Marks. with an Introduction by Bruce Ambacher. Edited by Michael Shallcross

TERMS OF REFERENCE FOR CONSULTANTS

Deconstructing Digital Libraries. Neil Jefferies R&D Project Manager Systems & eresearch Service (SERS) Bodleian Libraries, Oxford University

Ross Harvey GSLIS, Simmons College. November 15, 2008

Collection care and conservation policy

Introduction to Planets. Hans Hofman Nationaal Archief Netherlands Barcelona, 27 March 2009

The concept of significant properties is an important and highly debated topic in information science and digital preservation research.

PRESERVATION OF INFORMATION MANAGEMENT IN DIGITAL ERA

HANDSOME LAMS?: COLLABORATIONS AROUND COLLECTIONS AT YALE UNIVERSITY

CO-ORDINATION MECHANISMS FOR DIGITISATION POLICIES AND PROGRAMMES:

RESEARCH DATA MANAGEMENT PROCEDURES 2015

TeesRep policy document

Image Digitization: Best Practices and Training

content prior to the existence of these recommendations.

Living on the LAM: Libraries, Archives and Museums in the Digital Age

The Specimen Case and the Garden: Preserving Complex Digital Objects, Sustaining Digital Projects

Research Data Preservation in Canada A White Paper

COMMISSION RECOMMENDATION. of on access to and preservation of scientific information. {SWD(2012) 221 final} {SWD(2012) 222 final}

European Charter for Access to Research Infrastructures - DRAFT

Dr Richard Zheng, PhD. Director of Intellectual Property Development. University of East London 2009

Digital Preservation Assessment: Readying Cultural Heritage Institutions for Digital Preservation

Ensuring Continuity of Access to Resources for Scholarship Peter Burnhill

Scanning. Records Management Factsheet 06. Introduction. Contents. Version 3.0 August 2017

Comparing Preservation Strategies and Practices for Electronic Records Michèle V. Cloonan and Shelby Sanett, University of California, Los Angeles

ALA s Core Competences of Librarianship

Office of Science and Technology Policy th Street Washington, DC 20502

Guidelines for the Professional Evaluation of Digital Scholarship by Historians

State Archives of Florida Collection Development Policy

An investigative report on current long-term digital preservation situation among major Chinese libraries

GAMS: More than a Digital Asset Management System

THE MASSACHUSETTS HISTORICAL SOCIETY STRATEGIC PLAN,

Designing Sustainable Data Archives: Comparing Sustainability Frameworks

Low-Cost, On-Demand Film Digitisation and Online Delivery. Matt Garner

Technology Evaluation. David A. Berg Queen s University Kingston, ON November 28, 2017

Service Science: A Key Driver of 21st Century Prosperity

SI Digital Libraries, Winter 2008

The National Library Service (SBN) towards Digital

Attribution and impact for social science data

Digital Preservation Strategy Implementation roadmaps

National Perpetual Access & Digital Preservation CRKN & Scholars Portal

J A M E S C O S U L L I VA N J O S U L L I VA N. O R G U N I V E R S I T Y O F S H E F F I E L D

1. Digital Humanities in the Library: Challenges and Opportunities for Subject Specialists. Copyright 2015 by The Association of College & Research

Museum & Archives Access Policy

Memorandum on the long-term accessibility. of digital information in Germany

Recordkeeping for Good Governance Toolkit. GUIDELINE 15: Scanning Paper Records to Digital Records

CHANGING USE PATTERNS OF DIGITIZED LIBRARY AND ARCHIVE MATERIALS. Dan Paterson. Introduction

BEST PRACTICES EXCHANGE. Milt Shefter

Embedding Digital Preservation across the Organisation: A Case Study of Internal Collaboration in the National Library of New Zealand

LIBRARY AND ARCHIVES POLICY

Charles E. Young Research Library, UCLA

RECOMMENDATIONS. COMMISSION RECOMMENDATION (EU) 2018/790 of 25 April 2018 on access to and preservation of scientific information

Continuity and change Opportunities and challenges for the future of research libraries in a data-intensive age

Managerial issues in building digital collections

THE STANLEY KUBRICK ARCHIVE AT UNIVERSITY OF THE ARTS LONDON

For more information about how to cite these materials visit

International Symposium on Knowledge Communities 2012

NCRIS Capability 5.7: Population Health and Clinical Data Linkage

Workshop on the Open Archives Initiative (OAI) and Peer Review Journals in Europe: A Report

Digital Sustainability: Tyler O. Walters

The Digital National Library of Scotland Strategic Plan

Essential requirements for a spectrum monitoring system for developing countries

The Cedars Project. Maggie Jones

List of Members. Abby Smith Rumsey, Historian and Consulting Analyst to the Library of Congress, San Francisco, CA

CONSIDERATIONS REGARDING THE TENURE AND PROMOTION OF CLASSICAL ARCHAEOLOGISTS EMPLOYED IN COLLEGES AND UNIVERSITIES

Reframing Collections for a Digital Age: A Preparatory Study for. Collecting and Preserving Web-based Art Research Materials

The importance of linking electronic resources and their licence terms: a project to implement ONIX for Licensing Terms for UK academic institutions

Data-intensive environmental research: re-envisioning science, cyberinfrastructure, and institutions

Antenie Carstens National Library of South Africa. address:

Transcription:

Why does it cost so much? Decisions and choices in preservation of digital content New England Archivists Fall 2008 Meeting Boston, Massachusetts Amy Friedlander, Ph.D Council on Library and Information Resources November 15, 2008

Council on Library and Information Resources: Introduction Not-for-profit organization that undertakes activities at the intersection of higher education, advanced research, and libraries Interests in preservation, digital archiving, and scholarship and the infrastructure, including libraries, that supports and fosters research and education. Sponsorship from the Mellon Foundation and individual academic and research libraries and organizations. November 15, 2008 Friedlander/CLIR 2

This talk will: Describe some of what has been learned during the work of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access (BRTF-SPDA) Outline several of CLIR s projects that complement and extend aspects of the work of the Task Force Not represent the work or consensus views of the Task Force Acknowledgements: Charles Henry, Amy Lucko, Fran Berman, Sayeed Choudhury, Clifford Lynch, Brian Lavoie, Lorraine Eakin November 15, 2008 Friedlander/CLIR 3

Blue Ribbon Task Force on Sustainable Digital Preservation and Access (BRTF-SDPA) Two-year effort engaging 19 experts from economics, computer sciences, library and information science Addresses the data deluge in science as well as more generally (Gantz 2008) Support from NSF, Library of Congress, the Mellon Foundation, CLIR, JISC and others Deliverables: Year 1 report that establishes the conceptual framework Year 2 (final) report that describes the model(s) November 15, 2008 Friedlander/CLIR 4

BRTF-SDPA: Objectives General cost framework: key cost categories of digital preservation Set of economic models which provide alternative ways of addressing sustainable digital preservation Pros, cons, costs, trade-offs of each List real world conditions for which each model is best suited. Actionable recommendations: If your digital preservation context is X, you should consider using model Y for sustainable digital access and preservation. Source: F. Berman, Research and Data, ARL/CNI Workshop, October 2008. Used with permission. November 15, 2008 Friedlander/CLIR 5

Digital preservation is visible. Preservation was once a dimension of technical services in libraries. Digital preservation requires active management. Preservation v. Curation: Total process of management Acquisition Management of the content Re-purposing and re-use of the material November 15, 2008 Friedlander/CLIR 6

What are costs? Acquisition v. total cost of ownership Operations Maintenance Infrastructure, threshold investment Value proposition We can quantify some of the costs. We have trouble quantifying the value of the collections and services. So the cost-benefit analysis is undefined. Costly relative to what? November 15, 2008 Friedlander/CLIR 7

Components of Costs in a Nutshell Labor, especially metadata creation Format Scale and heterogeneity Granularity Collection or item Resolution Tagging/mark-up Environmental factors Heating and cooling Power consumption to operate Regulatory framework Time November 15, 2008 Friedlander/CLIR 8

Prior studies What are the assumptions? What are they measuring or estimating? How does the study map to your context? Note: This is based on the excellent work by Lorraine Eakin; background paper to be posted to the BRTF-SPDA website in December 2008. November 15, 2008 Friedlander/CLIR 9

Roquade Project / Dekker et al. (2001): Published literature Personnel costs of assigning metadata: approximately 10 euros Processing SIP's: approximately 10 euros per information item 5,000 items per year added: 6 PC's with a network card and AV facilities: 1500 euros each + professional serer: $5000 euros Total hardware costs: approximately 32,000 euros, depreciated over 4 years Software and licensing fees: 15,000 euros per year using proprietary software Maintenance support costs: 2,000 euros per year Technical support: 0.2 FEs = 9,000 euros per year Data refresh every 5 years @ 1 euro per MB; if DIPs are kept for 20 years and DIP is about 500 kb, cost - about 2 euros per information item, that is, 10,000 euros per year for all information items Total per information item costs: 29 euros per item November 15, 2008 Friedlander/CLIR 10

Chapman (2003):storage Excludes ingest and access Based on billable square feet $0.08 per 332-page (microfilm) volume per year in the standard vault $0.19 per 332-page (microfilm) volume per year in the film vault $0.31 per 332-page (book) volume in the standard vault November 15, 2008 Friedlander/CLIR 11

OCLC/Chapman (2003): cost/gb Excludes ingest and access Based on GB of data deposited $0.01-0.06 per 332-page ASCII text $0.47/$1.01/$1.89 per 332-page 600-dpi 1-bit page image (variable rate, based upon total amount of data deposited per account) November 15, 2008 Friedlander/CLIR 12

Digital Preservation Testbed, Nationaal / Testbed Digitale Bewaring, Archief of the Netherlands (2005): e-mail per yr. Creation of a batch of 1000 records (assuming 50kb per email, 100 kb per text document, 250 kb per spreadsheet, and 2 Mb per database): 333 euros "Repair" of a batch of 1000 records (assuming 50kb per email, 100 kb per text document, 250 kb per spreadsheet, and 2 Mb per database): 10,000 euros Acquisition and input of metadata for "normal" email: 1.41 euros Acquisition and input of metadata for XML email: 0.06 euros November 15, 2008 Friedlander/CLIR 13

Riksarkivet/National Archives of Sweden / Palm (2006) Looked at: Cost per year per 1 Gb stored; Total costs per year 1 Hierarchical Storage Management System (i.e., HSM) (2003 price + 3% interest per year): 449,694 euros over five years Storage medium for additional 40 Tb/year: 43648 euros over five years Staff Staff operations costs: 132240 euros over five years (0.6 FTE) Staff ongoing data input: 88160 euros over five years (0.4 FTE ) Total annual input cost: 131808 euros over five years (staff & storage medium included) Facilities ("Premises") (100 square meters): 66228 euros over five years Service/support: 138300 euros over five years Digitization of paper materials (1-bit 600 dpi files in A4 format): 0.10 euro per file, with 5 million images scanned annually Scanning of large-format drawings and maps (8-bit grey-scale at 297 dpi, in manually fed scanners): 0.61 euro per file, with 1,321,000 image files created annually Production costs for 1 Gb 1-bit digitized information: approximately 17 euros per Gb Production costs for 1 Gb 8-bit digitized information: approximately 30 euros per Gb Production costs for Audiovisual information: approximately 11 euros per Gb November 15, 2008 Friedlander/CLIR 14

Academy of Motion Picture Arts and Sciences/ AMPAS Science and Technology Council (2007) "All film" production generating no digital assets, annual storage costs for archival master: $1059 A film-captured, digital finished production at 4K, annual storage costs for archival master: $12,514 Digitally captured, digitally finished production using HDCAM SR videotape as the capture medium at 1920 x 1080, annual storage costs for archival master: $1,830 Digital captured, digitally finished production using an uncompressed digital data capture system at 2K, annual storage costs for archival master: $1,955 Digitally captured, digital finished production using an uncompressed digital data capture system at 4K, annual storage costs for archival master: $12,514 November 15, 2008 Friedlander/CLIR 15

Time has several senses. Technology changes. Migrate formats Refresh data Respond to changes in hardware and software Learning curves that do not always show up in the numbers Technology may help automatic capture of metadata element. Preservation/curation has a life cycle. Perpetuity means open-ended. November 15, 2008 Friedlander/CLIR 16

LIFE 2 : Life Cycle Model Lt: Life cycle C: Creation or purchase Aqt: Acquisition It: Ingest BPt: Bitstream Preservation CPt: Content Preservation Act: Access Source: The LIFE2 Final Project Report (August 22, 2008), p. 16, Figures 3 and 4. November 15, 2008 Friedlander/CLIR 17

LIFE 2 Estimates, total cost per year Several different projects, yielded ranges Year 1: 15.00-31.50 Year 5: 16.50-32.00 Year 10: 16.70-32.20 November 15, 2008 Friedlander/CLIR 18

What might minimize costs? Automation Metadata capture Cataloging Time Initial processing reduces costs Standards process Planning Collaboration November 15, 2008 Friedlander/CLIR 19

Hidden Collections Generous grant from the Andrew W. Mellon Foundation to run a competition to catalog unprocessed materials held in the special collections of libraries, archives, museums, and historical societies Two known problems: Small, distributed collections of materials of potential value to scholars either individually or in concert with others Labor-intensive cataloging of manuscripts First year with renewals for a total of five years November 15, 2008 Friedlander/CLIR 20

Hidden Collections: Eligibility Demonstrated value to scholars Web-accessible catalog with records that can be discoverable and hence compliant with current protocols and standards Long term responsibility for the maintaining the records (sustainability) Collections owned or held in the USA (Y1) Applicant a not-for-profit organization Digitization or format conversation not in scope November 15, 2008 Friedlander/CLIR 21

Hidden Collections: What will we learn? What constitutes an important collection? To whom? And how do individual collections relate to each other? value proposition How do organizations build a shared infrastructure? How can description and cataloging be streamlined? November 15, 2008 Friedlander/CLIR 22

What increases ambiguity? Unknowns: risks and liabilities Risks Random events Natural disasters Liabilities Intellectual property Evolving expectations and perceptions What is professionally appropriate? How are research, confidentiality and personal privacy reconciled? Note: These ideas owe much to Clifford Lynch. November 15, 2008 Friedlander/CLIR 23

Decisions Collection development and management Legacy collections: Do you digitize? Can you digitize? Native digital: Do you want to collect these materials? Context: Network of similar institutions Infrastructure Resources staff, volunteers, training, budget Users: Who are they? How do they work? Access Nature of the materials Expectations of users Not much that deviates from standard practice in archives, libraries and museums. It s all about mission. November 15, 2008 Friedlander/CLIR 24

And Choices Predominantly analog collections; digital catalogs and finding aides Hybrid collections, based on collecting policies Digital collections for purposes of access; is there a tipping point? All digital what do you do with the originals? And what parts of your collections are managed according to which policies? Does conversion mean preservation? It depends. November 15, 2008 Friedlander/CLIR 25

Why does it cost so much? Does it cost so much? What is the value proposition? Because so much is unknown. We can reasonably expect: Technology will become more stable. Technological system solutions will appear, and these will be modular. Costs of energy will rise, affecting heating and cooling as well as operations. Organizational systems will offer alternatives. And will challenge institutional identity. The learning curve will work with us as we simply become more accustomed to the medium and its challenges. November 15, 2008 Friedlander/CLIR 26

Surrogate for fear? Costs are necessarily open ended and hence unknown. Preservation is inherently an act of hope. Time will reduce some, if not many, sources of ambiguity. November 15, 2008 Friedlander/CLIR 27

Thank you. November 15, 2008 Friedlander/CLIR 28

References [1]: Ayris, P., R. Mcleod, et al. (2006). Lifecycle Information for E-literature: Full Report from the LIFE Project. JISC. London, UK, University College London and the British Library. Ayris, P., R. Mcleod, et al. (2008). LIFE 2 Final Project Report. JISC. London, UK, University College London and the British Library Beagrie, N., J. Chruszcz, et al. (2008). Keeping Research Data Safe: A Cost Model and Guidance for UK Universities. London, JISC. Berman F. (October 2008), Research and Data, ARL/CNI Workshop, Arlington, VA. Chapman, Stephen. (2003) "Counting the Costs of Digital Preservation: Is Repository Storage Affordable?" Journal of Digital Information 4.2. Gantz, J. (January 2008). The Exploding Digital Universe: Implications for the Enterprise and Data Preservation, Presentation for the Blue Ribbon Task Force on Sustainable Digital Preservation and Access, Washington, D.C. November 15, 2008 Friedlander/CLIR 29

References [2] Dekker, R. E., Dürr, M Slabbertje, M. and K. van der Mee. An Electronic Archive for Academic Communities.(2001) ICEIS 2001/NDDL Workshop. April, 2002. Gantz, J. F. (2008). The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth Through 2011, International Data Corporation (IDC). Palm, J.(2006). The Digital Black Hole. Stockholm, Sweden: Riksarkivet/National Archives. Science and Technology Council (2007). The Digital Dilemma: Strategic Issues in Archiving and Accessing Digital Motion Picture Materials, Academy of Motion Picture Arts and Sciences (A.M.P.A.S.): 74. Testbed Digitale Bewaring (2005). Costs of Digital Preservation. The Hague, Netherlands, Nationaal Archief of the Netherlands: 23. November 15, 2008 Friedlander/CLIR 30