How does one know which repository is worth its salt?

Similar documents
ABSTRACT INTRODUCTION

Strategy for a Digital Preservation Program. Library and Archives Canada

Best Practice and Minimum Standards in Digital Preservation. Adrian Brown, UK Parliament Oracle PASIG, London, 5 April 2011

Copyright 2008, Paul Conway.

Certification Report on CLOCKSS

Digital Preservation Program: Organizational Policy Framework (06/07/2010)

The Use of Quality Management Standards in Trustworthy Digital Archives

COMMISSION RECOMMENDATION. of on access to and preservation of scientific information. {SWD(2012) 221 final} {SWD(2012) 222 final}

InterPARES Project. The Future of Our Digital Memory. The Contribution of the InterPARES Project to the Preservation of the Memory of the World

University of Massachusetts Amherst Libraries. Digital Preservation Policy, Version 1.3

RECOMMENDATIONS. COMMISSION RECOMMENDATION (EU) 2018/790 of 25 April 2018 on access to and preservation of scientific information

Trends in. Archives. Practice MODULE 8. Steve Marks. with an Introduction by Bruce Ambacher. Edited by Michael Shallcross

1 of 7 16/01/ :57

Ross Harvey GSLIS, Simmons College. November 15, 2008

CO-ORDINATION MECHANISMS FOR DIGITISATION POLICIES AND PROGRAMMES:

TeesRep policy document

Sharing the effort a personal view on D3.4

TECHNICAL AND OPERATIONAL NOTE ON CHANGE MANAGEMENT OF GAMBLING TECHNICAL SYSTEMS AND APPROVAL OF THE SUBSTANTIAL CHANGES TO CRITICAL COMPONENTS.

Title: Case Study 02 Public Relations and Press Office of the State University of Campinas (UNICAMP) Digital Photographic Records: Final Report.

DNVGL-CP-0338 Edition October 2015

NZFSA Policy on Food Safety Equivalence:

Digital Preservation Planning: Principles, Examples and the future with Planets

Digital Preservation Assessment: Readying Cultural Heritage Institutions for Digital Preservation

Digital Preservation Strategy Implementation roadmaps

Memorandum on the long-term accessibility. of digital information in Germany

ccess to Cultural Heritage Networks Across Europe

At its meeting on 18 May 2016, the Permanent Representatives Committee noted the unanimous agreement on the above conclusions.

Catching Up: Creating a Digital Preservation Policy After the Fact

Level 1 VRQ Qualifications in Photo Imaging (7512) Assessment pack

Introduction to Data- PASS

REPORT ON THE INTERNATIONAL CONFERENCE MEMORY OF THE WORLD IN THE DIGITAL AGE: DIGITIZATION AND PRESERVATION OUTLINE

This document is a preview generated by EVS

ICSU World Data System Strategic Plan Trusted Data Services for Global Science

(Non-legislative acts) DECISIONS

The importance of linking electronic resources and their licence terms: a project to implement ONIX for Licensing Terms for UK academic institutions

Type Approval JANUARY The electronic pdf version of this document found through is the officially binding version

Methodology for Agent-Oriented Software

UKRI research and innovation infrastructure roadmap: frequently asked questions

Economic and Social Council

Carbon Literacy Centre pilot phase 1 brief for tender

The concept of significant properties is an important and highly debated topic in information science and digital preservation research.

National Perpetual Access & Digital Preservation CRKN & Scholars Portal

Public consultation on Europeana

DNVGL-CG-0214 Edition September 2016

Introduction to Planets. Hans Hofman Nationaal Archief Netherlands Barcelona, 27 March 2009

Getting the evidence: Using research in policy making

DEPUIS project: Design of Environmentallyfriendly Products Using Information Standards

Designated Institutes participating in the CIPM MRA

REPORT FROM THE COMMISSION TO THE EUROPEAN PARLIAMENT AND THE COUNCIL. on the evaluation of Europeana and the way forward. {SWD(2018) 398 final}

Joint ILAC CIPM Communication regarding the. Accreditation of Calibration and Measurement Services. of National Metrology Institutes.

Starting a Digital Preservation Program

Traditional Methodology Applied to a Non-Traditional Development.

CHARTER ON THE PROTECTION AND MANAGEMENT OF UNDERWATER CULTURAL HERITAGE (1996)

Open Science for the 21 st century. A declaration of ALL European Academies

Fiscal 2007 Environmental Technology Verification Pilot Program Implementation Guidelines

Digital Preservation Policy

Standard and guidance for the creation, compilation, transfer and deposition of archaeological archives

SI Digital Libraries and Archives, Winter 2009

PRIMATECH WHITE PAPER COMPARISON OF FIRST AND SECOND EDITIONS OF HAZOP APPLICATION GUIDE, IEC 61882: A PROCESS SAFETY PERSPECTIVE

Violent Intent Modeling System

REPORT OF THE UNITED STATES OF AMERICA ON THE 2010 WORLD PROGRAM ON POPULATION AND HOUSING CENSUSES

The Cedars Project. Maggie Jones

Fact Sheet IP specificities in research for the benefit of SMEs

GESIS Leibniz Institute for the Social Sciences

This is a preview - click here to buy the full publication

SAUDI ARABIAN STANDARDS ORGANIZATION (SASO) TECHNICAL DIRECTIVE PART ONE: STANDARDIZATION AND RELATED ACTIVITIES GENERAL VOCABULARY

AS/NZS ISO 9000:2016. Quality management systems Fundamentals and vocabulary AS/NZS ISO 9000:2016. Australian/New Zealand Standard

INTERNATIONAL STANDARD

GENERAL DESCRIPTION OF THE CMC SERVICES

DEVELOPMENT OF SAFETY PRINCIPLES FOR IN- VEHICLE INFORMATION AND COMMUNICATION SYSTEMS

Costing the Digital Preservation Lifecycle More Effectively

This is a preview - click here to buy the full publication PUBLICLY AVAILABLE SPECIFICATION. Pre-Standard

Co-ordination of the Group of Notified Bodies for the Construction Products Directive 89/106/EEC. GNB-CPD Conference on CPR

CONSIDERATIONS REGARDING THE TENURE AND PROMOTION OF CLASSICAL ARCHAEOLOGISTS EMPLOYED IN COLLEGES AND UNIVERSITIES

DEVON & CORNWALL C O N S T A B U L A R Y

Contents EUROPEAN UNION AGENCY FOR RAILWAYS. Accompanying Report Practical arrangements for safety certification ERA-REC-126/ACR V 1.

COAL CREEK COMMUNITY PARK MUSEUM AND COLLECTION POLICY

End users trust in data repositories: definition and influences on trust development

Arrangements for: National Progression Award in Food Manufacture (SCQF level 6) Group Award Code: GF4N 46. Validation date: July 2012

INTERNATIONAL. Medical device software Software life cycle processes

SERBIA. National Development Plan. November

Making It Your Own A PUBLIC ART POLICY AND PLANNING TEMPLATE. Arts North West Creative Opportunities 2012

A Strategic Policy Framework for Creating and Preserving Digital Collections

Student Bursary Application Form

Pan-Canadian Trust Framework Overview

Technology: Lighting Units

Response to consultation on Research and Development Tax Credits. This document has been prepared for H M Treasury

ARTICLE 29 Data Protection Working Party

Museum & Archives Access Policy

What is a collection in digital libraries?

in the New Zealand Curriculum

The archivist in the electronic age

THE IMPACT OF SCIENCE DISCUSSION PAPER

OPEN SCIENCE: TOOLS, APPROACHES, AND IMPLICATIONS *

Appointment of External Auditors

Invitation to take part in the MEP-Scientist Pairing Scheme 2015

Herts Valleys Clinical Commissioning Group. Review of NHS Herts Valleys CCG Constitution

Survey of Institutional Readiness

Australian/New Zealand Standard

Draft executive summaries to target groups on industrial energy efficiency and material substitution in carbonintensive

Transcription:

How does one know which repository is worth its salt? David Giaretta STFC, Rutherford Appleton Lab., Didcot, Oxon, OX11 0QX, UK Abstract From the earliest discussions of concerns about the preservability of digital objects there have been calls for some way of judging the quality of digital repositories. In the past few years the NARA/RLG group has produced the Trustworthy Repositories Audit and Certification: Criteria and Checklist (TRAC) document. A number of other closed national groups have created related checklists. This paper will outline these efforts and then describe the international effort (the RAC Working Group see http://wiki.digitalrepositoryauditandcertification.org) to produce a full ISO standard on which an accreditation and certification process can be built. This paper will also describe the rationale for the approach taken, the relationship to other standards and the way in which the accreditation and certification process may be carried out. If successful this standard and associated processes will allow funders to have an independent evaluation of the effectiveness of the archives they support and data producers will also have a basis on which to decide which repository could be entrusted with their valuable data. Background The Need for Trusted Repositories The Preserving Digital Information report of the Task Force on Archiving of Digital Information (Garrett & Waters, 1996) declared, a critical component of digital archiving infrastructure is the existence of a sufficient number of trusted organizations capable of storing, migrating, and providing access to digital collections. a process of certification for digital archives is needed to create an overall climate of trust about the prospects of preserving digital information. The issue of certification, and how to evaluate trust into the future, as opposed to a relatively temporary trust which may be more simply tested, has been a recurring request, repeated in many subsequent studies and workshops. The OAIS Reference Model (OAIS, 2003) Open Archival Information System (OAIS), is now adopted as the de facto standard for building digital archives (NSF, 2007). Section 1.5 of OAIS (Road map for development of related standards) included an item for accreditation of archives, reflecting the longstanding demand for a standard against which Repositories of digital information may be audited and on which an international accreditation and certification process may be based. It was agreed that RLG and NARA take a lead on this follow-on standard. This they did, forming a closed panel which produced Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC, 2007). TRAC was based on two documents, namely the OAIS Reference Model (OAIS, 2002) and the Report on Trusted Digital Repositories: Attributes and Responsibilities (RLG-OCLC, 2002). The former lays out fundamental requirements for preservation, while the latter focussed on the administrative, financial and organisational requirements for the body undertaking the preservation activities. Other, separate, work includes the nestor Catalogue of Criteria for Trusted Digital Long-term Repositories (nestor, 2006), which is also based on OAIS. The next section explains the advantages of OAIS for approaching certification of digital repositories. Testability and key OAIS concepts Information As a precursor to discussing its preservation, one may begin by asking what the definitions of information or data might be - how restrictive do we need to be? OAIS provides a very general definition of Information, namely: Any type of knowledge that can be exchanged. In an exchange, it is represented by data. Information clearly includes data as well as documents, and covers behaviour, performance and explicit, implicit and tacit information. Data is defined as: A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing.

Preservation We need first some methodology by which to test the basic claim that someone is preserving some digitally encoded information; without such a test this is a meaningless claim. OAIS introduces the, quite reasonable, test that the digital object must somehow be useable and understandable in the future. However by itself this is too broad - are we to be forced to ensure that the digitally encoded designs of a battleship are to be understood by everyone, for example a 6 year old child? In order to make this a practical test the obvious next refinement is to describe the type of person - and more particularly their background knowledge - by whom the information should be understandable. Thus OAIS introduces the concept of Designated Community, defined as an identified group of potential Consumers who should be able to understand a particular set of information. The Designated Community may be composed of multiple user communities. Note that a Designated Community is defined by the repository and this definition may change/evolve over time. Bringing these ideas together we can then say, following OAIS, that preserving digitally encoded information means that we must ensure that the information to be preserved is Independently Understandable to (and usable by) the Designated Community. We are clearly concerned about long term preservation, but how long is that? OAIS defines Long Term as long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing Designated Community. Long Term may extend indefinitely OAIS contains a number of models. The most important of these is the Information Model, shown in Figure 1. The UML diagram (Figure 1) means that Figure 1 OAIS Information Model Representation Information an Information Object is made up of a Data Object and Representation Information a Data Object can be either a Physical Object or a Digital Object. An example of the former is a piece of paper or a rock sample a Digital Object is made up of one or more Bits a Data Object is interpreted using Representation Information Representation Information is itself interpreted using further Representation Information The figure shows that Representation Information may contain references to other Representation Information. When this is coupled with the fact that Representation Information is an Information Object that may have its own Digital Object and other Representation Information associated with

understanding each Data Object, as shown in a compact form by the interpreted using association loop, the resulting set of objects can be referred to as a Representation Network. A Representation Network should cover semantic and structural information as well as recognising that there may be Other Representation Information such as software. The recursion in Representation Information leads to the question of how and where this recursion ends. Given the definitions one can see that the natural end of the recursion lies with what the Designated Community knows i.e. the Knowledge Base, defined as a set of information, incorporated by a person or system, that allows that person or system to understand received information, of the Designated Community. Once again, experience shows that any such Knowledge Bases changes over time, the changes ranging from the introduction of new theories to drift in vocabularies. Definition of the Designated Community An important clarification is needed here, namely that the definition of the Designated Community is left to the preserver. The same digital object held in different repositories could be being preserved for different Designated Comminities, each of which could consist of many disjoint communities. The quid pro quo is that those funding or entrusting digital objects to the repository can judge whether the definition of the Designated Community is appropriate for their needs. OAIS Conformance OAIS defines a number of responsibilities by which to judge conformance, which may be summarised as an OAIS must (these are the likely revised versions of these responsibilities) Negotiate for and accept appropriate information from information Producers. Obtain sufficient control of the information provided to the level needed to ensure Long-Term Preservation. Determine, either by itself or in conjunction with other parties, which communities should become the Designated Community and, therefore, should be able to understand the information provided, thereby defining its Knowledge Base. Ensure that the information to be preserved is Independently Understandable to the Designated Community. In other words, the community should be able to understand the information without needing the assistance of the experts who produced the information. Follow documented policies and procedures which ensure that the information is preserved against all reasonable contingencies, including the demise of the archive, ensuring that it is never deleted unless allowed as part of an approved strategy - there should be no ad-hoc deletions, Make the preserved information available to the Designated Community and enable the information to be disseminated as copies of, or as traceable to, the original submitted Data Object., with evidence supporting its Authenticity. OAIS introduces a number of important concepts and conformance criteria; however this is not enough on which to base a certification scheme. The next section describes some of the factors which must also be taken into consideration. What can change? We can consider some of the things can change over time and hence against which an archive must safeguard the digitally encoded information. Hardware and Software Changes Use of many digital objects relies on specific software and hardware, for example applications which run on specific versions of Microsoft Windows which in turn runs on Intel processors. Experience shows that while it may be possible to keep hardware and software available for some time after it has become obsolete, it is not a practical proposition into the indefinite future, however there are several projects and proposals which aim to emulate hardware systems and hence run software systems. Environment Changes These include changes to licences or copyright and changes to organisations, affecting the usability of digital objects. External information, ranging from the DNS to DTDs and Schema, vital to the use and understandability, may also become unavailable. Termination of the Archive Without permanent funding, any archive will, at some time, end. It is therefore possible for the bits to be lost, and much else besides, including the knowledge of the curators of the information

encoded in those bits. Experience shows that much essential knowledge, such as the linkage between holdings, operation of specialised hardware and software and links of data files to events recorded in system logs, is held by such curators but not encoded for exchange or preservation. Bearing these things in mind it is clear that any repository must be prepared to hand over its holding together with all these tacit pieces of information to its successor(s). Changes in what people know As described earlier the Knowledge Base of the Designated Community determines the amount of Representation Information which must be available. This Knowledge Base changes over time. Authenticity Trustability of holdings involves being sure of the authenticity of such holdings. Much has been written about authenticity and its role in preservation, for example in the InterPARES project (http://www.interpares.org/). While it seems unreasonable to require all archives themselves to investigate the origins of their holdings, it is reasonable for archives to be able to maintain authenticity. To maintain authenticity, evidence must cover both technical aspects, using techniques such as digests and hashes which can be used to prove that the bit sequences have not been changed unexpectedly, and social aspects, namely who has been entrusted with what, for example computer system administrators. If the digital object is transformed then the bit sequences will have been changed. In this case there must be some evidence that the information encoded in the digital object is unchanged. In order to do this a number of tests may be performed by appropriate, and one hopes, trustworthy, people. For example the data values of a scientific dataset before and after the transformation may be compared and verified as the same. For digital objects which are normally rendered, for example PDF files or JPEG images, then there might be other tests, often called Significant Properties, which can be evaluated and verified as unchanged after the transformation. A full discussion of this topic is outside the scope of this paper. TRAC and related documents A group was gathered by NARA and RLG (the latter subsequently incorporated into OCLC) to form the Task Force on Trusted Digital Repositories. This group produced the Trustworthy Repositories Audit and Certification : Criteria and Checklist (TRAC, 2007). The work combined concepts from OAIS and the Trusted Digital Repositories: Attributes and Responsibilities6 (RLG-OCLC, 2002). The latter allowed the group to supplement OAIS with considerations of financial stability and training of personnel. The document has a number of metrics grouped into Organisational Infrastructure Digital Object Management and Technologies Technical Infrastructure Security. Accompanying each of the metrics is extensive additional explanatory text and examples of the types of evidence which might be used as proof of fulfilling the metrics. The document has being used as the basis for internal and test audits in a number of repositories, however it is not part of a formal audit and certification process. Other work in this area includes: the German preservation consortium, nestor, has produced a Catalogue of Criteria for Trusted Digital Repositories (nestor, 2006) in early 2007 representatives from the Digital Curation Center (DCC, http://www.dcc.ac.uk), DigitalPreservationEurope (DPE, http://www.digitalpreservationeurope.eu/), NESTOR (Germany) and the Center for Research Libraries (North America) met and produced a list of 10 core criteria for digital preservation repositories, to guide further international efforts on auditing and certifying repositories (CRL, 2007). A comparison of this list with the OAIS responsibilities was produced (Giaretta, 2008). Ross et al (2006) produced comments on the TRAC document a cross-walk between the TRAC, nestor and Ross documents was produced (Dale, 2007) the DCC and DPE projects produced the Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) toolkit. This toolkit is intended to facilitate internal audit by providing repository administrators with a means to assess their capabilities, identify their weaknesses, and recognise their strengths.

All this work has been helpful in providing information and experience in assessing digital repositories, and some provide a local or project-backed certificate of quality. However none provide an ISO based accreditation and certification system of the kind which are available in other areas, such as the one concerning Information Security, based on ISO 27001 series. Without this we cannot expect to have a mark of quality and trustability for digital repositories which is recognised world-wide. Efforts to produce such a system are described next. Development of an ISO Accreditation and Certification process The development of OAIS was hosted by the Consultative Committee for Space Data Systems (CCSDS, http://www.ccsds.org) and approved by ISO as ISO 14721. OAIS contained a roadmap which listed a number of possible follow-on standards, some of which e.g. the Producer-archive interface -- Methodology abstract standard (ISO 20652:2008), have already become ISO standards, after development within CCSDS. The need for a standard for certification of archives was included in that list and the RLG/NARA work which produced TRAC was the first step in that process. The next step was to bring the output of the RLG/NARA working group back into CCSDS. This has been done and the Digital Repository Audit and Certification (RAC) Working Group has been created, the CCSDS details are available from http://cwe.ccsds.org/moims/default.aspx#_moims-rac, while the working documents are available from http://wiki.digitalrepositoryauditandcertification.org. Both may be read by anybody but, in order to avoid hackers, only authorised users may add to them. The openness of the development process is particularly important and the latter site contains the notes from the weekly virtual meetings as well as the live working version of the draft standards. Besides developing the metrics, which started from the TRAC document, the working group also has been working on the strategy for creating the accreditation and certification process. Review of existing systems which have accreditation and certification standard processes it became clear that there was a need for two documents 1. Metrics for Audit and Certification of Digital Repositories 2. Requirements for Bodies Providing Audit and Certification of Digital Repositories. The first document lists the metrics against which a digital repository may be judged. It is anticipated that this list will be used for internal metrics or peer-review of repositories, as well as for the formal ISO audit process. In addition tools such as DRAMBORA could use these metrics as guidance for its risk assessments. It must be recognised that the audit process cannot be specified in very fine, rigid, detail. An audit process must depend upon the experience and expertise of the auditors. For this reason the second document sets out the system under which the audit process is carried out; in particular the expertise of the auditors and the qualification which they should have is specified. In this way the document specifies how auditors are accredited and thereby helps to guarantee the consistency of the audit and certification process. For this reason the RAC Working Group refers to accreditation and certification processes. At the time of writing both documents are in an advanced state of preparation; the aim is to submit these documents for ISO in the Spring of 2009. While the reviews are underway further preparations for the accreditation and certification processes will be undertaken. It should be noted that the OAIS reference Model has also been undergoing revision and the new version is expected to be submitted for ISO review also in Spring 2009. Because of the close links between the metrics and OAIS concepts and terminology it is important that the two remain consistent, and cross-membership of the working groups will ensure this. In addition to the central accreditation body there will be an eventual need for a network of local accreditation and certification bodies. Conclusions It has been recognised for a long time that there is a need for a way to judge the extent to which an archive can be trusted to preserve digitally encoded information. On the one hand funders of such archives need some formal certification process to provide assurance that their funding is well spent and that their important digital holdings will continue to be usable and understandable into the future. On the other hand it is probably also true that many who manage such archives would want some less formal process. Considerable work has been carried out on the second of these aims, namely peer or informal certification. The RAC Working Group seems, at the time of writing, to be close to take important steps towards the first aim (formal ISO certification). Difficult organisational issues still need to be addressed but there is a clear roadmap for doing this. Even if all this is put in place the take-up of the process and

its impact on, for example, determining the funding of digital repositories is far from guaranteed. However in order to make progress the RAC Working Group believes that the effort must be made. References CRL,(2007), Retrieved from http://www.crl.edu/content.asp?l1=13&l2=58&l3=162&l4=92 Dale, R., (2007), Mapping of Audit & Certification Criteria for CRL Meeting (15-16 January 2007). Retrieved from http://wiki.digitalrepositoryauditandcertification.org/pub/main/referenceinputdocuments/trac-nestor- DCC-criteria_mapping.doc Garrett, J. & Waters, D, (Eds). (1996). Preserving Digital Information, Report of the Task Force on Archiving of Digital Information commissioned by The Commission on Preservation and Access and The Research Libraries Group. Retrieved from http://www.ifla.org/documents/libraries/net/tfadi-fr.pdf Giaretta, D., (2008), Comparison of OAIS and the Chicago Meeting 10 points. Retrieved from http://wiki.digitalrepositoryauditandcertification.org/bin/view/main/comparisonoaisandchicago10points nestor Working Group Trusted Repositories Certification, (2006), Catalogue of Criteria for Trusted Digital Repositories. English version retrieved from http://edoc.hu-berlin.de/series/nestormaterialien/8en/pdf/8en.pdf National Science Foundation Cyberinfrastructure Council (NSF, 2007), Cyberinfrastructure Vision for 21st Century Discovery. Retrieved from http://www.nsf.gov/pubs/2007/nsf0728/nsf0728.pdf Open Archival Information System (OAIS) Reference Model, ISO 14721:2003, (2003). Retrieved from http://public.ccsds.org/publications/archive/650x0b1.pdf RLG-OCLC, (2002), Report on Trusted Digital Repositories: Attributes and Responsibilities. Retrieved from http://www.oclc.org/programs/ourwork/past/trustedrep/repositories.pdf Ross, S., Bütikofer, N., and McHugh, A. (2006), DCC Comments on RLG/NARA Audit and Certification Checklist. Retrieved from http://wiki.digitalrepositoryauditandcertification.org/pub/main/referenceinputdocuments/ross_mchugh_ Buetikofer_comments_RLGNARA_AUDIT_ver2.pdf TRAC, (2007), Trustworthy Repositories Audit & Certification: Criteria and Checklist. Retrieved from http://www.crl.edu/pdf/trac.pdf Acknowledgements The entire RAC Working Group, in particular Mark Conrad of NARA, provided important ideas for this paper, and all deserve credit for the vision and effort which they are putting into this work.