Preservation of nuclear information and records

Similar documents
Category: Data/Information Keywords: Records Management, Digitization, Imaging, Image capture, Scanning and Indexing

Scanning. Records Management Factsheet 06. Introduction. Contents. Version 3.0 August 2017

Starting a Digitization Project: Basic Requirements

Survey of Institutional Readiness

INIS: the world s largest nuclear information system

DIGITISATION FOR PRESERVATION AND ACCESS A technical perspective

Digital Preservation Program: Organizational Policy Framework (06/07/2010)

PRESERVATION OF INFORMATION MANAGEMENT IN DIGITAL ERA

Mark Sullivan Digital Library of the Caribbean

Digital Libraries. Conversion to Digital Formats. Anne Kenney, Cornell University Library

University of Massachusetts Amherst Libraries. Digital Preservation Policy, Version 1.3

Specification for Quality Control (Version 1.1) (March. 2006)

STANDARD ST.67 MAY 2012 CHANGES

Digital Files File Format Storage Color Temperature

Fundamentals of Multimedia

Specific structure or arrangement of data code stored as a computer file.

GUIDELINES FOR THE CREATION OF DIGITAL COLLECTIONS

CD: (compact disc) A 4 3/4" disc used to store audio or visual images in digital form. This format is usually associated with audio information.

Vasantrao Naik Marathwada Krishi Vidyapeeth University Library, Parbhani No. U.L. /CIS/671/18 Date: 03 September 2018

Creating Digital Artwork

Nuclear Safety. and. the International Atomic Energy Agency

International Nuclear Information System (INIS)

RLG WORKSHEET FOR ESTIMATING DIGITAL REFORMATTING COSTS

Worksheet for Benchmarking Assignment

MEG BELLINGER. Digital Imaging: Issues for Preservation and Access

IMAGE SIZING AND RESOLUTION. MyGraphicsLab: Adobe Photoshop CS6 ACA Certification Preparation for Visual Communication

This report provides a brief look at some of these factors and provides guidelines to making the best choice from what is available.

PHOTOGRAPHIC RESTORATION & PRESERVATION

Image Digitization: Best Practices and Training

Coreldraw Crash Course

all editorial writing.

Raster (Bitmap) Graphic File Formats & Standards

Digital Imaging & Photoshop

Images and Graphics. 4. Images and Graphics - Copyright Denis Hamelin - Ryerson University

Starting a Digital Preservation Program

Digital Image Creation and Development

Digital Imaging - Photoshop

Computers & Philately Overview

TeesRep policy document

CGT 511. Image. Image. Digital Image. 2D intensity light function z=f(x,y) defined over a square 0 x,y 1. the value of z can be:

Table of Contents...2. Copyright notice...3 Disclaimer Introduction... 4

Digital Imaging and Image Editing

Photoshop 01. Introduction to Computer Graphics UIC / AA/ AD / AD 205 / F05/ Sauter.../documents/photoshop_01.pdf

Understanding Image Formats And When to Use Them

Nature of Archives. Preservation. Preservation. Preventive Preservation. Preservation Management

Digital Images. Digital Images. Digital Images fall into two main categories

FUNDAMENTALS OF MULTIMEDIA

Chapter 8. Representing Multimedia Digitally

Image Perception & 2D Images

PHOTO 11: INTRODUCTION TO DIGITAL IMAGING

,!1.,,,. Uni^rig. Digital ^ of Records Procedure. Digital Imaging of Records Procedure 8/2/ Approved by Approval date

townsweb superior digitisation & software services

Audio Visual: Multimedia Digital Slide Production

PENGENALAN TEKNIK TELEKOMUNIKASI CLO

PHOTOSHOP. pixel based image editing software (pixel=picture element) several small dots or pixels make up an image.

raw format format for capturing maximum continuous-tone color information. It preserves all information when photograph was taken.

content prior to the existence of these recommendations.

FLEXVIEW MICROFILM SCANNER. today s technology simplified

Slide Scanning Converting Your Film Photographs to Digital. Presentation to UCHUG - 8/06/08 G. Skalka

Image optimization guide

LIS 688 DigiLib Amanda Goodman Fall 2010

Byte = More common: 8 bits = 1 byte Abbreviation:

INTRODUCTION TO COMPUTER GRAPHICS

Module 6 STILL IMAGE COMPRESSION STANDARDS

DISA Guidelines for Best Practice. No 2. Digital conversion of text, photographs and archival material

A Guide to Image Management in Art Centres. Contact For further information about this guide, please contact

THE PARTNERSHIP FOR THE EAST ASIAN-AUSTRALIASIAN FLYWAY LOGO

Digitization and Scanning Basics at RRLC Planning a Digitization Project: November 27, Vision & Goals:

Lecture - 3. by Shahid Farid

Recordkeeping for Good Governance Toolkit. GUIDELINE 15: Scanning Paper Records to Digital Records

Digital Preservation Policy

DRAFT. Proposal for Format Adoption: JPEG2000 (ISO/IEC 15444:1-2000) For Still Image Objects in RUcore. Why Switch? Advantages of migrating to JP2

A Fast Segmentation Algorithm for Bi-Level Image Compression using JBIG2

GENERAL GOVERNMENT ADMINISTRATION MICROPHOTOGRAPHY SYSTEMS MICROPHOTOGRAPHY SYSTEMS, MICROPHOTOGRAPHY STANDARDS

The next table shows the suitability of each format to particular applications.

LECTURE 02 IMAGE AND GRAPHICS

Title: Case Study 02 Public Relations and Press Office of the State University of Campinas (UNICAMP) Digital Photographic Records: Final Report.

Multimedia. Graphics and Image Data Representations (Part 2)

Dr. Shahanawaj Ahamad. Dr. S.Ahamad, SWE-423, Unit-06

Digitisation Plan

Portfolio Primer University of Minnesota School of Architecture College of Design

B.Digital graphics. Color Models. Image Data. RGB (the additive color model) CYMK (the subtractive color model)

Chapter 3 Graphics and Image Data Representations

ICA-PAAG. Short Guides. Title: The Digital Image Archive Author: David Iglésias i Franch. Centre de Recerca i Difusió de la Imatge (CRDI)

DIGITAL IMAGING FOUNDATIONS

Book Scanning Technologies and Techniques. Mike Mansfield Director of Content Engineering Ancestry.com / Genealogy.com

Eidgenössisches Departement für Umwelt, Verkehr, Energie und Kommunikation UVEK. Bundesamt für Umwelt BAFU Abteilung Luftreinhaltung und Chemikalien

Photo Digitization. Pre-Digitization (including planning) Digitization. Post-Digitization

Catching Up: Creating a Digital Preservation Policy After the Fact

HUMAN RESOURCE DEVELOPMENT STRATEGY NATIONAL NUCLEAR ENERGY AGENCY INDONESIA For FNCA Human Resource Development 2003 Guritno Lokollo

Best Practice and Minimum Standards in Digital Preservation. Adrian Brown, UK Parliament Oracle PASIG, London, 5 April 2011

Image Formats Practical experiences

State Library of Queensland Digitisation Toolkit: Scanning and capture guide for image-based material

*Which code? Images, Sound, Video. Computer Graphics Vocabulary

The Preservation of Electronic Records

APPENDIX C: Photography Guidelines

Session 1. by Shahid Farid

Pros and Cons for Each Type of Image Extensions

2015 Athens-Clarke County Library

Longevity of digital raster images

Transcription:

International Atomic Energy Agency nuclear information and records Anatoli Tolstenkov Workshop on Managing Nuclear Knowledge Trieste, Italy, 8-12 November 2004 Nuclear Information and Records Main components of knowledge preservation Digital preservation (management issues) Review of Main IAEA Knowledge Preservation Projects 2 International Atomic Energy Agency 1

Goals of Preservation Select the most valuable information to convey to the future Ensure that it remains readable, accessible and understandable Manage technological change so that those objectives are met 3 International Atomic Energy Agency Type of Information Text (book, journal article, brochure, listing ) Image (photo, film, picture ) Sound Data (numerical, formulas, graph ) Interactive (rule-based, training, database ) Multimedia Computer code Sample (physical object) Tacit knowledge 4 International Atomic Energy Agency 2

Main Components of Knowledge Preservation Select Capture Describe/classify Store Provide access Maintain (longevity) 5 International Atomic Energy Agency Selection of Information for Preservation Why Select? Storage is not equal to Preservation High costs and limited budget Maintenance mortgage Legal issues Evaluation Prioritization by Value, Use and Risk 6 International Atomic Energy Agency 3

Copyright Issues Copyright protects the actual expression of an idea, not the idea itself The absence of copyright notice does not mean absence of copyright protection Possession or ownership of physical item does not mean the possessor or owner owns the copyright Copyright does not apply to all works, and it does not last forever 7 International Atomic Energy Agency Information Capture Purchasing Copy (the same media or different), digitize Interview (tacit knowledge) 8 International Atomic Energy Agency 4

Describe and Classify Information Create metadata Metadata is structured data about data Metadata is a summary of information about the form and content of resource to facilitate identification and retrieval 9 International Atomic Energy Agency Administrative Descriptive Structural Semantic Type of Metadata 10 International Atomic Energy Agency 5

Administrative Metadata Management information needed to maintain, retrieve and display an object Rights and permissions File format, size compression, etc. Hardware, software Physical location Etc. 11 International Atomic Energy Agency Descriptive Metadata Information that provides access to the subject of an object Author or Creator Title Subject terms Classification 12 International Atomic Energy Agency 6

Structural Metadata Information used to display and navigate an object Structural divisions of an object Sub-object relationships (internal links) 13 International Atomic Energy Agency Semantic Metadata Subject Descriptors (controlled, multilingual) Semantic links Information audience Related sources of information 14 International Atomic Energy Agency 7

Store Environment Media Format Text Image Text + Image PDF (text+image, hypertext, sound, video, metadata) XML 15 International Atomic Energy Agency Provide access On-line Web Z39.50 Off-line CD, DVD Full-text and/or Metadata Portability Multilingual Interface 16 International Atomic Energy Agency 8

Maintain. Ensure longevity. Control Refreshing (media) Migration (format) Emulation (application software) 17 International Atomic Energy Agency Type of Media Paper Film, photo materials Gramophone record/plate Magnetic tape Diskette, CD/DVD Hard disk, flash memory Magneto-Optical Glass, metal (holography) Etc. 18 International Atomic Energy Agency 9

INIS records management 1970 to present 1970: first generation of the Bibliographic Database (paper based INIS Atomindex) 1978: available on-line 1991: available on CD-ROM 1996: available on Internet 1997: migration from magnetic tape to CD-ROM migration from EBCIDC to ASCII transition from microfiche to digital images 2002: migration of archive from microfiche to digital images, OCR 2003: migration from tag-text format to XML transition from TIFF image format to image+text PDF 19 International Atomic Energy Agency Preservation. Analog versus Digital Analog Simple climate - controlled environment Long life No special equipment needed Simple maintenance technology Readability even after partial damage Space Metadata Search only Manual maintenance Not easy access 20 International Atomic Energy Agency 10

Preservation. Analog versus Digital Digital Easy access and search Content and semantic search Automated maintenance Easy duplication and distribution Multilinguality High risk of damage Short life Special equipment and software needed Too many different formats Dependency on digital technology Non-stop maintenance Legal constrains 21 International Atomic Energy Agency Preservation. Analog versus Digital Volume of information published in digital form is growing up dramatically (x2 every 3 years) Young generation preference is digital information New possibilities: Electronic document analysis, translation and data mining 22 International Atomic Energy Agency 11

High Density Analog Storage Devices (extreme longevity ) Developed by Los Alamos Laboratories and Norsam Technologies Analog images on a 3" nickel disk or on a 3" square plate at densities of up to 350,000 pages per disk 23 International Atomic Energy Agency 24 International Atomic Energy Agency 12

Analog versus Digital ~65% digital preservation projects failed 25 International Atomic Energy Agency Part 2 Digital Preservation 26 International Atomic Energy Agency 13

Digital Preservation Organizational Infrastructure: consistent, systematic management; comprehensive policy framework; co-operation Technological Infrastructure: technology anticipates needs; open architecture; well defined standards Resources: sustainable funding 27 International Atomic Energy Agency Two main standards OAIS Reference Model for an Open Archival Information System TDR - Trusted Digital Repositories: Attributes and Responsibilities 28 International Atomic Energy Agency 14

Open Archival Information System (OAIS) Was initiated by NASA in June 1995 To define an archive reference model and service categories for the intermediate and indefinite long term storage of digital data obtained from, or used in conjunction with, space missions. To provide a framework and common terminology that may be used by Government and Commercial sectors in the request and provision of archive services. This will also encourage commercial support for the provision of archive services which would truly preserve our valuable data, not only for space related data but also for all long term data archives Became an ISO standard in June 1999 29 International Atomic Energy Agency OAIS Functional Entities Preservation Planning DI Data Management DI P R O D U C E R SIP Ingest AIP Archival Storage AIP Access Requests other information DIP C O N S U M E R Administration SIP = Submission Information Package AIP = Archival Information Package DIP = Dissemination Information Package DI = Descriptive Information 30 International Atomic Energy Agency 15

Trusted Digital Repositories March 2000 start: to establish attributes of a digital repository for research organizations, building on international standard of the Reference Model for an Open Archival Information System (OAIS) A trusted digital repository is more than just organization responsible for storing and managing digital files. A trusted digital repository is one whose mission is to provide reliable, long-term access to managed digital resources to its designated community, now and in the future. 31 International Atomic Energy Agency TDR: Attributes Compliance with the Reference Model for an Open Archival Information System (OAIS) Administrative responsibility (standards for physical environment, backup and recovery procedures, and security system ) Organizational viability (commitment to the long-term retention, management of, and access to digital assets on behalf of depositors and users) Financial sustainability Technological and procedural suitability (preservation strategies; h/w, s/w, storage, access; comply with all relevant standards and best practices) System security (should be designed to assure the security of the digital assets; authentication systems, firewalls, backup system; policies and plans for disaster preparedness; data integrity) 32 International Atomic Energy Agency 16

Issues to Consider Clear mandate? Defined scope? Policy framework, procedures, standards? Multi-year plan? Relationship between various stakeholders within your organisation? Terms and conditions for access and use? Preservation planning? Appropriate technology? Designated, sustained resources? 33 International Atomic Energy Agency Principles of Responsibility Everyone doesn t have to do everything Everything doesn t have to be done at once Someone must be willing to take a lead on almost all steps Small steps are usually better than no steps Preservation should not be postponed until a perfect solution appears. Collin Webb Digital Preservation A Many Layered Thing 34 International Atomic Energy Agency 17

Part 3 Review of Main IAEA Knowledge Preservation Projects 35 International Atomic Energy Agency Main Preservation Activities INIS NCL Production Digitization of INIS NCL Microfiche Digitization of older IAEA and Member States Information Preserving Web-based information resources (evaluation project initiated) 36 International Atomic Energy Agency 18

INIS NCL (Non-Conventional Literature) full-text collection Contains knowledge about peaceful nuclear sciences&technologies (collected by Member States for over 30 years) Contains over 600 000 documents (many of them can t be found anywhere else!) 37 International Atomic Energy Agency History 1970 1996 Microfiche Technology US NCL Hard Copy Photo Imaging 1997 Electronic Technology NCL Hard Copy Scanning OCR - 2002 Microfiche NCL on CD-ROM INIS Bibliographic Data 38 International Atomic Energy Agency 19

INIS Members and IAEA NCL Hard Copy Imaging INIS Bibliographic Data Microfiche Electronic Image OCR Searchable Full-text NCL Archive; NCL Database; INIS Document Delivery Network 39 International Atomic Energy Agency INIS NCL Collection Total NCL documents NCL available from INIS NCL in electronic form Pages (electronic) Total NCL pages 791,642 614,971 183,298 > 4,000,000 ~25,000,000 40 International Atomic Energy Agency 20

INIS NCL Collection 63 languages Western languages - 83% English - 70% Cyrillic and Slavic 12% Russian 10% Asian languages 4.5% Japanese 3.5 % Arabic 0.4 % 41 International Atomic Energy Agency INIS NCL Microfiche Archive Digitizing Project 2002 12,000 documents 2003 13,000 documents 2003 45,000 documents Total NCL in electronic form: 183,298 Format PDF (image + hidden text) 42 International Atomic Energy Agency 21

Digitization of Older IAEA and Member State Documents Documents and Records of the IAEA Board of Governors (~5,000; 1957 1996) IAEA Technical Documents (~1,500) IAEA Fast Reactors Initiative (~400 docs + ). Knowledge Package IAEA Technical Reports Series (~3,000) Nuclear Data Reports (~3,500) 43 International Atomic Energy Agency Digitization of Older IAEA and Member State Documents IAEA Nuclear Safety Series IAEA Bulletin IAEA Conference Proceedings Legal Documents (~1,000) French CEA-R Collection (~4,500; 1946-1970) 44 International Atomic Energy Agency 22

Next Step Nuclear Knowledge Packages 45 International Atomic Energy Agency Small steps are usually better than no steps! Thanks for your attention! 46 International Atomic Energy Agency 23

Supplementary Digital Preservation. Technical Primer Digitization is not preservation 47 International Atomic Energy Agency Main steps in digitization process (Digitizing Workflow) Document benchmarking Scanning Quality Control Image Enhancement OCR Output Formats and Compression Archiving & Longevity 48 International Atomic Energy Agency 24

Document benchmarking the first and very important step in digitizing. The results of document benchmarking effect further steps very much (scanning, enhancement, format, etc.) The purpose of document benchmarking is to define/clarify the following: Can the informational content of a document be adequately captured in digital form? Do the physical formats and condition of material correspond to digitizing requirements? Document type Resolution Bit-Depth for colour and grayscale, and threshold for bitonal Output file format and compression 49 International Atomic Energy Agency Document Types Printed Text/Simple Line Art distinct edge-based representation, with no tonal variation, such as a book containing text and simple line graphics Manuscripts soft, edge-based representations that are produced by hand or machine, but do not exhibit the distinct edges typical of machine processes, such as a letter or line drawing Halftones reproduction of graphic or photographic materials represented by a grid of variably sized, regularly spaced pattern of dots or lines, often placed at an angle. Includes some graphic art as well, e.g., engravings Continuous Tone items such as photographs, watercolors, and some finely inscribed line art that exhibit smoothly or subtly varying tones Mixed documents containing two or more of the categories listed above, such as illustrated books 50 International Atomic Energy Agency 25

Document Types 51 International Atomic Energy Agency Resolution 52 International Atomic Energy Agency 26

Resolution 100 dpi 50 dpi 53 International Atomic Energy Agency Resolution is determined by the number of pixels used to represent the image, expressed in dots per inch or as pixel dimensions. Increasing resolution enables the capture of finer detail. At some point, however, added resolution will not result in an appreciable gain in image quality, only larger file size. The key is to determine the resolution necessary to capture all significant detail present in the source document. Main approach to imaging: No More, No Less 54 International Atomic Energy Agency 27

Resolution 100 200 300 400 500 resolution 55 International Atomic Energy Agency Resolution Electronic Access and Display Screen resolution (800x600; 1024x768) 50 150 dpi Reproduction/Printing 300-400dpi (8-bits for greyscale and 16/24-bits for colour) Preservation 400 dpi for text 600 dpi for photographs 56 International Atomic Energy Agency 28

Colour System (RGB) Red, Green, Blue 57 International Atomic Energy Agency Colour System (CMYK) Cyan, Magenta, Yellow, Black 58 International Atomic Energy Agency 29

Colour Systems Save colour images as RGB files Avoid CMYK for master image files! 59 International Atomic Energy Agency Colour/Greyscale/Bitonal Bit Depth number of bits of data representing each pixel (dot) of image Number of tones for colour and greyscale images = 2 (Bit Depth) 1 bit black & white (bitonal) = 2 1 2 bits 4 tones = 2 2 4 bits 16 tones = 2 4 8 bits 256 tones = 2 8 16 bits 65,536 tones = 2 16 60 International Atomic Energy Agency 30

Colour/Greyscale/Bitonal Bit Depth: When a 24-bit image (left) is reduced to an 8-bit one (right), the color reduction may result in quantization artifacts 61 International Atomic Energy Agency Bitonal/Greyscale/ Colour Bit Depth: Left to right - 1-bit bitonal, 8-bits grayscale, and 16-bits color images. 62 International Atomic Energy Agency 31

Size of file with scanned image file size (in bytes) = H *W*(Bit depth)*(dpi) 2 /8 H height of image (in inch) W width of image (in inch) dpi resolution (dots per inch) 63 International Atomic Energy Agency Size of file with scanned image Examples: 1. A4, 300 dpi, Bitonal file size (in bytes) = 8.5 *11*(1)*(300) 2 /8 = 1,05 MB (uncompressed) 2. A4, 300 dpi, 256 tones file size (in bytes) = 8.5 *11*(8)*(300) 2 /8 = 8,4 MB (uncompressed) 64 International Atomic Energy Agency 32

Image Enhancement Deskewing (100% INIS documents) Despeckling Black border removing 65 International Atomic Energy Agency OCR OCR (Optical Character Recognition) No longer based on optical processing OCR s/w algorithms process Raster bit maps ICR (Intelligent Character Recognition) Became synonymous with OCR 3D OCR Uses greyscale/colour information to improve character recognition of low resolution images (50-150 dpi) 66 International Atomic Energy Agency 33

Required OCR Accuracy For full text searching Above 75% For republishing documents Above 99.9% (5 errors per 5000 characters) 67 International Atomic Energy Agency Output Formats and Data Compression To ensure necessary level of quality To save space and time Lossless technology File reconstruction is identical to original image Lossy technology A certain amount of original information discarded during imaging (compression) process 68 International Atomic Energy Agency 34

Output Formats and Data Compression TIFF -Tag Image File Format Most common standard for archiving TIFF G4 (group 4 fax compression lossless) for black & white images PDF -Portable Document Format Most common standard for electronic publishing JPEG Joint Photographic Experts Group For colour images (allows lossless option) JPEG2000 New wavelet technology Many others 69 International Atomic Energy Agency PDF/A PDF/X XML PDF - Portable Document Format Text/Hypertext Image Image + Text 70 International Atomic Energy Agency 35

Output Formats and Data Compression 71 International Atomic Energy Agency 36