Digital Libraries Conversion to Digital Formats Anne Kenney, Cornell University Library 1
What are Digital Images? Electronic snapshots taken of a scene or scanned from documents samples and mapped as a grid of dots or picture elements (pixels) pixel assigned a tonal value (black, white, grays, colors), represented in binary code code stored or reduced (compressed) read and interpreted to create analog version 2
Four Scanning Methods Bitonal Grayscale Color Special Treatment
Digital Image Quality is Governed By: resolution and threshold bit depth image enhancement color management compression system performance operator judgment and care 4
Resolution determined by number of pixels used to represent the image expressed in dots per inch (dpi)-actually dots/sq. inch increasing resolution increases level of detail captured and geometrically increases file size 5
Effects of Resolution 600 dpi 300 dpi 200 dpi
Threshold Setting in Bitonal Scanning defines the point on a scale from 0 to 255 at which gray values will be interpreted either as black or white 7
Effects of Threshold threshold = 60 threshold = 100 8
Bit Depth number of bits used to represent each pixel, typically 8 bits or more per channel representing 256 (28) levels for grayscale and 16.7 million (224) levels for color example: 8-bit grayscale pixel 00000000 = black 11111111 = white 9
Bit Depth increasing bit depth increases the level of gray or color information that can be represented and arithmetically increases file size affects resolution requirements 10
Effects of Grayscale on Image Quality 3-bit gray 8-bit gray 11
Image Enhancement can be used to improve image capture use raises concerns about fidelity and authenticity 12
Effects of Filters no filters used maximum enhancement 13
Image Editing 14
Compression reduces file size for processing, storage, transmission, and display image quality may be affected by the compression techniques used and the level of compression applied 15
Compression Variables lossless versus lossy compression proprietary vs. open schemes level of industry support bitonal vs. gray/color 16
Common Compression Schemes bitonal ITU Group 4: lossless JBIG (ISO 11544): lossless CPC: Lossy DigiPaper grayscale/color LZW, lossless JPEG: lossy Kodak Image Pac, visually lossless Fractal and Wavelet compression 17
Effects of JPEG Compression 300 dpi, 8-bit grayscale uncompressed TIFF JPEG 18.5:1 compression 18
Compression Observations the richer the file, the more efficient and sustainable the compression the more complex the image, the poorer the compression 19
Equipment used and its performance over time scanners offer wide range of capabilities to capture detail, dynamic range, and color scanners with same stated functionality can produce different results calibration, age of equipment, and environment affect quality 20
Equipment used and its performance over time attributes and capabilities of monitor and/or printer are also factors assess quality visually and computationally use targets control QC environment increasing availability of software to assess resolution, tone, color, artifacts 21
Image Capture: Create digital objects rich enough to be useful over time in the most cost- effective manner. 22
How to determine what s good enough? Connoisseurship of document attributes Objective characterizations Translation between analog and digital measurement to scanning requirement to corresponding image metrics e.g., detail size resolution MTF tonal range bit depth signal-to-noise ratio 23
Case Study Brittle Books--printed text, use of metal type, commercial publishers, objective measurement, use of Quality Index from micrographics 600 dpi 1-bit capture adequately preserves informational content of text-based materials 24
Ensuring Full Informational Capture: No More, No Less image quality and utility desired point of capture cost 25
Create One Scan To Serve Multiple Uses Derive alternative formats/approaches to meet current and future information needs Base derivative requirements on document attributes, technical infrastructure, user requirements, and cost Understand technical links affecting presentation and utility of derivatives 26
User Requirements completeness legibility speed of delivery cooked files 27
Derivatives from a Digital Master the richer the image, the better the derivative a derivative from a rich file is superior in quality to one from a poorer scan the richer the image, the better the image processing 28
monitor: 800 x 600 pixels 800 600 document at 60 dpi 480 pixels x 600 pixels 2,000 pixels 1,600 pixels document: 8 x 10, 200 dpi (1,600 x 2,000 pixels) document at 100 dpi 800 pixels x 1,000 pixels
Compression/File Format Comparison for Derivative Files TIFF Uncompressed JPEG Compressed GGIF Compressed 6:1 (NARA) 20:1 ( LC) Compressed 6:1 (NARA)
Alternatives for Displaying Oversize Images File formats and compression schemes that support multi-resolution image delivery, e.g., wavelet compression, GridPix, Flashpix User tools for representing scale (Blake Project ImageSizer, java applet), and improving image quality 33
Recommendations Coalescing Intent of conversion drives decisions issues of access considered at conversion notion of long-term utility and crossinstitutional resources gaining ground Access images will change with: changing user needs and capabilities changes in technologies: file formats, technical infrastructure,compression, web browsers, processing programs, scaling routines 34