Starting a Digitization Project: Basic Requirements

Similar documents
INTRODUCTION TO COMPUTER GRAPHICS

Technical Aspects of Digitization

Specific structure or arrangement of data code stored as a computer file.

Digital Images. Digital Images. Digital Images fall into two main categories

Topics. 1. Raster vs vector graphics. 2. File formats. 3. Purpose of use. 4. Decreasing file size

Mark Sullivan Digital Library of the Caribbean

Factors to Consider When Choosing a File Type

UNIT 7C Data Representation: Images and Sound

PCCLUB.ORG.UK Tuesday, 3 rd May 2005 Stuart Crump. Picture Editing, Printing & Publishing Tutorial 1 of 2

PRESERVATION OF INFORMATION MANAGEMENT IN DIGITAL ERA

UNIT 7C Data Representation: Images and Sound Principles of Computing, Carnegie Mellon University CORTINA/GUNA

Glossary Unit 1: Hardware/Software & Storage Media

Fundamentals of Multimedia

Digital Imaging and Image Editing

Digital Imaging & Photoshop

4 Images and Graphics

Slide Scanning Converting Your Film Photographs to Digital. Presentation to UCHUG - 8/06/08 G. Skalka

Recordkeeping for Good Governance Toolkit. GUIDELINE 15: Scanning Paper Records to Digital Records

B.Digital graphics. Color Models. Image Data. RGB (the additive color model) CYMK (the subtractive color model)

CD: (compact disc) A 4 3/4" disc used to store audio or visual images in digital form. This format is usually associated with audio information.

ISO/TR TECHNICAL REPORT. Document management Electronic imaging Guidance for the selection of document image compression methods

Digital photo sizes and file formats

LECTURE 02 IMAGE AND GRAPHICS

Image Perception & 2D Images

CS 262 Lecture 01: Digital Images and Video. John Magee Some material copyright Jones and Bartlett

Raster (Bitmap) Graphic File Formats & Standards

State Library of Queensland Digitisation Toolkit: Scanning and capture guide for image-based material

STANDARD ST.67 MAY 2012 CHANGES

DIGITISATION FOR PRESERVATION AND ACCESS A technical perspective

Digitization as a Means of Preservation of Manuscripts: Case study of Osmania University Library

IMAGE SIZING AND RESOLUTION. MyGraphicsLab: Adobe Photoshop CS6 ACA Certification Preparation for Visual Communication

STANDARDS? We don t need no stinkin standards! David Ski Witzke Vice President, Program Management FORAY Technologies

NXPowerLite Technology

V Grech. Publishing on the WWW. Part 1 - Static graphics. Images Paediatr Cardiol Oct-Dec; 2(4):

raw format format for capturing maximum continuous-tone color information. It preserves all information when photograph was taken.

Dr. Shahanawaj Ahamad. Dr. S.Ahamad, SWE-423, Unit-06

The Need for Data Compression. Data Compression (for Images) -Compressing Graphical Data. Lossy vs Lossless compression

Scanning. Records Management Factsheet 06. Introduction. Contents. Version 3.0 August 2017

Lecture - 3. by Shahid Farid

Nature of Archives. Preservation. Preservation. Preventive Preservation. Preservation Management

What You ll Learn Today

Images and Graphics. 4. Images and Graphics - Copyright Denis Hamelin - Ryerson University

BEST PRACTICES FOR SCANNING DOCUMENTS. By Frank Harrell

CMPT 165 INTRODUCTION TO THE INTERNET AND THE WORLD WIDE WEB

The next table shows the suitability of each format to particular applications.

CGT 511. Image. Image. Digital Image. 2D intensity light function z=f(x,y) defined over a square 0 x,y 1. the value of z can be:

GUIDELINES FOR THE CREATION OF DIGITAL COLLECTIONS

15110 Principles of Computing, Carnegie Mellon University

15110 Principles of Computing, Carnegie Mellon University

*Which code? Images, Sound, Video. Computer Graphics Vocabulary

HTTP transaction with Graphics HTML file + two graphics files

Using Adobe Photoshop

DOCUMENT IMAGING AND DOCUMENT MANAGEMENT PRE-CONFERENCE WORKSHOP Saturday, September 30,2000 and Sunday, October 1,2000

Category: Data/Information Keywords: Records Management, Digitization, Imaging, Image capture, Scanning and Indexing

Digital Images: A Technical Introduction

Applying mathematics to digital image processing using a spreadsheet

Bitmap Vs Vector Graphics Web-safe Colours Image compression Web graphics formats Anti-aliasing Dithering & Banding Image issues for the Web

Digital Imaging - Photoshop

Understanding Image Formats And When to Use Them

Creating Digital Artwork

March 31, Welcome to the Family Tree Maker Users Group!

Digital images and image compression

Digital Image Processing Introduction

A raster image uses a grid of individual pixels where each pixel can be a different color or shade. Raster images are composed of pixels.

Photoshop 01. Introduction to Computer Graphics UIC / AA/ AD / AD 205 / F05/ Sauter.../documents/photoshop_01.pdf

Digital Files File Format Storage Color Temperature

CS101 Lecture 19: Digital Images. John Magee 18 July 2013 Some material copyright Jones and Bartlett. Overview/Questions

Graphics for Web. Desain Web Sistem Informasi PTIIK UB

Bitmap Image Formats

Digital Asset Management 2. Introduction to Digital Media Format

Know your digital image files

Image optimization guide

Color, graphics and hardware Monitors and Display

WordPress Users Group Manchester, NH July 13, Preparing Images for the Web. Daryl Johnson SvenGrafik

apt solutions, inc. Tips Graphics - An Introduction Vector vs. Raster Graphics Vector Graphics

Color & Compression. Robin Strand Centre for Image analysis Swedish University of Agricultural Sciences Uppsala University

Computers & Philately Overview

Recent Trends of Using ICT in Modern College Libraries

4/9/2015. Simple Graphics and Image Processing. Simple Graphics. Overview of Turtle Graphics (continued) Overview of Turtle Graphics

Image Optimization for Print and Web

Preservation of nuclear information and records

This report provides a brief look at some of these factors and provides guidelines to making the best choice from what is available.

SCANNING IMAGES - USER S GUIDE. Scanning Images with Epson Smart Panel and PhotoShop [for Epson 1670 scanners]

Capturing and Editing Digital Images *

Scanning Archival Images

3 Suggested workflow for Photographers and Compilers

PHOTO 11: INTRODUCTION TO DIGITAL IMAGING

Digital imaging or digital image acquisition is the creation of digital images, typically from a physical scene. The term is often assumed to imply

Portfolio Primer University of Minnesota School of Architecture College of Design

COMPSCI 111 / 111G Mastering Cyberspace: An introduction to practical computing. Digital Images Vector Graphics

Resolution: The Peanut Butter Analogy

1. Describe how a graphic would be stored in memory using a bit-mapped graphics package.

Elements of Design. Basic Concepts

Chapter 8. Representing Multimedia Digitally

Resizing Images By Laurence Fenn

A Guide to Image Management in Art Centres. Contact For further information about this guide, please contact

PHOTOGRAPHY AND DIGITAL IMAGING

CHAPTER 8 Digital images and image formats

MODULE No. 34: Digital Photography and Enhancement

ITP 140 Mobile App Technologies. Images

Transcription:

Starting a Digitization Project: Basic Requirements Item Type Book Authors Deka, Dipen Citation Starting a Digitization Project: Basic Requirements 2008-11, Publisher Assam College Librarians' Association Download date 11/05/2018 14:01:54 Link to Item http://hdl.handle.net/10150/105599

Starting a Digitization Project: Basic Requirements Dipen Deka JRF, DLISc Gauhati University Abstract: Digitization of the resources of the libraries and the information centres has become very essential for preservation as well as better access of the resources. Here an overview has been given about the basic requirements needed to start a digitization project. These include the hardwares and the softwares and some of the important technical issues. Keywords: Digitization, Scanner, Digital Camera, PDF, compression, resolution. 1. Introduction: Application of computers in libraries has encouraged the creation of surrogate of the printed documents. At the outset these resources were basically on magnetic tapes or on CD ROM. But with the entry of web technologies we can notice a rapid change in the form and access mode of the digital resources. The resources which can be processed only with the computers are called the digital resources. These are stored in the form of 0s and 1s. The process of making digital resource is called digitization. What are the advantages of digitization that have encouraged the process of digitization? The aim of expressing an object in numbers is that it can be stored and manipulated by computers. Computers are number crunchers, performing millions of calculations per second. By digitizing an original and placing a digital copy of it on a computer, the file can be manipulated, transferred, and stored with ease. (Wentzel, Larry, 2006, p. 11) The digital resources can be divided into the following types The first kind of the digital resources is in the born digital form, of which there is no printed counterpart. The second type is those resources which are available in both digital and printed form. The third kind of the digital resources is those resources which are primarily in printed form and converted into digital form. They may be simply digitized images or images that are converted to text by the process of Optical Character Recognition (OCR). The process of digitization involves two major sets of activities: (1) The process of digital conversion whereby source materials are converted into digital form, and (2) The processing of the digitized information, which involves several activities related to the storage, organization, processing and retrieval of digitized information. (Choudhury & Choudhury, 2007, p.104) Here we will discuss the third types of digital resources and first set of activities involved in the process of digitization in detail. 2. Hardware requirements:

For scanning we need the following components based on our requirement. Microcomputer Scanner or Digital Camera 2.1 Microcomputer: While selecting the model we can see the following components: Processor Pentium 4 or higher Memory - minimum 1 GB Hard drive minimum 160 GB Monitor- 17 inch or larger Video card- minimum 128 MB Video card Optical drive- CD-RW and/or DVD-R optical drive 2.2 Scanner: The selection of the right scanner is a more difficult job than selecting the right computer. Scanners are used to capture the image of the resources in printed form or from the microfilm. There are two types of image scanner; vector scanner and raster scanner. The vector image interprets the image as a set of x, y coordinates. In case of raster scanner images are captured by passing light down the page and digitally encoding it row by row. The following are the mostly used scanners: Flatbed scanner Overhead scanner Sheet-fed scanner Microfilm scanner 2.2.1 Flatbed scanner: It is like a photocopier, where a lamp moves slowly across the face of the original, and the reflected light is focused through a series of mirrors and lens onto the recording medium. Here the recording medium is compact light sensor, either a Charged Coupling Device (CCD) or Contact Image Sensor (CIS), each of which is composed of hundreds or thousands of elements. When light strikes each element, the intensity of the light is assigned a number. The numeric reading of light intensity and element position are recorded in sequence into a file, which forms the digital version of the original. 2.2.2 Overhead scanner: This type of scanner is quite expensive as compared to the flatbed scanner, but when we need to capture the extremely fragile materials then in that case it provides the answer. We should avoid the overhead scanner that scans only in black and white. 2.2.3 Sheet-fed scanner: In this type of scanner we have to slide sheets of paper through the scanner. It is not good for capturing images of loose manuscripts, photographs, fragile materials etc. 2

2.2.4 Microfilm scanner: It is a good choice for microfilm, photographs, slides and negatives. But it has the limitation of size of the scanning. The following points to be considered while choosing the right scanner for the digitization project: Scan area: The dimension or the area the scanner is capable of scanning is the scan area. The scan areas are determined by inches and/or media sizes such as 8 ½ X 11 inch (standard letter) 8 ½ X 14 inch (legal) 11 X 17 inch (ledger) Resolution: It indicates the resolution at which the image should be scanned. Resolution is a grid pattern that the original image is segmented into. The number equals the number of pixels captured per inch, abbreviated as dpi (dots per inch) or ppi (pixels per inch). The higher the resolution the finer the grid used to segment the image. The file size will increase with the increase of optical resolution. Optical and interpolated resolution is the two different resolution types based on how they are generated. Optical resolution is the maximum number of pixels a scanner is capable of capturing. Interpolated resolution is artificially generated by the scanning software; the software takes pixels captured by the scanner, expands the grid pattern, and guesstimates pixels that lie between the pixels that were captured by the scanner. Colour depth: Colour images are created through the combination of three colours: red, green, and blue. Then reflected light from the scanned image is separated into three separate colour channels. The colours are described in bits. If we express each colour in 8 bits there will be 16.7 million colours and when express in 16 bits a total 281 trillion colours will be generated. 2.3 Digital Camera: Digital camera is a good choice for digitization of not only the valuable documents of an organization, but we can use it for other different purposes like taking the pictures of the organization and its different sections, the staff etc. and can upload these on the website of the organization. When we have to digitize the damaged materials which cannot be moved and capture the image without disturbing their position, investing in a digital camera is a better choice. 3. Software requirements: The following software programmes are required to create digital publication HTML editor XML editor Text editor Image editor Scanning software OCR software FTP software Page layout and design software PDF software 3

3.1 HTML editor: There are two types of HTML editors: text base editors and What You See Is What You Get (WYSIWHG) type editor. Microsoft Notepad, Microsoft Word is text based HTML editors. Microsoft FrontPage, Netscape Composer, is examples of WYSIWYG type HTML editor. 3.2 XML editor: The XML editors help in the markup process and shows tags in a graphical and hierarchical display. XMetal, XML Spy are some XML editor soft wares which are very costly. If budget does not allow us to purchase the costly software then we can use NoteTab Pro for a minimal cost. 3.3 Text editor: The text editing software are used with the OCR software to create text files. Microsoft s Notepad, Word, Corel Word Perfect are some popular text editor. 3.4 Image editor: For archiving and online publishing of images image editor is a must. We can resize images, crop, create image for website, save in multiple formats. Adobe photoshop, PaintShop Pro are some powerful image editor we can use. 3.5 Scanning software: For the proper operation of the scanner we have to install the driver and the scanning software for a particular scanner. Some of scanning software is Docs Open, Documentum, FileNet etc. 3.6 OCR software: The function of OCR software is to convert the captured images into text with the help of word processor. The OCR software helps by saving time at the time of editing the text files. OmniPage Pro, Prime Recognition are two widely used OCR software. 3.7 FTP software: The File Transfer Protocol (FTP) software is used for uploading our files to the internet. The FTP client connects to your server and allows creating directories and moving the files into the appropriate folder from where they are visible on the internet. Filezilla, SmartFTP,WS_FTP are some FTP software we can use. 3.8 Page layout and design software: Page layout softwares offer more page design facility than the word processors. When we need to create in-house publications or to digitize publications that are in a page layout form page layout and design software is necessary. Adobe PageMaker, are some page layout and design software. 3.9 PDF software: Portable Document Format (PDF) is a very popular file format for storing, disseminating information. In the digitization process the PDF provides the facility of quick and easy solution for viewing and downloading the files. Adobe Acrobat is the leading software for creating, converting and other functions of PDF files. It is freeware software and anyone can download and use it. 4. Technical Issues: 4.1 Compression: Compression is the process of reducing the size of a data file or an image by abbreviating the repetitive information. It helps in economic storage, processing and transmission over a network. Data compression algorithms are of two types- lossless and lossy. Lossless compression uses algorithms which encode repeating elements or patterns within an image. If in an image same colours are present in more than one adjacent pixels then two bytes are used for storing the information. The first byte is used for the colour and the second for the number of adjacent pixels. When the file is decompressed the original image is restored. 4

In lossy compression the compression ratio is much higher than lossless ratio. But the quality of the image degrades in lossy compression. Some of the commonly used protocols are ITU-G4: Developed by International Telecommunication Union (ITU), is popular standard protocol for black and white images. JPEG: Joint Photographic Expert Group (JPEG) is an ISO-10918-I compression protocol. It represents an area that has the same tone, shade, colour, or other characteristics by a code. LZW: Lenpel-Ziv-Welch (LZW) uses a table-based lookup algorithim invented by Abraham Lempel, Jacob Ziv, and Terry Welch.two commonly used file formats in which LZW compression is used are the Graphic Interchange Format(GIF) and the Tag Image File Format(TIFF).(Arora, Jagdish, 2001, p. 19) Fractal and wavelet compression: These lossy compression formats offer advantages for providing access to digital images of oversized materials on the web. It converts the image into mathematical models instead of an array of pixels and thus save storage space. 4.2 File Format: File format for storage, dissemination and preservation of digital resources is one of the most important technical issues to be taken into consideration. File Formats stores different information like size, resolution, compression protocols etc. The scanned image can be stored in different types of file formats for easy storage and retrieval. PDF, SGML, TIFF, MPEG, WAVE are some popular file format used for storing scanned images. 5. Conclusion: We can start a digitization project with the minimum above discussed hardwares and softwares. No one can suggest that a particular hardware and software is useful for starting a digitization project. The librarian has to take the final decision consulting with experts. One of the main problem before the librarian or the in-charge of the digitization project is the rapid changing technology. One should be continuously migrating the data, image and other resources in digital form to the new media and new format so that after spending a lots of money, manpower and resources these would not become outmoded to the users. 6. References: 1. Arora, Jagdish (2001). Building Digital Libraries: Data Capture. (https://drtc.isibang.ac.in/bitstream/handle/1849/129/arora2.pdf?sequence=2) 2. Choudhury, G G & Choudhury, Sudatta (2007). Introduction to Digital Libraries. London: Facet. 3. Wentzel, Larry (2006). Scanning for Digitization Projects [Electronic version]. Library Hi Tech News, 4, 11. 5