Technical Aspects of Digitization

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Technical Aspects of Digitization"

Transcription

1 Chapter 4 Technical Aspects of Digitization 4.1 Digitization The process of converting printed resource into digital resource is digitization. What are the advantages of digitization that have encouraged the digitization of vast amount of analogue resources? The aim of expressing an object in numbers is that it can be stored and manipulated by computers. Computers are number crunchers, performing millions of calculations per second. By digitizing an original and placing a digital copy of it on a computer, the file can be manipulated, transferred, and stored with ease (Wentzel, Larry, 2006, p. 11). 4.2 Process of Digitization Digitization is the process of converting something into the numerical form that can be processed by computers. The main objective behind digitization is the storage and manipulation and access to the resources. The requirement of less storage space and the access of the resources without wear and tear by a large number of users have encouraged the digitization of the printed resources. The process of digitization involves two major sets of activities: (i) The process of digital conversion whereby source materials are converted into digital form, and (ii) The processing of the digitized information, which involves several activities related to the storage, organization, processing and retrieval of digitized information (Choudhury & Choudhury, 2007, p.104). The stages that are involved in the digitization process are scanning, indexing, storing, and retrieval of which detailed discussion is made below. 56

2 Scanning Using an electronic image scanner or a digital camera, the source document which is in printed form is converted to an electronic image. In this process, the source document is scanned at a predefined resolution and bit depth. The images are stored in files where for each pixel the binary digits (bits) are stored and it is called bit-map page image. The software used for scanning are used for formatting, tagging, storage and retrieval of the scanned image. Indexing In this step, the scanned image files are indexed by linking the database of the scanned image to a text database. The text database links the set of images according to keyword and location of the image in the image database. Some scanning software does manual keying in of the indexing term to the image files, while some others facilitates selection of indexing term from the image files. Storing The file of the scanned image is saved or stored for further processing. The file size of these files depends upon many factors like resolution used in scanning, the scan area, compression technique used, and file format used for the scanned image. The scanned image is stored in offline storage media like CD-ROM or DVD-ROM, external hard disc, snap servers, etc. Retrieval Retrieval is a part of scanning failing which scanning will be of no use. While scanning a document it is stored in the machine resulting two files. The first file will hold the image along with a key to the second file where the location of the document is stored. In retrieving the document already scanned the second file of which the key is linked with the first file retrieves the document from the system (Arora, 2001). 57

3 4.3 Technological Background of Digitization Digital images are represented by set of pixels or bits. These bit-mapped images cannot be searched like an ASCII file. But by applying Optical Character Recognition (OCR) technology, a bit-mapped image file can be converted to an ASCII file. Some technological specifications responsible to control the quality of the scanned image are discussed below. Bit Depth Bit is the abbreviated form of Binary Digit. 0 and 1 are the two values of bit. Bits are used to describe the range of shades between pure black and pure white. Black and white files are called 1-bit, as there are two shades, black and white. The bit depth or colour depth of a scanner is an indication of the range of colours that can be captured by the scanner. It does not define the limits of the colour range that is readable by the device but simply specifies the number of separate distinct colours. A higher figure will equate to a more accurate description of the colours available to the scanner but does not necessarily mean that they are available to the user at the end of the process. Scanners will often capture at a larger bit depth of bit and then save or export from the scanner in standard 24 bit RGB (Red Green Blue) colour. This extended colour depth is used internally by the scanner to produce the best possible quality original image data but is not normally available to the user. Although recently, there has been a move towards some scanners allowing the full size hi-bit version of the file to be saved and edited as a 48 bit TIFF (Taged File Format) or PNG (Portable Network Graphic). The colour depth, in itself, does not provide much evidence of the quality of the scanner, however it does give some guidance to how capable the scanner might be if it can use all the colour data it produces. Resolution Before going for scanning a document the resolution is to be decided in the form of dpi (dots per inch) or ppi (pixels per inch) which indicated the quality of the document scanned. It is to be noted that the higher we accept the resolution the more the dpi/ppi will be 58

4 (Wentzel, 2006). This can be decided at a certain level which depends on document to document. The higher the resolution, the finer the grid used to segment the image But the higher the resolution used, the file size will be more, i.e. resolution and the file size are related proportionally. Optical and interpolated resolutions are the two different resolution types based on how they are generated. Optical resolution is the maximum number of resolution a scanner is capable of capturing. Interpolated resolution is artificially generated where the software gets pixels captured by the scanner, expands the grid pattern, and estimates the pixels that were captured by the scanner. Many recommendations are put forwarded for selection of proper resolution to achieve good quality scanning for different types of documents. Wentzel (2006) in Scanning for Digitization Projects has put forwarded the following recommendations. Normal web image- 72 dpi GIF/JPEG Minimum gray/color print setting-150 dpi JPEG Optimal color print setting- 300 dpi TIFF Optimal setting for running pages of text through OCR- 300 dpi TIFF Best black and white print setting- 600 dpi TIFF Archival setting (all colors) dpi TIFF Again, the Digital Library Federation (DLF) has also recommended to use 300 dpi 24-bit color TIFF for images and 600 dpi 1-bit bitonal TIFF for pages of text < Based on the facilities available and the type of documents to be scanned, the resolution may be adjusted. Threshold To scan the pages where text or drawings are there, bitonal scanning is used. It is also known as binary or black and white scanning where one pixel is represented by one bit. In black and white photograph where intermediate or continuous tones are there, gray scale 59

5 canning is used. For the scanning of colour photographs, colour scanning is used. Bitonal scanning has the fastest processing. On the other hand, grayscale will provide more accurate results, especially on degraded or shaded background documents. Colour scanning helps to retain colour information and/or colour graphics in the source document. The threshold setting in bitonal scanning defines the point on a scale, usually ranging from 0 to 255, at which gray values will be interpreted as black or white pixels (Arora, 2001, p. 17). The threshold setting determines the image quality in bitonal scanning. Compression The size of the scanned image is very big if the source document is scanned with high resolution. Therefore, to make the files manageable by the computer system and by the user, it is necessary to reduce or compress the file size. Compression is the process of reducing the size of a data file or an image by abbreviating the repetitive information such as one or more rows of white bits to a single code (Arora, 2001, p. 17). It helps in economic storage, processing and transmission over a network. Data compression algorithms are of two types- lossless and lossy. a) Lossless compression It uses algorithms which encode repeating elements or patterns within an image. If in an image same colours are present in more than one adjacent pixels then two bytes are used for storing the information. The first byte is used for the colour and the second for the number of adjacent pixels. When the file is decompressed, the original image is restored. b) Lossy compression In this type, the compression ratio is much higher than lossless ratio. But the quality of the image degrades in lossy compression. Some of the commonly used compression protocols are i) ITU-G4: Developed by International Telecommunication Union (ITU), is a popular standard protocol for black and white images. 60

6 ii) JPEG: Joint Photographic Expert Group (JPEG) is an ISO I compression protocol. It represents an area that has the same tone, shade, colour, or other characteristics by a code. iii) LZW: Lenpel-Ziv-Welch (LZW) uses a table-based lookup algorithim invented by Abraham Lempel, Jacob Ziv, and Terry Welch are two commonly used file formats in which LZW compression is used are the Graphic Interchange Format(GIF) and the Tagged File Format(TIFF).(Arora, 2001, p. 19) iv) Fractal and wavelet compression: These lossy compression formats offer advantages for providing access to digital images of oversized materials on the web. It converts the image into mathematical models instead of an array of pixels and thus save storage space. Enhancement The image enhancement process can improve the quality of the image that is captured by using the scanning device. editor software helps in this process. For archiving and online publishing of images image editor is a must. We can resize images, crop, create image for website, save in multiple formats (Deka, 2008, p. 171). According to Arora (2001) we can decompose the scan area into small areas and can be treated for further improvement of the image quality. There is lots of image editing software which can be used for image enhancement like Adobe photoshop, PaintShop Pro, etc. 4.4 File Formats File format for storage, dissemination and preservation of digital resources is one of the most important technical issues to be taken into consideration. One of the key components in ensuring resource longevity is the choice of file and media formats used to create, store, and deliver digital content, and the strategies that are employed to manage these in the long term (Williamson, 2005, p. 508). File Formats stores different information like size, resolution, compression protocols, etc. The scanned image can be stored in different types of file formats for easy storage and retrieval. PDF, SGML, TIFF, 61

7 MPEG, WAVE are some popular file formats used for storing scanned images. We have mainly two types of file formats which are as follows. Open File Format Open file format which is freely available for use is free from patent or license issue and can be used by anyone in any proprietary or free or open source software. An open standard approach brings a wide range of benefits (Williamson, 2006). These are Resources are freed from dependencies on a single application or particular hardware platforms; Resources can be preserved and accessed over the long term. Open Document, Office Open XML, PNG, JPEG 2000, ZIP are some of the examples of open file formats. Proprietary File Format Proprietary file format is owned either by an individual or an organization and they protect it from unauthorized use by using the patent or license. These formats are owned by an organization or group (e.g. Microsoft), may sometimes be accepted as de facto standards through sheer ubiquity, and might even be referred to as standards, but cannot be regarded as open since the owner could theoretically choose to change the format or conditions of usage at any time (Williamson, 2005, p. 509). A list of file formats for different media types along with the creator, date of creation, media types and formats is given in the next page. 62

8 Table 4.1 List of File Formats Sl. File Name No 1 Advanced Audio Coding 2 Advanced Authoring Format 3 Apple QuickTime 4 Audio Interchange File Format 5 Audio Video Interleave File Extension Creator Creation Date Media Type Format.aac Collaboration between 1997 Sound Lossy corporations approved Compression by MPEG.aaf Advanced Media 2000 Moving Uncompressed Workflow Association.mov Apple Computer, Inc Moving Container.aiff Electronic Arts Interchange and Apple Computer, Inc Sound Uncompressed.avi Microsoft 1992 Moving Container 6 Bitmap.bmp IBM and Microsoft 1988 Still Compressed or Uncompressed 7 Broadcast.bwav IBM and Microsoft 1997 Sound Uncompressed Wave File 8 Digital Video.dv or.dif Sony 1994 Video Uncompressed File 9 Extensible.xmf The MIDI 2001 Moving Container Music Format Manufacturers Association, XMF Working Group 10 Final Cut Pro.fcp Final Cut Pro/Apple 1999 Moving Uncompressed Computer, Inc. 11 Flash Video.swf (or.flv) Adobe/Macromedia 1997 Moving Moving /Dynamic 12 Graphics.gif CompuServe 1987 Still Lossless 63

9 Interchange Compression Format 13 JPEG.jpg Joint Photographic Experts Group 1990 Still Lossy Compression 14 Keynote.key Apple Computer, Inc Presenta Container tion 15 Material.mxf Pro-MPEG Forum 2004 Moving Container Exchange Format 16 MPEG-1 or.mpg Motion Picture Experts 1988 Moving Container MPEG-2 Group 17 MPEG-1/2 Audio Layer.mp3 Motion Picture Experts Group 1991 Sound Lossy Compression 3 18 MPEG-4.mp4 Motion Picture Experts 1998 Moving Container Group 19 Ogg Vorbis.ogm Ogg Vorbis 2003 Moving Container Compressed Video 20 Open Office.odp Sun Microsystems 2000 Presenta Container Impress tion 21 Photoshop.psd Adobe 1990 Still Uncompressed Document 22 Portable Network.png The Portable Networks Graphics Development 1996 Still Lossless Compression Graphics Group of the World Wide Web Consortium 23 Power Point.ppt Microsoft 2003 Presenta Container Document tion 24 Raw File.dng,.cr2,.nef,.arw, and.srf Depends on equipment manufacturer 2000 Still Uncompressed 64

10 25 RealAudio File Format 26 Scalable Vector Graphics 27 Tagged File Format 28 WAVE Form Audio Format.ra RealMedia 1995 Sound Compressed.svg The World Wide Web 1999 Still or Uncompressed Consortium Moving.tiff Aldus 1985 Still Container or Uncompressed.wav IBM and Microsoft 1992 Sound Uncompressed (Source: Hardware Used for Digitization For capturing the image of the source document we need some devices. Scanner is generally used for image capture from textual document, image or from other sources. A discussion regarding the hardware used in the process of digitization is given below Scanner Scanners can be called as a photocopier. In case of a flatbed scanner, a moving lamp throws light onto the object to be digitized and the reflected light is focused through a series of mirrors and lenses onto the recording medium. In case of a flatbed scanner, the recording medium is a compact light sensor, either a CCD (Charged Coupling Device) or CIS (Contact Sensor), each of which is composed of hundreds or thousands of elements. When light strikes each element the intensity of the light is assigned a number. The numeric reading of light intensity and the element position are recorded in sequence into a file which forms the digital version of the original. Following features should be analysed first in a scanner selection process. 65

11 a) Driver of scanner Driver is a software that operates the scanner and transfer the digitized file to the hard drive or software. The scan driver may be a standalone or a plug-in, a specialized version of the driver that is accessible through Photoshop, word or other programme. The standalone driver runs the scanner without involving other software and saves the file to the hard drive. Plug-ins are opened within Photoshop or word and after scanning and the files can be used immediately in Photoshop or Word. Scan driver falls into two groups: native and third party. Flatbed scanner manufacturers provide their own native driver for their scanner and provide updates for the drivers through the website. In case of specialized scanners, such as overhead book scanners or the digital cameras, the native driver is the only driver available. Third party scan drivers offer better control over the scanner and scanned image than the native drivers. These drivers are to be procured unless they are supplied with the scanner as an incentive. Windows Acquisition (WIA) is a third party scan driver provided by the Microsoft Windows XP. It has offered the most commonly available features used by all flatbed scanners. However, the specialized scanners cannot be operated with the WIA. b) Scanning speed Scanning times varies depending upon the type of scanner used. Within a busy workflow, scanning speed often can be a deciding factor in scanner choice and should always be researched and considered before a choice is made. Many scanners offer a choice of differing qualities of scan which is dependent upon the number of passes and/or speed of the CCD: the more passes the CCD makes, the higher the quality and the slower the scanning speed. Some early scanners were unable to scan Red, Green and Blue data in one go (one-pass) and had to make three separate scans (three-pass). This does not normally affect the quality but was very slow. Some scanners offer functions such as dust and noise reduction, however, this also slows down the process significantly. 66

12 c) Scan area The dimension or the area the scanner is capable of scanning is the scan area. The scan areas are determined by inches and/or media sizes such as 8 ½ X 11 inch (standard letter) 8 ½ X 14 inch (legal) 11 X 17 inch (ledger) Most flatbed scanners have a nominal size of A4 but can scan an area of about 8.5" by 12-14". A3 sized scanners are available but they can take up a considerable amount of space. They are, of course, essential if it becomes necessary to capture works (overa4) although if the objects are very large or difficult to handle a digital camera might well offer a more pragmatic alternative. Hi-end A3 flatbed scanners are very popular with commercial digitization as they can be set up to scan a number of images at one go. This offers greatly increased efficiency and increased throughput. But these machines are very costly. Some flatbed scanners offer the addition of dual optics where the optional system can be switched to scan a sweet-zone which offers a smaller scan area with a greatly increased resolution. This is normally of use when scanning small to medium sized transparencies within the full size of the scanner bed. There are range of optional add-on parts that can provide additional functionality and productivity for many mid-range to high-end scanners. Two of the most common options for flatbed scanners are the automatic sheet/transparency feeder (ASF/ATF) and the transparency media adapter (TMA). An ASF or ATF is used to batch scan quantities of single sheets or transparencies. Normally ASF/ATF is best for creating small and low quality scans, either 1-bit black and white images from text for later optical character recognition (OCR) or small scans for thumbnail creation. TMA provides an alternative light source within the scanner which enables transparent artworks such as photo-slides and larger colour transparencies to be scanned. 67

13 d) Scanner types The selection of the right scanner is a more difficult job than selecting the right computer. Scanners are used to capture the image of the resources in printed form or from the microfilm. There are two types of image scanner based on interpretation of the image; vector scanner and raster scanner. The vector image interprets the image as a set of x, y coordinates. In case of raster scanner images are captured by passing light down the page and digitally encoding it row by row. i) Drum scanner: Drum scanners use photo-multiplier tubes (PMT) to produce very high quality results. They typically have a density range of with a dmax at the top of that range. They can offer an optical resolution of up to 8000 samples per inch (spi). Drum scanners are the tool of choice of the print industry and normally used by professional digitization bureaux. This is due to their expense and their complexity requiring skilful operation to get the best from them. Only flexible original artwork can be scanned in a drum scanner as it has to be mounted on a transparent acrylic cylinder (drum) and then spun at high speed around the photo-multipliers within the cylinder. Mounting transparencies on the drum is a slow and skilled operation and it is normal to have at least two drums in use so that one can be mounted whilst the other is being scanned. Fig. 4.1: A Drum Scanner Although the quality from these scanners is exemplary, they tend to be slow and cannot normally provide the level of productivity required from most digitization projects. There 68

14 are also some preservation issues with the standard use of a mounting oil to avoid Newton s rings between the transparency and the drum. If mounting oil is used then the transparencies must be scrupulously cleaned after scanning. ii) Flatbed scanner: It is like a photocopier where a lamp moves slowly across the face of the original and the reflected light is focused through a series of mirrors and lens onto the recording medium. Here, the recording medium is compact light sensor, either a Charged Coupling Device (CCD) or Contact Sensor (CIS), each of which is composed of hundreds or thousands of elements. When light strikes each element the intensity of the light is assigned a number. The numeric reading of light intensity and element position are recorded in sequence into a file which forms the digital version of the original. To enable the scanner to capture colour, they must either make three passes with a Red, Green or Blue filter in front of the CCD or have 3 lines of CCD each with either a Red, Green or Blue filter on top. Fig. 4.2: Flatbed Scanner (HP Scanjet G2410) 69

15 Flatbed scanners are much cheaper than drum scanners and also much easier to operate. The technology and the quality of CCD have improved a lot and still cheaper than drum scanners. Another advantage of it is that it can be operated by unskilled operators as its functions are simple. The document to be scanned does not need to be bent around a drum. Flatbed scanners also offer more scanning speed than drum scanners. Lots of flatbed scanners are available in the market. The major printer production companies have their low cost flatbed scanners which can be used for scanning photographs and loose sheet pages. iii) Overhead scanner: This type of scanner is quite expensive as compared to flatbed scanner, but when we need to capture the image of extremely fragile materials it can be helpful. We should avoid the overhead scanner that scans only in black and white. A photograph of Zeutschel overhead scanner is a popular scanner used by LICs and resource centres for digitization is given below. Fig. 4.3: An Overhead Scanner (Zeutschel os 5000) 70

16 Zeutschel Scanners can be used to digitise books, magazines and other large documents. Special and careful procedures and functions for books are used during scanning. This includes book cradles, radiographic tables, innovative light systems and the creation of documents with the text facing upwards. Depending on customer needs, Zeutschel offers different models for colour, greyscale and black/white. iv) Sheet-fed scanner: In this type of scanner, we have to slide sheets of paper through the scanner. It is not good for capturing images of loose manuscripts, photographs, fragile materials, etc. v) Microfilm scanner: It is a good choice for microfilm, photographs, slides and negatives. But it has the limitation of size of the scanning. The microfilm produced from the original documents can be preserved in ideal condition for a very long time. Fig. 4.4: Microfilm scanner (B-M-I EYECOM MIC5M) The steady growth of digital imaging technology over the last five years has led to a vast range of professional and consumer scanners in the market. Quality and speed are steadily rising and the cost is slowly falling down. However, it remains true that although it is 71

17 possible to buy fast low-quality scanners or slow high-quality scanners at a cheaper price, productive and high-quality scanners tend to still be very expensive Digital Camera Digital camera is a good choice for digitization of not only the valuable documents of an organization but we can use it for different other purposes like taking the photographs of the organization and its different sections, the staff etc. and can upload these on the website of the organization. When we have to digitize the damaged materials which cannot be moved and captured the image without disturbing their position, investing in a digital camera is a better choice. Any modern DSLR (Digital Single Lens Reflex Camera) or point-and-shoot digital camera can be used as a document scanner. We can use a DSLR with a dedicated flash and a lens with some measure of zoom (18-55mm or mm). In order to do this properly, the light in the room where scanning is done should be good enough. Fig. 4.5: Digital Camera Used as Document Scanner 72

18 It is to be properly aligned with the document; otherwise we will get slightly skewed shots which could be a problem. We can use holding arms in order to fasten the camera in place while taking the photographs. Most tripods will not angle down enough for this to work but if we place the document on an easel, it would be feasible to find the right angle for alignment. The researcher has seen using digital camera of Sony to capture image of rare documents while visiting the University Library of Osmania University. 4.6 Software Used for Digitization The scanner can only capture the image of the source document which has to be processed further for enhancing the image quality, image clarity, or make it searchable and accessible by the user in future. For these purposes, we need software like scanning software and Optical Character Recognition (OCR) software Scanning Software For the proper operation of the scanner, we have to install the driver and the scanning software for a particular scanner. In this regard, we have to install the driver and the scanning software for a particular scanner. Scanner software controls the scanning process as well as driving the hardware that captures the image data and passes it on to the next stage of the image workflow. This software usually offers a range of image processing features. Software can either be a device-specific program designed to work with one scanner or a plug-in based on a driver interface such as TWAIN or ISIS which can be accessed from within a host program. Software can play an important role within a workflow in terms of productivity and quality of the scan, so it is important to consider how best to combine the work undertaken by scanning software with that done by image processing software. In addition to setting resolution, scan area and colour greyscale, reflective/transmissive quality, the scanner software can also be used to control colour optimization, colour transmission, sharpening, 73

19 tonal optimization, automated dust/scratch removal, negative to positive image selection, scan quality control, image rotation, batch scanning, etc. Using any of these facilities at the time of acquiring the image can save a lot of time in corrective manipulation later on in the workflow, but it is worth comparing the performance of these functions between the scanner software and the image processing software when deciding which is going to be more effective. Some of scanning software FreeKapture, VueScan etc. FreeKapture 2.0: It is a free Twain image capture application from TSoft that works on any Windows (98 and on) Twain compliant system. TWAIN is, allegedly, an acronym for Technology Without An Interesting Name and is software (a driver) supplied by the manufacturer of TWAIN complaint devices. Using this driver, FreeKapture is able to scan, save and print images (photographs etc.). s are saved in JPG or BMP formats. VueScan: It is an easy-to-use replacement for the software that comes with scanner and supports most flatbed scanners, printer/scanners and film scanners. Over 10 million people have downloaded VueScan since it was first released in VueScan is a powerful scanning tool. It is packed with loads of useful and powerful features and currently supports over 1200 scanners and 321 digital camera RAW files. Scanitto Pro: Scanitto Pro provides one-click scanning and copying utilizing TWAIN drivers which provide exceptional scan and copy quality. In addition, Scanitto Pro integrates with all major operating systems to provide a seamless document management environment which is intuitive and very simple to use. Scanitto Pro is extremely stable and has passed all the major security and operational tests. It supports multiple file formats like PDF, BMP, JPG, TIFF, JP2 and PNG files. Scanitto Pro supports all major European languages supported including English, French, German, Italian, Spanish & Russian. 74

20 4.6.2 OCR Software A scanned document is nothing but a picture of a printed page. It cannot be edited or manipulated or managed or searched based on the content. In other words, scanned documents have to be referred to by their labels rather than characters in the documents. OCR (Optical Character Recognition) software is used to transform scanned textual page image into word processing file. The function of OCR software is to convert the captured image or set of images and generate a file containing that text in ASCII code or in a specified word processing format leaving the image intact in the process. OCR does not actually convert an image into text but rather creates a separate file containing the text. There are four types of OCR technology namely matrix matching, feature extraction, structural analysis and neural networks. In matrix matching, each character is compared with a template of the same character. In feature extraction technology, a character is recognized from its structure and shape based on a set of rules. In structural analysis, the characters are determined on the basis of density gradations or character darkness. A form of artificial intelligence is used in neural networking technology which attempts to minimize the human effort by using fuzzy logic technology and it is also known as ICR (Intelligent Character Recognition). There are lots of OCR software available in the market now-a days. ABBYY FineReader 11 and OmniPage Pro are two of the widely used OCR software. ABBYY FineReader 11: With new support for Arabic (Modern Standard), Vietnamese and Turkmen (Latin), ABBYY FineReader 11 detects any combination of 189 languages. FineReader 11 supports a wide range of output formats. The OCR results can also be sent directly to applications such as Microsoft Word, Excel and PowerPoint, Adobe Acrobat, Corel, WordPerfect and OpenOffice.org TM Writer. It has cutting-edge image correction tools which adjust motion blur, ISO noise, 3D image distortion, brightness, contrast, color levels and curved text for the best possible results. 75

21 OmniPage: The newest version of OmniPage utilizes the latest OCR software technology with greatly increased accuracy and innovative cloud service capabilities and recognition of 123 languages. OCR loses its convenience if the software is too difficult or confusing to use. Such is the risk with any multi-featured software. OmniPage easily navigates around this risk with its intuitive design and logical layout. Even an OCR rookie could navigate through the many features of this software. 4.7 Storage Space of Scanned : An Experimental Study Two files one textual of size 19.2 kb in docx file format and the other image file of size 577 kb in docx file format were created and print outs were taken. Both the pages were scanned using two different types of flatbed scanner. One scanner is Avision FB6280E is an A3 Bookedge scanner and the other one is Canon image Class D 520. The textual document was scanned in different resolutions using black and white option and the file is saved in different file format in both the scanner. In the following table, the different file size of the images saved in different file formats is given. Table 4.2 File Size of B/W Scanned Sl No. File format File size in FB6280 File size in Canon image Class D dpi 300 dpi 600 dpi 200 dpi 300 dpi 600 dpi 1 pdf 152 kb 315 kb 1.04 mb 34.1 kb 95 kb 67 kb 2 bmp 10.6 mb 23.1 mb 95.8 mb 464 kb 1 mb 4.02 mb 3 tiff 6.08 mb 15.5 mb 59.1mb 457 kb 1 mb 1.78 mb 4 jpg 169 kb 350 kb 1.22 mb gif 430 kb 1.29 mb 4.12 mb Similar process was applied for the image printout and was scanned using colour option. The respective file size of the two different types of scanned document saved in different file formats are presented in the following table. 76

22 Table 4.3 File Size of Colour Scanned Sl No. File format File size in FB6280 File size in Canon image Class D dpi 300 dpi 600 dpi 200 dpi 300 dpi 600 dpi 1 pdf 127 kb 283 kb 1.09 mb 57.3 kb 101 kb 1 mb 2 bmp 6.32 mb 14.2 mb 56.9 mb 10.7 mb 24 mb 96.2 mb 3 tiff 5.28 mb 12.4 mb 48.6 mb 10.7 mb 24 mb 96.2 mb 4 jpg 459 kb 336 kb 1.28 mb 277 kb 629 kb 2.70 mb 5 gif 507 kb 1.20 mb 4.75 mb From the table 4.2 and 4.3, it is found that the file size of the same document scanned in two different scanners saved in different file formats in same resolution is different. The file sizes of the scanned image increase when the documents are scanned using different resolution. Higher the resolution used in scanned, greater is the file size. The qualities of the scanned images are found to be good in higher resolution. 4.8 Summing Up Digitization has many sides to be dealt with from scan area, resolution to file formats of storing. Selection of hardware and software is also a factor of successful digitization project. The university libraries can opt for either in-house or outsourcing process to digitize their rich collection. The university libraries can approach institutes like CDAC- Noida, CDAC-Pune, IIIT Allahabad, Indira Gandhi National Centre for the Arts, New Delhi to provide necessary infrastructure and manpower for digitization of their valuable and rare documents; provided the conditions laid down by the respective bodies are acceptable by the university libraries. 77