Multilevel Rendering of Document Images

Multilevel Rendering of Document Images ANDREAS SAVAKIS Department of Computer Engineering Rochester Institute of Technology Rochester, New York, 14623 USA http://www.rit.edu/~axseec Abstract: Rendering document images for scanning and printing applications typically involves binarization via adaptive thresholding, halftoning, or color dropout. While bitonal rendering is often adequate, there are cases where multi-level rendering is required to capture important image characteristics. In this paper, we present two methods for multilevel rendering of document images. The first method involves adaptive multilevel thresholding of gray scale images based on background tracking. The second method presents color form dropout using color quantization. Both methods are based on a computationally efficient version of the K-means algorithm. The selection of thresholding, halftoning or color dropout depends on the document type and can be applied to the whole image or to various image regions, as determined by a document categorization and segmentation module. Key-Words: Multilevel rendering, color dropout, adaptive thresholding. 1. Introduction High-speed scanners that are currently used in production scanning of document images typically process thousands of document images daily. The high processing volume does not allow setting scanning parameters for individual images, and performance requirements dictate the use of dedicated hardware for image processing. This requires algorithms that are not only effective, but also lend themselves to realtime implementation. Document images are captured in color or gray scale form by a linear array charge-coupled device and are converted to bitonal output images that are compressed and stored. In many cases, documents consist of text or line graphics on a relatively uniform background, thus, converting them to binary form is suitable for output and storage, because it significantly reduces file size and transfer bandwidth requirements without loss of important document information. Image binarization algorithms, such as dithering and error diffusion [1-3], are applicable to pictorial images, because they are designed to effectively represent shades of gray. However, they are not optimal for processing document images, since they produce broken characters that are difficult to interpret. Thresholding algorithms [4-8] are better suited for separating characters and graphics from backgrounds, and often enhance the appearance of low contrast text. Thresholding methods are broadly categorized in techniques employing the histogram, moment preserving, entropy coding, and locally adaptive methods [9-11]. When performing character recognition on color forms, it is desirable to eliminate the color background and lines that are part of the form, and keep only the textual information that is of relevance. Color dropout accomplishes this by converting the scanned color document to a binary image where the text colors are turned to black while the form colors and background are turned to white. [12-15]. Document halftoning, thresholding and color dropout can all be options in a document scanning or printing system, as shown in Fig.1. The selection of the appropriate method for processing is based on the results obtained from the classification and segmentation module, where the image type is determined based on properties such as color content, edge

information, and pictorial vs. textual and graphic region classification. Depending on the classification and segmentation results, a suitable rendering method may be applied for the whole image, or different rendering methods may be applied to different regions. Bitonal image rendering provides good results, but in some cases it cannot sufficiently capture important image characteristics. To address this issue, methods for multilevel halftoning and thresholding have been developed [16-22]. In addition, as printers become capable of printing multiple levels, multilevel rendering becomes more attractive. In this paper, two algorithms for multilevel document image thresholding are presented that are based on a computationally efficient variant of the K-means algorithm. The first algorithm is a multilevel adaptive algorithm based on background tracking and is presented in Section 2. The second algorithm is a multilevel color dropout algorithm based on color quantization and is presented in Section 3. 2. Multilevel Thresholding Multilevel thresholding (multithresholding) is particularly suitable for representing text and graphics on gray or colored background. The method presented here is based on an extension the foreground/background tracking technique discussed in [11]. In the multilevel foreground and background clustering (MFBC) approach, each pixel is assigned to one of several clusters, where each cluster represents a background or foreground level. The relative intensity values of the foreground and background clusters is not important. Pixel clustering is based on a variant of the K-means algorithm due to McQueen [11,23], where the cluster means are updated each time a data point is assigned to a cluster. Since multilevel thresholding is desirable, the number of foreground and background clusters is K>2. The following steps describe the way this approach works: Step 1. Region selection Divide the document into all inclusive mutually exclusive subregions. Select the document subregion for which the threshold will be computed, and a region containing the subregion that will be used to determine the threshold subregion. For example, the region may consist of N contiguous scanlines, where the subregion is the center M scanlines, with M<N. Step 2. Initialization Initialize the cluster means to be the same as the computed results for the previous subregion. If there is no previous subregion, set the initial cluster means with a large separations between them. For each pixel inside the region, iterate between steps 3 and 4: Step 3. Pixel Assignment Assign each pixel to the nearest cluster. Step 4. Cluster Mean Update After each new pixel assignment update the relevant cluster mean. Step 5. Threshold Calculation After all pixels in the region have been assigned, set the thresholds for the subregion equal to the average between cluster means. The size of the region and subregion determine the amount of memory that is necessary, the speed of processing, and the adaptivity to local image variations. 3. Multilevel Color Dropout Color forms constitute a large number of documents that are scanned using high-speed scanners. Typical documents of this type include medical forms, insurance forms, census forms, etc. When performing character recognition on these forms, it is desirable to suppress the color background and lines that are part of the form, and store only the entered textual information of interest. The purpose of color dropout is to convert the scanned color document to a binary image where the form background colors are turned to white and the text colors are turned to black. To accomplish this we need to distinguish between the colors of the background and the colors of the entered text. Color dropout may be viewed as a form of color image rendering, since the image is converted from a full-color form to black and white or indexed color image.

Color dropout may be accomplished using optical or digital methods. Recent work in [15] presented a method for color dropout in YC bc r Luminance/Chrominance space, that is designed to operate in a fully automatic environment and is implemented in hardware. The basic assumption is to associate the ink colors with darker colors, such as black and dark blue, and treat lighter colors as part of the document background. During processing, the dark (nondropout) colors are converted to black, while all other dropout colors are converted to white. 3.1 Multilevel Color Dropout using Color Quantization The objective of an adaptive color dropout approach is to adjust the dropout parameters so that they reflect the colors that are present in the document. To accomplish this, it is essential to scan the entire document before setting the filter parameters. This means that the entire image should be buffered so that two passes take place, one for determining the form colors and setting the filter parameters and one for performing color dropout. The operations that take place include a color space transformation to YC bc r color space followed by color quantization. Color quantization is more effective in YC bc r, because it is more uniform than RGB color space. The transformation involves a matrix multiplication: Y C C b r 0.257 = 0.439 0.148 0.504 0.368 0.291 0.098 R 16 0.071 + G 128 0.439 B 128 Color quantization is accomplished by clustering the image colors using a variant of the K-means algorithm [23]: I. Initialize the centers of the color clusters. Initialization can be done automatically or by having the user specify the dropout and nondropout colors. Automatic initialization uses 8 clusters: black, white, red, green, blue, cyan, magenta and yellow. II. For each pixel do the following: II.a Assign the pixel to the cluster with the closest mean color. II.b Update the mean color of the cluster where the pixel was assigned. III. When all of the image pixels are assigned to a cluster, either repeat step II or exit. After color quantization, pixels assigned to non-dropout clusters are turned black and pixels assigned to dropout clusters are turned white, if they are background, or levels of gray if they are part of the form. In no user input is available, the pixels associated with the black and blue clusters are turned black, while all other pixels are considered dropout colors. This approach is adaptive, because the cluster centers are adapted to the colors of the image pixels. However, it requires more computations than the previous methods [25] and at least two passes through the image, one pass for color quantization and one for color dropout. 4. Results and Conclusions Multilevel thresholding using foreground background tracking works well and provides a more faithful representation of the original gray scale image, because it uses more levels to represent it, and it is less likely to miss low contrast characters. The drawback of multilevel thresholding is that it requires more bits per pixel, which results in a larger overall file size. Additionally, there are some practical issues associated with images that M bits per pixel and 2<M<8. First there are few file formats available that can handle bits per pixel other than one or eight. This may be overcome by selecting M=4 which may be implemented using the TIFF file format, or by using a proprietary file format. Secondly, multibit images are less efficient to compress than binary images, because the less significant bits contain detail information. However, with increasing availability of low cost memory, storage issues are less of a problem. Multilevel color dropout also provides several advantages compared to traditional black and white color dropout. First the textual information of interest is enhanced, because it is

rendered black, while the background color, that may reduce the text contrast, is suppressed or reduced in contrast. In addition, the removal of the form lines minimizes interference with the text characters, and may reduce errors during character recognition. Another advantage is that the uncompressed file size is reduced and may significantly reduce the storage requirements for the resulting document files. It should be noted that bitonal color dropout involves some risk in cases where the full range of the non-dropout ink colors is not known. This risk is reduced when multilevel color dropout is employed. To further reduce the potential loss of information, a semiautomatic approach can be adopted, where additional dropout and non-dropout colors are interactively specified before processing the forms. References [1] R. Ulichney, Digital Halftoning, MIT Press, Cambridge MA, 1987. [2] X. Kang, Digital Color Halftoning, Wiley- IEEE Press, 1999. [3] D. Lau and G. Arce, Modern Digital Halftoning, Marcel Dekker, 2001. [4] J. Weszka and A. Rosenfeld, Threshold Evaluation Techniques, IEEE Trans. Systems Man and Cybernetics, pp. 622-629, 1978. [5] P. Palumbo, P. Swaminathan, and S. Srihari, Document Image Binarization: Evaluation of Algorithms, SPIE Applications of Digital Image Processing IX, vol. 697, pp. 278-285, 1986. [6] P.K. Sahoo, S. Soltani, A.K.C. Wong, and Y.C. Chen, A survey of thresholding techniques, Computer Vision, Graphics and Image Processing, vol. 41, pp. 233-260, 1988. [7] O.D. Trier and A. Jain, Goal-directed evaluation of binarization methods, IEEE Trans. PAMI, pp. 1191-1201, Dec. 1995. [8] A.T. Abak, U. Baris, and B. Sankur, The Performance Evaluation of Thresholding Algorithms for Optical Character Recognition, pp. 697-700, ICDAR 97, Ulm, Germany, 1997. [9] J.D. Yang, Y.S. Chen, and W.H. Hsu, Adaptive thresholding algorithm and its hardware implementation, Pattern Recognition Letters, pp. 141-150, 1994. [10] J. Sauvola, T. Seppanen, S. Haapakoski, and M. Pietikainen, Adaptive Document Binarization, pp. 147-152, ICDAR 97, Ulm, Germany, 1997. [11] A. Savakis, "Adaptive Document Image Thresholding using Foreground and Background Clustering," ICIP 98, Chicago, 1998. [12] P. Rudak, Automatic Detection and Selection of a dropout color using zone calibration in conjunction with optical character recognition of preprinted forms, US Patent 5014329, 1991. [13] Y. Murai and T. Amagai, Image processing apparatus with function of extracting visual information from region printed in dropout color on sheet, US Patent 5,664,031, 1997. [14] B. Yu and A. Jain, "A Generic System for Form Dropout," IEEE Trans. PAMI, 1998. [15] A. Savakis and C. Brown, Document processing for automatic color dropout, SPIE Conference on Applications of Digital Image Processing, San Diego CA, July 2001. [16] R. S. Gentile, E. Walowit, and J. P. Allebach, "Quantization and Multilevel Halftoning of Color Images for Near Original Image Quality" J. Opt. Soc. Am. A., Vol. 7, pp. 1019-1026, June 1990 [17] A new criterion for automatic multilevel thresholding Jui-Cheng Yen; Fu-Juay Chang; Shyang Chang, IEEE Trans Image Proc., Vol. 4, pp. 370-378, 1995 [18] L. Hertz, R.W. Schafer, Multilevel Thresholding Using Edge Matching, Computer Vision Graphics and Image Processing, 44 (1988) 279-295. [19] Digital Multitoning with Overmodulation for Smooth Texture Transition, Yu Q, Parker KJ, Spaulding K,

and Miller R, Journal of Electronic Imaging, Vol. 8, No. 3, pp. 311-321, July 1999. [20] J. L. Mitchell, G. Thompson, C. W. Wu, T. J. Trenary and Y. Qiao, ``Multilevel color halftoning,'' The 9th Color Imaging Conference, Scottsdale, AZ, 2001. [21] J. R. Goldschneider, E. A. Riskin and P. W. Wong, ``Embedded multilevel error diffusion,'' IEEE Transactions on Image Processing, vol. 6, pp. 956-964, July 1997. [22] U.S. Patent US6323956, Adaptive quantization of grayscale images, Eastman Kodak Company, issued November 2001. [23] A. Jain, and R. Dubes, Algorithms for Clustering Data, pp. 96-101, Prentice Hall, 1988. Document Image Halftoning Thresholding Method Selection Module Rendered Image Color Dropout Classification Segmentation Module Figure 1. Image Rendering Block Diagram