BINARIZATION TECHNIQUE USED FOR RECOVERING DEGRADED DOCUMENT IMAGES

Similar documents
Recovery of badly degraded Document images using Binarization Technique

Robust Document Image Binarization Techniques

An Analysis of Image Denoising and Restoration of Handwritten Degraded Document Images

Image binarization techniques for degraded document images: A review

Robust Document Image Binarization Technique for Degraded Document Images

Binarization of Historical Document Images Using the Local Maximum and Minimum

An Improved Binarization Method for Degraded Document Seema Pardhi 1, Dr. G. U. Kharat 2

Efficient Document Image Binarization for Degraded Document Images using MDBUTMF and BiTA

A Robust Document Image Binarization Technique for Degraded Document Images

PHASE PRESERVING DENOISING AND BINARIZATION OF ANCIENT DOCUMENT IMAGE

Document Recovery from Degraded Images

Er. Varun Kumar 1, Ms.Navdeep Kaur 2, Er.Vikas 3. IJRASET 2015: All Rights are Reserved

[More* et al., 5(8): August, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116


IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

Contrast adaptive binarization of low quality document images

Document Image Binarization Technique For Enhancement of Degraded Historical Document Images

Neighborhood Window Pixeling for Document Image Enhancement

Quantitative Analysis of Local Adaptive Thresholding Techniques

Chapter 6. [6]Preprocessing

An Improved Bernsen Algorithm Approaches For License Plate Recognition

Preprocessing and Segregating Offline Gujarati Handwritten Datasheet for Character Recognition

Effect of Ground Truth on Image Binarization

Removal of High Density Salt and Pepper Noise through Modified Decision based Un Symmetric Trimmed Median Filter

Automatic Licenses Plate Recognition System

Enhanced Binarization Technique And Recognising Characters From Historical Degraded Documents

MAJORITY VOTING IMAGE BINARIZATION

A Novel Method for Enhancing Satellite & Land Survey Images Using Color Filter Array Interpolation Technique (CFA)

Remove Noise and Reduce Blurry Effect From Degraded Document Images Using MATLAB Algorithm

Restoration of Degraded Historical Document Image 1

Extraction of Newspaper Headlines from Microfilm for Automatic Indexing

An Analysis of Binarization Ground Truthing

NON UNIFORM BACKGROUND REMOVAL FOR PARTICLE ANALYSIS BASED ON MORPHOLOGICAL STRUCTURING ELEMENT:

Improving the Quality of Degraded Document Images

Colored Rubber Stamp Removal from Document Images

Keywords: Image segmentation, pixels, threshold, histograms, MATLAB

` Jurnal Teknologi IDENTIFICATION OF MOST SUITABLE BINARISATION METHODS FOR ACEHNESE ANCIENT MANUSCRIPTS RESTORATION SOFTWARE USER GUIDE.

Image Restoration and De-Blurring Using Various Algorithms Navdeep Kaur

MAV-ID card processing using camera images

Automatic Enhancement and Binarization of Degraded Document Images

Study and Analysis of various preprocessing approaches to enhance Offline Handwritten Gujarati Numerals for feature extraction

Method for Real Time Text Extraction of Digital Manga Comic

A new seal verification for Chinese color seal

Implementation of Block based Mean and Median Filter for Removal of Salt and Pepper Noise

Extraction and Recognition of Text From Digital English Comic Image Using Median Filter

EFFICIENT CONTRAST ENHANCEMENT USING GAMMA CORRECTION WITH MULTILEVEL THRESHOLDING AND PROBABILITY BASED ENTROPY

Number Plate Recognition Using Segmentation

International Conference on Computer, Communication, Control and Information Technology (C 3 IT 2009) Paper Code: DSIP-024

An Efficient Color Image Segmentation using Edge Detection and Thresholding Methods

Keywords Fuzzy Logic, ANN, Histogram Equalization, Spatial Averaging, High Boost filtering, MSE, RMSE, SNR, PSNR.

An Approach for Reconstructed Color Image Segmentation using Edge Detection and Threshold Methods

Contrast Enhancement Techniques using Histogram Equalization: A Survey

A Survey Based on Region Based Segmentation

Image Enhancement using Histogram Equalization and Spatial Filtering

COLOR IMAGE SEGMENTATION USING K-MEANS CLASSIFICATION ON RGB HISTOGRAM SADIA BASAR, AWAIS ADNAN, NAILA HABIB KHAN, SHAHAB HAIDER

CoE4TN4 Image Processing. Chapter 3: Intensity Transformation and Spatial Filtering

Fig 1: Error Diffusion halftoning method

Compression Method for Handwritten Document Images in Devnagri Script

Contrast Enhancement Using Bi-Histogram Equalization With Brightness Preservation

Main Subject Detection of Image by Cropping Specific Sharp Area

License Plate Localisation based on Morphological Operations

A New Character Segmentation Approach for Off-Line Cursive Handwritten Words

ABSTRACT I. INTRODUCTION

An Algorithm and Implementation for Image Segmentation

Automatic Detection Of Optic Disc From Retinal Images. S.Sherly Renat et al.,

An Effective Method for Removing Scratches and Restoring Low -Quality QR Code Images

Fig 1 Complete Process of Image Binarization Through OCR 2016, IJARCSSE All Rights Reserved Page 213

An Efficient Method for Landscape Image Classification and Matching Based on MPEG-7 Descriptors

Anna University, Chennai B.E./B.TECH DEGREE EXAMINATION, MAY/JUNE 2013 Seventh Semester

Paper Sobel Operated Edge Detection Scheme using Image Processing for Detection of Metal Cracks

Implementing Morphological Operators for Edge Detection on 3D Biomedical Images

International Journal of Advance Engineering and Research Development

Interpolation of CFA Color Images with Hybrid Image Denoising

Automatic Morphological Segmentation and Region Growing Method of Diagnosing Medical Images

Adaptive Feature Analysis Based SAR Image Classification

8.2 IMAGE PROCESSING VERSUS IMAGE ANALYSIS Image processing: The collection of routines and

Open Access An Improved Character Recognition Algorithm for License Plate Based on BP Neural Network

ICFHR2014 Competition on Handwritten Document Image Binarization (H-DIBCO 2014)

Automatic Segmentation of Fiber Cross Sections by Dual Thresholding

Restoration of Motion Blurred Document Images

Keyword: Morphological operation, template matching, license plate localization, character recognition.

FPGA implementation of DWT for Audio Watermarking Application

An Efficient DTBDM in VLSI for the Removal of Salt-and-Pepper Noise in Images Using Median filter

OTSU Guided Adaptive Binarization of CAPTCHA Image using Gamma Correction

Carmen Alonso Montes 23rd-27th November 2015

Accurate, Swift and Noiseless Image Binarization

A SURVEY ON HAND GESTURE RECOGNITION

VEHICLE LICENSE PLATE DETECTION ALGORITHM BASED ON STATISTICAL CHARACTERISTICS IN HSI COLOR MODEL

A Review of Optical Character Recognition System for Recognition of Printed Text

Real Time Word to Picture Translation for Chinese Restaurant Menus

A Proficient Roi Segmentation with Denoising and Resolution Enhancement

A New Framework for Color Image Segmentation Using Watershed Algorithm

PERFORMANCE ANALYSIS OF LINEAR AND NON LINEAR FILTERS FOR IMAGE DE NOISING

Linear Gaussian Method to Detect Blurry Digital Images using SIFT

Performance Evaluation of Edge Detection Techniques for Square Pixel and Hexagon Pixel images

A Fast Median Filter Using Decision Based Switching Filter & DCT Compression

][ R G [ Q] Y =[ a b c. d e f. g h I

Multispectral Image Restoration of Historical Document Images

Effective and Efficient Fingerprint Image Postprocessing

Contrast Enhancement Based Reversible Image Data Hiding

Transcription:

BINARIZATION TECHNIQUE USED FOR RECOVERING DEGRADED DOCUMENT IMAGES Miss. Nikita Mote SCSMCOE, Ahmednagar, India Miss. Shital Avhad SCSMCOE, Ahmednagar, India Miss. Sonali Jangale SCSMCOE, Ahmednagar, India Abstract :Document Image binarization converts an acquired gray-scale document image to binary format, the objective of binarization is to automatically choose a threshold that separates the foreground and background information. Document image binarization is a process that is usually carried out in the pre-processing stage of document image processing. Primary aim of this document image binarization is to extract the foreground text from the document background. In the case of degraded document images this text extraction or segmentation is a difficult task. In this paper, we propose a simple and efficient document image binarization technique it makes use of the adaptive image contrast and some of the noise reduction methods. In the proposed technique, first input degraded document image is normalized to improve the quality of output binarized document image. Second, an adaptive image contrast map is constructed for the normalized image.third, adaptive image contrast map is binarized and combined with Canny s edge map to identify the text stroke edge pixels. Then the document text is segmented by a local threshold that is estimated based on the intensities of the detected text stroke edge pixels. Finally, the output document image is filtered to reduce noise. The proposed method requires only minimum number of parameters. This method shows superior performance over various datasets interms of various performance measures. In recent years,the field of document image processing has increasingly widespread applicability and powerful growth.documentimage Binarization is usually performed in the preprocessing stage of Document image processing. Frequently,binarization is used as a preprocessor before Optical Character Recognition (OCR). Image binarization converts an image of up to 256 gray levels to a black and white image Document Image Binarization converts a grayscale document image into binary document image. The main of this document image binarization technique is to segment or extract foreground text from the document background. In the case of degraded document images this foreground text extraction is a challenging task due to variations in the document image properties. By degradations we mean every sort of less-than ideal properties of a real document image, example coarsening of document image, ink or toner drop-outs, smear, thinning and thickening, geometric deformations etc. Keywords-Degradation,Equations, Histograms, Image edge detection,image segmentation. 1.Introduction Figure 1.Input Image Imperial Journal of Interdisciplinary Research (IJIR) Page 40

Handwritten text within document images also shows some variations in stroke width, stoke connection etc. In addition historical document images are often degraded by bleed-trough.many document image binarization techniques have been reported for binarization of degraded document images.in the year 1998, a recursive thresholding thresholding technique for image segmentation has been proposed.this approach is only applicable to gray-scale images specifically for real-life bank checks. Performance analysis indicates that this method is more efficient to segment darkest object in a given image. An Iterative multimodel subimage binarization technique has been proposed for handwritten document images in the year 2004.This approach can be used for different type of handwritten document images where we do not do not have prior knowledge about noisiness of document image.in the year 2005, an image binarization technique has been proposed for degraded historical document images.this approach is mainly based on a decompose algorithm. Main drawback of this approach is that the algorithm does not works well on document images with big pattern or pictures. In order to give best results on heavily degraded document images a document image binarization technique using Markov field model has been proposed.this method is more effective to detect text than other local thresholding methods. An improved document image binarization technique has been proposed in the year 2008.This method is mainly based on the combination of different document image binarization technique and efficient edge information about gray scale images.a document image binarization using background estimation and stroke edges has been proposed in the year 2010.The proposed document thresholding method still has several limitations. One of the drawback is that the proposed technique is worked for the binarization of scanned document images that have no or weak slanting.another approach for document image binarization is using local maximum and minimum filter.we attempt to create vigorous and productive report picture binarization methods which have the capacity to handle great effects for severely debased archive pictures.generally,they can be classified into three major types:global binarization, local binarization and hybrid binarization methods. 1.1 Global Binarization The global thresholding technique computes an optimal threshold for the entire image;these techniques need few computations and can work well in simple cases. But fails in complex backgrounds, such as non-uniform color and poor illuminated backgrounds. These methods are usually not suitable for degraded document images, separates foreground text and background. because they do not have a clear pattern. 1.2 Local Binarization The local binarization techniques set different thresholds for different target pixels depending on their neighbourhood/local information. Generally,these techniques are sensitive to background noises due to large variance in case of a poor illuminated document or bleed-through degradation. 1.3 Hybrid Binarization Hybrid binarization approach combines global and local thresholding. A first step consists in carrying out a global thresholding to classify a part of the background of the document image and keep only the part containing the foreground (graphics or text in our case).a second step aims to refine the image obtained by the previous step in order to obtain a sharper result by applying an adaptive thresholding technique. 1.4 Dynamic Threshold Binarization Dynamic Threshold Binarization such as iteration method defines the threshold of a pixel with the grey-level values of its own and neighbouring pixels and the coordinate of the pixel.this binarization method is commonly used for the bad quality images,especially the images with single peak histogram. However, owing to the dynamic threshold calculation, the method has high computation complexity and slow speed. 2. Related Work Imperial Journal of Interdisciplinary Research (IJIR) Page 41

Many thresholding techniques have been reported for document image binarization.as many degraded documents do not have a clear bimodal pattern, global thresholding is usually not a suitable approach for the degraded document binarization. Adaptive thresholding, which estimates a local threshold for each document image pixel, is often a better approach to deal with different variations within degraded document images.for example, the early window-based adaptive thresholding techniques, estimate the local threshold by using the mean and the standard variation of image pixels within a local neighborhood window. The main drawback of these window-based thresholding techniques is that the thresholding performance depends heavily on the window size and hence the character stroke width.other approaches have also been reported, including background subtraction, texture analysis, recursive method, decomposition method, contour completion, Markov Random Field, matched wavelet,cross section sequence graph analysis, selflearning, Laplacian energy user assistance, and combination of binarization techniques.these methods combine different types of image information and domain knowledge and are often complex. The local image contrast and the local image gradient are very useful features for segmenting the text from the document background because the document text usually has certain image contrast to the neighboring document background. They are very effective and have been used in many document image binarization techniques. In previous paper, the local contrast is defined as follows: C(x,y ) = Imax(x,y ) Imin(x,y )(1) where C(x,y ) denotes the contrast of an image pixel(x,y),imax(x,y) and Imin(x,y ) denote the maximum and minimum intensities within a local neighborhood windows of (x,y ),respectively. If the local contrast C(x,y ) is smaller than a threshold, the pixel is set as background directly. Otherwise it will be classified into text or background by comparing with the mean of Imax(x,y) and Imin(x,y). Bernsen s method is simple, but cannot work properly on degraded document images with a complex document background. We have earlier proposed a novel document image binarization method by using the local image contrast. 3. Literature Review Abdenour Sehad et al.(2013),has present a capable scheme for binarization of ancient and degraded document images, grounded on texture qualities.the suggested technique is an adaptive threshold-based. It has been calculated by using a descriptor centred on a co-occurrence matrix and the scheme is verified objectively,on DIBCO dataset degraded documents furthermore subjectively, utilizing a set of ancient degraded documents offered by a national library. The outcomes are acceptable and assuring, present an improvement to classical approaches.konstantinos Ntirogiannis et al.(2013),has analysed that document image binarization is of incredible value in the document image examination and recognition pipeline as it disturbs further phases of the recognition procedure.the assessment of a binarization technique helps in examining its algorithmic conduct,and also confirming its adequacy, by giving qualitative and quantitative sign of its execution.a pixel-based binarization assessment approach for recorded hand written/machine-printed document image has been proposed..in the proposed assessment procedure, the review and accuracy assessment measures are fittingly adjusted utilizing a weighting plan that decreases any potential assessment unfairness.extra execution measurements of the proposed assessment plan comprise of the rate rates of broken and missed content, false alerts,foundation commotion, character amplification, and combining.djamel GACEB et al. (2013),has studied a smartbinarization technique of the images.in this technique,considered different degradations document images.the nature of every pixel is approximate using a hierarchical local thresholding in order to classify it as foreground, background or ambiguous pixel. The ambiguous pixels that represent the corrupted zones cannot be binarized with the same local thresholding. The global quality of the image is estimated from the density of theses degraded pixels. If image is degraded then apply a second separation on the ambiguous pixels to split them into background or foreground. Second process uses our improved relaxation method.marian Wagdy et al. (2013), has implemented a quick and proficient document image clean up and binarization technique depend on retinex hypo Imperial Journal of Interdisciplinary Research (IJIR) Page 42

thesis and global thresholding.this technique joins of local and global thresholding with concept of retinex theory which can efficiently improve the degraded and poor quality document image.then, quick global threshold is utilized to change over the document image into binary form. The new method conquers the limitations of the related global threshold techniques.vassilis Papavassiliou et al.(2012),has discussed an capable technique dependent upon mathematical morphology for extracting text regions from degraded document images.the fundamental stages of methodology area) top-hat-by-reconstruction to construct a filtered image with sensible background) region growing beginning from a set of seed points and attaching to each seed similar intensity neighbour pixels and conditional extension of the first detected text regions based on the values of the second derivative of the filtered image. bleed-through, etc. To ex-tract only the stroke edges properly, the image gradient needs to be normalized to compensate the image variation within the document background.for the image pixels within dark regions, it will produce a small denominator and ac-cordingly result in a relatively high image contrast. How-ever, the image contrast has one typical limitation that it may not handle document images with the bright text properly 4. Praposed Methodology This section describes the proposed document image binarization techniques. Given a degraded document image, an inversion contrast map is first constructed and the text stroke edges are then detected through the grayscale conversion of contrast image. The text is then segmented based on the local threshold that is estimated from the detected text stroke edge pixels. Some postprocessing is further applied to improve the document binarization quality. 4.1 Construction of contrast image Primary aim of the contrast image construction is to detect text stroke edge pixels properly. In prior to the construction of adaptive image contrast map for the input degraded document image input image is normalized to improve the quality of output binarized image. Adaptive image contrast is a combination of local image contrast and local image gradient.the image gradient has been widely used for edge de-tection and it can be used to detect the text stroke edges of the document images effectively that have a uniform document background. On the other hand, it often detects many non-stroke edges from the background of degraded document that often contains certain image variations due to noise, uneven lighting, Figure 2.Block diagram of Praposed System.4.2 Detection of text stroke edge pixels We can extract the foreground text from the document back-ground once the high contrast edge pixels are detected prop-erly.text Stroke edge pixels can be detected easily by using, previously constructed contrast image. Adaptive image contrast computed at the text stroke is considerably higher than that computed within document background. Contrast map is then binarized using a global thresholding method which can extract the Imperial Journal of Interdisciplinary Research (IJIR) Page 43

stroke edge pixels properly.the purpose of the contrast image construction is to detectthe stroke edge pixels of the document text properly. The constructed contrast image has a clear bimodal pattern, where the inversion image contrast computed at text stroke edges is obviously larger than that computed within the document background. 4.3 Estimation of Local Threshold After high contrast text stroke edge pixels are detected properly, we can segment the foreground text from the document background by a local threshold that is estimated based on the intensities of the detected text stroke edge pixels.if we analyze different kinds of document images we can observe that the text pixels are close to the detected text stroke edge pixels and there is a distinct intensity difference between the high contrast stroke edge pixels and the surrounding background pixels. 4.4 Post-Processing Once the initial binarization result is derived from Equation as described in previous subsections, the binarization result can be further improved by incorporating certain domain knowledge,the isolated foreground pixels that do not connect with other foreground pixels are filtered out to make the edge pixel set precisely.second, the neighborhood pixel pair that lies on symmetric sides of a text stroke edge pixel should belong to different classes.one pixel of the pixel pair is therefore labeled to the other category if both of the two pixels belong to the same class.finally, some single-pixel artifacts along the text stroke boundaries are filtered out by using several logical operators. 5. Acknowledgement We would like to take this opportunity to express our profound gratitude and deep regard to my Project Guide Prof-Deepa Agale, for his exemplary guidance, valuable feedback and constant encouragement throughout the duration of the project.his valuable suggestions were of immense help throughout my project work. His perceptive criticism kept me working to make this project in a much better way. Working under him was an extremely knowledgeable experience for me. 6. Conclusion This paper presents an adaptive image contrast based document image binarization technique that is tolerant to different types of document degradation such as uneven illumination and document smear. The proposed technique is simple and robust, only few parameters are involved.moreover, it works for different kinds of degraded document images.the proposed technique makes use of the local image contrast that is evaluated based on the local maximum and minimum.the proposed method has been tested on the various datasets. 7. Refference [1] O. D. Trier and T. Taxt, Evaluation of binarization methods for document images, IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 3, pp. 312-315, Mar. 1995. [2] J. Kittler and J. Illingworth, On threshold selection using clustering criteria, IEEE Trans. Syst., Man, Cybern., vol. 15, no. 5, pp. 652-655,Sep.-Oct. 1985. [3] G. Leedham, C. Yan, K. Takru, J. Hadi, N. Tan, and L. Mian, Comparison of some thresholding algorithms for text/background segmentation indifficult document images, in Proc. Int. Conf. Document Anal. Recognit.,vol. 13. 2003, pp. 859-864. [4] Bolan Su, Shijian Lu, and Chew Lim Tan, Senior Member, IEEE, RobustDocument Image Binarization Technique for Degraded Document Images,IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 4,APRIL 2013 [5] B. Gatos, K. Ntirogiannis, and I. Pratikakis, ICDAR 2009 document image binarization contest (DIBCO 2009), in Proc. Int. Conf. Document Anal.Recognit., Jul. 2009, pp. 1375-1382 [6] I. Pratikakis, B. Gatos, and K. Ntirogiannis, ICDAR 2011 document image binarization contest (DIBCO 2011), in Proc. Int. Conf. Document Anal. Recognit., Sep. 2011, pp. 1506-1510. [7] I. Pratikakis, B. Gatos, and K. Ntirogiannis, H- DIBCO 2010 handwritten document image binarization competition, in Proc. Int. Conf. Frontiers Handwrit. Recognit., Nov. 2010, pp. 727-732. [8] S. Lu, B. Su, and C. L. Tan, Document image binarization using background estimation and stroke Imperial Journal of Interdisciplinary Research (IJIR) Page 44

edges, Int. J. Document Anal. Recognit.,vol. 13, no. 4, pp. 303-314, Dec. 2010. [9] B. Su, S. Lu, and C. L. Tan, Binarization of historical handwritten document images using local maximum and minimum filter, in Proc. Int.Workshop Document Anal. Syst., Jun. 2010, pp. 159-166. [10] M. Sezgin and B. Sankur, Survey over image thresholding techniques and quantitative performance evaluation, J. Electron. Imag., vol. 13, no. 1, pp.146-165, Jan. 2004.. Imperial Journal of Interdisciplinary Research (IJIR) Page 45