An Improved Binarization Method for Degraded Document Seema Pardhi 1, Dr. G. U. Kharat 2

Similar documents
[More* et al., 5(8): August, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Image binarization techniques for degraded document images: A review

Contrast adaptive binarization of low quality document images

Robust Document Image Binarization Techniques

Recovery of badly degraded Document images using Binarization Technique

Efficient Document Image Binarization for Degraded Document Images using MDBUTMF and BiTA


Quantitative Analysis of Local Adaptive Thresholding Techniques

Binarization of Historical Document Images Using the Local Maximum and Minimum

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

Robust Document Image Binarization Technique for Degraded Document Images

PHASE PRESERVING DENOISING AND BINARIZATION OF ANCIENT DOCUMENT IMAGE

Document Recovery from Degraded Images

An Analysis of Image Denoising and Restoration of Handwritten Degraded Document Images

A Robust Document Image Binarization Technique for Degraded Document Images

BINARIZATION TECHNIQUE USED FOR RECOVERING DEGRADED DOCUMENT IMAGES

MAJORITY VOTING IMAGE BINARIZATION

An Analysis of Binarization Ground Truthing

Er. Varun Kumar 1, Ms.Navdeep Kaur 2, Er.Vikas 3. IJRASET 2015: All Rights are Reserved

Preprocessing and Segregating Offline Gujarati Handwritten Datasheet for Character Recognition

` Jurnal Teknologi IDENTIFICATION OF MOST SUITABLE BINARISATION METHODS FOR ACEHNESE ANCIENT MANUSCRIPTS RESTORATION SOFTWARE USER GUIDE.

Automatic Licenses Plate Recognition System

Adaptive Feature Analysis Based SAR Image Classification

International Conference on Computer, Communication, Control and Information Technology (C 3 IT 2009) Paper Code: DSIP-024

Improving the Quality of Degraded Document Images

An Improved Bernsen Algorithm Approaches For License Plate Recognition

Extraction of Newspaper Headlines from Microfilm for Automatic Indexing

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

Keywords: Image segmentation, pixels, threshold, histograms, MATLAB

Keywords Fuzzy Logic, ANN, Histogram Equalization, Spatial Averaging, High Boost filtering, MSE, RMSE, SNR, PSNR.

Segmentation of Fingerprint Images

Implementation of global and local thresholding algorithms in image segmentation of coloured prints

An Efficient Color Image Segmentation using Edge Detection and Thresholding Methods

Neighborhood Window Pixeling for Document Image Enhancement

Segmentation of Fingerprint Images Using Linear Classifier

Noise Removal and Binarization of Scanned Document Images Using Clustering of Features

An Efficient Method for Landscape Image Classification and Matching Based on MPEG-7 Descriptors

Accurate, Swift and Noiseless Image Binarization

Urban Feature Classification Technique from RGB Data using Sequential Methods

OTSU Guided Adaptive Binarization of CAPTCHA Image using Gamma Correction

DENSE-CLUSTER BASED VOTING APPROACH FOR LICENSE PLATE IDENTIFICATION

Multispectral Image Restoration of Historical Document Images

Chapter 6. [6]Preprocessing

Keyword: Morphological operation, template matching, license plate localization, character recognition.

Document Image Binarization Technique For Enhancement of Degraded Historical Document Images

International Journal of Advanced Research in Computer Science and Software Engineering

Malaysian Car Number Plate Detection System Based on Template Matching and Colour Information

ICFHR2014 Competition on Handwritten Document Image Binarization (H-DIBCO 2014)

Color Image Segmentation Using K-Means Clustering and Otsu s Adaptive Thresholding

Implementation of Barcode Localization Technique using Morphological Operations

An Effective Method for Removing Scratches and Restoring Low -Quality QR Code Images

Brain Tumor Segmentation of MRI Images Using SVM Classifier Abstract: Keywords: INTRODUCTION RELATED WORK A UGC Recommended Journal

INDIAN VEHICLE LICENSE PLATE EXTRACTION AND SEGMENTATION

Study and Analysis of various preprocessing approaches to enhance Offline Handwritten Gujarati Numerals for feature extraction

Remove Noise and Reduce Blurry Effect From Degraded Document Images Using MATLAB Algorithm

AUTOMATED MALARIA PARASITE DETECTION BASED ON IMAGE PROCESSING PROJECT REFERENCE NO.: 38S1511

Automatic Morphological Segmentation and Region Growing Method of Diagnosing Medical Images

Binarization of Color Document Images via Luminance and Saturation Color Features

Segmentation using Saturation Thresholding and its Application in Content-Based Retrieval of Images

both background modeling and foreground classification

Improved SIFT Matching for Image Pairs with a Scale Difference

An Evaluation of Automatic License Plate Recognition Vikas Kotagyale, Prof.S.D.Joshi

ICFHR 2016 Handwritten Document Image Binarization Contest (H-DIBCO 2016)

Comparison of Static Background Segmentation Methods

The Classification of Gun s Type Using Image Recognition Theory

Hybrid Binarization for Restoration of Degraded Historical Document

EFFICIENT CONTRAST ENHANCEMENT USING GAMMA CORRECTION WITH MULTILEVEL THRESHOLDING AND PROBABILITY BASED ENTROPY

Background Pixel Classification for Motion Detection in Video Image Sequences

Enhanced Binarization Technique And Recognising Characters From Historical Degraded Documents

Recognition Of Vehicle Number Plate Using MATLAB

The Study on the Image Thresholding Segmentation Algorithm. Yue Liu, Jia-mei Xue *, Hua Li

Image Forgery Detection Using Svm Classifier

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

A Global-Local Contrast based Image Enhancement Technique based on Local Standard Deviation

COMPARATIVE PERFORMANCE ANALYSIS OF HAND GESTURE RECOGNITION TECHNIQUES

The Research of the Lane Detection Algorithm Base on Vision Sensor

Image Rendering for Digital Fax

Real Time Word to Picture Translation for Chinese Restaurant Menus

Multi-technology Integration Based on Low-contrast Microscopic Image Enhancement

CSC 320 H1S CSC320 Exam Study Guide (Last updated: April 2, 2015) Winter 2015

Student Attendance Monitoring System Via Face Detection and Recognition System

Computer Vision. Howie Choset Introduction to Robotics

Effect of Ground Truth on Image Binarization

COLOR IMAGE SEGMENTATION USING K-MEANS CLASSIFICATION ON RGB HISTOGRAM SADIA BASAR, AWAIS ADNAN, NAILA HABIB KHAN, SHAHAB HAIDER

An Algorithm for Fingerprint Image Postprocessing

Contrast Enhancement using Improved Adaptive Gamma Correction With Weighting Distribution Technique

Parallel Genetic Algorithm Based Thresholding for Image Segmentation

A new seal verification for Chinese color seal

Iris Recognition using Hamming Distance and Fragile Bit Distance

Historical Document Preservation using Image Processing Technique

Improved color image segmentation based on RGB and HSI

VEHICLE LICENSE PLATE DETECTION ALGORITHM BASED ON STATISTICAL CHARACTERISTICS IN HSI COLOR MODEL

Iris Segmentation & Recognition in Unconstrained Environment

Effective and Efficient Fingerprint Image Postprocessing

Removal of Gaussian noise on the image edges using the Prewitt operator and threshold function technical

I. INTRODUCTION. Keywords Image Contrast Enhancement; Fuzzy logic; Fuzzy Hyperbolic Threshold; Intelligent Techniques.

Moving Object Detection for Intelligent Visual Surveillance

An Efficient Method for Vehicle License Plate Detection in Complex Scenes

Enhanced MLP Input-Output Mapping for Degraded Pattern Recognition

A NOVEL APPROACH FOR CHARACTER RECOGNITION OF VEHICLE NUMBER PLATES USING CLASSIFICATION

Transcription:

An Improved Binarization Method for Degraded Document Seema Pardhi 1, Dr. G. U. Kharat 2 1, Student, SPCOE, Department of E&TC Engineering, Dumbarwadi, Otur 2, Professor, SPCOE, Department of E&TC Engineering, Dumbarwadi, Otur Email: seemapardhi07@gmail.com 1 govindukharat@gmail.com 2 Abstract- The document binarization of the degraded document is a challenging task due to variation in background and foreground. In recent years degraded documentation is active research topic. In this paper, the approach for segmentation of text from the degraded document images is presented by using adaptive image contrast. It is a combination of local image contrast and gradient. The performance of the proposed system is evaluated on the online document image binarization contest (DIBCO) 2014 database. This system gives accuracy of 98.65% and f-measure of 99.29% for γ = 1. Index Terms- Degraded document binarization, text stroke edge, local thresholding, canny edge. 1. INTRODUCTION The research areas of document binarization from old, degraded documents attract researchers because of the challenges of segmentation of the foreground text from degraded background. The main aim of the method is to segment out the text from the document background accurately. The binarization technique is technique which convert gray or color image into binary image. In this research area, binarization segment foreground text from document background. But still document binarization is unsolved problem because of variation in the nature of background. Basically, binarization process is of three main types, Local, Global and hybrid. c. Hybrid thresholding Hybrid method is the combination of the local and global thresholding method. It takes the advantages of both the method. The fig. 1 shows the variation in the document in terms of brightness, stroke edges, width, connections and background. In image processing technique mostly the image binarization process is performed in the pre-processing step. A. Local thresholding In this method the image is converted into the subblocks of particular size. It may be a statically or dynamically. Then the threshold value of each block is calculated and according to that local threshold value, the block pixels are converted into black (0) and white (1) image. B. Global Thresholding Global thresholding is unlike the local thresholding method. In this method the single threshold value is calculated from whole image and converts the whole image into binary according to calculated threshold value. Fig.1. Degraded document image examples from DIBCO 2014 database The proposed system is simple, accurate and it has capability of handling any type of degraded documents. 1

The remaining paper is as organized as follows. In next section II, the reviews of different techniques to binarization of degraded document are presented. The proposed binarization technique is presented in section III. Results are discussed in Section IV. Sections V concluded the papers. 2. LITERATURE SURVEY Otsu is the global thresholding method. In this method, first histogram of the grayscale image is calculated and separates the background and foreground clusters by choosing optimal threshold [1]. s. The weighted within-class variance, where the class probabilities of different gray-level pixels are estimated as: σ w 2 (t) = q 1 t σ 1 2 (t) + q 2 t σ 2 2 (t) (1) Niblack [2] present local thresholding method by calculating local mean and local standard deviation. This method is applicable to all kind of images excluding unable to remove unimportant details. This system failed when image contain large amount of noise. The threshold by Niblack method is given by: T(i, j) = m(i, j) + κ. σ(i, j) (2) Where, m (i,j) is a local mean σ (i, j) is a standard deviation and k is a constant, Afterword number of author presents the document binarization techniques. Some of these techniques are explained below. Bolan Suet al. [5] presents the document binarization technique based on Markov Random Model. This method classified document into three parts, background, binarized text and uncertain pixels generated during process. Abdenour et al. [6] proposed an adaptive thresholding technique. In this technique, descriptor centre of the GLCM matrix is calculated and it is subjectively verified on DIBCO database. Lopes et al. [7] present an histogram threshold approach based on fuzziness measure. In this method, initially two levels of gray scale are considered to represent the boundaries of histogram. The fuzzy logic is used t find out the similarity in the image to decide the threshold. 3. PROPOSED WORK The detailed process of the proposed degraded document binarization process is shown in fig. 2. Degraded document Image Preprocessing Text stroke edge pixel detection Sauvola [3] presents the technique which is an improvement of Niblack method. This method is working in presence of noise. In this method threshold is calculated by dynamic range of standard deviation. The thresholding by Sauvola is given by Clear Document Post processing Local threshold Estimation T(i, j) = m(i, j) (1 + κ ( σ(i,j) 1)) (3) Where, R is constant and its value suggested as 128 This system has some disadvantages that it does not work for text pixel nearer to background image. Wolf [4] proposed a method to normalization of contrast and mean of an image and calculate the threshold by the formula given by T = (1 k) m + k M + k sr (m M) (4) Where, R is highest gray value standard deviation. R Fig.2 Block diagram of the proposed system. The proposed system is processed through the four main steps: Pre-processing, Text stroke edge pixel detection, local threshold estimation and post processing. Each step is explained in detailed below. A. Preprocessing The input image from DIBCO 2014 database is feed to the system. The old degraded documents having the variation in the background and foreground text. To improve the contrast, histogram equalization technique is used. The histogram equalization helps to improve the contrast hence quality and clarity of the image. Another operation performed to minimize the noise is median filtering. The mask of 3x3 is used for filtering. 2

B. Text stroke edge pixel detection Mostly, the gradients are used for the edge pixels detection. Gradients are calculated from the image and normalized to compensate the effect of variations from background. This method extracts the strokes edges. A. Qualitative analysis The qualitative analysis is the non statistical representation of the research. LMM [8] is the method which is used to find out the stroke edges by differencing the local minima and maxima of the image. It is given by Ca(i, j) = αc(i, j) + ((1 α)(imax (i, j) Imin (i, j)) ) (5) Where, Ca (i, j) is the local contrast of an image. Imax (i, j) is the maximum local image gradient and Imin (i, j) is the minimum local image gradient and α is the standard deviation and which is given by α = ( std 180 )γ (6) (a) The γ parameter is important to enhance the contrast of the image and increase the accuracy of system. This contrast map is used to extract the text stroke edges. The canny edge detection method has a capability to reduce the false edges. In this method canny edge detection operation perform on binary edge map. C. Local Threshold Estimation The text stroke edge pixel detection method properly extracted edges hence it is easy to select the text from the document image. But it is found that after edge stroke is extracted, the contrast is high nearer to the edge pixel has recognizable intensity change. The local threshold is estimated by considering the edge map, if edge pixel is 0 and next adjacent pixel is 1 then it is consider as a edge else discard that pixel. The local threshold is estimated by using equation (b) R(x, y) = 1 Edg > Emean + Estd = 0 Otherwise (7) D. Postprocessing Post processing step collect all edge pixels and connect them while discard others non edge pixels. After that the connected pixels are examine whether the pixel is belongs to foreground or background. 4. PROPOSED ALGORITHM The experimentation is carried on the DIBCO 2014 database. The results of the proposed system are analyzed by qualitative and quantitative analysis. (c) 3

(d) (h) Fig. 3 Qualitative analysis (a) Input degraded document color image from DIBCO 2014 dataset (b) grayscale image (c)output of sauvola binarization method (d) output of bernsen binarization method (e) Output of modified bernsen binarization method (f)output of contrast image map construction (g)output of otsu gobal thresholding method (h) output of proposed system for γ = 1. (e) B. Quantitative analysis The quantitative analysis is the statistical representation of the research. The proposed system can be representing in terms of accuracy and f measures. The accuracy of the system in terms of performance measures is given by Where, accuracy = (TP+TN) (TP+TN+FP+FN) (8) fmeasure = (2 (precision recall)) (precision+recall) (9) (f) precision = TP/((TP + FP)) (10) recall = TP/((TP + FN )) (11) 5. CONCLUSION The degraded document binarization and analysis is the important research area in the field of image processing, computer vision and pattern recognition. It is difficult to segment the text because of the noise and illumination changes. The proposed system is evaluated on the DIBCO 2014 database. This system gives high accuracy of 98.65% and f-measure of 99.29% for γ = 1. (g) 4

REFERENCE [1] G. N. Otsu, A threshold selection method from gray level histogram, IEEE Trans. Syst., Man, Cybern., vol. 19, no. 1, pp. 62 66, Jan. 1979. [2] W. Niblack, An Introduction to Digital Image Processing, Englewood Cliffs, NJ: Prentice - Hall, 1986. [3] J. Sauvola and M. Pietikainen, Adaptive document image binarization, Pattern Recognit., vol. 33, no. 2, 2000, pp. 225236, [4] Wolf, J-M. Jolion, Extraction and Recognition of Artificial Text in Multimedia Documents, Pattern Analysis and Applications, 6(4):309-326, (2003). [5] B. Su, S. Lu, and C. L. Tan, Binarization of historical handwritten document images using local maximum and minimum lter, in Proc.Int. Workshop Document Anal. Syst., Jun. 2010, pp. 159 166. [6] A. Sehad, Y. Chibani, M. Cheriet and Y. Yaddaden, "Ancient degraded document image binarization based on texture features," 2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA), Trieste, 2013, pp. 189-193. [7] Lopes, N.V., Mogadouro do Couto, P.A., Bustince, H., Melo- Pinto, P.: Automatic histogram threshold using fuzzy measures. IEEE Trans. Image Process. 19(1), 199 204 (2010). [8] J. Bernsen, Dynamic thresholding of gray-level images, in Proc. Int.Conf. Pattern Recognit., Oct. 1986, pp. 1251-1255. 5