PHASE PRESERVING DENOISING AND BINARIZATION OF ANCIENT DOCUMENT IMAGE

Similar documents
Image binarization techniques for degraded document images: A review

An Analysis of Image Denoising and Restoration of Handwritten Degraded Document Images

Efficient Document Image Binarization for Degraded Document Images using MDBUTMF and BiTA

Robust Document Image Binarization Technique for Degraded Document Images


Recovery of badly degraded Document images using Binarization Technique

Er. Varun Kumar 1, Ms.Navdeep Kaur 2, Er.Vikas 3. IJRASET 2015: All Rights are Reserved

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

An Improved Binarization Method for Degraded Document Seema Pardhi 1, Dr. G. U. Kharat 2

Multispectral Image Restoration of Historical Document Images

` Jurnal Teknologi IDENTIFICATION OF MOST SUITABLE BINARISATION METHODS FOR ACEHNESE ANCIENT MANUSCRIPTS RESTORATION SOFTWARE USER GUIDE.

Robust Document Image Binarization Techniques

Binarization of Historical Document Images Using the Local Maximum and Minimum

A Robust Document Image Binarization Technique for Degraded Document Images

ICFHR2014 Competition on Handwritten Document Image Binarization (H-DIBCO 2014)

Document Recovery from Degraded Images

An Analysis of Binarization Ground Truthing

[More* et al., 5(8): August, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Neighborhood Window Pixeling for Document Image Enhancement

Preprocessing and Segregating Offline Gujarati Handwritten Datasheet for Character Recognition

Effect of Ground Truth on Image Binarization

Noise Removal and Binarization of Scanned Document Images Using Clustering of Features

MAJORITY VOTING IMAGE BINARIZATION

Automatic Enhancement and Binarization of Degraded Document Images

BINARIZATION TECHNIQUE USED FOR RECOVERING DEGRADED DOCUMENT IMAGES

Contrast adaptive binarization of low quality document images

An Improved Bernsen Algorithm Approaches For License Plate Recognition

Improving the Quality of Degraded Document Images

Colored Rubber Stamp Removal from Document Images

ICFHR 2016 Handwritten Document Image Binarization Contest (H-DIBCO 2016)

Accurate, Swift and Noiseless Image Binarization

Hybrid Binarization for Restoration of Degraded Historical Document

Restoration of Degraded Historical Document Image 1

Fig 1 Complete Process of Image Binarization Through OCR 2016, IJARCSSE All Rights Reserved Page 213

Historical Document Preservation using Image Processing Technique

International Conference on Computer, Communication, Control and Information Technology (C 3 IT 2009) Paper Code: DSIP-024

Automatic Licenses Plate Recognition System

Study and Analysis of various preprocessing approaches to enhance Offline Handwritten Gujarati Numerals for feature extraction

Quantitative Analysis of Local Adaptive Thresholding Techniques

Keyword: Morphological operation, template matching, license plate localization, character recognition.

Extraction of Newspaper Headlines from Microfilm for Automatic Indexing

Chapter 6. [6]Preprocessing

Removal of High Density Salt and Pepper Noise through Modified Decision based Un Symmetric Trimmed Median Filter

CHAPTER 4 LOCATING THE CENTER OF THE OPTIC DISC AND MACULA

Carmen Alonso Montes 23rd-27th November 2015

OTSU Guided Adaptive Binarization of CAPTCHA Image using Gamma Correction

An Evaluation of Automatic License Plate Recognition Vikas Kotagyale, Prof.S.D.Joshi

INDIAN VEHICLE LICENSE PLATE EXTRACTION AND SEGMENTATION

Processing and Enhancement of Palm Vein Image in Vein Pattern Recognition System

APPLICATION OF THRESHOLD TECHNIQUES FOR READABILITY IMPROVEMENT OF JAWI HISTORICAL MANUSCRIPT IMAGES

Extraction and Recognition of Text From Digital English Comic Image Using Median Filter

An Efficient Color Image Segmentation using Edge Detection and Thresholding Methods

License Plate Localisation based on Morphological Operations

Iris Recognition using Hamming Distance and Fragile Bit Distance

Manuscript Investigation in the Sinai II Project

Implementation of License Plate Recognition System in ARM Cortex A8 Board

MAV-ID card processing using camera images

Enhanced Binarization Technique And Recognising Characters From Historical Degraded Documents

Enhanced MLP Input-Output Mapping for Degraded Pattern Recognition

EFFECTIVE AND EFFICIENT BINARIZATION OF DEGRADED DOCUMENT IMAGES

EFFICIENT CONTRAST ENHANCEMENT USING GAMMA CORRECTION WITH MULTILEVEL THRESHOLDING AND PROBABILITY BASED ENTROPY

Number Plate Detection with a Multi-Convolutional Neural Network Approach with Optical Character Recognition for Mobile Devices

Displacement Measurement of Burr Arch-Truss Under Dynamic Loading Based on Image Processing Technology

Combined Approach for Face Detection, Eye Region Detection and Eye State Analysis- Extended Paper

COMPARITIVE STUDY OF IMAGE DENOISING ALGORITHMS IN MEDICAL AND SATELLITE IMAGES

Computing for Engineers in Python

A Novel Morphological Method for Detection and Recognition of Vehicle License Plates

Segmentation of Fingerprint Images

A Study On Preprocessing A Mammogram Image Using Adaptive Median Filter

Automatic Morphological Segmentation and Region Growing Method of Diagnosing Medical Images

A Comparative Analysis of Different Edge Based Algorithms for Mobile/Camera Captured Images

Malaysian Car Number Plate Detection System Based on Template Matching and Colour Information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

A New Character Segmentation Approach for Off-Line Cursive Handwritten Words

Adaptive Feature Analysis Based SAR Image Classification

Algorithm for Detection and Elimination of False Minutiae in Fingerprint Images

Efficient Car License Plate Detection and Recognition by Using Vertical Edge Based Method

Implementation of Barcode Localization Technique using Morphological Operations

Image Restoration and De-Blurring Using Various Algorithms Navdeep Kaur

Document Image Binarization Technique For Enhancement of Degraded Historical Document Images

AUTOMATED MALARIA PARASITE DETECTION BASED ON IMAGE PROCESSING PROJECT REFERENCE NO.: 38S1511

Restoration of Motion Blurred Document Images

VEHICLE LICENSE PLATE DETECTION ALGORITHM BASED ON STATISTICAL CHARACTERISTICS IN HSI COLOR MODEL

PARAMETRIC ANALYSIS OF IMAGE ENHANCEMENT TECHNIQUES

A Review of Optical Character Recognition System for Recognition of Printed Text

Number Plate Recognition System using OCR for Automatic Toll Collection

An Analysis of Ground Truth Binarized Image Variability of Palm Leaf Manuscripts

Automatic optical measurement of high density fiber connector

A Ground Truth Bleed-Through Document Image Database

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

APJIMTC, Jalandhar, India. Keywords---Median filter, mean filter, adaptive filter, salt & pepper noise, Gaussian noise.

Number Plate Recognition Using Segmentation

A comparative study of different feature sets for recognition of handwritten Arabic numerals using a Multi Layer Perceptron

Image Segmentation of Historical Handwriting from Palm Leaf Manuscripts

Performance Evaluation of Edge Detection Techniques for Square Pixel and Hexagon Pixel images

Moving Object Detection for Intelligent Visual Surveillance

Motion Detector Using High Level Feature Extraction

Locating the Query Block in a Source Document Image

Efficient 2-D Structuring Element for Noise Removal of Grayscale Images using Morphological Operations

Iraqi Car License Plate Recognition Using OCR

Transcription:

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 7, July 2015, pg.16 20 RESEARCH ARTICLE ISSN 2320 088X PHASE PRESERVING DENOISING AND BINARIZATION OF ANCIENT DOCUMENT IMAGE Somadi Ajay M. Tech (DSCE) ajayrao402@gmail.com JITS, Karimnagar Ramesh Jitty Assistant Professor ramesh.jitty@gmail.com JITS, Karimnagar Abstract: In this paper, we present a phase-based binarization model for degraded document images, also a post processing method that can improve any binarization method and a ground truth generation tool. Usually, many binarization techniques are implemented in the literature for different types of binarization problems. It include an adaptive image contrast based document image binarization technique that is tolerant to different type of document degradation such as uneven illumination document smear involving smudging of text, seeping of ink to the back side of page, degradation of paper ink because of aging and so on similar reasons. An objective evaluation based methodology for handwritten document image binarization techniques that aims to reduce the human involvement in the ground truth construction and consecutive testing Image binarization is the method of separation of pixel values into dual collections, foreground for black and background for white. In images the grayscale and color images into black and white images. Ancient and degraded document improvement using image processing is attracting many researchers in the recent period. Binarization is very popular cleaning the document for further processing. Keywords: Phase-based binarization model, degraded document images, grayscale images, binarization method, ground truth construction tool. 1. Introduction Binarization method is an important for improving degraded document image preprocessing to eliminate background noise and improve the document quality. This process consists of converting the original image in binary image which can be used for further processing (Optical character recognition OCR, Intelligent character recognition ICR, Word spotting [6]. In libraries and archives around the world abundance of old and historically important document and manuscripts are being stored. These documents accumulate a significant amount of human heritage over times which are being suffered high degree of degradation. However, many types of degradation like bleed though, faded ink, show through uneven illumination, deterioration of the cellulose structure, variation in image contrast [7]. Now, there is a strong move toward digitization of these manuscripts to preserve their content for future generation. In degraded documents, where extensive background noise or difference in contrast and brightness exists i.e. there are many pixels that cannot be effortlessly categorized as foreground 2015, IJCSMC All Rights Reserved 16

or background. Binarization is one of the pre-processing steps which consist to separate foreground and background of documents images. It converts a gray-scale document image into a binary document image. The need of automatic archiving and processing of large volumes of old documents and manuscripts had got attention of many researchers. To the best of our knowledge, none of the proposed methods can deal with all types of documents and degradation. In this, a robust phase-based binarization method is proposed for the binarization and enhancement of historical documents and manuscripts. There are three steps in the proposed method: i) preprocessing, ii) main binarization, and iii) postprocessing. The preprocessing step mainly does image denoising with phase preservation, also contain some morphological methods. Then, the phase congruency features used in the main binarization step. Phase congruency is used in the machine vision and image processing literature. Phase congruency is a feature detector e.g. palmprint verification, finger-knuckle-print recognition, object detection, and biomedical applications[4]. We show that the foreground of ancient documents can be modeled by phase congruency. Phase congruency is a robust and stable way to process historical documents, both handwritten and machine-printed manuscripts. After completing the three binarization steps on the input images using phase congruency features and a denoised image, the enhancement processes are applied. A median filter and a phase congruency feature are used to construct an object exclusion map image. This map is then used to remove unwanted lines and interfering patterns [3][9]. The proposed binarization method is stable and robust to various types of degradation and to different datasets, with this purpose designed steps; we provide comprehensive experimental results to demonstrate this robustness of documents. Sample document images selected from various places. 2. Literature Review Figure 1: Three degraded document image samples from DIBCO 11. In this section, we describe some selected binarization methods. Otsu s method [5] assumes the presence of two distributions (one for the text and another one for the background). It calculates a threshold value in such a way that it maximizes the variance between the two distributions. Lu's et al [10][13] proposed a binarizatition method mainly based on background estimation. In first step, background of document was estimated via a one-dimensional iterative Gaussian smoothing procedure. After that for accurate binarization of strokes and sub-strokes, L1-norm gradient image was utilized. This method selected among 43 algorithms submitted to the DIBCO'09 competition. Su et al [10] used local maximum and minimum to build a local contrast image. After that a sliding window was applied across the contrast image to determine local thresholds, where bright pixels shows foreground and dark pixels refer to background pixels. A version of this method ranked one of the two sharing _rst-rank winners among 17 algorithms participated in H-DIBCO'10 contest. Farrahi Moghaddam et al [10] proposed a multi-scale binarization method in which input document was binarized several times using different scales. Then, these output images were combined to form the _nal output image. After that Historical Document Binarization Based on Phase Information of Images 3 this method has been extended to the Otsu's method with better results, which named as AdOtsu. 3. Phase Based Binarization Model In this we are going to discuss Phase based binarization model to improve the visual feature of the text of degraded document. We have three types of documents Hand written, Machine printed and graphics. Degradation can classify into more categories depending on foreground and background. In foreground degradation can be text in nebulous and weak strokes or sub stroke. Where in background global bleed through, local bleed though, unwanted lines and pattern, alien ink and faded ink[11][13]. 2015, IJCSMC All Rights Reserved 17

3.1 Phase Congruency-Based Feature Maps Two features of phase congruency [5] are used to preprocess document images: i) the maximum moment of phase congruency covariance (MMPCC), and ii) the locally weighed mean phase angle (LWPMA). The MMPCC is a measure of edges strength which is used as an accurate edge detector. The LWMPA can be used to estimate the structure of foreground text. 3.2 Binarization Model An extended version of the one proposed in our previous work [19]. We have added a diagnosed image, which is another phase-based feature to the binarization model, and achieved 5% improvement, on average. Figure 2: Input image Figure 3: Binarized image 3.2.1 Preprocessing In the preprocessing step, we use a denoised image instead of the original image to obtain a binarized image in rough form. The image denoising method discussed in section III is applied to preprocess the binarization output 3.2.2 Main Binarization The next step is the main binarization, which is based on phase congruency features: i) the maximum moment of phase congruency covariance (IM); and ii) the locally weighted mean phase angle (IL). 1) IM: In this, IM is used to separate the background from potential foreground parts[11]. This step performs very well, even in badly degraded documents, where it can reject a majority of badly degraded background pixels by means of a noise modeling method. To achieve this, we set the number of two-dimensional log-gabor filter scales ρ to 2, and use 10 orientations of two-dimensional log-gabor filters r. 3.2.3 Post processing In this step, we apply enhancement processes. First, a bleed through removal process is applied. Then, a Gaussian filter is used to further enhance the binarization output and to separate background from foreground, and an exclusion process is applied, based on a median filter and IM maps, to remove background noise and objects[12]. Finally, a further enhancement process is applied to the denoised image. The final binarized output image is obtained by processing the input image in three steps: preprocessing, main binarization, and post processing. 2015, IJCSMC All Rights Reserved 18

Somadi Ajay et al, International Journal of Computer Science and Mobile Computing, Vol.4 Issue.7, July- 2015, pg. 16-20 Figure 4: Binarization results of an Arabic historical document image: (a) Original image, (b) Otsu s, (c) Niblack s, (d) Sauvola s, (e) NICK and (f) Proposed method. 3.2.4 Ground truth Generation tool A ground truth binary image is produced using the proposed PhaseGT software. The PhaseGT is an application for historical document ground truthing. It uses phase congruency features [8] [9] and a priori knowledge about the characteristics of the input document image to preprocess input document image. Based on the provided information, the PhaseGT will generate a rough binarized image.gt generation is difficult and time consuming task. For this benchmark datasets are required i.e. PHIBD 2012, H-DIBCO 2012. Figure 5: Sample original and ground truth image We have also proposed a rapid method to determine the type of document image been studied, which will be of great interest. The behavior of ancient handwritten document images and machine-printed images shows differences in terms of binarization. The strokes and sub-strokes of handwritten images require accurate binarization, and the binarization of the interior pixels of the text of machine-printed images needs to be performed with care. Although the proposed binarization method works well on both handwritten and machine-printed documents, better results for both types of documents are achieved, when a priori information about the type of input document is available. Finally, an efficient ground truthing tool called PhaseGT has been provided for degraded documents [2]. This tool is designed to reduce the manual correction involved in ground truth generation. In future work, we plan to expand the application of phase-derived features, which ensures the stable behavior of document images, to other cultural heritage fields, such as microfilm analysis and multispectral imaging. 2015, IJCSMC All Rights Reserved 19

4. Conclusion Somadi Ajay et al, International Journal of Computer Science and Mobile Computing, Vol.4 Issue.7, July- 2015, pg. 16-20 In this, we have introduced an image binarization method that uses the phase information of the input image, and robust phase-based features extracted from that image are used to build a model for the binarization of ancient manuscripts. Phasepreserving denoising followed by morphological operations are used to preprocess the input image. Then, two phase congruency features, the maximum moment of phase congruency covariance and the locally weighted mean phase angle, are used to perform the main binarization[1][7]. For post-processing, we have proposed a few steps to filter various types of degradation; in particular, a median filter has been used to reject noise, unwanted lines, and interfering patterns. 5. Acknowledgments I would like to thank, specifically Mayur S. Burange Sir who provided helpful information about this topic and helped to quickly resolve issues that I encountered. References [1] H. Z. Nafchi, R. F. Moghaddam, and M. Cheriet, Historical document binarization based on phase information of images, in Proc. ACCV, 2012, pp. 1 12. [2] K. Ntirogiannis, B. Gatos, and I. Pratikakis, An objective evaluation methodology for handwritten image document binarization techniques, in Proc. 18th IAPR Int. Workshop DAS, 2008, pp. 217 224. [3] E. Saund, L. Jing, and P. Sarkar, Pixlabeler: User interface for pixel- level labeling of elements in document images, in Proc. 10th ICDAR, Jul. 2009, pp. 646 650. [4] H. Z. Nafchi, S. M. Ayatollahi, R. F. Moghaddam, and M. Cheriet, An efficient ground truthing tool for binarization of historical manuscripts, in Proc. 12th ICDAR, Aug. 2013, pp. 807 811. [5] R. F. Moghaddam and R.F.Nafchi, Phase Based Binarization of Ancient document, IEEE Trans. Image Process., vol. 23, no. 7, 2014. [6] M. Valizadeh and E. Kabir, Binarization of degraded document image based on feature space partitioning and classification, Int. J. Document Anal. Recognit., vol. 15, no. 1, pp. 57 69, 2010. [7] H. Z. Nafchi and H. R. Kanan, A phase congruency based document binarization, in Proc. IAPR Int. Conf. Image Signal Process., 2012, pp. 113 121. [8] E. Zemouri, Y. Chibani, and Y. Brik, Enhancement of Historical Document Images by Combining Global and Local Binarization Technique, IJIEE, Vol. 4, No. 1, January 2014. 2015, IJCSMC All Rights Reserved 20