Skeletonization Algorithm for an Arabic Handwriting

Similar documents
AN EFFICIENT THINNING ALGORITHM FOR ARABIC OCR SYSTEMS

Preprocessing and Segregating Offline Gujarati Handwritten Datasheet for Character Recognition

Study and Analysis of various preprocessing approaches to enhance Offline Handwritten Gujarati Numerals for feature extraction

A comparative study of different feature sets for recognition of handwritten Arabic numerals using a Multi Layer Perceptron

Chapter 17. Shape-Based Operations

A new seal verification for Chinese color seal

8.2 IMAGE PROCESSING VERSUS IMAGE ANALYSIS Image processing: The collection of routines and

Text Extraction from Images

Iraqi Car License Plate Recognition Using OCR

MATHEMATICAL MORPHOLOGY AN APPROACH TO IMAGE PROCESSING AND ANALYSIS

Chapter 6. [6]Preprocessing

On-Line, Low-Cost and Pc-Based Fingerprint Verification System Based on Solid- State Capacitance Sensor

MAV-ID card processing using camera images

A Fast Median Filter Using Decision Based Switching Filter & DCT Compression

Abstract. Most OCR systems decompose the process into several stages:

Compression Method for Handwritten Document Images in Devnagri Script

Artificial Intelligence: Using Neural Networks for Image Recognition

Computer Graphics (CS/ECE 545) Lecture 7: Morphology (Part 2) & Regions in Binary Images (Part 1)

Libyan Licenses Plate Recognition Using Template Matching Method

Preprocessing of Digitalized Engineering Drawings

Analysis and Identification of Rice Granules Using Image Processing and Neural Network

Method for Real Time Text Extraction of Digital Manga Comic

International Conference on Computer, Communication, Control and Information Technology (C 3 IT 2009) Paper Code: DSIP-024

A New Character Segmentation Approach for Off-Line Cursive Handwritten Words

Recognition Offline Handwritten Hindi Digits Using Multilayer Perceptron Neural Networks

Raster Based Region Growing

Integrated Digital System for Yarn Surface Quality Evaluation using Computer Vision and Artificial Intelligence

Pixel Classification Algorithms for Noise Removal and Signal Preservation in Low-Pass Filtering for Contrast Enhancement

Laser Printer Source Forensics for Arbitrary Chinese Characters

Automatic Morphological Segmentation and Region Growing Method of Diagnosing Medical Images

A New Connected-Component Labeling Algorithm

PCB Fault Detection by Image Processing Tools: A Review

Combination of Web and Android Application to Implement Automated Meter Reader Based on OCR

Efficient Car License Plate Detection and Recognition by Using Vertical Edge Based Method

Figure 1. Artificial Neural Network structure. B. Spiking Neural Networks Spiking Neural networks (SNNs) fall into the third generation of neural netw

Implementation of License Plate Recognition System in ARM Cortex A8 Board

Automatic License Plate Recognition System using Histogram Graph Algorithm

Segmentation of Fingerprint Images

Enhanced MLP Input-Output Mapping for Degraded Pattern Recognition

Segmentation of Fingerprint Images Using Linear Classifier

An Hybrid MLP-SVM Handwritten Digit Recognizer

A Scheme for Salt and Pepper oise Reduction and Its Application for OCR Systems

A new method to recognize Dimension Sets and its application in Architectural Drawings. I. Introduction

Malaysian Car Number Plate Detection System Based on Template Matching and Colour Information

Image Segmentation of Historical Handwriting from Palm Leaf Manuscripts

Offline Signature Verification for Cheque Authentication Using Different Technique

Traffic Sign Recognition Senior Project Final Report

Noise Reduction Technique in Synthetic Aperture Radar Datasets using Adaptive and Laplacian Filters

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

A Review of Optical Character Recognition System for Recognition of Printed Text

A Novel Encryption System using Layered Cellular Automata

Optical Character Recognition for Hindi

COMPARITIVE STUDY OF IMAGE DENOISING ALGORITHMS IN MEDICAL AND SATELLITE IMAGES

A Fast Segmentation Algorithm for Bi-Level Image Compression using JBIG2

Module 6 STILL IMAGE COMPRESSION STANDARDS

Image binarization techniques for degraded document images: A review

A Comprehensive Survey on Kannada Handwritten Character Recognition and Dataset Preparation

Automated License Plate Recognition for Toll Booth Application

Detection of License Plates of Vehicles

How to Create Animated Vector Icons in Adobe Illustrator and Photoshop

Automated Detection of Early Lung Cancer and Tuberculosis Based on X- Ray Image Analysis

Lecture # 01. Introduction

An Evaluation of Automatic License Plate Recognition Vikas Kotagyale, Prof.S.D.Joshi

VLSI Implementation of Impulse Noise Suppression in Images

COLOR LASER PRINTER IDENTIFICATION USING PHOTOGRAPHED HALFTONE IMAGES. Do-Guk Kim, Heung-Kyu Lee

Abstract Terminologies. Ridges: Ridges are the lines that show a pattern on a fingerprint image.

Recognition System for Pakistani Paper Currency

IEEE Signal Processing Letters: SPL Distance-Reciprocal Distortion Measure for Binary Document Images

Color Feature Extraction of Oil Palm Fresh Fruit Bunch Image for Ripeness Classification

World Journal of Engineering Research and Technology WJERT

FPGA based Real-time Automatic Number Plate Recognition System for Modern License Plates in Sri Lanka

Automatic Reader of Recording Strips.

Segmentation using Saturation Thresholding and its Application in Content-Based Retrieval of Images

International Journal of Advanced Research in Computer Science and Software Engineering

REVERSIBLE MEDICAL IMAGE WATERMARKING TECHNIQUE USING HISTOGRAM SHIFTING

Combined Approach for Face Detection, Eye Region Detection and Eye State Analysis- Extended Paper

Finger print Recognization. By M R Rahul Raj K Muralidhar A Papi Reddy

A Novel Morphological Method for Detection and Recognition of Vehicle License Plates

Preliminary Modulation Transfer Function Study on Amorphous Silicon Flat Panel System for Industrial Digital Radiography

PATENT COOPERATION TREATY (PCT) WORKING GROUP

IMAGE PROCESSING PAPER PRESENTATION ON IMAGE PROCESSING

Algorithm for Detection and Elimination of False Minutiae in Fingerprint Images

Scrabble Board Automatic Detector for Third Party Applications

Iris Recognition-based Security System with Canny Filter

QUALITY CHECKING AND INSPECTION BASED ON MACHINE VISION TECHNIQUE TO DETERMINE TOLERANCEVALUE USING SINGLE CERAMIC CUP

The Genetic Algorithm

Image Processing and Particle Analysis for Road Traffic Detection

Biometrics and Fingerprint Authentication Technical White Paper

AN EXTENDED VISUAL CRYPTOGRAPHY SCHEME WITHOUT PIXEL EXPANSION FOR HALFTONE IMAGES. N. Askari, H.M. Heys, and C.R. Moloney

PHASE PRESERVING DENOISING AND BINARIZATION OF ANCIENT DOCUMENT IMAGE

Recursive Text Segmentation for Color Images for Indonesian Automated Document Reader

Filtering in the spatial domain (Spatial Filtering)

Real-Time Face Detection and Tracking for High Resolution Smart Camera System

UNIT-IV Combinational Logic

A Data-Embedding Pen

Improve OCR Accuracy on Color Documents Use Image Detergent to Clean Up Color Document Images Prior to OCR for Improved Results

VEHICLE LICENSE PLATE DETECTION ALGORITHM BASED ON STATISTICAL CHARACTERISTICS IN HSI COLOR MODEL

Biometric Authentication for secure e-transactions: Research Opportunities and Trends

Automatic Locating the Centromere on Human Chromosome Pictures

Detection of Voltage Sag and Voltage Swell in Power Quality Using Wavelet Transforms

Transcription:

Skeletonization Algorithm for an Arabic Handwriting MOHAMED A. ALI, KASMIRAN BIN JUMARI Dept. of Elc., Elc. and sys, Fuculty of Eng., Pusat Komputer Universiti Kebangsaan Malaysia Bangi, Selangor 43600 MALAYSIA Abstract: - In this paper, we propose a thinning algorithm for Arabic handwriting using color coding for both thinning processing and gap recovery in the final output skeleton. This algorithm is designed so that it accepts unconstrained Arabic handwriting. Different colors have been given to different pixels of interest on the original image in the beginning and during the process of skeletonization. Color coding gives good optimization and demonstration and yielding an efficient skeletonization. Redundant pixels of (one pixel width) skeleton are removed to ease the task of next stage (feature extraction). The algorithm preserves very well the shape of the original image and yield skeleton that can be effectively incorporated in Arabic OCR system. Key-Words: - Character recognition, image processing, thinning algorithm, skeletonization, connectivity preservation and Arabic handwriting 1 Introduction Character recognition is a field of pattern recognition that has been subjected to considerable work during the past three decades [1]. Although the designing of thinning algorithm has been an important research area, merely few researchers considered the thinning of Arabic writing [2]. Thinning plays a major role in OCR system, and since recognition is dependent in part on the effectiveness of the thinning algorithm, attention is given in this paper to the development of effective thinning algorithm for the purpose of developing an Arabic OCR. The thinning algorithms have been studied extensively regarding the image processing and pattern recognition [2-7]. Skeletonization has been effectively proven in wide range of image processing usages, for instance character recognition, fingerprint recognition, inspection of printed circuit boards, chromosome shape analysis [2]. In general, an effective skeletonization algorithm should ideally remove all redundant pixels and retain the significant aspects of the pattern under process. In addition, good algorithm should fulfill some requirements namely: i) Preservation of skeleton connectivity and shape ii) Obtaining the approximate medial axis iii) Output a skeleton of unity pixel width Thinning algorithms can be classified into two types; sequential algorithms [8], and parallel algorithms [9]. Sequential algorithms have two approaches; iterative approach and noniterative approach. In the iterative approach, pixels on the boundary are examined (either in sequential or parallel) and successively deleted until a skeleton of one pixel width is obtained. On the other hand noniterative approach produces a medial line of the original image directly (in one pass) without examining all pixels individually. In fact, our algorithm falls under the first approach of the first type which is sequentially iterative algorithm and that is to achieve the simplicity and effectiveness. In the proposed algorithm we use color coding in bitmap file of sixteen colors. Different colors have been chosen for different types of pixels throughout the steps of thinning process (e.g. mark, examine, preserve or delete and pixels recovery) to achieve thinning and solve the problem of discontinuity. Using this technique has yield a very fine skeleton of the original image of Arabic handwritten text and in turn will facilitate the objective of feature extraction and recognition stages of any character recognition system. 2 Algorithm Procedure Our algorithm utilizes a windows color bitmap file format. Six colors; black, white, yellow, blue, red and green were chosen to represent on-pixel, offpixel, noise pixel, start or end point pixel, deletable pixel and recovered pixel respectively. The input image file is monochrome (black & white) bitmap

file, however, as the algorithm start assign colors for different type of pixels the input file is converted to a windows color bitmap file. There are seven main steps to achieve the task of skeletonization and they are as follows: 2.1 Start and end points marking This is done by scanning the whole image from topleft to bottom-right corner allocating all pixels in inner and outer boarder of the image and distinguish those deletable from undeletable pixels. The algorithm considers all black pixels on-pixel which surrounded by six or seven white pixels offpixel (in directions according to the Freeman s code diagram shown in Fig.1) are undeletable and assign blue color for them. These pixels are expected to be a start or end points on the image which must be stay undeletable for sake of image shape preservation and they should not be examined in all iterations come afterward as shown in Fig. 2. Fig. 3 Pixels that considered as noise 2.2 Allocation of deletable pixels In this step we need to allocate all pixels on the boundary of the image that can be deleted for the sake of thinning, the algorithm marks these pixels with Red color. Allocation of these pixels follows the rules (template) shown in Fig. 4. Fig. 1 Freeman Chain code Fig. 4 Templates for allocation of deletable pixels Where P T is a pixel under test and P 0, P 2, P 4 and P 6 are the four neighbor pixels of PT in four directions according to Freeman s Code. The conditions that make P T deletable are as follows: If {(P 2 =on) & (P 6 =off) or (P 0 =on) & (P 4 =off) or (P 2 =off) & (P 6 =on) or (P 0 =off) & (P 4 =on)} Fig. 2 Start and end points detection In the same manner, algorithm consider all black pixels on-pixels which surrounded by five or eight white pixels off-pixels are noise and assign yellow color for them during scanning and then delete them as shown in the Fig. 3(a) and Fig. 3(b). So P T in all four, above mentioned, cases is deletable pixel provided that it should be connected to at least two other black pixels. Subsequently they will be temporary turned Red before the algorithm will finally decide whether to delete or retain them depending on other conditions fulfillment. Now to avoid discontinuity there are three more rules to apply before start deleting all pixels marked as deletable (Red) pixels: i) The first rules that we put to avoid discontinuity is that the deletable pixel should not follow any pattern shown in the Fig. 5.

Fig. 5 First rule for discontinuity prevention If any of deletable pixels do fall under any of patterns shown in Fig. 5, one of deletable pixels should be retained. The priority of retaining a pixel goes to the deletable pixel which has more other deletable pixels connected to it than the other, however, if both of deletable pixel have the same number of other deletable pixel the priority goes to the one that leads the other according to the direction of scanning the image from top-left to bottom-right and that pixel marked as black pixel (retained). ii) The second rule state that if a deletable pixel connected to another three deletable pixels in a manner shown in Fig. 6(a), the algorithm marks the medial pixel as a black pixel as shown in Fig. 6(b). Fig. 6 Second rule for discontinuity prevention iii) The third rule states that any pixel which has been marked as deletable Red and has two white pixel off-pixel at direction of (P 2 & P 6 ) or (P 0 & P 4 ) as shown in Fig. 7 should be reverted to black pixel. As a result of this deletion we have noticed that some discontinuities have occurred and hence we make the algorithm finish this process without any interruption and make it iterate as described in the next section till there are no more pixels deleted (in other word the number of deleted pixels after each iteration is same). Only then the algorithm start checking for discontinuities and deal with them as we shall see later in section 2.5. 2.4 Iteration The algorithm now will iterate repeating steps in section 2.2 and section 2.3 till there are no more red pixels to delete. In other word the templates in Fig. 4 are no longer applicable. The number of iterations depends mainly on the thickness of the handwriting in the input image. For instance the handwritten character (ha), shown in Fig. 8(a), took five iterations to reach its final skeleton whereas Arabic character (dal), shown in Fig. 8(b), took six iterations. We could make notice of this by taking snapshots after each iteration. Using this technique can also help in monitoring thinning process by following (step by step) the marking of pixels by different colors as explained above, so any process malfunction can easily be detected Fig. 7 Third rule for discontinuity prevention 2.3 Deletion process We shall now delete all pixels that still marked as deletable pixels red pixels and turn them to white pixels. Pixels deletion follows the scanning of the image from top-left corner to bottom-right corner. Fig. 8 two Arabic handwritten characters of different thickness and their skeletons 2.5 Discontinuity detection and recovery After making the last deletion we noticed that there are some discontinuities in one place or another in the output skeleton, and accordingly we propose a

technique involves recovering of those deleted pixels which cause this type of discontinuity as following: We move a window of 3x3 on the whole thinned image and if one of the templates shown in Fig. 9 was found, we check the missed pixel so that if it is proven that this pixel was there and, because of thinning algorithm, has been deleted we just recover that pixel back (make it black pixel) so that we solve the problem of discontinuity, otherwise we shall consider that as a deliberate discontinuity (i.e. is one of the character feature) and keep it as it is. Referring to Fig. 9, P T is a pixel to be checked whether it was there before applying the algorithm or not, so if it was there we just convert this offpixel back to on-pixel otherwise we leave it as it is. Fig. 10 Type of discontinuity with more than one pixel long In the Fig. 10 we can clearly notice (from left to right) original image of Arabic character (LamAleef), skeleton with discontinuity and skeleton with discontinuity being recovered. The measures taken to recover this type of discontinuity is as follows: the algorithm sweep the whole skeleton image looking for those black pixels which are connected to one black pixel only (excluding those pixels marked as start and end point blue pixels ) and check its neighbor at P 3 or P 7, so if the tested pixel connected to either P 3 or P 2 and that P 7 is white and used to be black before deletion then P 7 is converted back to black, likewise if the tested pixel connected to either P 6 or P 7 and that P 3 is white and used to be black before deletion then P 3 is converted back to black. Fig. 11 illustrates this mechanism. This mechanism is repeated till there are no more pixels (excluding those blue pixels ) connected to one black pixel only. In this way it is verified that our algorithm is effectively capable of solving this type of discontinuity. Fig. 9 Templates for recovery of deleted pixel and preserve connectivity Solving this type of discontinuity does not prevent other type of discontinuity from occurring like the one shown in the Fig. 10 where none of those templates is applicable and the length of discontinuity is more than two pixels and that is notably happened in the line or stroke which inclined diagonally in the direction of P 3 or P 7 (i.e. lines goes to North-West or South-East) Fig. 11 Mechanism applied for discontinuity of more than one pixel long 2.6 Redundant pixels removal One of the main features of our algorithm is that removing the redundant pixels in the final skeleton. In Fig. 12 although the skeleton is one pixel width yet it has one or more pixel which can be removed without causing any discontinuity. On the contrary,

the removal of those redundant pixels will enhance the processes of feature extraction and character recognition in OCR system. This is due to the fact that the number of possibilities in the decision tree will dramatically reduced and hence it speedup the process. Fig. 13 samples of original Arabic handwritten images and their skeletons 2.8 Optimization To confine the algorithm to a minimum number of pixels for testing in each iteration so that we reduce the run-time and make it faster, we made the algorithm (in the first scan) assign the location of first and last black pixels found as pixels of origin so that for the next iterations the algorithm starts and ends at these pixels rather than scanning the whole image area as defined by BitMap file format. On the other hand, to avoid inefficient iteration the algorithm is designed so that the process of deletion (thinning) is stopped and final output image (skeleton) is saved when either there are no more pixels to delete or the number of deleted pixels in two successive iteration are same, subsequently the excessive iterations are avoided and program runtime is minimized. Fig. 12 removal of redundant pixels 2.7 Experiments and results The algorithm was tested on different Arabic handwritten in both cases discrete and cursive using hp-scanner (with 1200 dpi resolution) for image capturing. A preserved smooth skeleton was obtained. Fig. 8 and Fig. 13 show examples of tests carried out on different Arabic handwriting images along with their output skeletons. Fig. 13 clearly shows how a skeleton of an image has a reserved shape, smoothness, intermediate and one pixel width line of the original image when we superimpose the output skeleton on the original image. 3 Conclusion The main goal of this work is to develop a reliable thinning algorithm to be used in Arabic handwritten character recognition system. The proposed algorithm has used color coding to mark, delete recover pixels in an image of Arabic handwritten so that a fine reliable skeleton of that image is produced in a very simple and effective manner compared with those algorithms which are based on a complex morphology and mathematical calculations which make the overall time consumption is relatively high. Color coding gives better optimization and demonstration and yielding an efficient skeletonization. Using this technique can also help in monitoring thinning process by following (step by step) the marking of pixels using different colors, so any process malfunction can easily be detected. Through the analysis of the skeletons produced it can be clearly noticed that

they are very representative of the original shape of handwritten image. This paper introduces an interactive thinning algorithm for Arabic handwriting in particular, nevertheless this algorithm can be used for Latin handwriting as well. References: [1] Mohamed A. Ali and Kasmiran Bin Jumari, A Survey and Comparative Evaluation of Selected off-line Arabic handwritten Character Recognition Systems Jurnal Teknologi, No. 36, pp. 1-18, June 2002. [2] M. M. Altuwaijiri and M. A. Bayoumi, A thinning Algorithm for Arabic characters using ART2 Neural Network, IEEE Trans. Circuits & Systems, Analogue & Digital Signal Processing, Vol. 45, No. 2, pp. 260-264, Feb 1998. [3] Flores, Edna, Eder N. Rezende, Gilberto A. Carrijo, Joao B. T. Yabu-tti, "A Fast Thinning Algorithm for Characters," 1995 IEEE Workshop On Nonlinear Signal And Image Processing, June 1995. [4] Sabri A. Mahmoud, Ibrahim AbuHaiba and Roger J. Green, Skeletonization of Arabic characters using clustering based skeletonization algorithm, Pattern Recognition, Vol. 24, No. 5, pp. 453-464, 1991. [5] A. I. El-Desouky, M.M. Salem, A.O. Abd El- Gwad and H. Arafat, A handwritten Arabic character recognition technique for machine reader, International Journal of Mini and Microcomputers, Vol. 14, No. 2, 1992. [6] R. C. Gonzalez and P. Wintz, Digital Image Processing, 2nd ed., Addison-Wesley, Canada, 1987, pp. 391-402 [7] S. Mori, H. Nishida and H. Yamada, Optical Character Recognition, John Wiley & Sons, 1999, pp. 131-158. [8] I. Zainodin, D. Khairuddin, and S. Horani, Sequential thinning of binary images, Sains Malaysiana, Vol. 32, No. 4, 1994, pp. 35-57. [9] B.K. Jang and R.T. Chin, One-pass parallel thinning analysis, properties, and quantitative evaluation, IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol. 18, No. 3, 1992, pp. 267-278