DESIGNING AND DEVELOPMENT OF OFFLINE HANDWRITTEN ISOLATED ENGLISH CHARACTER RECOGNITION MODEL

Similar documents
Preprocessing and Segregating Offline Gujarati Handwritten Datasheet for Character Recognition

Study and Analysis of various preprocessing approaches to enhance Offline Handwritten Gujarati Numerals for feature extraction

An Evaluation of Automatic License Plate Recognition Vikas Kotagyale, Prof.S.D.Joshi

Keyword: Morphological operation, template matching, license plate localization, character recognition.

Chapter 6. [6]Preprocessing

A Review of Optical Character Recognition System for Recognition of Printed Text

Number Plate Recognition Using Segmentation

A NOVEL APPROACH FOR CHARACTER RECOGNITION OF VEHICLE NUMBER PLATES USING CLASSIFICATION

An Efficient Nonlinear Filter for Removal of Impulse Noise in Color Video Sequences

Implementation of License Plate Recognition System in ARM Cortex A8 Board

Proposed Method for Off-line Signature Recognition and Verification using Neural Network

COMPARITIVE STUDY OF IMAGE DENOISING ALGORITHMS IN MEDICAL AND SATELLITE IMAGES

MAV-ID card processing using camera images

Table of Contents 1. Image processing Measurements System Tools...10

Automatic Licenses Plate Recognition System

CONTENTS. Chapter I Introduction Package Includes Appearance System Requirements... 1

RECOGNITION OF EMERGENCY AND NON-EMERGENCY LIGHT USING MATROX AND VB6 MOHD NAZERI BIN MUHAMMAD

APJIMTC, Jalandhar, India. Keywords---Median filter, mean filter, adaptive filter, salt & pepper noise, Gaussian noise.

International Journal of Advance Engineering and Research Development

CoE4TN4 Image Processing. Chapter 3: Intensity Transformation and Spatial Filtering

A New Character Segmentation Approach for Off-Line Cursive Handwritten Words

CHARACTERS RECONGNIZATION OF AUTOMOBILE LICENSE PLATES ON THE DIGITAL IMAGE Rajasekhar Junjunuri* 1, Sandeep Kotta 1

Non Linear Image Enhancement

EE368/CS232 Digital Image Processing Winter Homework #3 Released: Monday, January 22 Due: Wednesday, January 31, 1:30pm

Lane Detection in Automotive

Digital Image Processing 3/e

A Method of Multi-License Plate Location in Road Bayonet Image

Libyan Licenses Plate Recognition Using Template Matching Method

Computing for Engineers in Python

An Efficient Color Image Segmentation using Edge Detection and Thresholding Methods

Image Segmentation of Historical Handwriting from Palm Leaf Manuscripts

Performance Comparison of Mean, Median and Wiener Filter in MRI Image De-noising

4/9/2015. Simple Graphics and Image Processing. Simple Graphics. Overview of Turtle Graphics (continued) Overview of Turtle Graphics

Carmen Alonso Montes 23rd-27th November 2015

Vehicle License Plate Recognition System Using LoG Operator for Edge Detection and Radon Transform for Slant Correction

Control of Noise and Background in Scientific CMOS Technology

Using the Advanced Sharpen Transformation

PHASE PRESERVING DENOISING AND BINARIZATION OF ANCIENT DOCUMENT IMAGE

in the list below are available in the Pro version of Scan2CAD

Lane Detection in Automotive

Segmentation of Liver CT Images

Images and Graphics. 4. Images and Graphics - Copyright Denis Hamelin - Ryerson University

An Improved Bernsen Algorithm Approaches For License Plate Recognition

CHAPTER 4 LOCATING THE CENTER OF THE OPTIC DISC AND MACULA

Finger print Recognization. By M R Rahul Raj K Muralidhar A Papi Reddy

SYLLABUS CHAPTER - 2 : INTENSITY TRANSFORMATIONS. Some Basic Intensity Transformation Functions, Histogram Processing.

ROBOT VISION. Dr.M.Madhavi, MED, MVSREC

Application of Machine Vision Technology in the Diagnosis of Maize Disease

Compression Method for Handwritten Document Images in Devnagri Script

Version 6. User Manual OBJECT

Robust Document Image Binarization Techniques

Historical Document Preservation using Image Processing Technique

International Journal of Computer Engineering and Applications, TYPES OF NOISE IN DIGITAL IMAGE PROCESSING

INDIAN VEHICLE LICENSE PLATE EXTRACTION AND SEGMENTATION

Image Filtering. Median Filtering

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

OCR QUALITY IMPROVEMENT USING IMAGE PREPROCESSING Vlad Badoiu 1 * Andrei-Constantin Ciobanu 2 Sergiu Craitoiu 3

VLSI Implementation of Impulse Noise Suppression in Images

An Efficient Method for Landscape Image Classification and Matching Based on MPEG-7 Descriptors

CSE 564: Scientific Visualization

High density impulse denoising by a fuzzy filter Techniques:Survey

Improve OCR Accuracy on Color Documents Use Image Detergent to Clean Up Color Document Images Prior to OCR for Improved Results

Extraction and Recognition of Text From Digital English Comic Image Using Median Filter

Mahdi Amiri. March Sharif University of Technology

Practical Image and Video Processing Using MATLAB

A Comprehensive Survey on Kannada Handwritten Character Recognition and Dataset Preparation

Image Processing by Bilateral Filtering Method

Recognition Offline Handwritten Hindi Digits Using Multilayer Perceptron Neural Networks

Digital Image Processing Labs DENOISING IMAGES

Digital Image Processing Lec.(3) 4 th class

Institute of Technology, Carlow CW228. Project Report. Project Title: Number Plate f Recognition. Name: Dongfan Kuang f. Login ID: C f

Computer Vision. Howie Choset Introduction to Robotics

W i n d o w s. ScanGear CS-S 4.3 for CanoScan FB1200S Color Image Scanner. User's Guide

Detection of License Plates of Vehicles

Colored Rubber Stamp Removal from Document Images

International Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17, ISSN

The KNIME Image Processing Extension User Manual (DRAFT )

Artificial Intelligence: Using Neural Networks for Image Recognition

Number Plate Recognition System using OCR for Automatic Toll Collection

Vehicle Number Plate Recognition with Bilinear Interpolation and Plotting Horizontal and Vertical Edge Processing Histogram with Sound Signals

A Novel Morphological Method for Detection and Recognition of Vehicle License Plates

Image processing for gesture recognition: from theory to practice. Michela Goffredo University Roma TRE

Automatics Vehicle License Plate Recognition using MATLAB

Impulse Noise Removal Based on Artificial Neural Network Classification with Weighted Median Filter

FPGA IMPLEMENTATION OF RSEPD TECHNIQUE BASED IMPULSE NOISE REMOVAL

A Histogram based Algorithm for Denoising Images Corrupted with Impulse Noise

CSC 320 H1S CSC320 Exam Study Guide (Last updated: April 2, 2015) Winter 2015

AUTOMATED MALARIA PARASITE DETECTION BASED ON IMAGE PROCESSING PROJECT REFERENCE NO.: 38S1511

Thresholding Technique for Document Images using a Digital Camera

Image Processing for feature extraction

Table of contents. Vision industrielle 2002/2003. Local and semi-local smoothing. Linear noise filtering: example. Convolution: introduction

All Creative Suite Design documents are saved in the same way. Click the Save or Save As (if saving for the first time) command on the File menu to

Image Enhancement using Histogram Equalization and Spatial Filtering

Filtering in the spatial domain (Spatial Filtering)

Image Denoising Using Statistical and Non Statistical Method

An Efficient DTBDM in VLSI for the Removal of Salt-and-Pepper Noise in Images Using Median filter

Tutorial Version 5.1.xx March 2016 John Champlain and Jeff Woodcock

A comparative study of different feature sets for recognition of handwritten Arabic numerals using a Multi Layer Perceptron

Method for Real Time Text Extraction of Digital Manga Comic

World Journal of Engineering Research and Technology WJERT

Transcription:

4 DESIGNING AND DEVELOPMENT OF OFFLINE HANDWRITTEN ISOLATED ENGLISH CHARACTER RECOGNITION MODEL Introduction Designing of Offline Handwritten Isolated English Character Recognition Model Pseudo Code used to build Offline Isolated Handwritten Character Recognition Model Development of the Offline Isolated English Handwritten Character Recognition

CHAPTER 4 DESIGNING AND DEVELOPMENT OF OFFLINE HANDWRITTEN ISOLATED ENGLISH CHARACTER RECOGNITION MODEL 4.1 Introduction As described in Section 2.7, recognition rate of characters G, N, O, Y and digit 4 is less than 35%. To eliminate this deficiency, researcher has decided to develop the model. In this chapter, character recognition model is designed and developed for offline handwritten isolated English capital characters G, N, O, Y and digit 4. To implement this model, handwritten character and digit samples are collected from various writers and perform digitization, character segmentation, preprocessing, feature extraction and classification steps on it. The proposed handwritten character recognition model converts image into text file. 4.2 Designing of Offline Handwritten Isolated English Character Recognition Model To design offline handwritten isolated English character recognition model, following steps are performed. 4.2.1 Collect handwritten data samples. 4.2.2 Digitization of the data - Scan each data samples and create an image file. 4.2.3 Character Segmentation- Separate each Character. 4.2.4 Preprocessing Perform binarization, thresholding and noise removal. 4.2.5 Feature Extraction- Extract Structural and Statistical features. 4.2.6 Classification Identify each character using decision tree algorithm and display output. Figure 4.1 shows Offline Handwritten Isolated English Character Recognition Model diagram. 129

G N O Y 4 Handwrit ten Data Collection Digitizati on of the Data OCR Process Output Writer writes individual characters on an A4 size blank paper Create an image of document using Scanner Character Segmentati on, Feature Extraction, Classificati Display Character and digit Figure 4.1 Offline Handwritten Isolated English Character Recognition Model 4.2.1 Handwritten Data Collection There are many handwritten English characters and digits databases are available. MNIST database for handwritten digits, IAM database for handwritten English text, CEDAR database for handwritten words and ZIP codes etc. For the proposed model handwritten isolated English Capital Characters and digits are collected in following manner: 1. Handwritten data samples are collected from seven persons of different age from 18 years to 56 years. 2. Handwritten data samples are collected for English Capital characters G, N, O and Y. 3. Handwritten data samples are also collected for digit 4. 130

4. Data is collected on a blank A4 size paper. Each character and digits is written 10 times by each person. 5. Writer may use pencil or ball point pen of any ink color. For proposed work researcher has collected handwritten data samples of her and from other six persons. Persons name are CKKSir, Mayur, Manoj, Manjulaben, Mitul and Jhanvi. Each character and digit is written 10 times by each person. So there are 70 samples of each and character and digit. 4.2.2 Digitization of the Data The proposed model requires a scanned image as an input. This image is acquired through a scanner, digital camera or any other suitable digital device. For the proposed work, handwritten data sample are scanned using HP DeskJet 1510 scanner at 300 dpi and saved in.jpg format which are shown in Figure 4.2. Researcher has created total seven directories to store handwritten datasheet of seven writers separately. Directories are named as per writer s name. i.e. Purna. As described in Section 2.4.1, handwritten data samples are collected for all English capital characters and digits but for the proposed model images of characters G, N, O, Y and digit 4 are needed so it is cropped and stored in the following format. Writer s name_character i.e. Purna_G Each file naming convention is as shown in Table 4.1. 131

(a) (b) (c) (d) (e) Figure 4.2 Digitization of Handwritten Datasheet (a) Character G (b) Character N (c) Character O (d) Character Y (e) Digit 4 132

Table 4.1 Naming Convention of Handwritten Data Samples Naming Convention Meaning Purna_G File contains handwritten data samples of Character G written 10 times by writer Purna. Purna_N File contains handwritten data samples of Character N written 10 times by writer Purna. Purna_O File contains handwritten data samples of Character O written 10 times by writer Purna. Purna_Y File contains handwritten data samples of Character Y written 10 times by writer Purna. Purna_4 File contains handwritten data samples of Digit 4 written 10 times by writer Purna. 4.2.3 Character Segmentation In character segmentation, each character is separated and saved as an individual character image. The proposed model requires image with single character. Hence if there is more than one character in an image, character segmentation is performed. To separate the character, researcher has used OpenCV and developed a module named segmentcharacter, Pseudo Code for character segmentation is presented in Table 4.2 which uses cvfindcontours(), cvdrawcontours(), cvsetimageroi() and cvsaveimage() functions. First of all each contours are identified in an image. Contour means an outline representing or bounding the shape of character. Opencv cvfindcontours() function is used to find contours in an image. cvdrawcontours() function is used to draw contours outline. cvsetimageroi() function is used to set an image Region of Interest (ROI) and cvsaveimage() is used to save the separated individual character. 133

Table 4.2 Pseudo Code for Character Segmentation Procedure Name : segmentcharacter(filename) Purpose : Segment individual character from multiple characters in an image file Input : File with multiple handwritten characters/digits Output : Segmented individual character image Variable Used : fname file name image_original black and white image image_color color image smooth noise free image threshold thresholding image open_morf image after morphological operation img_contornos copy of open_morf file contour contours of a binary image contourlow optimized contours rect bounding rectangle points of a contour xbox rectangle x position ybox rectangle y position wbox- rectangle width hbox- rectangle height img image file segmentedimage image file array i index variable Begin fname filename image_original Load black and white image image_color Load color image smooth Remove median noise and gaussian noise from an image threshold Thresholding image open_morf Apply morphological operations img_contornos copy open_morf image 134

end contour find contours from an image contourlow find optimized contours for i 1 to 10 by 1 image_color draw contours rect find bounding rectangle of contour xbox x position of rectangle ybox y position of rectangle wbox width of rectangle hbox height of rectangle img create image of wbox and hbox size image_color extract region of interest img copy image_color Reset region of interest segmentedimage copy img in individual image array Save img end for return segmentedimage Above code return segmented images as an output. Segmented images are saved in following format. Writer s name_character_<digit> i.e. Purna_G_1, Purna_G_2, Purna_G_3 etc. Figure 4.3 shows (a) image with multiple characters and (b) segmented images. 135

(a) Purna_G_ Purna_G_ Purna_G_ Purna_G_ Purna_G_ Purna_G_ Purna_G_ Purna_G_ Purna_G_ Purna_G_ (b) Figure 4.3 (a) Image with Multiple Handwritten Characters (b) Result of Character Segmentation 4.2.4 Pre-Processing Pre-processing is necessary to modify the raw data to correct deficiencies in the data acquisition process due to limitations of the capturing device sensor. Data pre-processing describes any type of processing performed on raw data to prepare it for another processing procedure. Pre-processing is the preliminary step which transforms the data into a format that will be more easily and effectively processed. Pre-processing is an essential stage since it controls the suitability of the results for the successive stages. 136

The main objective of the pre-processing stage is to normalize and remove variations that would otherwise complicate the classification and reduce the recognition rate [1]. Factors affecting accuracy of character recognition Following factors affect the accuracy of character recognition. Older or discolored documents. Low contrast documents. Scanner quality Type of printed document Paper quality Fonts used in the document Scan Resolution - The recommended best scanning resolution for OCR accuracy is 300 dpi. Higher resolutions do not necessarily result in better accuracy and can slow down OCR processing time. Resolutions below 300 dpi may affect the quality and accuracy of OCR results. A series of operations have to be performed during the pre-processing stages. The main objective of the pre-processing is to organize the information so that the subsequent character recognition task becomes simpler. Pre-processing includes following stages. Figure 4.4 Preprocessing Steps 137

4.2.4.1 RGB to Grayscale Image Conversion RGB means Red, Green and Blue. In RGB to Grayscale image conversion, RGB image is converted to grayscale image. A gray-scale image is composed of different shades of grey color. Researcher has used java to covert RGB to grayscale image. TYPE_BYTE_GRAY is used to create a new grayscale image and paint the original colored image on it. Pseudo code for RGB to grayscale image conversion is presented in Table 4.3. Figure 4.3 RGB to Grayscale Image Conversion Procedure Name : togray(bimage) Purpose : To convert color image to gray scale image Input : Bufferedimage Output : Grayscale image Variable Used : image_width Width of an image file image_height Height of an image file image - Grayscale image Begin image_width To get image width image_height To get image height image Create a new grayscale image image paint the original colored image on it return image end Above pseudo code return grayscale image as an output. (a) RGB image and (b) Grayscale image are shown in Figure 4.5. 138

(a) (b) Figure 4.5 (a) RGB Image (b) Gray Scale Image Grayscale image occupies less memory space as compare to RGB image as each pixel is representing eight bits information. 4.2.4.2 Thresholding Thresholding converts gray scale image to binary image (black and white image). The goal of Thresholding is to remove only the background, by setting it to white, and leave the foreground image unchanged. Thresholding selects a proper threshold value for the image and then convert all the pixels above the threshold to white and below the threshold to black. In any image analysis or enhancement problem, it is very essential to identify the objects of interest from the rest. Thresholding is the process of separating the objects of an image from its background. Otsu s Thresholding Method In the proposed model Otsu s Thresholding method is used. Otsu s Thresholding method is an efficient and frequently used method. Otsu's method is an image processing technique that can be used to convert a gray scale image into a binary image by calculating a threshold to split pixels into two classes. More generally, Otsu's method can be used to split a histogram into two classes which minimizes the intra-class variance of the data contained within the class. 139

Otsu s method uses the image histogram data as input and finds a pixel value (threshold level) that is able to separate the image into foreground and background (or even to multiple levels). It converts gray scale image into a binary image on the basis of pixel whether it is below or above the specified threshold value. Researcher has developed a module for thresholding using Java. Pseudo codes for thresholding, otsu s threshold value and histogram are presented in Table 4.4, 4.5 & 4.6. Table 4.4 Pseudo Code for Thresholding Procedure Name : threshold(greyimg) Purpose : To convert grayscale image to binary image Input : Grayscale image with single character/digit image Output : Binary image Variable Used : red newpixel image_width grey image width image_height grey image height threshold Threshold value binarized Bufferedimage i index variable j index variable alpha Call to Sub Procedure : otsuthreshold(greyimg) 140

Begin end Threshold CallostuThreshold(greyImg) for i 0 to image_width by 1 for j 0 to image_height by 1 red Get red pixel value from RGB value of an image alpha Get alpha pixel value from RGB value of an image if red is greater than threshold newpixel 255 else newpixel 0 newpixel Set RGB color alpha, newpixel, binarized set newpixel value at i, j location end for end for return binarized Table 4.5 Pseudo Code to Find Threshold Value Procedure Name : otsuthreshold(greyimg) Purpose : Find threshold value Input : Grey scale image Output : Threshold value Variable Used : histogram total sum i index variable sumb 141

Call to Sub Procedure : Begin wb wf varmax threshold mb mf varbetween imagehistogram(greyimg) histogram Call imagehistogram(greyimg) total greyimg height greyimg width sum 0 for i 0 to 255 by 1 sum sum + i histogram[i] end for sumb 0 wb 0 wf 0 varmax 0 threshold 0 for i 0 to 255 by 1 wb wb + histogram[i] if wb is equal to 0 continue wf total-wb if wf is equal to 0 break sumb sumb + i histogram [i] mb sumb/wb mf (sum sumb) /wf varbetween wb wf (mb-mf) (mbmf) if varbetween is greater than varmax 142

end varmax varbetween threshold i end if end for return threshold Table 4.6 Pseudo Code for Image Histogram Procedure Name : imagehistogram(greyimg) Purpose : To find Histogram Input : Grayscale image with single character/digit image Output : Histogram value Variable Used : histogram histogram value i- index variable j- index variable red- red pixel value of an image image_width greyimg width image_height greyimg height Begin histogram Create an array to store histogram value histogram [0] 0 for i 0 to image_width by 1 for j 0 to image_height by 1 red Get red pixel value of greyimg increment histogram[red] by 1 end for end for return histogram end 143

Pseudo code of thresholding return black and white image. (a) Grayscale image and (b) Thresholding image are shown in Figure 4.6. (a) (b) Figure 4.6 (a) Gray Scale Image (b) Thresholding Image 4.2.4.3 Noise Removal Scanned documents often contain noise that arises due to printer, scanner, print quality, age of the document, etc. Therefore, it is necessary to filter this noise before character recognition. Image noise is random variation of brightness or color information in images, and is usually an aspect of electronic noise. It can be produced by the sensor and circuitry of a scanner or digital camera. Image noise is an undesirable by-product of image capture that adds spurious and extraneous information [1]. Noise is visible as a grain or film grain in an image. Optical scanning devices introduce some noises like disconnected line segments, bumps and gaps in lines, filled loops etc. It is necessary to remove all these noise elements prior to the character recognition. Noise removal is the process of removing or reducing unwanted noise. Depending on the type of disturbance, the noise can affect the image to different extent. There are following types of noises. I. Gaussian Noise (Amplifier Noise): Gaussian noise is caused by random fluctuations in the signal. It is modeled by random values added to an image. In Gaussian noise, each pixel in the image will be changed from its 144

original value by a small amount. Each pixel in the noisy image is the sum of the true pixel value and a random, Gaussian distributed noise value. II. Salt and Pepper Noise: Salt and pepper noise is also called fat-tail distributed or impulsive noise or spike noise. An image containing salt and pepper noise will have dark pixels in bright regions and bright pixels in dark regions. It presents itself as sparsely occurring white and black pixels. This noise arises in the image because of sharp and sudden changes of image signal. An effective noise reduction method for this type of noise is a median filter or a morphological filter. Researcher has used OpenCV smooth() to remove median filter and gaussian filter noise from an image as described in Table 4.2. Median filter is a common technique for removing noise. It preserves edges while removing noise. 4.2.5 Feature Extraction The heart of any character recognition system is the formation of feature vector to be used in the recognition stage. Feature extraction can be considered as finding a set of parameters (features) that define the shape of the underlying character as precisely and uniquely as possible [1]. The features are to be selected in such a way that they help in discriminating between characters. There is no universally accepted set of feature vectors. In Chapter 3, structural and statistical features of characters and digits are identified. Among them following features are used for the proposed model. 1. Zoning: Divide the image into 9 equal zones (3 3) and count number of foreground pixel value of each zone. 2. Zone with Zero Foreground Pixel Value: Find out the zone whose foreground pixel value is 0. 3. Number of Endpoints: Count number of endpoints. 4. Endpoint Existence in Zone: Check if endpoint exists in zone. 145

1. Zoning Zoning is the most popular method used for character recognition. In the zoning feature extraction, a character is usually divided into zones of predefined size. In the proposed model character image is divided into 3 rows and 3 columns total 9 zones (3 3) and number of foreground pixel value of each zone is counted. In Figure 4.7, Character O is divided into 9 zones. Zoning Figure 4.7 Zoning of Character O For zoning, researcher has used java and developed a module named zoning. Pseudo code for zoning is presented in Table 4.7. Table 4.7 Pseudo Code for Zoning Procedure Name : getimagezones(filename) Purpose : Divide an image into 9 zones and get each zone information Input : Binary image with single character/digit image Output : Image with 9 zones and its information Variable Used : image bufferedimage zonewidth zone width zoneheight zone height count index variable imgs bufferedimage array 146

begin rows number of rows cols number of columns x index variable y index variable i index variable j index variable zoneblackpixelvalue black pixel values in zone totalzones number of zones totalblackpixels total number of black pixels in each zone gr color RGB value zonedata image get bufferedimage of filename cols 3 rows 3 zonewidth image width divided by cols zoneheight image height divided by rows count 0 imgs initialize buffered image array of size 9 for i 0 to rows-1 by 1 for j 0 to cols -1 by 1 img crop image of zonewidth and zoneheigth size from image count add one to count end for end for totalzones 0 while count is greater than totalzones for i 0 to height of imgs by 1 147

end for j 0 to width of imgs by 1 color get RGB value at j, i location if color is black increment totalblackpixels by 1 end for end for Add totalblackpixels value to zoneblackpixelvalue increment totalzones by 1 zonedata set zoneblackpixelvalue to each zone return zonedata 2. Zone with Zero Foreground Pixel Value To identify this feature, black and white (threshold) image is used. In black and white image, background is white and foreground is black as shown in Figure 4.6(b). Perform zoning on black and white image as shown in Figure 4.7 and check each pixel value in of zone whether it is black or white. If it is black then it is a foreground pixel else background pixel. Count number of foreground pixel value and find out the zone whose foreground pixel value is 0 i.e. in Figure 4.7, foreground pixel value of zone 5 is 0. To find the zone with zero foreground pixel, zonedata of Table 4.7 is used. Pseudo code for zone with zero foreground pixel value is presented in Table 4.8. 148

Table 4.8 Pseudo Code for Zone with Zero Foreground Pixel Value Procedure Name : zonewithzeropixel(zonedata) Purpose : Find out the zone with zero foreground pixels Input : zonedata Output : zone with zero foreground pixel Variable Used : foregroundpixels Number of foreground pixel value of zone Zoneswithzerovalue Begin fori 1 to 9 by 1 foregroundpixels Getvalueofzonedata[i] if foregroundpixels is equal to zero zoneswithzerovalue[i] i end for return zoneswithzerovalue end 3. Number of Endpoints Endpoint defines starting point and ending point of a character. In the proposed model numbers of endpoints are calculated for character and digit. As shown in Figure 4.8 &4.9, Character O has zero endpoint and Character Y has 3 endpoints respectively. End Points Figure 4.8 Character O having 0 Endpoint 149

End Points Figure 4.9 Character Y having 3 Endpoints 4. Endpoint Existence in Zone After finding numbers of endpoints, check in which zone endpoint exists. As shown in Figure 4.9, Y has 3 endpoints. Endpoint exists in Zone 1, Zone 3 and Zone 7 as shown in figure 4.10. End Point Existence Figure 4.10 Endpoint Existence in Zone 1, Zone 3 and zone 8 To find number of endpoints and endpoint existence in zone, researcher has used OpenCV and Java to develop a module named endpointexistence. Pseudo code to find endpoint and endpoint existence in zone is presented in Table 4.9. cvconvexhull() and cvconvexitydefects() is used to count number of endpoints. Java drawline() is used to draw a point at endpoint. Pseudo code to count number of endpoints and to find end point existence in zone is presented in Table 4.9. 150

Table 4.9 Pseudo Code for Endpoint and Endpoint Existence in zone Procedure Name : findendpointexistence(filename) Purpose : Count number of endpoints and find endpoint existence in zone Input : File with single character/digit Output : Number of endpoints and zone number Variable Used : image Image name contour store contour hullseq storage img_contours contourlow optimized contours pntr imagen_color imagen contourstorage approxstorage hullstorage defectsstorage fname defects - endpoints bimage bufferedimage endpoints i - index variable endpt defect totalredpixels totalzones count zones array of zone in which endpoint exist Begin fname imagename imagen Load black and white image 151

end image_color Load color image // Count Endpoints contour find contours from an image contourlow find optimized contours imagen_color draw contours hullseq finds the convex hull of a point set defects find convexity defects of a contour bimage get buffered image of imagen_color endpoints find defects total for i 0 to endpoints by 1 pntr Get element of endpoint at position i defect Get convexity defect of pntr endpt Get endpoint Draw endpt with red color end for // Find Endpoint Existence totalzones 0 while count is greater than totalzones for i 0 to height of imgs by 1 for j 0 to width of imgs by 1 color get RGB value at j, i location if color is red increment totalredpixels by 1 end for end for for i 1 to 9 by 1 if totalredpixels[i] is greater than zero zones[i] i end for return zones 152

4.2.6 Classification The classification stage is the main decision making stage of a character recognition system. Classification stage uses the features extracted in the previous stage feature extraction. There are many classification algorithms discussed in chapter 2. Among them, decision tree classifier algorithm is used in this proposed model. The concept of decision tree is decomposition of complex problem into smaller, more manageable whereby it represents the relationship among attribute and decision in a diagram that mimic to tree [2] [3]. Decision trees are powerful and popular tools for classification and prediction. Decision trees represent rules, which can be understood by humans and used in knowledge system such as database. In the decision tree, the root and internal nodes contain attribute test conditions to separate records that have different characteristics. Decision table and decision tree are designed for 4 characters G, N, O and Y and 1 digit 4 which are shown in Table 4.10 and Figure 4.11. Table 4.10 Decision Table for Characters G, N, O, Y and Digit 4 Character/Digit No. of End Points G 3 End Point Existence Zone 2, Zone 6, Zone 9 Zone with Zero Pixel Value - N 2 Zone 3, Zone 7 - O - - Zone 5 Y 3 Zone 1, Zone 3, Zone 7 Zone 2, Zone 9 4 2 Zone 6, Zone 8 - Proposed tree is designed on the basis of structural and statistical features extracted in previous stage. 153

G, N, O, Y, 4 Number of End points 2 0 3 N, 4 O G, Y Endpoint Existence 3, 7 6, 8 Endpoint Existence 2, 6, 9 1, 3, 7 N 4 G Y Figure 4.11 Decision Tree for Characters G, N, O, Y and Digit 4 Pseudo code for classification is presented in Table 4.11. Table 4.11 Pseudo Code for Classification Procedure Name : recognize(zonedata, endpoints,endpointexistence) Purpose : Recognize character Input : Zone data, number of endpoint and zone in which endpoint exits Output : character or digit Begin if endpoint is equal to 3 and endpointexist in zone 1 and zone 3 and (zone7 or zone8) return Y else if foreground pixel value of zone 5 is 154

end equal to 0 and endpoint is equal to 0 return O else if endpointexist in (zone 2 and zone 8) or (zone 2 and zone 9) return 4 else if endpointexist in zone 3 and zone 7 return N else if endpointexist in zone 3 or zone 6 or zone 9 and endpoint is equal to 3 returng else return _ end if 4.3 Pseudo Code used to build Offline Isolated Handwritten Character Recognition model Table 4.12 List of Pseudo Codes for Offline Handwritten Character Recognition Model No. Procedure Purpose 1 Main Read an image file 2 Process Perform Character Segmentation, Pre-processing, Feature Extraction and Classification 3 Segment character Segment character and save it as an individual character 4 RGB to Grayscale Convert RGB image to grayscale image Image Conversion 5 Thresholding Convert grayscale image to black and white image 6 Otsu Threshold Find threshold value 7 Image Histogram Find image histogram 8 Zoning Divide an image into 9 zones 155

9 Zone with Zero Find the zone whose foreground pixel value is Pixel zero 10 End Point Count number of endpoints 11 End Point Check if endpoint exists in zone Existence 12 Classification Recognize character 4.3.1 Flow of the Proposed Model Main Process Segment Character RGB to Grayscale Image Conversion Thresholding Zoning Otsu Threshold Zone with Zero Pixel Image Histogram End Point End Point Existence Classification Figure 4.12 Flow of the Proposed Model 156

4.4 Development of Offline Handwritten Isolated English Character Recognition Model Offline isolated handwritten character recognition model is graphical user interface for the proposed model which is designed for the characters G, N, O, Y and digit 4. Graphical interface provides a facility to select an image file having single or multiple isolated handwritten characters, process on it preprocessing, character segmentation, feature extraction and classification steps are performed and display an output in text format. The model is developed in OpenCV and Java. OpenCV and Java both are open source and free software. Graphical interface of the model gives facility to select an image file, process it for converting an image file to text file, display output in text format and save it. 4.4.1 Features of the Offline Handwritten Isolated English Character Recognition Model 1. Proposed model is used for English capital characters G, N, O and Y. 2. Proposed model is also used for digit 4. 3. Graphical interface of the model allows user to open an image file and process it to convert into text file. 4. Operation can be performed by simple clicking on icon. 5. Model is designed in such manner that is easy to use. It doesn t require extra skills to interact with this tool. 4.4.2 Component of the Offline Handwritten Isolated English Character Recognition Model As the user double clicks on the icon of the model, the program is executed and below screen will appear. 157

Title bar Tool bar Open Process Save Help Input Region Output Region Figure 4.13 Main Screen of Handwritten English Character Recognition Model Above window is divided into four parts. (1) Title bar (2) Tool bar (3) Input Region (4) Output Region. (1) Title Bar It displays title of the model named Handwritten English Character Recognition Model and also minimize, maximize and close buttons. Screenshot of title bar and buttons are shown in Figure 4.14 and 4.15 respectively. Figure 4.14 Screenshot of Title bar 158

Figure 4.15 Minimize, Maximize and Close Buttons (2) Tool Bar Tool bar consist Open, Process, Save and Help buttons. Screen shot of Tool bar is shown in Figure 4.16. Figure 4.16 Screenshot of Tool bar Figure 4.13 Toolbar Content Command Action Open Open an image file Process Process an image file and convert it into text file Save Save output file in.txt or.doc format. Help Help to use the Tool (3) Input Region User has to select an image file to process. Selected Image file is displayed in Input Region. Screen shot of input region is shown in Figure 4.17. Figure 4.17 Screen shot of Input Region 159

(4) Output Region Output of the Image file is displayed in Output region. Screen shot of output region is shown in Figure 4.18. Figure 4.18 Screen shot of Output Region Screenshot of the proposed Handwritten English Character Recognition Model with Output is shown in Figure 4.19. Figure 4.19 Screenshot of the Handwritten English Character Recognition Model with Output 160

References: [1] http://shodhganga.inflibnet.ac.in:8080/jspui/bitstream/10603/4166/10/1 0_chapter%202.pdf [2] Rajendra Lambodari and Prof. S. M. Kharad, Review of Classification Methods for Character Recognition in Neural Network, International Journal of Electronics Communications and Computer Engineering, Vol. 4, Issue 2, 2013. [3] Han, J. & Kamber M. Data mining concept and technique, San Francisco: Morgan Kaufmann Publishers, 2001. 161