Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 58

Similar documents
Multi-Script Line identification from Indian Documents

Optical Character Recognition for Hindi

Study and Analysis of various preprocessing approaches to enhance Offline Handwritten Gujarati Numerals for feature extraction

Preprocessing and Segregating Offline Gujarati Handwritten Datasheet for Character Recognition

Vehicle License Plate Recognition System Using LoG Operator for Edge Detection and Radon Transform for Slant Correction

A comparative study of different feature sets for recognition of handwritten Arabic numerals using a Multi Layer Perceptron

Implementation of License Plate Recognition System in ARM Cortex A8 Board

A Review of Optical Character Recognition System for Recognition of Printed Text

Keywords OCR, Scripts, Hierarchical Classification, Contour, Projections.

Proposed Method for Off-line Signature Recognition and Verification using Neural Network

R. K. Sharma School of Mathematics and Computer Applications Thapar University Patiala, Punjab, India

A Fast Segmentation Algorithm for Bi-Level Image Compression using JBIG2

Scrabble Board Automatic Detector for Third Party Applications

Automatic Licenses Plate Recognition System

MAV-ID card processing using camera images

Contents 1 Introduction Optical Character Recognition Systems Soft Computing Techniques for Optical Character Recognition Systems

Smart License Plate Recognition Using Optical Character Recognition Based on the Multicopter

Finger print Recognization. By M R Rahul Raj K Muralidhar A Papi Reddy

Vehicle Number Plate Recognition with Bilinear Interpolation and Plotting Horizontal and Vertical Edge Processing Histogram with Sound Signals

AN APPROACH TO EXTRACT LINE, WORD AND CHARACTER FROM SCENE TEXT IMAGE

A New Character Segmentation Approach for Off-Line Cursive Handwritten Words

Number Plate Recognition System using OCR for Automatic Toll Collection

INDIAN VEHICLE LICENSE PLATE EXTRACTION AND SEGMENTATION

Keyword: Morphological operation, template matching, license plate localization, character recognition.

RESEARCH PAPER FOR ARBITRARY ORIENTED TEAM TEXT DETECTION IN VIDEO IMAGES USING CONNECTED COMPONENT ANALYSIS

Extraction and Recognition of Text From Digital English Comic Image Using Median Filter

Machine-printed and hand-written text lines identi cation

IJRASET 2015: All Rights are Reserved

License Plate Recognition Using Convolutional Neural Network

Text Detection in Document Images: Highlight on using FAST algorithm

Text Extraction from Images

Fig 1 Complete Process of Image Binarization Through OCR 2016, IJARCSSE All Rights Reserved Page 213

Efficient Car License Plate Detection and Recognition by Using Vertical Edge Based Method

Automatic License Plate Recognition System using Histogram Graph Algorithm

GENERALIZATION: RANK ORDER FILTERS

License Plate Localisation based on Morphological Operations

Automated Number Plate Verification System based on Video Analytics

Dragnet Abstract Test 4 Solution Booklet

Locally baseline detection for online Arabic script based languages character recognition

A Comparative Analysis Of Back Propagation And Random Forest Algorithm For Character Recognition From Handwritten Document

Detection of Defects in Glass Using Edge Detection with Adaptive Histogram Equalization

CHARACTERS RECONGNIZATION OF AUTOMOBILE LICENSE PLATES ON THE DIGITAL IMAGE Rajasekhar Junjunuri* 1, Sandeep Kotta 1

Inductive Reasoning Practice Test. Solution Booklet. 1

Localization of License Plates from Surveillance Camera Images: A Color Feature Based ANN Approach

Method for Real Time Text Extraction of Digital Manga Comic

An Improved Bernsen Algorithm Approaches For License Plate Recognition

An Evaluation of Automatic License Plate Recognition Vikas Kotagyale, Prof.S.D.Joshi

FPGA based Real-time Automatic Number Plate Recognition System for Modern License Plates in Sri Lanka

Libyan Licenses Plate Recognition Using Template Matching Method

International Journal of Scientific & Engineering Research, Volume 5, Issue 5, May ISSN

Study of 3D Barcode with Steganography for Data Hiding

A NOVEL APPROACH FOR CHARACTER RECOGNITION OF VEHICLE NUMBER PLATES USING CLASSIFICATION

An Effective Method for Removing Scratches and Restoring Low -Quality QR Code Images

ISSN No: International Journal & Magazine of Engineering, Technology, Management and Research

Comparison of Various Error Diffusion Algorithms Used in Visual Cryptography with Raster Scan and Serpentine Scan

International Journal of Advance Engineering and Research Development

AN EXPANDED-HAAR WAVELET TRANSFORM AND MORPHOLOGICAL DEAL BASED APPROACH FOR VEHICLE LICENSE PLATE LOCALIZATION IN INDIAN CONDITIONS

Project Documentation

Analysis of Secure Text Embedding using Steganography

A Simple Skew Correction Method of Sudanese License Plate

1 P a g e

A Comprehensive Survey on Kannada Handwritten Character Recognition and Dataset Preparation

Real Time Word to Picture Translation for Chinese Restaurant Menus

Compression Method for Handwritten Document Images in Devnagri Script

Line Segmentation and Orientation Algorithm for Automatic Bengali License Plate Localization and Recognition

VEHICLE LICENSE PLATE DETECTION ALGORITHM BASED ON STATISTICAL CHARACTERISTICS IN HSI COLOR MODEL

A Novel Morphological Method for Detection and Recognition of Vehicle License Plates

Recursive Text Segmentation for Color Images for Indonesian Automated Document Reader

Image Segmentation of Historical Handwriting from Palm Leaf Manuscripts

Real time verification of Offline handwritten signatures using K-means clustering

Chapter 6. [6]Preprocessing

Chapter 17. Shape-Based Operations

Locating the Query Block in a Source Document Image

Real Time ALPR for Vehicle Identification Using Neural Network

Research on Pupil Segmentation and Localization in Micro Operation Hu BinLiang1, a, Chen GuoLiang2, b, Ma Hui2, c

IIH. United States Patent (19) Chen. (11) Patent Number: 5,318,090 (45. Date of Patent: Jun. 7, 1994

Er. Varun Kumar 1, Ms.Navdeep Kaur 2, Er.Vikas 3. IJRASET 2015: All Rights are Reserved

Colored Rubber Stamp Removal from Document Images

World Journal of Engineering Research and Technology WJERT

Automatic Electricity Meter Reading Based on Image Processing

Noise Removal and Binarization of Scanned Document Images Using Clustering of Features

A Method of Multi-License Plate Location in Road Bayonet Image

FACE RECOGNITION USING NEURAL NETWORKS

A Solution for Identification of Bird s Nests on Transmission Lines with UAV Patrol. Qinghua Wang

Physics 253 Fundamental Physics Mechanic, September 9, Lab #2 Plotting with Excel: The Air Slide

Restoration of Motion Blurred Document Images

PASS Sample Size Software. These options specify the characteristics of the lines, labels, and tick marks along the X and Y axes.

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

IMAGE TYPE WATER METER CHARACTER RECOGNITION BASED ON EMBEDDED DSP

Implementation of Barcode Localization Technique using Morphological Operations

Thresholding Technique for Document Images using a Digital Camera

Enhanced Binarization Technique And Recognising Characters From Historical Degraded Documents

Digitization Errors In Hungarian Documents

Image Processing: Capturing Student Attendance Data

Face Detection System on Ada boost Algorithm Using Haar Classifiers

NON UNIFORM BACKGROUND REMOVAL FOR PARTICLE ANALYSIS BASED ON MORPHOLOGICAL STRUCTURING ELEMENT:

Automatics Vehicle License Plate Recognition using MATLAB

Detection of Faults Using Digital Image Processing Technique

Techniques for Generating Sudoku Instances

BacklightFly Manual.

Transcription:

A Review On Various Techniques For Skew Detection And Correction In Handwritten Text Documents Ambica Rani 1, Er. Harinderpal Singh 2 Abstract 1 M. Tech Student, IT Department, AIET, Faridkot 2 Assistantt Professor, CSE Department, AIET, Faridkot Skew detection and correction methods are used to align the handwritten text document by making the rectangular shape such as paragraph, text lines and tables. A skew can be detected from scanned document Image as well as from handwritten text document Image. In this paper, we discuss different techniques of skew detection and correction. There are various methods which will be discussed in this paper to detect the skewed from Indian script. These methods include vertical projection profile, horizontal projection profile, Hough Transform. Some of them detect a skew and provide better result but are slow in speed. So a new technique for skew detection in this paper will reduce the time and cost. Keywords: skew Detection, skew Estimation, Skew correction, Profile Projection analysis. the scanned image is degraded. Sometimes there is a tilt in the document when we give it for zerox or when we scan it. The resulted image is not up to mark. The defects in the images are referred to as skews. The text in the skewed document is sometimes diminished. So, it becomes difficult to read the skewed document. Here we are testing skew detection and correction using some algorithms. Using these algorithms we find faults in the document and finding these faults is referred to as skew detection. After the skew detection, there is a need to correct the faults produced. Thesee corrections are referred to as Skew Corrections. Fig. shows the skewed document. Introduction: Now a day s skew detection has become vital for the recognition of scanned images because the originality of Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 58

. If the document includes skew, then it will be difficult for the reader to read the documents.. Document which include skew are of reduced and low quality. Skew can be classified into two categories:- Figure: 1 Skewed Image Skewed Document Image Skew can be broadly classified into two categories namely. Single Skew: In this skew, whole document is skewed to single angle. Most of document images have this type of skew-ness. This work deals with Single Skew problem. Lot of work has been done in this field and lot of research is still going on. Multiple Skew: In this, scanned document can have many sections; each may be skewed to different angle. Detecting such type of skew-ness needs lot of efforts. Multiple Skew problems existss rarely and has not got lot attention from researchers. Limitations of Skew in Documents. Clockwise Skew. Anticlockwise Skew Clockwise Skew:-Clockwise skew refers to the defects in the document in the direction same as the moving hands of the clock. This can also be termed as positive skew. Anticlockwise Skew:- Anticlockwise skew refers to the defects in the documents in the direction opposite to the moving hands of clock. This can also be termed as negative skew. Different method to find out the skew When the skew is tilted towards the downward direction, then the skew is said to be downward skew. To ensure the correction skew, we must make sure that the angle formed is 90% in the anti- clockwise direction. Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 59

Literature Survey Marian Wagdy, Ibrahima Faye, DayangRohaya Document Image Skew Detection and Correction Method Based on Extreme Points Figure: 2 Downward Skew When the skew is tilted towards the upward direction, then the skew is said to be upward skew. Similarly to the corrections made in upward skew, we must make sure that the angle formed is 90% in clock-wise direction. Figure: 3 Upward Skew In this paper author present a method for estimating the document image skew angle. The main idea of this method is based on the concept that any document image has objects with rectangular shape such as paragraphs, text lines, tables and figures. These objects can be bounded by rectangles. Author use the extreme point s properties to obtain the corners of the rectangle which fits the largest connected component of the document image. The angle of this rectangle represents the angle of document skew. The experimental results show the high performance of the algorithm in detecting the angle of skew for a variety of documents with different levels of complexity. The Proposed method has been implemented using MATLABR2009a.It is tested on different variety of documents like journal, books etc. Each document image skewed by Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 60

different ground truth skew angles ranges between[-89, 89] degree. Bishakha jain,mrinaljit Borah, A comparison Skew detection of scanned Document Images based on Horizontal and vertical Projection Profile analysis A Lot of techniques already exists and has been developing for detection of skew of scanned document Images. In this paper author describes the skew detection and correction of scanned document images written in Assamese language using horizontal and vertical projection. The algorithm was implemented on input images of Assamese language. The horizontal profile technique could be used for skew correction with images with some noise. The algorithm only estimate skew if the angle is less than ±15. Naazia Makkar and Sukhjit Singh, A Brief tour to various Skew Detection and Correction Techniques During the scanning of the document, skew is being inevitably introduced in the document image. The scanned text image is a non editable image though it has the text but one cannot edit it or make any change, if required. This paper includes the various skew detection and correction techniques. The methods provide a very efficient way to calculate the Skew. Correction in the skewed scanned document image is very important, because it has a direct effect on the reliability and efficiency of the segmentation and feature extraction stages. Ruby Singh, Ramandeep Detection In Image Processing Many researchers proposed methodologies for the text in binary images/gray scale images. They have been used widely identification of the printed text. There exist so many ways algorithms for detecting and correcting a slant or skew in a given document or image. Some better accuracy but are slow in speed, others have angle limitation drawback. So a new technique for skew detection in the paper, will reduce the time and cost. Kaur,Skew different skew estimation for the skew of them provide Lipi Shah, Ripal Patel, Shreyal Patel, Jay Maniar, Skew Detection and Correction for Gujarati Printed and Handwritten Character using Linear Regression Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 61

In this paper, author have proposed approach for skew detection and correction of handwritten and printed Gujarati document using Linear Regression method/technique. Skew detection and correction is important for any recognition system as it directly affects the recognition process of characters/documents. The proposed method work involves linear regression formula for detecting angle of rotation and correcting it for printed and handwritten document/characters. With this approach for skew detection and correction author get up to 59.63% of accuracy for printed and 45.58% of accuracy for handwritten document/characters. This proposed method is simple and fast for detecting angle of rotation as well as it corrects the skewed image fast. Existing Techniques Techniques can be broadly classified into three categories namely Document enhancement and binarization In this technique, author use retinex theory to solve the degradation problem. It is used to convert the lightness image into bi-level image.(black for text and white for background) Dividing the document components In this technique divide the document into connected components, Morphological operation will be used. We use square as structure element with width 4 to give an accurate result extreme points for each component. Skew angle estimation In this technique author estimate the angle of the largest connected component of the document image. Author use the properties of extreme points to obtain the rectangle which can fit the largest component with the same skew angle. Each connected component has eight extreme bottom, and left- top) points (top- left, top- right, right- top, right- bottom, bottom- right, bottom -left, left- Vertical Projection profile Algorithm: 1. Read the image data into a matrix and convert it to grayscale. into connected Erosion in detecting the connected connected Analysis Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 62

2. This grayscale image is changed to black background and white writing on comparison each pixels with 0.34 3. Searches for the first column with a white pixel, i.e., with a written pixel. 4. The entire image column-wise is stored in a variable (Skew_input). 5. Each element of the input image matrix is added column-wise to get the number of white pixels per column and is stored in a variable Sum_col. 6. Sum of the squares of each Sum_col gives the value of energy function for the skew angle. 7. Input Image is rotated by angle rot_angle and steps 5 and 6 are repeated for this angle to obtain the value of energy function. 8. Input Image is rotated by angle (- )rot_angle and steps 5 and 6 are repeated for this angle to obtain the value of energy function. 9. rot_angle = rot_angle 1. 10. Repeat steps 7, 8 & 9 till rot angle!= 0. 11. Find the angle for which the value of Energy function is maximum. 12. This angle gives the skew angle. 13. To display as output the values of energy function for each angle is displayed along with the bar graph for the column values for the skew angle and the corrected image segment. Horizontal Projection profile Analysis Algorithm: 1. Read the image data into a matrix and convert it to grayscale. 2. This grayscale image is changed to black background and white writing on comparison each pixels with 0.34 3. Searches for the first column with a white pixel, i.e., with a written pixel. 4. One-Fourth of the image row-wise is stored in a variable (Skew_input). 5. Each element of the input image matrix is added row-wise to get the number of white pixels per column and is stored in a variable Sum_row. 6. Sum of the squares of each Sum_row gives the value of energy skew angle. function for the 7. Input Image is rotated by angle rot_angle and steps 5 and 6 are repeated for this angle to obtain the value of energy function. Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 63

8. Input Image is rotated by angle (- )rot_angle and steps 5 and 6 are repeated for this angle to obtain the value of energy function. 9. rot_angle = rot_angle 1 10. Repeat steps 7, 8 & 9 till rot angle!= 0 11. Find the angle for which the value of Energy function is maximum. 12. This angle gives the skew angle. 13. To display as output the values of energy function for each angle is displayed along with the bar graph for the row values for the skew angle and the corrected image segment. Thinning and Hough transform The method has two stages. In the first stage, selected characters from the document image are blocked and thinning is performed over the blocked region [5]. In the second stage, the thinned coordinates are fed to Hough transform (HT) to estimate the skew angle accurately. The detailed algorithm is shown below: Algorithm Step-1: Find connected components in the document image and compute average bounding height (AH). Step-2: Select those connected components whose height is less than very small connected components so that the dots of the character i, j, punctuation marks like full stop, comma, hyphen etc. are deleted. Step-3: Block the selected component present in the document. Step-4: Perform thinning operation over the selected block region. Step-5: Remove the parallel straight lines using prespecified threshold. Step-6: The obtained points are then subjected to Hough transform to estimate skew angle accurately. Step-7: Stop. Topline AH and remove The topline algorithm [9] does not operate directly on the skewed image. First the skewed image is converted to a segment file or a thin segment file, and then the algorithm operates on one of these files to find skew angle. Algorithm Step 1: Input.bmp image to the OCR.exe and convert the image to seg.dat and thinseg.dat. Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 64

Step 2: Apply both seg and thinseg algorithm on the images. Step 3: Calculate the h1 and the h2 coordinates. Step 4: Calculate the width and the margin. Step 5: Input these values to the formula for detecting the Skew angle. Step 6: The values are entered into the formula: = a tan (diff / (w-margin)). Step 5: Then this angle is used to rotate the image to get the skew freed images. Step 6: The difference among both the angles can be seen easily. Step 7: Stop Few Existing Methods with their Accuracy are given in the following table:- Author Rohit Sharma, Utkarsh Mathur, Naveen Srivasta va Pu b. Ye ar 20 13 Method Angular Skew correcti on Algorith m Script Devana gari Comm ents Angula r skew was correct ed with an accurac y of 94.91% Bishakh a Jain, Mrinalji t Borah Lovelee n Kaur, Simple Jindal Ruby Singh, Raman deep Kaur Rajib Ghosh, Gouran ga Mandal 20 14 Compar ison Horizon tal and vertical Projecti on Profile Analysi s 20 OCR 11 20 13 20 12 Used Differen t techniq ues Words Recogni tion and Docume nt Analysi s System. Assame se Langua ge Gurum ukhi Script Gurum ukhi Docum ents Bangla Langua ge Skew could be estimat ed if the angle is less than +- 15. Achiev ed Accura cy 94% The Skew angle of the docum ent can be determi ned in range of - 180% to 180% Accura cy achieve d by taking 3839 words was 92.22% Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 65

Conclusion & Future Scope In this paper, we have presented a review on skew detection and correction of Handwritten text documents Indian languages which are written in Gurumukhi, Devanagri, Bangla,Asssemese etc. Algorithms of various techniques has presented in this paper. It is concluded that a lot of work is required to be done to detect and correct the skew in handwritten text document in Devanagri Script. A more robust algorithm is required to be developed to detect the skew angle in the text documents and to correct the detected angle. In future we are planning to develop an algorithm that can detect and correct the upward and downword skew in handwritten text documents written in various Indian scripts in an efficient manner. References [1] Marian Wagdy, Ibrahima Faye, DayangRohaya Document Image Skew Detection and Correction Method Based on Extreme Points [2] Lipi Shah, Ripal Patel, Shreyal Patel, Jay Maniar, Skew Detection and Correction for Gujarati Printed and Handwritten Character using Linear Regression [3] B. Aditya Vighnesh, Abhishek Kumar, B. Manikanta Yadav, Skew Detection in Handwritten Documents [4] Tian Jipeng, G.Hemantha Kumar, H.K. Chethan, Skew Correction for Chinese Character using Hough Transform [5] Naazia Makkar and Sukhjit Singh, A Brief tour to various Skew Detection and Correction Techniques [6] Ruby Singh, Ramandeep Kaur,Skew Detection In Image Processing [7] Rajib Ghosh, Gouranga Mandal, Skew Detection and Correction of Online Bangla Handwritten Word [8] Bishakha jain, Mrinaljit Borah, A Comparison paper on Skew Detection of scanned Document Image Based on Horizontal and vertical Projection Profile Analysis [9] Loveleen Kaur, Simple Jindal, Skew Detection technique for various script Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 66

[10] Rohit Sharma, Utkarsh Mathur, Naveen Srivastava, Angular Skew Correction Algorithm for Handwritten Hindi Text [11] Sepideh Barekat Rezaei, Abdholhossien Sarrafzadeh, and Jamshid Shanbehzadeh, Skew Detection of Scanned Document Images [12] Utkarsh Mathur, Rohit Sharma, Script Independent Angular Skew Detection and Correction Algorithms [13] Deepak Kumar, Dalwinder Singh, Modified Approach of Hough Transform for Skew Detection and Correction in Documented Images Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 67