OCR With Background Image Elimination-A Survey

Similar documents
Implementation of License Plate Recognition System in ARM Cortex A8 Board

A Review of Optical Character Recognition System for Recognition of Printed Text

Smart Vehicle Identification And Surveillance System Using OCR

A Novel Approach for Image Cropping and Automatic Contact Extraction from Images

Matlab Based Vehicle Number Plate Recognition

Automatic Licenses Plate Recognition System

Preprocessing and Segregating Offline Gujarati Handwritten Datasheet for Character Recognition

AUTOMATIC LICENSE PLATE RECOGNITION USING PYTHON

An Evaluation of Automatic License Plate Recognition Vikas Kotagyale, Prof.S.D.Joshi

AUTOMATIC NUMBER PLATE DETECTION USING IMAGE PROCESSING AND PAYMENT AT TOLL PLAZA

MAV-ID card processing using camera images

Study and Analysis of various preprocessing approaches to enhance Offline Handwritten Gujarati Numerals for feature extraction

Keyword: Morphological operation, template matching, license plate localization, character recognition.

International Journal of Scientific & Engineering Research, Volume 5, Issue 5, May ISSN

Number Plate Recognition Using Segmentation

Extraction and Recognition of Text From Digital English Comic Image Using Median Filter

Compression Method for Handwritten Document Images in Devnagri Script

World Journal of Engineering Research and Technology WJERT

Proposed Method for Off-line Signature Recognition and Verification using Neural Network

White Paper. Scanning the Perfect Page Every Time Take advantage of advanced image science using Perfect Page to optimize scanning

Mobile SuDoKu Harvesting App

A New Character Segmentation Approach for Off-Line Cursive Handwritten Words

Automatic Electricity Meter Reading Based on Image Processing

Automatic Ground Truth Generation of Camera Captured Documents Using Document Image Retrieval

International Conference on Computer, Communication, Control and Information Technology (C 3 IT 2009) Paper Code: DSIP-024

FPGA based Real-time Automatic Number Plate Recognition System for Modern License Plates in Sri Lanka

A comparative study of different feature sets for recognition of handwritten Arabic numerals using a Multi Layer Perceptron

A Training Based Approach for Vehicle Plate Recognition (VPR)

Automatics Vehicle License Plate Recognition using MATLAB

Localization of License Plates from Surveillance Camera Images: A Color Feature Based ANN Approach

Libyan Licenses Plate Recognition Using Template Matching Method

Automated Parking Management System using Image Processing Techniques

Number Plate Recognition System using OCR for Automatic Toll Collection

Line Segmentation and Orientation Algorithm for Automatic Bengali License Plate Localization and Recognition

Touchless Fingerprint Recognization System

Combination of Web and Android Application to Implement Automated Meter Reader Based on OCR

A NOVEL APPROACH FOR CHARACTER RECOGNITION OF VEHICLE NUMBER PLATES USING CLASSIFICATION

Contrast adaptive binarization of low quality document images

Volume 7, Issue 5, May 2017

OPEN CV BASED AUTONOMOUS RC-CAR

Vehicle License Plate Recognition System Using LoG Operator for Edge Detection and Radon Transform for Slant Correction

Vehicle Number Plate Recognition with Bilinear Interpolation and Plotting Horizontal and Vertical Edge Processing Histogram with Sound Signals

Chapter 6. [6]Preprocessing

AUTOMATIC LICENSE PLATE RECOGNITION USING IMAGE PROCESSING AND NEURAL NETWORK

CHARACTERS RECONGNIZATION OF AUTOMOBILE LICENSE PLATES ON THE DIGITAL IMAGE Rajasekhar Junjunuri* 1, Sandeep Kotta 1

Bangla Optical Digits Recognition using Edge Detection Method

Sri Shakthi Institute of Engg and Technology, Coimbatore, TN, India.

Optical Character Recognition for Hindi

International Journal of Advanced Research in Computer Science and Software Engineering

Automated License Plate Recognition for Toll Booth Application

Image Processing and Particle Analysis for Road Traffic Detection

Real Time Word to Picture Translation for Chinese Restaurant Menus

Smart License Plate Recognition Using Optical Character Recognition Based on the Multicopter

Controlling Humanoid Robot Using Head Movements

Vasantrao Naik Marathwada Krishi Vidyapeeth University Library, Parbhani No. U.L. /CIS/671/18 Date: 03 September 2018

Efficient Car License Plate Detection and Recognition by Using Vertical Edge Based Method

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

Multi-Script Line identification from Indian Documents

ENHANCHED PALM PRINT IMAGES FOR PERSONAL ACCURATE IDENTIFICATION

Iraqi Car License Plate Recognition Using OCR

An Optimal Text Recognition and Translation System for Smart phones Using Genetic Programming and Cloud Ashish Emmanuel S, Dr. S.

The Classification of Gun s Type Using Image Recognition Theory

Text Detection in Document Images: Highlight on using FAST algorithm

Image Processing Based Vehicle Detection And Tracking System

Identification of Fake Currency Based on HSV Feature Extraction of Currency Note

Automated Number Plate Verification System based on Video Analytics

R. K. Sharma School of Mathematics and Computer Applications Thapar University Patiala, Punjab, India

Number Plate Detection with a Multi-Convolutional Neural Network Approach with Optical Character Recognition for Mobile Devices

IJRASET 2015: All Rights are Reserved

An Effective Method for Removing Scratches and Restoring Low -Quality QR Code Images

Abstract. Most OCR systems decompose the process into several stages:

An Improved Bernsen Algorithm Approaches For License Plate Recognition

Computer Vision Lesson Plan

COMPARATIVE PERFORMANCE ANALYSIS OF HAND GESTURE RECOGNITION TECHNIQUES

Scanning Techniques to Create Accessible PDF Documents

License Plate Localisation based on Morphological Operations

THERMAL IMAGING ANALYSIS OF POTENTIALLY HARMFUL SUBJECT FOR NIGHT VISION SYSTEM

Text Extraction from Images

Recognition Of Vehicle Number Plate Using MATLAB

DESIGNING AND DEVELOPMENT OF OFFLINE HANDWRITTEN ISOLATED ENGLISH CHARACTER RECOGNITION MODEL

Making PHP See. Confoo Michael Maclean

A Method of Multi-License Plate Location in Road Bayonet Image

An Offline Technique for Localization of License Plates for Indian Commercial Vehicles

An Automatic System for Detecting the Vehicle Registration Plate from Video in Foggy and Rainy Environments using Restoration Technique

International Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17, ISSN

INDIAN VEHICLE LICENSE PLATE EXTRACTION AND SEGMENTATION

Machine Vision for the Life Sciences

A Fast Segmentation Algorithm for Bi-Level Image Compression using JBIG2

RECOGNITION OF EMERGENCY AND NON-EMERGENCY LIGHT USING MATROX AND VB6 MOHD NAZERI BIN MUHAMMAD

FILE ASSEMBLY GUIDE. ~ File Assembly Guidelines ~

AN EXPANDED-HAAR WAVELET TRANSFORM AND MORPHOLOGICAL DEAL BASED APPROACH FOR VEHICLE LICENSE PLATE LOCALIZATION IN INDIAN CONDITIONS

A Solution for Identification of Bird s Nests on Transmission Lines with UAV Patrol. Qinghua Wang

SMART READING SYSTEM FOR VISUALLY IMPAIRED PEOPLE

Handwritten Character Recognition using Different Kernel based SVM Classifier and MLP Neural Network (A COMPARISON)

Colored Rubber Stamp Removal from Document Images

CONTENTS. Chapter I Introduction Package Includes Appearance System Requirements... 1

Institute of Technology, Carlow CW228. Project Report. Project Title: Number Plate f Recognition. Name: Dongfan Kuang f. Login ID: C f

Image to Sound Conversion

A Comprehensive Survey on Kannada Handwritten Character Recognition and Dataset Preparation

Implementation of Text to Speech Conversion

Transcription:

OCR With Background Image Elimination-A Survey Damini J. Patel P. G. scholar CSE Department Gujarat Technological University, Ahmedabad, India Prof. Shital V. Patel Professor CSE Department Gujarat Technological University, Ahmedabad, India ABSTRACT Optical Character Recognition (OCR) system converts scanned input documents into an Editable Text Document. This report presents for OCR with background image elimination, the various stages (techniques) in OCR, Accuracy of it and use of different Software s to implement this technique. It is widely used as a form for food package, Invoices, FMCG, on metal parts, passport documents, printout of static data etc. An OCR system enables us to feed a book or a magazine article directly into an Electronic Computer File and edit it. The various stages of an OCR are: Upload a Scanned Image from the Computer, Segmentation Process in which we extract the text zone from the image, Recognition of the text and the last which is Post Processing Process in which the output of the previous stage goes through the Error Detection and Correction Phase. In this project we use the method is effective in removing the background of image and enhance the performance of OCR. The output image is clean after the background elimination. Keywords OCR, Background image, image pre-processing, OpenCV 1. INTRODUCTION OCR stands for Optical Character Recognition. A person is able to see images because of the communication between our eyes and brain. Our eyes act as an optical mechanism and the images seen by our eyes are an input for our brain and the ability to understand visualize these images varies from person to person. Similarly we have the technology known as OCR, where OCR stands for Optical Character Recognition, which through its automated mechanism allows easier recognition of character and it's processing. [1] Earlier scanners were the only working OCR application available in the market. The main disadvantage of scanners was that it was not portable and it takes a lot of time to capture an image. [2] But with today s devices having better processing speeds, larger internal memory and an excellent back camera, researchers have dared to think of running OCR applications on devices such as smart phones for having real time imaging results. Applications such as Cam Scanner and Google translate are the prime examples of Optical character Recognition application. It also showcases the fact that this OCR technology can be put to use in a wide array of streams and hence is a very important concept which requires more attention towards research. [3] 3826 www.ijariie.com 1158

HOW OUR OCR WORKS a. What is OCR OCR allows for automatically recognizing characters through an optical mechanism. It is capable of recognizing both handwritten and printed text. Its performance can be judged based on the quality of the documents and the camera being used to capture the raw image. OCR system is so designed that it processes images with contain more text with very less number of graphic element. [4] As mentioned before, most of the character recognition programs and algorithms will be working efficiently only on the images which are captured using a scanner or a digital camera and run on a computer software. But since the size and portability were the factors which were hampering further growth and usability of this technology, in order to overcome the above mentioned limitations, a character recognition system based on android devices is proposed. [5] OCR as a technology that enables us to convert various types of documents such as scanned papers, PDF files or images captured by a digital camera into editable and searchable data. A point worth noting is that the images captured by a digital camera differ from scanned documents or images as they often have distortions in their captured images. These distortions and noise makes it difficult to recognize the text accurately. Pre-processing is done on the image to improve the accuracy of text recognition. [6] b. Our OCR Works In OCR Many document images are embedded with background images. it causes difficulties for OCR applications. Some parts of the background image could be bound-boxed as characters, which leads to immediate wrong recognition and causes troubles in the following processing steps of the OCR pipeline. Therefore, I proposed a new method for background image elimination.it is very important to pre-process the documents by removing the background images before text detection. The existing work is considers only good quality of printed document is considered without any touching or broken characters. The existing method is effective in removing the background image and thus enhances the performance of OCR. This method is based on the difference of the values of the R, G, B colors in background image pixels. The experiments showed that the output image is clean after using the preprocessing. OpenCV is chosen for OCR because of widespread, approbation, extensibility, and fexibilit. OpenCV is an open source library for Image Processing. It is available on many operating system. It is most accurate open sourse library.[7] Figuer 1 Our OCR Work With OpenCV In this convert the input image into binary format using adaptive thesholding outlines of copmonents are stored on connected component analysis.nesting of outlines is done which gathers the outlines together to form a Blob.Text lines are analyzed for fixed pitch and proportional text. 3826 www.ijariie.com 1159

Then the lines are broken into words by analysis according to the character spacing. Fixed pitch ischopped in character cells and proportional text is broken into words by definite spaces and fuzzy spaces. In this recognises a word in two passes, that is,it tries to recognize the words in the first pass. If the match is found, then the found word is passed on to the Adaptive Classifier, which recognizes the text more accurately. During the second pass, the words which were not at all recognized or were not well recognised in the first pass are recognized again through a run over through the page. Finally resolves fuzzy spaces. To locate small and capital text, checks alternative hypothesis for x-height. [3] OCR technology has a broad range of applications in document processing. Many document images are embedded with background images, e.g., checks, deposit books, drive licenses, passports, certificates, etc. While the background image enhances the document's security or visual effects, it causes difficulties for OCR applications. Some parts of the background image could be bound-boxed as characters, which leads to immediate wrong recognition and causes troubles in the following processing steps of the OCR pipeline. Therefore, it is very important to preprocess the documents by removing the background images before text detection. 2. METHODOLOGY The Fig.2 given below is the overallfunctioning of Optical Character Recognition (OCR). The input image can be any document, live text, journals, magazines etc. The functioning of OCR contains the following steps: scanning, segmentation, pre-processing, feature extraction,recognition[5]. The input is first scanned using an Android mobile camera. This is done to digitize the document. Segmentation extracts any symbols in the text region. Noise is removed by pre-processing each symbol,and the characteristics of each symbol is extracted using feature extraction to finally recognise the text. Scanning Segmentation Input Image Pre-processing (Background Elimination) Recognition Feature Extraction Output Image Figure 2 Overall functioning of OCR 2.1 Scanning Android mobile camera is used to capture the image of document. This process is called scanning. This is nothing but the process of scanning which converts the document into digital image. The digital image is then converted into a grayscale image using Thresholding function. Thresholding is the process which converts multi level image into bi-level image i.e. black and white image.black is represented if the gray level is below the threshold level, and it is represented by white if the gray level is above the threshold level. This makes it easier to detect the text regions in an image. It also saves a lot of memory space and processing time.[10] 3826 www.ijariie.com 1160

2.2 Segmentation Regions of text is detected using the process of segmentation. It differentiates the text from other graphical elements in the document. Splits and joints can cause confusion between text and graphic elements in the document resulting in incorrect segmentation of the text.[5] This generally occurs due to poor scanning which increases the noise in the digital document. Joints in characters occurs when the document is scanned at low threshold and splits occurs when the document is scanned at high threshold. Figuer3. Example of charecter segmentation[9] Figuer4. Example of charecter Recognition [9] 2.3 Pre-Processing Figuer5. Result of applying OCR [9] During scanning stage, some noise is produced in the scanned image. This results in poor recognition of characters. This noise can be reduced by pre-processing.pre-processing is done using smoothing and normalization.smoothing is done on the image using filling and thinning techniques. Normalization is responsibleto handle uniform size, slant and skew correction. [5] 3826 www.ijariie.com 1161

2.4 Feature Extraction Feature extraction refers to the extraction of features of symbols from the image. In this step, only important attributes Flow Chart For Our OCR System Capture Image Save Image Adaptive Thresholding Identify Fore ground Pixels Remove the Background Image Segmentation Find Fixed Pitch Crop Word into Character Measure Gaps between Characters Apply Classifier Match with Font File Recognize Character Figuer.6 Flow Chart For Our OCR System are taken into account and any unnecessary attributes are ignored. This technique takes into account the abstract features present in the character. Spaces, lines, intersections etc are some of the abstract features. Feature extraction is done using OpenCV algorithm. OpenCV algorithm is used to implement feature extraction.[6] 2.5 Recognition OCR system uses OpenCV to identify characters from the image foreground pixels also called as blobs and recognizes the lines. These lines are then recognized into words or characters.in this phase the image is converted into character stream which represents letters. [7] 3826 www.ijariie.com 1162

3. APPLICATIONS OF OCR Data Entry and Text Entry Process Automation[14] Banking Read and transfer correct amount of money from printed cheques.[15] Food package FMCG Automatic number plate recognition[16] Legal digitize paper documents[8] 4. CONCLUSION The presented work is effective in removing background image to improve performance of OCR using adaptive threshold. This method is use pre-processing and contour tracking algorithms. IN the pre-processing we have use a Gaussian blur and Goble threshold and find contour method and remove the background of image.it improve charecter qulity.if charecter are cut then do join method. 5. REFERENCES [1] Sravan Ch, ShivankuMahna, NirbhayKashyap," Optical Character Recognition on Handheld Devices",International Journal of Computer Applications (0975 8887). [2]Ali Farhat*, Ali Al-Zawqari, Abdulhadi Al-Qahtani, Omar Hommos, Faycal Bensaali and Abbes Amira, OCR Based Feature Extraction and Template Matching Algorithms for Qatari Number Plate, 978-1-4673-8743-9/16/$31.00 2016 IEEE. [3] Pooja Sharma, Shanu Sharma, An analysis of Vision Based Techniques for Quality Assessment and Enhancement of Camera Captured Document Images, 978-1-4673-8203-8/16/$31.00 c 2016 IEEE. [4] Dave Desrochers, Zhihua &U: and Apiwat Saengdeejing, OCR Readability Study and Algorithms for Testing Partially Damaged Characters, Proceedings of 2001 lnternational Symposium on intelligent Multimedia, Wdeo and Speech Processing May 2-4 2001 Hang Kong. [5] Heuristic-Based OCR Post-Correction for Smart Phone Applications, the University of North Carolina at Chapel Hill department of computer science honors thesis Author: Wing-Soon Wilson Lian 2009. [6] R. Smith. An overview of the Tesseract OCR Engine. Proc 9th Int.Conf. on Document Analysis and Recognition, IEEE, Curitiba, Brazil,Sep 2007 [7] The Tesseract open source OCR engine, http://code.google.com/p/tesseract-ocr. International Journal of Computer Applications (0975 8887) Volume 115 No. 22, April 2015 13 [8]R. Smith. An overview of the Tesseract OCR Engine. Proc 9th Int.Conf. on Document Analysis and Recognition, IEEE, Curitiba, Brazil,Sep 2007 [9] α-soft: An English Language OCR, 2010 Second InternationalConference on Computer Engineering and Applications. Junaid Tariq,Umar Nauman Muhammad UmairNaru. [10] A survey of modern optical character recognition techniques (DRAFT), February 2004 [11] Mrs. B.Vani, Ms. M. Shyni Beaulah, High accuracy Optical Character Recognition algorithms using learning array of ANN, 2014 International Conference on Circuit, Power and Computing Technologies [ICCPCT]. 3826 www.ijariie.com 1163

[12]Norizam Sulaiman,Sri Nor Hafidah Mohammad Jalani, Mahfuzah Mustafa, Kamarul Hawari, Development of Automatic Vehicle Plate Detection System, 2013 IEEE 3rd International Conference on System Engineering and Technology, 19-20 Aug. 2013. [13] Rohollah Mazrae Khoshki, Subramaniam Ganesan, Improved Automatic License Plate Recognition (ALPR) system based on single pass Connected Component Labeling (CCL)and reign property function, 978-1-4799-8802- 0/15/$31.00 2015 IEEE. [14]Archana S.Sawant, Prof. D.G.Chougule, Script Independent Text Pre-processing and Segmentation for OCR, International Conference on Electrical, Electronics, Signals, Communication and Optimization (EESCO) 2015. [15]Ayatullah Faruk Mollah, Nabamita Majumder, Subhadip Basu, Mita Nasipuri4, Design of an Optical Character Recognition System for Camerabased Handheld Devices, IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 4, No 1, July 2011. [16]Norizam Sulaiman,Sri Nor Hafidah Mohammad Jalani, Mahfuzah Mustafa, Kamarul Hawari, Development of Automatic Vehicle Plate Detection System, 2013 IEEE 3rd International Conference on System Engineering and Technology, 19-20 Aug. 2013. 3826 www.ijariie.com 1164