Recognition of very low-resolution characters from motion images captured by a portable digital camera

Similar documents
Intelligent Traffic Sign Detector: Adaptive Learning Based on Online Gathering of Training Samples

Precise Recognition of Blurred Chinese Characters by Considering Change in Distribution

Colour correction for panoramic imaging

Background. Computer Vision & Digital Image Processing. Improved Bartlane transmitted image. Example Bartlane transmitted image

Multiplex Image Projection using Multi-Band Projectors

Automatic Ground Truth Generation of Camera Captured Documents Using Document Image Retrieval

Near Infrared Face Image Quality Assessment System of Video Sequences

Facial Caricaturing Robot COOPER in EXPO 2005

Estimation of Folding Operations Using Silhouette Model

Hand Waving Gesture Detection using a Far-infrared Sensor Array with Thermo-spatial Region of Interest

ECC419 IMAGE PROCESSING

FEATURES Industry windows paperless solutions High speed portable document scanner is well-suited for a wide variety of Window industry

Colored Rubber Stamp Removal from Document Images

Super resolution with Epitomes

HANDWRITING MODEL ADJUSTABLE TO WRITERS

An Analysis of Image Denoising and Restoration of Handwritten Degraded Document Images

Digital Imaging Systems for Historical Documents

CS6670: Computer Vision

Interactive System for Origami Creation

Antenna arrangements realizing a unitary matrix for 4 4 LOS-MIMO system

Exercise questions for Machine vision

FEATURE. Adaptive Temporal Aperture Control for Improving Motion Image Quality of OLED Display

Recognizing Words in Scenes with a Head-Mounted Eye-Tracker

Distance Estimation with a Two or Three Aperture SLR Digital Camera

Channel Capacity Enhancement by Pattern Controlled Handset Antenna

Speed Traffic-Sign Recognition Algorithm for Real-Time Driving Assistant System

VLSI Implementation of Impulse Noise Suppression in Images

A Human Factor Analysis for Software Reliability in Design-Review Process

Eye Contact Camera System for VIDEO Conference

Motion Blur Perception in Various Conditions of Presented Edge

Method for Real Time Text Extraction of Digital Manga Comic

Pose Invariant Face Recognition

THE PROBLEM of electromagnetic interference between

Optic Flow Based Skill Learning for A Humanoid to Trap, Approach to, and Pass a Ball

Digital images. Digital Image Processing Fundamentals. Digital images. Varieties of digital images. Dr. Edmund Lam. ELEC4245: Digital Image Processing

Digitizing Color. Place Value in a Decimal Number. Place Value in a Binary Number. Chapter 11: Light, Sound, Magic: Representing Multimedia Digitally

Experimental Study on Super-resolution Techniques for High-speed UWB Radar Imaging of Human Bodies

A shooting direction control camera based on computational imaging without mechanical motion

5/17/2009. Digitizing Color. Place Value in a Binary Number. Place Value in a Decimal Number. Place Value in a Binary Number

ROAD TO THE BEST ALPR IMAGES

PERFORMANCE EVALUATIONS OF MACRO LENSES FOR DIGITAL DOCUMENTATION OF SMALL OBJECTS

Improved sensitivity high-definition interline CCD using the KODAK TRUESENSE Color Filter Pattern

FOG REMOVAL ALGORITHM USING ANISOTROPIC DIFFUSION AND HISTOGRAM STRETCHING

Project: Sudoku solver

Thresholding Technique for Document Images using a Digital Camera

Real Time Video Analysis using Smart Phone Camera for Stroboscopic Image

Images and Graphics. 4. Images and Graphics - Copyright Denis Hamelin - Ryerson University

A Real Time Static & Dynamic Hand Gesture Recognition System

Development of an Education System for Surface Mount Work of a Printed Circuit Board

Multi-Resolution Estimation of Optical Flow on Vehicle Tracking under Unpredictable Environments

Research on 3-D measurement system based on handheld microscope

License Plate Localisation based on Morphological Operations

FOCAL LENGTH CHANGE COMPENSATION FOR MONOCULAR SLAM

Abstract. Most OCR systems decompose the process into several stages:

An Effective Method for Removing Scratches and Restoring Low -Quality QR Code Images

Blind Single-Image Super Resolution Reconstruction with Defocus Blur

Segmentation Extracting image-region with face

ISMCR2004. Abstract. 2. The mechanism of the master-slave arm of Telesar II. 1. Introduction. D21-Page 1

2 About Pressure Sensing Pressure sensing is a mechanism which detects input in the interface of which inputs are sense of touch. Although the example

A VLSI Convolutional Neural Network for Image Recognition Using Merged/Mixed Analog-Digital Architecture

Optical Flow Estimation. Using High Frame Rate Sequences

PIXPOLAR WHITE PAPER 29 th of September 2013

Contrast adaptive binarization of low quality document images

Generating Personality Character in a Face Robot through Interaction with Human

A Review of Optical Character Recognition System for Recognition of Printed Text

Correction of Clipped Pixels in Color Images

Development of an Automatic Camera Control System for Videoing a Normal Classroom to Realize a Distant Lecture

A comparative study of different feature sets for recognition of handwritten Arabic numerals using a Multi Layer Perceptron

The Influence of the Noise on Localizaton by Image Matching

Distributed Vision System: A Perceptual Information Infrastructure for Robot Navigation

A Study on Single Camera Based ANPR System for Improvement of Vehicle Number Plate Recognition on Multi-lane Roads

Fundamental frequency estimation of speech signals using MUSIC algorithm

Defocusing and Deblurring by Using with Fourier Transfer

Finger rotation detection using a Color Pattern Mask

Enhanced Method for Face Detection Based on Feature Color

Digital Photographic Imaging Using MOEMS

Communication Graphics Basic Vocabulary

Working with your Camera

Defense Technical Information Center Compilation Part Notice

Introduction to Digital Photography

Robert B.Hallock Draft revised April 11, 2006 finalpaper2.doc

COMP 776 Computer Vision Project Final Report Distinguishing cartoon image and paintings from photographs

Midterm Examination CS 534: Computational Photography

Introduction to Video Forgery Detection: Part I

F210 Vision Sensor Flow Menus and Macro Capability

A Novel Morphological Method for Detection and Recognition of Vehicle License Plates

An Improved Bernsen Algorithm Approaches For License Plate Recognition

ScienceDirect. Improvement of the Measurement Accuracy and Speed of Pupil Dilation as an Indicator of Comprehension

International Journal of Innovative Research in Engineering Science and Technology APRIL 2018 ISSN X

Automatic Control Motion control Advanced control techniques

Development of a Finger Mounted Type Haptic Device Using a Plane Approximated to Tangent Plane

3DUNDERWORLD-SLS v.3.0

TAKING GREAT PICTURES. A Modest Introduction

A New Connected-Component Labeling Algorithm

Toward an Augmented Reality System for Violin Learning Support

Superfast phase-shifting method for 3-D shape measurement

The ultimate camera. Computational Photography. Creating the ultimate camera. The ultimate camera. What does it do?

Enhanced Virtual Transparency in Handheld AR: Digital Magnifying Glass

C. Efficient Removal Of Impulse Noise In [7], a method used to remove the impulse noise (ERIN) is based on simple fuzzy impulse detection technique.

Facial Biometric For Performance. Best Practice Guide

Transcription:

Recognition of very low-resolution characters from motion images captured by a portable digital camera Shinsuke Yanadume 1, Yoshito Mekada 2, Ichiro Ide 1, Hiroshi Murase 1 1 Graduate School of Information Science, Nagoya University Furo-cho, Chikusa-ku, Nagoya, Aichi, 464-8603 Japan yanadume@murase.nuie.nagoya-u.ac.jp,{ide,murase}@is.nagoya-u.ac.jp 2 Life System Science and Technology, Chukyo University 101 Tokodachi, Kaizu, Toyota, Aichi, 470-0393 Japan y-mekada@life.chukyo-u.ac.jp Abstract. Many kinds of digital devices can easily take motion images such as digital video cameras or camera-equipped cellular phones. If an image is taken with such devices under everyday situations, the resolution is not always high; moreover, hand vibration can cause blurring, making accurate recognition of characters from such poor images difficult. This paper presents a new character recognition algorithm for very low-resolution video data. The proposed method uses multi-frame images to integrate information from each image based on a subspace method. Experimental results using a DV camera and a phone camera show that our method improves recognition accuracy. 1 Introduction Recently, opportunities for taking videos with such portable equipments as digital video cameras (DV camera) or camera-equipped cellular phones (phone camera) continue to increase. If a system could automatically recognize the characters from such video data, it could become a key piece of technology for the next generation of human-machine interfaces. For example, in the future, we will easily be able to scan and input URLs from magazines by phone cameras or send text by e-mail by recognizing characters from the images of captured notes. Many character recognition methods have already been proposed[1]. However, such methods generally assume that the image quality of characters is quite high. On the other hand, the quality of characters captured by portable digital cameras is often not sufficient to apply these methods; the image of the characters might be too small when a full document is captured in a single shot. Moreover, hand movement or poor lens quality might blur the image. It is difficult to recognize such low-quality characters from a single image. Eims et al.[9] proposed a method to recognize low-quality images from a image scanner, but this method is not sufficient in our case when the resolution of a single image is not enough. When we capture a character on video, we obtain a variety of character images

2 Shinsuke Yanadume, Yoshito Mekada, Ichiro Ide, Hiroshi Murase as a sequence of motion images. If we properly use such information, recognition of very low-resolution characters may become possible, even if we cannot recognize them from a single image. In this paper, we propose a method that recognizes characters from poor quality video images. Cheeseman et al.[2] generated an image with higher resolution from multi-frame low-resolution images. Thus restoration of low-resolution images is one solution[10]. On the other hand, we take information directly from multi-frame images at the recognition step and integrate that information with a subspace method[3 7]. We generate subspaces that approximate a set of a large number of training images and compute the degree of similarity of the subspace. Finally, we use the multi-frame images input for recognition. The proposed method consists of the following three parts: gathering training data, constructing subspaces, and recognizing characters from input images. In the training step, our method uses many variations of characters that are segmented from sequences of videos at various resolutions. The recognition step does not need to estimate camera movements to recognize a character, unlike a previously proposed method by Sawaguchi et al.[8]. We describe the characteristics of the characters in the video data in section 2, propose the algorithm in section 3, and show the experimental results in section 4. 2 Characters in video data 2.1 Portable digital cameras Figure 1(c) shows a typical example of a character captured by a portable digital camera that is obviously difficult to recognize from the single image shown. When we photograph a full document with a common portable camera, as shown in (Fig.1 (a)), each character is in low-resolution. Our aim is to recognize such poor quality characters as shown in Fig.1(c) by using the information from multiframe images. 2.2 Characteristics of videodata When we take a video using a portable handy camera, hand movement slightly shifts and rotates the camera, making it difficult to fix the camera position difficult. Therefore, a large variation generally exists in a sequence of video images, even for the same character. If we can properly integrate the information from these images, recognition of a very low-resolution character may become possible, even if we cannot recognize it from a single image. Figure 2 shows character A obtained from two frames captured by a digital video camera. Typical character recognition algorithms might not be able to recognize these characters from a single image. However, the subtle difference between these two images provides a clue to improving the recognition accuracy.

Title Suppressed Due to Excessive Length 3 Fig. 1. Taking document image with a phone camera. (a): Taking an image with a phone camera. (b): Captured document image. (c): Segmented image of character a. The pixel values change slightly The position of character is shifted by hand motion Fig. 2. Changes in pixel values due to hand motion. 3 Recognition of characters from motion images The proposed method consists of the following three parts: gathering training data, constructing subspaces, and recognizing characters from input images. We used character images captured by a portable camera as training data that helps achieve a high recognition rate and includes various cases of characters to be recognized. Eigenvectors were computed from the training data to be used to recognize input characters. Training data and input data were generated from character images captured by a portable camera. We printed characters on a sheet with a fixed print pitch and segmented each character by this pitch information. The size of the segmented characters was normalized for use as training data and input data. 3.1 Creating training data The target characters for recognition are: Printed characters. Upper and lower cases of the alphabet and the Arabic numerals. Characters whose images are bigger than 6 6 pixels. The training data consisted of printed characters captured by a portable camera. We used multi-frame images from a sequence of motion images for training data because they contain many variations of the same character. Since the size

4 Shinsuke Yanadume, Yoshito Mekada, Ichiro Ide, Hiroshi Murase Fig. 3. Excerpt from the training data A. (a) (b) (c) Fig. 4. Picturized eigenvector of A. (a): the first eigenvector. (b): the second eigenvector. (c): the third eigenvector. of characters was unknown beforehand, we prepared training data captured at various resolutions by changing the distance between the camera and the sheet. Figure 3 is an excerpt from the training data. 3.2 Construction of the subspace from the training data First, our method found the orthogonal bases of the training data for each category. Each i-th learning data image was converted to a unit vector whose average was 0 (normalization). The normalized vector is represented by x i = [x 1, x 2,, x N ] T, where N is the number of pixels. Next, matrix X is defined as X = [x 1, x 2,, x k ], where k is the amount of learning data for this category. Then, we calculated an autocorrelation matrix Q for the category using matrix X: Q = XX T. We constructed the subspace for each category using R eigenvectors that corresponded to the largest R eigenvalues. A set of eigenvectors was represented by, {e (c) 1, e(c) 2,, e(c) R }, Figure 4 shows an example of eigenvectors that were computed and picturized implying that the blurred characters at several resolutions are included.

Title Suppressed Due to Excessive Length 5 Fig. 5. Advantage of multi-frame input. This figure shows input multi-frame samples projected in subspace. Fig. 6. Set of target characters (Font: Century). 3.3 Recognition Each character is segmented from an input video and normalized. Each character was segmented from an input video and normalized. A set of vectors for the character was constructed from multi-frame images and represented by {a 1, a 2,, a M }, where M is the number of input frames. The similarity between category c and the input images is defined as L (c) (a) = 1 M M m=1 r=1 R (a m, e (c) r ) 2 where (x,y) denotes an inner product. Then the category of input images was determined to maximize the above equation. If one sample is input which was closer to an incorrect class than a correct class, integration of multi-frame samples should enable correct category output (see Fig. 5). 4 Experiments We verified the capability of this method experimentally by capturing a sequence of printed characters with either a portable digital camera or a phone camera. An alphanumeric Century font was used in the experiments on a total of characters in 62 categories, as shown in Fig. 6.

6 Shinsuke Yanadume, Yoshito Mekada, Ichiro Ide, Hiroshi Murase 4.1 Recognition rate vs. number of input frames To verify the performance of our method when applied to very low-resolution characters, we evaluated recognition rates by changing the number of frames and the size of characters. The data used for this experiment are as follows: Training data Captured with a DV camera. Character size controlled by changing the distance between the camera and the sheet on which the characters were printed. Distance: less than 70cm. Average character size : 16 16, 11 11, 8 8, 7 7, or 6 6 pixels. Multi-frames of each character for a total of 50 frames per character. Dictionary data Ten eigenvectors that corresponded to the ten largest eigenvalues. Test data Captured with the same DV camera Different from the training data. Two test sample sets for character size controlled by changing the distance between the camera and a sheet. small: 70cm (approximately 6 6 pixels). medium: 60cm (approximately 7 7 pixels). Number of input samples: 30 sets for each character for a total of 1,860 sets. The results are shown in Fig. 7. Recognition rates increased as the number of input frames increased until reaching a saturation point at around 15 frames. In the medium size, the recognition rate almost reached 100%, indicating that our method improves recognition accuracy by inputting multi-frame images. 4.2 Lighting conditions vs. recognition rate Since changes in light conditions are a serious problem for most computer vision systems, we checked the relationship between light conditions and recognition rate. Training data were captured in bright light conditions. The remaining conditions of the training data and the dictionary data are identical to Section 4.1. Test data Captured with a DV camera. Character size: small. Lighting conditions: bright, middle, or dark. Number of frames: 20. Number of input samples: 30 sets for each character for a total of 1,860. The results in Table 1 show that our method is generally independent of light conditions. We also found that normalization (in Section 3.2) was effective.

Title Suppressed Due to Excessive Length 7 Fig. 7. Recognition rate vs. number of input frames. Table 1. Recognition rates for change in light conditions. The character size is small. Light condition Recognition rate(%) bright 88.1 middle 82.8 dark 85.4 4.3 Using different types of cameras For a character recognition system to be practical for use, its algorithm must be applicable to any type of camera. Therefore, we tried using image sequences taken by a phone camera in addition to the previous experiment. The image quality of this camera was worse than the DV camera used in the training stage. The specifications of this phone camera and test samples are as follows: Actual number of pixels of the CCD: 0.31 mega pixels. Captured image size: 164 220 pixels. Frame rate: 7.5 fps. Character size: medium. Distance between the camera and the printed sheet: approximately 20cm. Number of frames: 20. Number of input samples: 30 sets for each character for a total of 1,860 sets. In the experimental results, the 92.0% recognition rate when using a phone camera is slightly lower than the 99.9% when using a DV camera because the image quality of the phone camera is inferior to the DV camera. The dictionary data for both cases were constructed from the data captured by a DV camera. 5 Conclusion In this paper, we proposed a new framework based on a subspace method for recognizing low-quality, especially low-resolution, characters. We used various

8 Shinsuke Yanadume, Yoshito Mekada, Ichiro Ide, Hiroshi Murase resolutions of image sequences to construct the subspace in the training step. Experimental results show that a recognition rate of 99.9% is obtained for lowresolution alphanumeric characters about 7 7 pixels in size. Our method performs well even when devices or light conditions are changed. We conclude that our method is useful in recognizing very low-resolution characters captured by a portable digital camera. Although slight shift and rotation of camera are absorbed in the training data set, the method can not cope with a large tilt or rotation. Future work includes adding such figures as Japanese characters and different fonts. When we recognize a character from document images, it is difficult to segment characters from low-resolution sentence images. Since we used printed, pre-segmented character images in this research, in the future we must apply this algorithm to words and sentences and explore the ramifications. Acknowledgments The authors thank their colleagues for useful suggestions and discussion. Parts of this research were supported by the Grant-In-Aid for Scientific Research (16300054) and the 21st century COE program from the Ministry of Education, Culture, Sports, Science and Technology. References 1. S. Mori, K. Yamamoto, and M. Yasuda, Research on machine recognition of handprinted characters, Trans. PAMI, vol.pami-6, no.4, pp.386-405, July 1984 2. P. Cheeseman, B. Kanefsky, R. Hanson, and J. Stutuz, Super-resolved surface reconstruction from multiple images, Technical Report FA-94-12, NASA Ames Research Center, Artificial Intelligence Branch, October 1994 3. E. Oja, Subspace methods of pattern recognition, Hertfordshire, UK: Research Studies, 1983. 4. H. Murase, H. Kimura, M. Yoshimura, and Y. Miyake, An improvement of the auto-correlation matrix in the pattern matching method and its application to handprinted HIRAGANA recognition, IECE Trans., vol.j64-d, no.3, pp.276-283, March 1981 5. H. Murase and S. K. Nayar, Visual learning and recognition of 3-D objects from appearance, International Journal of Computer Vision, vol.14, pp.5-24, 1995 6. S. Omachi, and H. Aso, A qualitative adaptation of subspace method for character recognition, IEICE Trans., vol.j82-d-ii, no.11, pp.1930-1939, November 1999 7. S. Uchida and H. Sakoe, Handwritten character recognition using elastic matching based on a class-dependent deformation model, Proc. ICDAR, vol.1 of 2, pp.163-167, August 2003 8. M. Sawaguchi, K. Yamamoto, and K. Kato, A proposal of character recognition method for low resolution images by using cellular phone, Technical Report of IEICE, PRMU2002-247, March 2003 9. A.J.Elms, S.Procter, and J. Illingworth, The advantage of using an HMM-based approach for faxed word recognition, IJDAR, vol.1, no.1, pp.18-36, 1998 10. Paul D. Thouin and Chein-I Chang, A method for restoration of low-resolution document images, IJDAR, vol.2, no.4, pp.200-210, 2000