An Hybrid MLP-SVM Handwritten Digit Recognizer

Similar documents
A comparative study of different feature sets for recognition of handwritten Arabic numerals using a Multi Layer Perceptron

Handwritten Character Recognition using Different Kernel based SVM Classifier and MLP Neural Network (A COMPARISON)

Enhanced MLP Input-Output Mapping for Degraded Pattern Recognition

The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification

MULTI-MODULAR ARCHITECTURE BASED ON CONVOLUTIONAL NEURAL NETWORKS FOR ONLINE HANDWRITTEN CHARACTER RECOGNITION

Figure 1. Artificial Neural Network structure. B. Spiking Neural Networks Spiking Neural networks (SNNs) fall into the third generation of neural netw

Distinguishing Mislabeled Data from Correctly Labeled Data in Classifier Design

A COMPARISON OF ARTIFICIAL NEURAL NETWORKS AND OTHER STATISTICAL METHODS FOR ROTATING MACHINE

NEURALNETWORK BASED CLASSIFICATION OF LASER-DOPPLER FLOWMETRY SIGNALS

MINE 432 Industrial Automation and Robotics

Segmentation of Fingerprint Images

Student: Nizar Cherkaoui. Advisor: Dr. Chia-Ling Tsai (Computer Science Dept.) Advisor: Dr. Eric Muller (Biology Dept.)

Sonia Sharma ECE Department, University Institute of Engineering and Technology, MDU, Rohtak, India. Fig.1.Neuron and its connection

Linear time and frequency domain Turbo equalization

Contents 1 Introduction Optical Character Recognition Systems Soft Computing Techniques for Optical Character Recognition Systems

Adaptive Multi-layer Neural Network Receiver Architectures for Pattern Classification of Respective Wavelet Images

Introduction to Machine Learning

Multi-User Blood Alcohol Content Estimation in a Realistic Simulator using Artificial Neural Networks and Support Vector Machines

Detection and Classification of Power Quality Event using Discrete Wavelet Transform and Support Vector Machine

A Neural Solution for Signal Detection In Non-Gaussian Noise

Pose Invariant Face Recognition

Kernels and Support Vector Machines

Static Signature Verification and Recognition using Neural Network Approach-A Survey

Comparing The Performance Of MLP With One Hidden Layer And MLP With Two Hidden Layers On Mammography Mass Dataset

Approximation a One-Dimensional Functions by Using Multilayer Perceptron and Radial Basis Function Networks

Statistical Tests: More Complicated Discriminants

SIGNAL PROCESSING OF POWER QUALITY DISTURBANCES

EXPLOTING THE IMPULSE RESPONSE OF GROUNDING SYSTEMS FOR AUTOMATIC CLASSIFICATION OF GROUNDING TOPOLOGIES

Raster Based Region Growing

A fast and accurate distance relaying scheme using an efficient radial basis function neural network

On the Simulation of Oscillator Phase Noise

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

Optimal Power Allocation over Fading Channels with Stringent Delay Constraints

Number Plate Detection with a Multi-Convolutional Neural Network Approach with Optical Character Recognition for Mobile Devices

Stochastic Resonance and Suboptimal Radar Target Classification

Biometrics Final Project Report

CHAPTER 4 LINK ADAPTATION USING NEURAL NETWORK

MLP for Adaptive Postprocessing Block-Coded Images

Online Large Margin Semi-supervised Algorithm for Automatic Classification of Digital Modulations

A Radial Basis Function Network for Adaptive Channel Equalization in Coherent Optical OFDM Systems

Laser Printer Source Forensics for Arbitrary Chinese Characters

SNR Estimation in Nakagami Fading with Diversity for Turbo Decoding

Classification of Voltage Sag Using Multi-resolution Analysis and Support Vector Machine

Use of Neural Networks in Testing Analog to Digital Converters

A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION. Scott Deeann Chen and Pierre Moulin

Review on Identification of Faults of Transmission and Distribution Lines

Introduction to Machine Learning

Open Access An Improved Character Recognition Algorithm for License Plate Based on BP Neural Network

LOOK WHO S TALKING: SPEAKER DETECTION USING VIDEO AND AUDIO CORRELATION. Ross Cutler and Larry Davis

Partial Discharge Classification Using Novel Parameters and a Combined PCA and MLP Technique

Neural Filters: MLP VIS-A-VIS RBF Network

RELEASING APERTURE FILTER CONSTRAINTS

Learning a Gaussian Process Prior for Automatically Generating Music Playlists

10mW CMOS Retina and Classifier for Handheld, 1000Images/s Optical Character Recognition System

Recognition Offline Handwritten Hindi Digits Using Multilayer Perceptron Neural Networks

AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast

A Novel Fuzzy Neural Network Based Distance Relaying Scheme

Current Harmonic Estimation in Power Transmission Lines Using Multi-layer Perceptron Learning Strategies

PHASE PRESERVING DENOISING AND BINARIZATION OF ANCIENT DOCUMENT IMAGE

License Plate Localisation based on Morphological Operations

IDENTIFICATION OF POWER QUALITY PROBLEMS IN IEEE BUS SYSTEM BY USING NEURAL NETWORKS

PID Controller Design Based on Radial Basis Function Neural Networks for the Steam Generator Level Control

DERIVATION OF TRAPS IN AUDITORY DOMAIN

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016

Cubature Kalman Filtering: Theory & Applications

Transactions on Information and Communications Technologies vol 1, 1993 WIT Press, ISSN

IBM SPSS Neural Networks

A moment-preserving approach for depth from defocus

CHAPTER 1 INTRODUCTION

Neural Network Classifier and Filtering for EEG Detection in Brain-Computer Interface Device

Classification in Image processing: A Survey

MUSICAL GENRE CLASSIFICATION OF AUDIO DATA USING SOURCE SEPARATION TECHNIQUES. P.S. Lampropoulou, A.S. Lampropoulos and G.A.

Application of Deep Learning in Software Security Detection

Application of Artificial Neural Networks System for Synthesis of Phased Cylindrical Arc Antenna Arrays

Analysis of Learning Paradigms and Prediction Accuracy using Artificial Neural Network Models

NEURO-ACTIVE NOISE CONTROL USING A DECOUPLED LINEAIUNONLINEAR SYSTEM APPROACH

Mikko Myllymäki and Tuomas Virtanen

Uplink and Downlink Beamforming for Fading Channels. Mats Bengtsson and Björn Ottersten

Performance of Soft Iterative Channel Estimation in Turbo Equalization

Recovery of badly degraded Document images using Binarization Technique

Norsk Regnesentral (NR) Norwegian Computing Center

CLASSIFICATION OF CLOSED AND OPEN-SHELL (TURKISH) PISTACHIO NUTS USING DOUBLE TREE UN-DECIMATED WAVELET TRANSFORM

Multiresolution Color Image Segmentation Applied to Background Extraction in Outdoor Images

A Fast Segmentation Algorithm for Bi-Level Image Compression using JBIG2


Sanjivani Bhande 1, Dr. Mrs.RanjanaRaut 2

Preprocessing and Segregating Offline Gujarati Handwritten Datasheet for Character Recognition

Multiple-Layer Networks. and. Backpropagation Algorithms

Target Classification in Forward Scattering Radar in Noisy Environment

International Journal of Advanced Research in Computer Science and Software Engineering

Experiments with An Improved Iris Segmentation Algorithm

Real Time Video Analysis using Smart Phone Camera for Stroboscopic Image

IMPERIAL COLLEGE of SCIENCE, TECHNOLOGY and MEDICINE, DEPARTMENT of ELECTRICAL and ELECTRONIC ENGINEERING.

Image binarization techniques for degraded document images: A review

Advanced Techniques for Mobile Robotics Location-Based Activity Recognition

A new edited k-nearest neighbor rule in the pattern classi"cation problem

INDIAN VEHICLE LICENSE PLATE EXTRACTION AND SEGMENTATION

A multi-class method for detecting audio events in news broadcasts

Frequency Hopping Spread Spectrum Recognition Based on Discrete Fourier Transform and Skewness and Kurtosis

COMPARATIVE STUDY ON ARTIFICIAL NEURAL NETWORK ALGORITHMS

Transcription:

An Hybrid MLP-SVM Handwritten Digit Recognizer A. Bellili ½ ¾ M. Gilloux ¾ P. Gallinari ½ ½ LIP6, Université Pierre et Marie Curie ¾ La Poste 4, Place Jussieu 10, rue de l Ile Mabon, BP 86334 75252 Paris Cedex 05, France 44263 Nantes Cedex 02, France Abdel.Bellili, Patrick.Gallinari @lip6.fr Michel.Gilloux@laposte.fr Abstract This paper presents an original hybrid MLP-SVM method for unconstrained handwritten digits recognition. Specialized Support Vector Machines (SVMs) are introduced to improve significantly the MLP performances in local areas around the separation surfaces between each pair of digit classes, in the input pattern space. This hybrid architecture is based on the idea that the correct digit class almost systematically belongs to the two maximum MLP outputs and that some pairs of digit classes constitute the majority of MLP substitutions (errors). Specialized local SVMs are introduced to detect the correct class among these two classification hypotheses. The hybrid MLP-SVM recognizer achieves a recognition rate of ¼½±, for real mail zipcode digits recognition task, a performance better than several classifiers reported in recent researches. 1. Introduction The recognition of handwritten characters has been an active research domain in recent years. The result of these researches is the accumulation of many algorithms for classification using the rough representation -in pixels- of the character or a feature vector representation. Most of these algorithms achieve good performances in terms of correct recognition rate. But, in crucial real applications as automatic bankcheck reading systems or zipcode recognition systems, errors are very expensive to correct. There are two situations which reduce the classification confidence and need a good rejection mechanism: 1) patterns might be ambiguous or 2) patterns might be unrelated to the training data used to train the classifier. In this paper, we demonstrate the advantage to use Support Vector Machines (SVMs) [9] to improve the performances of an OCR (Optical Character Recognition) system based on MLP neural network [1]. In Section 2, we justify the use of MLPs and explain their limitations in real OCR applications. In Section 3, we present a brief description of the support vector machine method. Section 4 details the idea of using SVMs in an hybrid combination architecture to improve the global performace of MLPs. In Section 5, we discuss the experiments which demonstrate the relevance of our hybrid architecture on a real zipcode character recognition task. 2. MLP Networks for OCR systems The most commonly used familly of neural networks for handwritten characters recognition task is the feed-forward network, which includes multilayer perceptron (MLP) [1] and Radial-Basis Function (RBF) [4] networks. MLP networks are widely used in handwritten character recognition systems because they are very easy to train and very fast to use in classification decision process. This popularity is related to the use of the gradient back-propagation algorithm in the training process. MLPs generally achieve good performances in terms of correct recognition rate in handwritten character classification. Unfortunately, there are limits when using MLPs in classification tasks: the first is that there is no theoretic relashionship between the MLP structure (ex: hidden layers number and neurons number per layer) and the classification task; the second limitation is due to the fact that MLP derives hyperplans separation surfaces, in feature representation space, which are not optimal in terms of margin (for the margin notion, see [2]) area between the examples of two-different classes. To classify an unlabelled pattern localized in this margin area, MLPs often make erroneus classification decision with a high level confidence. This type of classification errors are very hard to avoid by using a rejection mechanism. In recent years, to achieve an optimal recognition rate, many researches resulted in the design of classification systems using different methods for combining multiple classifers [6]. The idea is to compensate the weakness of one

classifier, in a given local area of the feature space, by the strength of the other classifiers once they are correctly optimized. The combination method can use Local Accuracy Estimates [11], Local Learning Algorithm [3], Adaptive Mixtures of Local Experts [5] or aggregate the decisions obtained from individual classifiers to derive the best final decisions from a statistical point of view [7]. But the disadvantage of most of these methods is the complexity of optimization for each classifier and the definition of local area in terms of K-nearest neighbours which requires to store in the system memory all the training examples. These constraints are prohibitive in real character recognition systems where some training character sets can contain one million characters. The idea of our original method is founded on the observation that, when using MLP as a handwritten digit recognition system, the correct class is almost always one of the two maximum outputs of the MLP. The following Table 1 summarize the presence rate of the correct class among the k- th maximum MLP outputs (ÅÄÈ training set Ë Ø ½ =44081 real digit characters; ÅÄÈ test set Ë Ø ¾ =44075 real digit characters). k-th maximum MLP outputs presence rate (%) 1 97.45 2 99.00 3 99.42 4 99.66 5 99.74 Table 1. Presence rate of the correct class in the k-th maximum MLP outputs. The second observation is that some pairs of classes constitute the majority of the errors (confusions) made by the MLP (ex. ( 7, 1 ) or ( 9, 3 ) ). In order to improve the performances of the MLP, our approach consist in introducing support vector machines ( SVMs) [8, 9] to detect the correct class among the two first maximums provided by the output layer of the MLP, for certain pairs of classes. In the ideal case, i.e. if the introduced SVMs can always decide the correct class among the two first maximum ÅÄÈ outputs, the combination of the MLP and a limited number of SVMs will achieve a correct recognition rate equal to (99.00 %). 3. Support Vector Machines One of the most important recent researches in classifier design is the introduction of the support vector machines classifier by V. Vapnik [8, 9]. The idea consists to map the space Ë Ü of the input examples into a high-dimensional (possibly infinite-dimensional) feature space. By choosing an adequate mapping, the input examples become linearly or almost linearly separable in the high-dimensional space [9]. SVM is primarily a two-class classifier for which the optimization criterion is the width of the margin between the classes, i.e, the area around the decision surface defined by the distance to the nearest training examples in the feature space. These examples, called support vectors, define the classification function of the support vector machine. The optimization of a support vector machine consist to minimize the number of the support vectors by maximizing the margin between the two classes. The decision function derived by the SVM classifier for a two-class problem can be formulated, using a kernel function Ã Ü Ü µ of a new example Ü (to classify) and a training example Ü, as follows: ܵ ¾ËÎ «Ý Ã Ü Üµ «¼ (1) where ËÎ is the support vector set (a subset of the training set) and Ý ½ the label of example Ü. The parameters «¼ are optimized during the training process. There are many kernel functions Ã: the most simple one is a dot product between the input pattern to classify Ü and a member of the support vectors set Ã Ü Ü µ Ü Ü µ, which derives a linear classifer. The nonlinear SVM classifiers, as Gaussian radial basis functions SVM or polynomial SVM classifier can be derived by RBF Ã Ü Ü µ ÜÔ Ü Ü ¾ ¾ µ or Ô-th order polynomial Ã Ü Ü µ Ü Ü ½µ Ô µ functions. The support vector machine classifiers are more and more used as single classifier or combined with different types of classifiers in character recognition systems [10] 4. An hybrid MLP-SVM combined architecture The idea of our hybrid MLP-SVM combination method is motivated by the fact that, in handwritten digit character recognition task, MLP can achieve very good performances in terms of correct recognition rate if we consider the two maximum outputs (i.e. the correct class is almost systematically the first maximum or second maximum MLP outputs). This observation motivates the search for a suitable method which can detect the correct classification among these two maximum MLP outputs with the maximum confidence level in the decision of classification. Once the MLP decision is made, the problem is to choose the right class among two classification hypotheses. This choice results in a two class or binary problem. 2

One of the most effective methods to resolve a binary classification problem, with the maximum confidence in decision, is to introduce support vector machines. This combination method results in specialization of SVMs in the local area around the separation surface between each pair of the ten (10) digit classes. Although, this method can seem very tedious because it needs a SVM for each pair of classes (45 SVMs for the ten digit classes). The second originality of our method is to introduce SVMs only for the pairs of classes which constitute the majority of MLP errors (substitutions or confusions). Figure 1 shows the substitutions made by the MLP for two pairs of digit characters (9,3) and (7,1): for these four examples, the correct class is the second maximum output of the MLP. Figure 1. Examples of (9,3) and (7,1) ÅÄÈ substitutions 4.1. Hybrid MLP-SVM architecture The design and validation of this hybrid architecture needs Ø Ö (3) different digit sets, denoted Ë Ø ½, Ë Ø ¾ and Ë Ø. Ä Ð Ü µ and Ð Ü µ denote respectively the label of the digit pattern Ü and the class assigned to the digit pattern Ü by the hybrid MLP-SVM recognizer. The first and second maximums of the MLP outputs are denoted ÅÄÈ Ñ Ü½ and ÅÄÈ Ñ Ü¾. µì Ö Ò Ò ÔÖÓ Train an optimized MLP with Ë Ø ½ and determine the pairs µ of digit classes causing the majority of Å ÄÈ substitutions (i.e. the pairs of classes for which the addition of µ and µ MLP substitutions rates mentioned in Table 2 reaches a fixed threshold rate). The set containing all this pairs is denoted Ë Å ËÙ È Ö. For each pair µ in Ë Å ËÙ È Ö, extract from Ë Ø ¾ all the patterns Ü Ä Ð Ü µ or Ä Ð Ü µ µ for which ÅÄÈ Ñ Ü½ Ü µ and ÅÄÈ Ñ Ü¾ Ü µ µ or ÅÄÈ Ñ Ü½ Ü µ and ÅÄÈ Ñ Ü¾ Ü µ µ. The resulting subset is denoted ËÙ Ë Ø µ. Obviously ËÙ Ë Ø µ ËÙ Ë Ø µ. For each pair µ in Ë Å ËÙ È Ö, train and optimize a support vector machine ËÎ Å µµ using the ËÙ Ë Ø µ examples. This is a twoclass classification problem (for example, will be associated to the binary-svm label ½ and will be associated to the other label ½). Knowing that ËÙ Ë Ø µ ËÙ Ë Ø µ, there is only one SVM classifier ËÎ Å µµ for the two pairs of classes µ and µ. µ ÓÒ ÔÖÓ For an unknown ÙÒÐ ÐÐ µ digit pattern Ü in the test-validation set Ë Ø : Compute the MLP outputs for the input pattern Ü. ÅÄÈ Ñ Ü½ Ü µ and ÅÄÈ Ñ Ü¾ Ü µ µ and the pair µ of classes belong to Ë Å ËÙ È Ö, then Ð Ü µ ËÎ Å µ Ü µ (if the distance returned by the SVM is positive (SVM label ½)) or otherwise (negative distance returned). Ð Ð Ü µ ÅÄÈ Ñ Ü½ Ü µ. 5. Experimentations All the digit images sets used for our experimentations contain only segmented characters from real mail (post) zipcodes. For the validation of our approach we used one hidden layer MLP trained with a digit characters set (Ë Ø ½ ) of 44081 digits and an input layer of 138 nodes (feature vector representation dimension). The inter-classes substitutions (errors) for the digit characters set (Ë Ø ¾ = 44081 digits) are summarized by Table 2 in terms of percentage(%) of each pair digit classes. The different SVMs derived for some pairs of classes (ex. (7,1), (9,3), (6,0),...etc.) which constitutes the majority of the MLP substitutions, as shown by Table 2, were implemented by the software SVM-LIGHT provided by T. Joachims ØØÔ Ñ Ø ÓÖ Ø Ò µ. Different kernel function SVMs (linear, polynomial and RBF ) were used and the best performances are obtained by the RBF function kernel SVMs. We tested the performances of our hybrid MLP-SVM handwritten digit recognizer on the digit images set Ë Ø = 44075 patterns. The following Table 3 summarizes the performances,without any rejection, in terms of global correct recognition rates of our hybrid MLP-SVM recognizer (Reco.1) and the theoretic recognition rate (Th. Reco.) of the same hybrid MLP-SVM recognizer if the introduced specialized SVMs was able to detect systematically the correct class among the two maximum MLP outputs classification hypotheses. 3

ËÙ Ø Ø µ ±µ 0 1 2 3 4 5 6 7 8 9 0 15.44 6.61 6.61 3.67 5.14 13.23 0.73 2.94 45.68 1 2.29 9.16 10.68 20.23 12.21 2.67 37.78 4.19 3.05 2 12.29 13.11 6.55 5.73 0.00 0.82 41.80 11.88 7.78 3 9.09 6.91 3.27 0.36 13.19 0.00 23.63 7.27 35.63 4 15.75 27.88 10.91 3.63 3.03 9.09 13.94 0.00 15.75 5 14.18 17.91 0.67 21.64 4.47 15.67 14.92 2.98 7.46 6 32.80 6.40 1.60 5.60 8.00 36.80 0.00 8.80 0.00 7 3.43 38.62 11.58 23.60 11.16 0.85 0.00 2.75 8.15 8 27.53 6.52 10.14 25.00 2.17 12.68 3.98 4.71 7.24 9 18.47 6.65 7.39 35.71 3.49 8.12 0.00 11.33 8.86 Table 2. Inter-classes substitutions made by ÅÄÈ on Ë Ø ¾ Classifier Reco.1(%) Th. Reco. Å ÄÈ 97.45 99.00 (ÅÄÈ + 5 local SVMs) 97.71 98.10 (Å ÄÈ + 10 local SVMs) 97.90 98.41 (ÅÄÈ + 15 local SVMs) 98.01 98.60 Table 3. Performances of the hybrid MLP-SVM recognizer These results prove that our hybrid MLP-SVM recognizer improves, significantly, the performances in terms of recognition rate and error rate for a real zipcode digits classification task. The performance can seem modest in comparison with the optimal limit ¼¼±µ because we have used only fifteen local SVMs among the fourtyfive possible local SVMs and the test set Ë Ø contain many digit patterns which are impossible to recognize (for example broken digit patterns), even for a human (impossible to distinguish some 7 patterns from some 1 patterns outside the context). 5.1. Conclusions In this paper, an hybrid MLP-SVM handwritten digit recognition method is introduced. The method takes advantage of the simplicity of use and good classification performances of the MLP networks and compensate their weakness, in terms of non-optimal separation surfaces between classes, by introducing specialized support vector machines. These SVMs improves the performances in local areas around the classes separation surfaces in the input space of patterns. The originality of our approach consists in introducing SVMs only for the pairs of classes which constitutes the majority of the MLP network substitutions (errors). To classify an unknown pattern, the system makes one MLP decision and one SVM decision in the worst case (i.e. if the first and second maximums of MLP outputs belong to the pairs of classes causing errors). With reasonable SVM optimizations, our hybrid MLP-SVM recognizer improves the recognition performances, for a real handwritten digits classification task, in comparison with a MLP recognizer. Our present researches consist to take advantage of SVM s margin definition to introduce a reject mechanism in order to reduce, more significantly, the error rate of the hybrid MLP- SVM recognizer. Another way to improve the performances of the hybrid MLP-SVM recognizer consists to develop a theory that allows us to create SVM kernels that enforce desirable invariants for digit patterns. References [1] C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, UK, 1995. [2] B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proc. Fifth Annual Workshop on Comput. Learn. Theory, Pittsburgh, USA, July 1992. [3] L. Bottou and V. Vapnik. Local learning algorithms. Neural Computation, 4(6):888 900, 1992. [4] S. Haykin. Neural Networks, A Comprehensive Foundation. Macmillan Publishing Company, London, UK, 1994. [5] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3(1):79 87, 1991. [6] J. Kittler. Combining classifiers: A theoretical framework. Pattern Analysis and Applic., 1(1):18 27, 1998. [7] C. Y. Suen and Y. S. Huang. A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Trans. on PAMI, 17(1):90 94, January 1995. [8] V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, USA, 1998. [9] V. N. Vapnik. An overview of statistical learning theory. IEEE Trans. on Neural Networks, 10(5):988 999, 1999. 4

[10] L. Vuurpijl and L. Schomaker. Two-stage character classification: A combined approach of clustering and support vector classifiers. In Proc. Seventh International Workshop on Frontiers in Handwriting Recognition, Amsterdam, Netherlands, September 2000. [11] K. Woods, W. P. K. Jr, and K. Bowyer. Combination of multiple classifiers using local accuracy estimates. IEEE Trans. on PAMI, 19(4):405 410, 1997. 5