Author(s) Corr, Philip J.; Silvestre, Guenole C.; Bleakley, Christopher J. The Irish Pattern Recognition & Classification Society

Similar documents
Research on Hand Gesture Recognition Using Convolutional Neural Network

Introduction to Machine Learning

Deep Learning. Dr. Johan Hagelbäck.

Coursework 2. MLP Lecture 7 Convolutional Networks 1

MULTI-MODULAR ARCHITECTURE BASED ON CONVOLUTIONAL NEURAL NETWORKS FOR ONLINE HANDWRITTEN CHARACTER RECOGNITION

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

SIMULATION-BASED MODEL CONTROL USING STATIC HAND GESTURES IN MATLAB

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland

Number Plate Detection with a Multi-Convolutional Neural Network Approach with Optical Character Recognition for Mobile Devices

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

Convolutional Neural Networks: Real Time Emotion Recognition

A Vision Based Hand Gesture Recognition System using Convolutional Neural Networks

Deep Neural Network Architectures for Modulation Classification

6. Convolutional Neural Networks

Vehicle Color Recognition using Convolutional Neural Network

GPU ACCELERATED DEEP LEARNING WITH CUDNN

Visual Recognition of Sketched Symbols

Hand Gesture Recognition by Means of Region- Based Convolutional Neural Networks

Comparison of Head Movement Recognition Algorithms in Immersive Virtual Reality Using Educative Mobile Application

arxiv: v1 [cs.ce] 9 Jan 2018

Deep Learning for Human Activity Recognition: A Resource Efficient Implementation on Low-Power Devices

Lecture 17 Convolutional Neural Networks

Convolutional Networks for Images, Speech, and. Time-Series. 101 Crawfords Corner Road Operationnelle, Universite de Montreal,

Generating an appropriate sound for a video using WaveNet.

Image Manipulation Detection using Convolutional Neural Network

GESTURE RECOGNITION FOR ROBOTIC CONTROL USING DEEP LEARNING

Biologically Inspired Computation

Convolutional Neural Network-based Steganalysis on Spatial Domain

Classification Accuracies of Malaria Infected Cells Using Deep Convolutional Neural Networks Based on Decompressed Images

Wadehra Kartik, Kathpalia Mukul, Bahl Vasudha, International Journal of Advance Research, Ideas and Innovations in Technology

Convolutional Networks for Images, Speech, and. Time-Series. 101 Crawfords Corner Road Operationnelle, Universite de Montreal,

Preprocessing and Segregating Offline Gujarati Handwritten Datasheet for Character Recognition

Impact of Automatic Feature Extraction in Deep Learning Architecture

arxiv: v2 [cs.cv] 11 Oct 2016

Analyzing features learned for Offline Signature Verification using Deep CNNs

A Deep Learning Approach To Universal Image Manipulation Detection Using A New Convolutional Layer

Understanding Neural Networks : Part II

A comparative study of different feature sets for recognition of handwritten Arabic numerals using a Multi Layer Perceptron

Continuous Gesture Recognition Fact Sheet

Radio Deep Learning Efforts Showcase Presentation

Energy-Efficient Hybrid Stochastic-Binary Neural Networks for Near-Sensor Computing

Xception: Deep Learning with Depthwise Separable Convolutions

Counterfeit Bill Detection Algorithm using Deep Learning

LANDMARK recognition is an important feature for

CP-JKU SUBMISSIONS FOR DCASE-2016: A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS

Compact Deep Convolutional Neural Networks for Image Classification

DETECTION AND RECOGNITION OF HAND GESTURES TO CONTROL THE SYSTEM APPLICATIONS BY NEURAL NETWORKS. P.Suganya, R.Sathya, K.

Synthetic View Generation for Absolute Pose Regression and Image Synthesis: Supplementary material

GESTURE RECOGNITION WITH 3D CNNS

Handwritten Nastaleeq Script Recognition with BLSTM-CTC and ANFIS method

arxiv: v1 [cs.lg] 2 Jan 2018

THE Touchless SDK released by Microsoft provides the

ROAD RECOGNITION USING FULLY CONVOLUTIONAL NEURAL NETWORKS

INFORMATION about image authenticity can be used in

Convolutional Neural Networks for Small-footprint Keyword Spotting

Research Seminar. Stefano CARRINO fr.ch

Compression Method for Handwritten Document Images in Devnagri Script

Colorful Image Colorizations Supplementary Material

Locally baseline detection for online Arabic script based languages character recognition

Study and Analysis of various preprocessing approaches to enhance Offline Handwritten Gujarati Numerals for feature extraction

Learning to Play Love Letter with Deep Reinforcement Learning

Camera Model Identification With The Use of Deep Convolutional Neural Networks

Can you tell a face from a HEVC bitstream?

Facial Emotion Detection Using Different CNN Architectures: Hybrid Vehicle Driving

A SURVEY ON HAND GESTURE RECOGNITION

Demystifying Machine Learning

Handwritten Character Recognition using Different Kernel based SVM Classifier and MLP Neural Network (A COMPARISON)

Image Recognition of Tea Leaf Diseases Based on Convolutional Neural Network

Robust Chinese Traffic Sign Detection and Recognition with Deep Convolutional Neural Network

Neural network pruning for feature selection Application to a P300 Brain-Computer Interface

Creating Intelligence at the Edge

Recognizing Gestures on Projected Button Widgets with an RGB-D Camera Using a CNN

CSC 578 Neural Networks and Deep Learning

Sketch-a-Net that Beats Humans

An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet

AUGMENTED CONVOLUTIONAL FEATURE MAPS FOR ROBUST CNN-BASED CAMERA MODEL IDENTIFICATION. Belhassen Bayar and Matthew C. Stamm

ON CLASSIFICATION OF DISTORTED IMAGES WITH DEEP CONVOLUTIONAL NEURAL NETWORKS. Yiren Zhou, Sibo Song, Ngai-Man Cheung

arxiv: v1 [cs.sd] 1 Oct 2016

NU-Net: Deep Residual Wide Field of View Convolutional Neural Network for Semantic Segmentation

DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION

Deep Neural Networks (2) Tanh & ReLU layers; Generalisation and Regularisation

Introduction to Machine Learning

Robust Hand Gesture Recognition for Robotic Hand Control

Lecture 11-1 CNN introduction. Sung Kim

Automated hand recognition as a human-computer interface

Convolutional Networks Overview

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

Driving Using End-to-End Deep Learning

Free-hand Sketch Recognition Classification

arxiv: v3 [cs.cv] 18 Dec 2018

GestureCommander: Continuous Touch-based Gesture Prediction

arxiv: v2 [cs.sd] 22 May 2017

An Hybrid MLP-SVM Handwritten Digit Recognizer

A Kinect-based 3D hand-gesture interface for 3D databases

A Convolutional Neural Network Smartphone App for Real-Time Voice Activity Detection

Markerless 3D Gesture-based Interaction for Handheld Augmented Reality Interfaces

Static Signature Verification and Recognition using Neural Network Approach-A Survey

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

یادآوری: خالصه CNN. ConvNet

The Art of Neural Nets

Transcription:

Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please cite the published version when available. Title Open Source Dataset and Deep Learning Models for Online Digit Gesture Recognition on Touchscreens Author(s) Corr, Philip J.; Silvestre, Guenole C.; Bleakley, Christopher J. Publication date Conference details 2017-09-01 Irish Machine Vision and Image Processing Conference (IMVIP), Maynooth University, Ireland, 30 August- 1 September 2017 Publisher The Irish Pattern Recognition & Classification Society Link to online version Item record/more information http://eprints.maynoothuniversity.ie/8841/ http://hdl.handle.net/10197/9349 Downloaded 2018-05-02T09:35:06Z The UCD community has made this article openly available. Please share how this access benefits you. Your story matters! (@ucd_oa) Some rights reserved. For more information, please see the item record link above.

Open Source Dataset and Deep Learning Models for Online Digit Gesture Recognition on Touchscreens Philip J. Corr, Guenole C. Silvestre and Chris J. Bleakley School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland. Abstract This paper presents an evaluation of deep neural networks for recognition of digits entered by users on a smartphone touchscreen. A new large dataset of Arabic numerals was collected for training and evaluation of the network. The dataset consists of spatial and temporal touch data recorded for 80 digits entered by 260 users. Two neural network models were investigated. The first model was a 2D convolutional neural (ConvNet) network applied to bitmaps of the glpyhs created by interpolation of the sensed screen touches and its topology is similar to that of previously published models for offline handwriting recognition from scanned images. The second model used a 1D ConvNet architecture but was applied to the sequence of polar vectors connecting the touch points. The models were found to provide accuracies of 98.50% and 95.86%, respectively. The second model was much simpler, providing a reduction in the number of parameters from 1,663,370 to 287,690. The dataset has been made available to the community as an open source resource. 1 Introduction Touchscreens are now pervasively used in smartphones and computing tablets. Text input on a touchscreen commonly uses a virtual keyboard. Unfortunately, the virtual keyboard occupies a significant portion of the screen. This loss of screen is noticeable on smartphones but is especially problematic on smaller devices, such as smartwatches. Text entry by means of handwriting using the finger or thumb has the advantage that the gestures can be performed on top of a screen image or background. Smaller screens can be easily accommodated by entering characters individually, one top of another [Kienzle and Hinckley, 2013]. Previous work on handwriting recognition has mainly focused on processing images of pen-on-paper writing, i.e. offline character recognition. Notably, the MNIST dataset was created using images of handwritten US census returns [LeCun et al., 1998]. Excellent recognition accuracy (99.2%) was demonstrated on the MNIST dataset using a convolutional neural network (ConvNet) [LeCun et al., 1998]. In contrast, online character recognition systems take input in the form of the continuously sensed position of the pen, finger, or thumb. Online systems have the advantage of recording temporal information as well as spatial information. To date, most work on online character recognition has focused on pen based systems [Guyon et al., 1991, Bengio et al., 1995, Verma et al., 2004, Bahlmann, 2006]. LeCun et al. s paper proposed a ConvNet approach to the problem, achieving 96% accuracy. The method involved considerable preprocessing without which accuracy falls to 60%. The preprocessing step requires that the entire glyph is known a priori, removing the possibility of early recognition and completion of the glyph. To date, there has been almost no work on using neural networks for online recognition of touchscreen handwriting using a finger or thumb. Our observation is that digits formed using a finger or thumb have greater variability than those formed using a pen, with more examples of poorly formed glyphs. Most likely, this is due to the users having better fine grained control of the pen. Furthermore, to enable operation on low cost, small form factor devices it is desirable that the resource footprint of the recognizer is low in terms of computational complexity and memory requirements. To date, an unexplored dimension of the problem is that online entry allows early recognition and confirmation of the character entered, enabling faster text entry.

Herein, we report on a investigation seeking to address these challenges. A large dataset of Arabic numerals was collected using a smartphone. A number of deep learning models were explored and their accuracy evaluated for the collected dataset. Of these architectures, two are reported herein. The first model uses an approach similar to offline character recognition systems, i.e. a 2D ConvNet taking the bitmap of the completed glyph as input. The second model uses a 1D ConvNet applied to the polar vector connecting touch positions. The accuracy and the size of the networks are reported herein together with an analysis of some of the errors. In addition, initial results on early digit recognition are provided. To the best of our knowledge, this is the first work to report on a low footprint recognizer using polar vector inputs for online finger or thumb touch digit recognition. 2 Dataset A software application was developed to record the dataset. Prior to participation, subjects signed a consent form. The application firstly asked subjects to enter their age, sex, nationality and handedness. Each subject was then instructed to gesture digits on the touchscreen using their index finger. The digits 0 to 9 were entered four times. The sequence of digit entry was random. Instructions to the user were provided using voice synthesis to avoid suggesting a specific glyph rendering. The process was repeated for input using the thumb while holding the device with the same hand. This is to allow for applications where the user may only have one hand free. Cubic interpolation of touches during gesture input was rendered on the screen to provide visual feedback to the subject and to compute arclengths. The screen was initially blank (white) and the gestures were displayed in black. The subject could use most of screen to gesture with small areas at the top and bottom reserved for instructions/interactions/guidance. The subject was permitted to erase and repeat the entry, if desired. The dataset was acquired on a 4.7 inch iphone 6 running ios 10. Force touch data was not available. The touch panel characteristics are not publicly available, specifically the sampling frequency and spatial accuracy are unknown. Values of 60Hz and ±1mm are typically reported (Optofidelity datasheet). Data was stored in a relational database. Subject details such as handedness, sex and age were recorded along with the associated glyphs. Glyphs were stored as a set of associated strokes, corresponding to a period when the subject s finger was in continuous contact with the device panel. The coordinate of each touch position was sampled by the touch panel device and this, along with the timestamp of the touch, was stored. The dataset was reviewed manually and any incorrectly entered glyphs were marked as invalid. The final dataset contained input from 260 subjects with a total of 20,217 digits gestured and demographic details are summarized in Table 1a. 3 Deep Learning Models Two deep learning models were developed. One takes an offline glyph bitmap as input and the other takes the polar vectors connecting touch points as input. The models were implemented using Keras with TensorFlow backend and trained on a NVIDIA TITAN X GPU. 3.1 Model with Bitmap Input The first architecture investigated, as listed in Table 1b, consisted of two convolutional layers and two fully connected layers. Each of the convolutional layers are followed by a rectified linear unit activation layer and a max pooling layer. In the convolutional layers, kernels of size 5x5 were used with a stride of 1. Padding was set to ensure the height and width of the output is the same as the input. The max pooling layers use non-overlapping windows of size 2x2. The result of this is that the output of the second max pooling layer is 7x7. The two fully connected layers come after the aforementioned layers. 50% dropout is used during training to prevent over fitting and a momentum optimizer, implementing a variation of stochastic gradient descent, was used to minimise the error. The learning rate used for this optimiser was 0.9. Exponential decay was used and the decay rate was set to 0.95. When running for 10 epochs the network took approximately 8 seconds to train on the NVIDIA TITAN X graphics card.

3.2 Model with Polar Vector Input The coordinates of the touch samples were converted to a series of polar vectors. For each touch point, the vector to the next touch point was calculated. The angle of the vector was calculated as the angle to the positive x axis in the range ±π where +π/2 is vertically upwards. The length of the vector was expressed in pixels. The network architecture is listed in Table 1c. The input sequences were padded with zeros so that they were they were all the same length as the longest sequence in the dataset, 130 points. Dropout layers with a dropout rate of 25% were used to avoid co-adaption of the training data and hence, to reduce overfitting. Max pooling layers with pool size of 2 were used to progressively reduce the number of parameters in the network and hence, reduce the computation required in the training process. In the convolutional layers a kernel size of 5 was used as this was found to capture local features from within the sequence. The activation function used was ReLU as it was found to provide the highest accuracy of the commonly used activation functions. Softmax was used in order to perform the final classification. Three input cases were considered: angle-only, vector length-only, and both angle and length. Some of the glyphs include multiple strokes. Only the longest stroke was input to the network. This was found to give better accuracy than inputting the entire multi-stroke gesture. Training was considered finished when the validation accuracy did not change for 18 epochs. This typically occurred after 80 epochs. (a) Database Demographic Parameter Number of Entries Male 126 Female 134 Right Handed 228 Left Handed 32 Nationalities 12 Age Range 18-80 (b) 2D Model with Bitmap Input Layers Output Size F # P # 2D Convolution 28x28 32 832 Max Pooling 14x14-0 2D Convolution 14x14 64 51264 Max Pooling 7x7-0 Fully Connected 512-1,606,144 Dropout 512-0 Fully Connected 10-5130 (c) 1D Model with Polar Vector Input Layer Output Size F # P # 1D Convolution 126 32 352 Dropout 126-0 1D Convolution 122 32 5152 Max Pooling 61-0 Dropout 61-0 1D Convolution 57 64 10304 Max Pooling 28-0 Dropout 28-0 1D Convolution 28 128 41088 Max Pooling 14-0 Dropout 14-0 Flatten 1792-0 Fully Connected 128-229504 Dropout 128-0 Fully Connected 10-1290 Table 1: Dataset and Network Architectures. F# and P# refer to the number of features and number of parameters. 4 Results and Discussion The networks were evaluated on the dataset using a 60% training set, 20% validation set and 20% test set split. The accuracy of the networks is listed in Table 3. It can be seen that the network with bitmap input gives highest accuracy. The accuracy is close to the results reported in [LeCun et al., 1998] for the NMIST dataset, suggesting that the network is able to cope with the variability of the finger and thumb touch gestures. In the case of the polar vector input, the best results are obtained by using both angle and distance data. Also for the polar vector model, using only the longest stroke provided better results than using the full multi-stroke gesture. This may be due to a dataset deficiency or the artificial concatenation of the multi-strokes. The size of the networks is compared in Table 3. The 2D network is clearly larger due to the number of points on the screen, whereas the 1D network takes only the sequence as input.

Table 3: Network Accuracy 100 80 Polar Cartesian Model Input Accuracy (%) # of Parameters 2D bitmap 98.5 1,663,370 1D distance 76.52 287,530 20 1D angle 93.77 287,530 0 1D distance & angle 95.86 287,690 % stroke of completion Figure 1: Accuracy vs. stroke completion Accuracy 60 40 20 40 60 80 100 (A) (B) (C) (D) (E) (F) Figure 2: Selection of classification errors. A & B show glyphs where mis-clasification occurs due to omission of subsequent strokes. C & D are ambiguous glyphs. E & F show mis-classification due to glyph formation. 5 Conclusions and Future Work A dataset was created consisting of Arabic numerals recorded on a smartphone touchscreen using single finger or thumb gestures. Two deep neural networks were trained to recognise the digits. Both models achieved high accuracy. One of the models used a novel polar vector data format and had a significantly lower footprint. In future work, we plan to enhance the accuracy of early digit recognition to accelerate the digit entry process. It is hoped that the open source dataset described here will facilitate further work on this topic. The dataset is available at [Corr et al., 2017]. References [Bahlmann, 2006] Bahlmann, C. (2006). Directional features in online handwriting recognition. Pattern Recognition, 39(1):115 125. [Bengio et al., 1995] Bengio, Y., LeCun, Y., Nohl, C., and Burges, C. (1995). LeRec: A NN/HMM hybrid for on-line handwriting recognition. Neural Computation, 7(6):1289 1303. [Corr et al., 2017] Corr, P., Silvestre, G., and Bleakley, C. (2017). Numeral gesture dataset. https://github.com/philipcorr/numeral-gesture-dataset. Accessed: 2017-07-14. [Guyon et al., 1991] Guyon, I., Albrecht, P., Le Cun, Y., Denker, J., and Hubbard, W. (1991). Design of a neural network character recognizer for a touch terminal. Pattern Recognition, 24(2):105 119. [Kienzle and Hinckley, 2013] Kienzle, W. and Hinckley, K. (2013). Writing handwritten messages on a small touchscreen. In Proc. Int. Conf. HCI with Mobile Devices and Services, pages 179 182. [LeCun et al., 1998] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278 2324. [LeCun et al., 1998] LeCun, Y., Cortes, C., and Burges, C. J. (1998). MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. Accessed: 2017-06-22. [Verma et al., 2004] Verma, B. et al. (2004). A feature extraction technique for online handwriting recognition. In Proc. IEEE Int. Joint Conf. on Neural Networks, volume 2, pages 1337 1341.