Visual Interpretation of Hand Gestures as a Practical Interface Modality

Similar documents
Stereo-based Hand Gesture Tracking and Recognition in Immersive Stereoscopic Displays. Habib Abi-Rached Thursday 17 February 2005.

Research Seminar. Stefano CARRINO fr.ch

R (2) Controlling System Application with hands by identifying movements through Camera

COMPARATIVE PERFORMANCE ANALYSIS OF HAND GESTURE RECOGNITION TECHNIQUES

An Efficient Color Image Segmentation using Edge Detection and Thresholding Methods

Student Attendance Monitoring System Via Face Detection and Recognition System

Robust Hand Gesture Recognition for Robotic Hand Control

Enabling Cursor Control Using on Pinch Gesture Recognition

Virtual Grasping Using a Data Glove

Hand & Upper Body Based Hybrid Gesture Recognition

of interface technology. For example, until recently, limited CPU power has dictated the complexity of interface devices.

A Real Time Static & Dynamic Hand Gesture Recognition System

Live Hand Gesture Recognition using an Android Device

Thesis: Bio-Inspired Vision Model Implementation In Compressed Surveillance Videos by. Saman Poursoltan. Thesis submitted for the degree of

Number Plate Recognition Using Segmentation

Application Areas of AI Artificial intelligence is divided into different branches which are mentioned below:

3D Data Navigation via Natural User Interfaces

Transactions on Information and Communications Technologies vol 1, 1993 WIT Press, ISSN

Computing for Engineers in Python

Checkerboard Tracker for Camera Calibration. Andrew DeKelaita EE368

Applying Vision to Intelligent Human-Computer Interaction

Background. Computer Vision & Digital Image Processing. Improved Bartlane transmitted image. Example Bartlane transmitted image

Imaging Process (review)

Analysis of Various Methodology of Hand Gesture Recognition System using MATLAB

CHAPTER 1. INTRODUCTION 16

Gesture Recognition with Real World Environment using Kinect: A Review

A Method for Temporal Hand Gesture Recognition

The use of gestures in computer aided design

Design a Model and Algorithm for multi Way Gesture Recognition using Motion and Image Comparison

Research on Application of Conjoint Neural Networks in Vehicle License Plate Recognition

Vishnu Nath. Usage of computer vision and humanoid robotics to create autonomous robots. (Ximea Currera RL04C Camera Kit)

A Novel Approach of Compressing Images and Assessment on Quality with Scaling Factor

CS/ECE 545 (Digital Image Processing) Midterm Review

AUTOMATED MALARIA PARASITE DETECTION BASED ON IMAGE PROCESSING PROJECT REFERENCE NO.: 38S1511

Goal: Label Skin Pixels in an Image. Their Application. Background/Previous Work. Understanding Skin Albedo. Measuring Spectral Albedo of Skin

PRACTICAL IMAGE AND VIDEO PROCESSING USING MATLAB

Evolving Robot Empathy through the Generation of Artificial Pain in an Adaptive Self-Awareness Framework for Human-Robot Collaborative Tasks

NEURAL NETWORK DEMODULATOR FOR QUADRATURE AMPLITUDE MODULATION (QAM)

Real Time Word to Picture Translation for Chinese Restaurant Menus

ROBOT VISION. Dr.M.Madhavi, MED, MVSREC

Service Robots in an Intelligent House

Wheeler-Classified Vehicle Detection System using CCTV Cameras

Statistical Color Models with Application to Skin Detection

Digitizing Color. Place Value in a Decimal Number. Place Value in a Binary Number. Chapter 11: Light, Sound, Magic: Representing Multimedia Digitally

A SURVEY ON HAND GESTURE RECOGNITION

Real Time Hand Gesture Tracking for Network Centric Application

Finger rotation detection using a Color Pattern Mask

Kigamo Scanback which fits in your view camera in place of conventional film.

A SURVEY ON GESTURE RECOGNITION TECHNOLOGY

International Journal of Computer Engineering and Applications, Volume XII, Issue IV, April 18, ISSN

ECC419 IMAGE PROCESSING

5/17/2009. Digitizing Color. Place Value in a Binary Number. Place Value in a Decimal Number. Place Value in a Binary Number

Enhanced performance of delayed teleoperator systems operating within nondeterministic environments

Computer Vision. Howie Choset Introduction to Robotics

Libyan Licenses Plate Recognition Using Template Matching Method

MILITARY PRODUCTION MINISTRY Training Sector. Using and Interpreting Information. Lecture 6. Flow Charts.

SMARTPHONE SENSOR BASED GESTURE RECOGNITION LIBRARY

Implementation of License Plate Recognition System in ARM Cortex A8 Board

Scrabble Board Automatic Detector for Third Party Applications

Preprocessing and Segregating Offline Gujarati Handwritten Datasheet for Character Recognition

Synthetic Brains: Update

FINGER PLACEMENT CORRECTION FOR STATIC GESTURE RECOGNITION IN AMERICAN SIGN LANGUAGE. Veronica Yenquenida Flamenco Cordova

Image Recognition for PCB Soldering Platform Controlled by Embedded Microchip Based on Hopfield Neural Network

Applications of Music Processing

APPENDIX 1 TEXTURE IMAGE DATABASES

GE 113 REMOTE SENSING. Topic 7. Image Enhancement

HUMAN COMPUTER INTERFACE

International Journal of Advanced Research in Computer Science and Software Engineering

ME 6406 MACHINE VISION. Georgia Institute of Technology

SECTION I - CHAPTER 2 DIGITAL IMAGING PROCESSING CONCEPTS

Virtual Tactile Maps

UNIT VI. Current approaches to programming are classified as into two major categories:

Face Recognition Based Attendance System with Student Monitoring Using RFID Technology

ImageJ, A Useful Tool for Image Processing and Analysis Joel B. Sheffield

Hand Gesture Recognition System Using Camera

AN EFFICIENT APPROACH FOR VISION INSPECTION OF IC CHIPS LIEW KOK WAH

8.2 IMAGE PROCESSING VERSUS IMAGE ANALYSIS Image processing: The collection of routines and

Displacement Measurement of Burr Arch-Truss Under Dynamic Loading Based on Image Processing Technology

Face Detection System on Ada boost Algorithm Using Haar Classifiers

Challenging areas:- Hand gesture recognition is a growing very fast and it is I. INTRODUCTION

Motion Detector Using High Level Feature Extraction

Linear Gaussian Method to Detect Blurry Digital Images using SIFT

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Graz University of Technology (Austria)

Toward an Augmented Reality System for Violin Learning Support

FACE DETECTION. Sahar Noor Abdal ID: Mashook Mujib Chowdhury ID:

RECOGNITION OF EMERGENCY AND NON-EMERGENCY LIGHT USING MATROX AND VB6 MOHD NAZERI BIN MUHAMMAD

DERIVATION OF TRAPS IN AUDITORY DOMAIN

Forensic Search. Version 3.5. Configuration Manual

Detection of License Plates of Vehicles

Colour Profiling Using Multiple Colour Spaces

Appendix C: Graphing. How do I plot data and uncertainties? Another technique that makes data analysis easier is to record all your data in a table.

International Journal of Research in Computer and Communication Technology, Vol 2, Issue 12, December- 2013

Digital Photogrammetry. Presented by: Dr. Hamid Ebadi

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

Fast and Automatic Inspection of Citrus HLB and Other Common Defects

II. LITERATURE SURVEY

An Evaluation of Automatic License Plate Recognition Vikas Kotagyale, Prof.S.D.Joshi

Automatic Vehicles Detection from High Resolution Satellite Imagery Using Morphological Neural Networks

Convolutional Neural Networks: Real Time Emotion Recognition

Transcription:

Visual Interpretation of Hand Gestures as a Practical Interface Modality Frederik C. M. Kjeldsen Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 1997

1997 Frederik C. M. Kjeldsen All Rights Reserved

Abstract This dissertation describes a user interface in which many tasks traditionally performed by a mouse are instead performed using visual recognition of hand gestures. The goals are to explore both how a vision system should be designed to recognize hand gestures, and how they are best used in a general purpose interface. Observed by a camera below the screen, the user manipulates objects directly with gestures incorporating both motion and pose. Task and domain knowledge provide context, allowing real-time recognition on standard PC hardware. A color-based algorithm is trained to segment user's hands from complex backgrounds without visual aids. Training uses a novel combination of both positive and negative data to improve segmentation quality. The apparent path of the hand is smoothed with an algorithm which reduces the types of noise inherent in the domain but leaves a cursor motion on the screen that feels natural for the user. Salient features of the motion are extracted, including a newly discovered natural gesture (a Comma ), which helps provide punctuation for each gestural sentence. Neural networks are trained to classify the pose of the user's hand from cropped and preprocessed images. The nets correctly classify 90-95% of the hand images in real time. A transition network encodes the interaction language. It controls the application of feature extraction operators and interprets their results to determine when to perform actions on the user's behalf. The style of interaction is based on studies of natural gesticulation and incorporates various features designed to make it natural and easy for the user to remember. The system demonstrates a 80-90% success rate on most tasks. Object selection time for large objects is demonstrated to be equal or superior to that of a mouse. Object selection performance is modeled accurately by augmenting Fitts' Law with terms for lag and random cursor noise. Finally, the suitability of gesture for this type of task is considered. Various interaction styles are examined, and problems specific to hand gesture are discussed.

Acknowledgments I would like to express my thanks to IBM for the generous support of this work, via the Resident Study Program. Several individuals deserve special mention. My advisor, John Kender, has given his support in many ways. Ross Bevridge's suggestions helped shape the direction of this work, and Steve Feiner's comments helped to make the thesis much more complete. Several members of the T.J. Watson community have been very helpful, both as colleagues and laboratory rats. In particular, thanks to Jon Connell for both inspiration and his many excellent comments, as well as to Sharatchandra Pankanti, Michael Yao, Chitra Dorai, and Lisa Brown. Finally, apologies to my son, Joseph, for having to do without his father so much during the precious first year of life, and sincere thanks to his mother, Lorraine, both for picking up the slack when I was not there, and suffering my moods when I was. i

Contents Chapter 1: Introduction 1 1.1 Why Gesture... 2 1.1.1 How should it be used?...5 1.2 Why Vision... 6 1.3 Scope of problem... 7 1.4 Difficulties... 9 1.5 Overview of Thesis... 9 Chapter 2: Background 11 2.1 Hand Gesture Theory... 11 2.2 Hand Gesture Recognition... 13 2.2.1 Hand Segmentation...14 2.2.2 Pose Recognition...18 2.2.3 Motion Interpretation...23 2.3 Applications of gesture recognition... 26 2.3.1 Virtual Environments...26 2.3.2 Gesture in Traditional Interfaces...28 Chapter 3: System Description 30 3.1 Overview... 30 3.1.1 Design Discussion...31 3.2 Hand Segmentation... 33 3.2.1 Overview...33 3.2.2 Color Predicate and Training...34 3.2.3 Segmentation Process...36 3.2.4 Design Discussion...37 3.3 Hand Tracking... 43 3.3.1 Design Discussion...48 3.4 Motion... 49 3.4.1 Smoothing the Hand Path...50 ii

3.4.2 Extraction Motion Features...56 3.4.3 Design Discussion...58 3.5 Pose Recognition... 63 3.6 Gesture Interpretation... 76 3.6.1 Design Discussion...81 3.7 Implementation Details and Parameters... 87 3.7.1 Hardware...87 3.7.2 Segmentation...89 3.7.3 Tracking...90 3.7.4 Motion...91 3.7.5 Pose Recognition...93 3.7.6 Interaction Language Details...93 3.7.7 Window System Interface...94 Chapter 4: Performance Evaluation 96 4.1 Segmentation... 96 4.1.1 Overall Performance...96 4.1.2 Calibration...98 4.1.3 Performance in Different Environmental Conditions...99 4.1.4 Performance on Different Skin Tones...101 4.1.5 Non-Hand Skin Regions...101 4.1.6 Other Issues Affecting Segmentation Quality...102 4.2 Hand Motion Tracking... 106 4.2.1 Smoothing Algorithm Performance...106 4.2.2 Object Selection Performance...107 4.2.3 Subjective Evaluation of Tracking Performance...122 4.3 Pose Recognition... 124 4.3.1 Evaluating Network Performance...124 4.3.2 Network Training...126 4.3.3 Sources of Error...127 4.3.4 Network Weight Analysis...130 4.3.5 Variations...133 iii

4.4 The System as a Whole... 135 4.4.1 Speed...135 4.4.2 Task Performance...137 4.4.3 User Comments...141 Chapter 5: Discussion 143 5.1 Vision Systems for Hand Gesture Recognition... 144 5.1.1 Segmentation...144 5.1.2 Tracking...145 5.1.3 Motion Feature Extraction...147 5.1.4 Pose Recognition...151 5.1.5 Language Representation...153 5.1.6 General Considerations...153 5.2 Hand Gestures as an Interface Modality... 160 5.2.1 Characteristics of Free-Hand Gesture...161 5.2.2 Designing an Interface for Hand Gestures...167 5.2.3 Climbing the Learning Curve...180 5.2.4 Design of a Practical Gesture System...182 Chapter 6 Summary and Conclusions 186 6.1 Summary... 186 6.2 In Conclusion... 189 References 190 Appendix 196 iv

List of Figures and Tables Chapter 3. 30 Physical layout....30 System's view of user...30 Image labeled by CP and the largest connected component....34 Weighting function around CP training data...35 User training system...35 Segmentations of Figure 2 for tracking and pose recognition...36 Optimal CP and CP produced by histogramming the positive training examples....38 Color Predicates trained using simple update and with Gaussian smoothing and subtraction....41 Images segmented using the CPs in Figure 8....41 Segmented images of the user pointing to the four corners of the screen...44 The centroid of the user's hand pointing to the corners of the screen forms a quadrilateral in image space...45 The centroid of the hand as it follows a grid in screen space, forms a warped grid in image space....48 Sigmoid force scaling functions...52 The hand backing up behind the cursor between cycles and causing overshoot in a simple smoothing algorithm....54 Force applied to the cursor versus hand displacement...55 Table: Motion Features....57 Pose recognition network architecture....64 Various appearances the Point pose takes on...66 Hand pointing to the top and bottom of the screen....67 Color to gray conversion....68 Two poses very similar in joint angle space, but easy to differentiate in image space....70 Two pointing poses with corrupted outlines....71 A pointing pose with the finger removed and a fist pose...71 Two extremes of roll in a pointing pose...74 Transition network for window control task...76 Table: The actions which can be performed at each node...77 Interaction language using only motion features...80 Three CP training templates...89 v

Chapter 4. 96 Segmentation performance, the good, the average, and the ugly...97 Point missing a finger...97 Fist with hole...97 Example of the face and arm extracted with the hand....102 Example hand images from the PCN training set....105 Hand location before and after smoothing....107 Table: Selection times for 1 inch target with free-hand pointing...109 Time to select a screen object versus its size in inches...110 Table: Selection time in seconds versus target size in inches...110 Predicted and actual mouse selection time for objects of various sizes...113 Table: Probability of cursor landing in target in any one cycle for various cursor error distributions and target sizes....115 Table: Expected number of cycles it will take for the cursor to land inside the target for 3.5 consecutive cycles at various levels of noise....116 Predicted and actual selection time for targets of various sizes using free-hand pointing...117 Predicted selection time from simply increasing tracking rate....118 Predicted free-hand selection time with a reduced level of random noise and with no noise...119 Selection time performance for realistic targets of tracking rate and noise...120 Predicted free-hand selection times under ideal conditions...121 Examples of the three pose classes differentiated by one of the PCNs....124 Total classification performance versus training cycle for the training and test sets....127 Weights in a typical pose classification network....131 Example images for network weights discussion...132 Total classification performance during training for binary pose images...133 Classification performance for palm poses during training for binary pose images...134 Table: Results of system task testing....137 Table: Percentage of total errors by category...140 Chapter 5. 143 Alternate interaction language for the window control task, using the pose of the hand and the motion that occurs after it to signal an action...173 Interaction language allowing multiple actions, separated by a comma...176 Menu layout better suited for hand gesture...179 vi