Multi-modal Human-computer Interaction

Similar documents
Multi-modal Human-Computer Interaction. Attila Fazekas.

Introduction to Haptics

Research Seminar. Stefano CARRINO fr.ch

Computer Vision in Human-Computer Interaction

A Comparison Study of Image Descriptors on Low- Resolution Face Image Verification

EFFICIENT ATTENDANCE MANAGEMENT SYSTEM USING FACE DETECTION AND RECOGNITION

Biometrics technology: Faces

Stereo-based Hand Gesture Tracking and Recognition in Immersive Stereoscopic Displays. Habib Abi-Rached Thursday 17 February 2005.

Multi-Modal User Interaction

Today. CS 395T Visual Recognition. Course content. Administration. Expectations. Paper reviews

FACE DETECTION. Sahar Noor Abdal ID: Mashook Mujib Chowdhury ID:

Auto-tagging The Facebook

Effectiveness of Multi-modal Techniques in Human-Computer Interaction: Experimental Results with the Computer Chess Player Turk-2

Face Recognition System Based on Infrared Image

Face Detection System on Ada boost Algorithm Using Haar Classifiers

A SURVEY ON GESTURE RECOGNITION TECHNOLOGY

3D Face Recognition in Biometrics

Student Attendance Monitoring System Via Face Detection and Recognition System

Pose Invariant Face Recognition

Perceptual Interfaces. Matthew Turk s (UCSB) and George G. Robertson s (Microsoft Research) slides on perceptual p interfaces

Face Tracking using Camshift in Head Gesture Recognition System

Study guide for Graduate Computer Vision

Distinguishing Mislabeled Data from Correctly Labeled Data in Classifier Design

AAU SUMMER SCHOOL PROGRAMMING SOCIAL ROBOTS FOR HUMAN INTERACTION LECTURE 10 MULTIMODAL HUMAN-ROBOT INTERACTION

Introduction to Machine Learning

[2005] IEEE. Reprinted, with permission, from [Hatice Gunes and Massimo Piccardi, Fusing Face and Body Gesture for Machine Recognition of Emotions,

Exercise questions for Machine vision

Multiresolution Analysis of Connectivity

An Efficient Approach to Face Recognition Using a Modified Center-Symmetric Local Binary Pattern (MCS-LBP)

Session 2: 10 Year Vision session (11:00-12:20) - Tuesday. Session 3: Poster Highlights A (14:00-15:00) - Tuesday 20 posters (3minutes per poster)

Emotion Based Music Player

ROBOT VISION. Dr.M.Madhavi, MED, MVSREC

Visual Interpretation of Hand Gestures as a Practical Interface Modality

Digital image processing vs. computer vision Higher-level anchoring

The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification

Perception. Introduction to HRI Simmons & Nourbakhsh Spring 2015

Classification in Image processing: A Survey

Face Recognition Based Attendance System with Student Monitoring Using RFID Technology

A Proposal for Security Oversight at Automated Teller Machine System

Face detection, face alignment, and face image parsing

COMPARATIVE PERFORMANCE ANALYSIS OF HAND GESTURE RECOGNITION TECHNIQUES

International Journal of Advanced Research in Computer Science and Software Engineering

Image Extraction using Image Mining Technique

MATLAB DIGITAL IMAGE/SIGNAL PROCESSING TITLES

BIOMETRIC IDENTIFICATION USING 3D FACE SCANS

A Comparison of Histogram and Template Matching for Face Verification

Iranian Face Database With Age, Pose and Expression

A New Connected-Component Labeling Algorithm

Remote Sensing. The following figure is grey scale display of SPOT Panchromatic without stretching.

Face Detection: A Literature Review

Solution Q.1 What is a digital Image? Difference between Image Processing

Title Goes Here Algorithms for Biometric Authentication

A Neural Network Facial Expression Recognition System using Unsupervised Local Processing

Image Processing Final Test

Finding Lips in Unconstrained Imagery for Improved Automatic Speech Recognition

EE368 Digital Image Processing Project - Automatic Face Detection Using Color Based Segmentation and Template/Energy Thresholding

Banknote Authentication

Image Recognition for PCB Soldering Platform Controlled by Embedded Microchip Based on Hopfield Neural Network

Automated hand recognition as a human-computer interface

Vision-based User-interfaces for Pervasive Computing. CHI 2003 Tutorial Notes. Trevor Darrell Vision Interface Group MIT AI Lab

Digital Image Processing

ELEC Dr Reji Mathew Electrical Engineering UNSW

DIGITAL IMAGE PROCESSING

PHASE CONGURENCY BASED FEATURE EXTRCTION FOR FACIAL EXPRESSION RECOGNITION USING SVM CLASSIFIER

Combined Approach for Face Detection, Eye Region Detection and Eye State Analysis- Extended Paper

8.2 IMAGE PROCESSING VERSUS IMAGE ANALYSIS Image processing: The collection of routines and

A Novel Algorithm for Hand Vein Recognition Based on Wavelet Decomposition and Mean Absolute Deviation

A Novel Morphological Method for Detection and Recognition of Vehicle License Plates

Artificial Beacons with RGB-D Environment Mapping for Indoor Mobile Robot Localization

Autonomous Face Recognition

Object Category Detection using Audio-visual Cues

Keywords: - Gaussian Mixture model, Maximum likelihood estimator, Multiresolution analysis

HW- Finish your vision book!

An Hybrid MLP-SVM Handwritten Digit Recognizer

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Multi-Image Deblurring For Real-Time Face Recognition System

Mel Spectrum Analysis of Speech Recognition using Single Microphone

ZKTECO COLLEGE- FUNDAMENTAL OF FINGER VEIN RECOGNITION

Advanced Man-Machine Interaction

Digital images. Digital Image Processing Fundamentals. Digital images. Varieties of digital images. Dr. Edmund Lam. ELEC4245: Digital Image Processing

HISTOGRAM BASED AUTOMATIC IMAGE SEGMENTATION USING WAVELETS FOR IMAGE ANALYSIS

CS534 Introduction to Computer Vision. Linear Filters. Ahmed Elgammal Dept. of Computer Science Rutgers University

Sensing and Perception

Robust Hand Gesture Recognition for Robotic Hand Control

RESEARCH AND DEVELOPMENT OF DSP-BASED FACE RECOGNITION SYSTEM FOR ROBOTIC REHABILITATION NURSING BEDS

Multimodal Research at CPK, Aalborg

A Driver Assaulting Event Detection Using Intel Real-Sense Camera

SECTION I - CHAPTER 2 DIGITAL IMAGING PROCESSING CONCEPTS

Non Linear Image Enhancement

Face Recognition: Identifying Facial Expressions Using Back Propagation

Visual Rules. Why are they necessary?

Multimodal Face Recognition using Hybrid Correlation Filters

Reliable Classification of Partially Occluded Coins

YDDON. Humans, Robots, & Intelligent Objects New communication approaches

Perception. Read: AIMA Chapter 24 & Chapter HW#8 due today. Vision

Detection of Rail Fastener Based on Wavelet Decomposition and PCA Ben-yu XIAO 1, Yong-zhi MIN 1,* and Hong-feng MA 2

BIOMETRICS BY- VARTIKA PAUL 4IT55

VEHICLE LICENSE PLATE DETECTION ALGORITHM BASED ON STATISTICAL CHARACTERISTICS IN HSI COLOR MODEL

Computer Assisted Image Analysis 1 GW 1, Filip Malmberg Centre for Image Analysis Deptartment of Information Technology Uppsala University

BODILY NON-VERBAL INTERACTION WITH VIRTUAL CHARACTERS

Transcription:

Multi-modal Human-computer Interaction Attila Fazekas Attila.Fazekas@inf.unideb.hu SSIP 2008, 9 July 2008

Hungary and Debrecen Multi-modal Human-computer Interaction - 2

Debrecen Big Church Multi-modal Human-computer Interaction - 3

University of Debrecen Multi-modal Human-computer Interaction - 4

Coming Soon Summer School on Image Processing 2009

Coming Soon Summer School on Image Processing 2009

Coming Soon Summer School on Image Processing 2009 http:\\www.inf.unideb.hu\ ssip Multi-modal Human-computer Interaction - 5

Road Map Multi-modal interactions and systems (main categories, examples, benefits) Turk-2 Multi-modal chess player, facial gestures recognition Experimental results Multi-modal Human-computer Interaction - 6

Defining Multi-Modal Interaction There are two views on multi-modal interaction:

Defining Multi-Modal Interaction There are two views on multi-modal interaction: The first focuses on the human side: perception and control. There the word modality refers to human input and output channels.

Defining Multi-Modal Interaction There are two views on multi-modal interaction: The first focuses on the human side: perception and control. There the word modality refers to human input and output channels. The second view focuses on synergistic using two or more computer input or output modalities. Multi-modal Human-computer Interaction - 7

Human-Centered View The focus is on multi-modal perception and control, that is, human input and output channels.

Human-Centered View The focus is on multi-modal perception and control, that is, human input and output channels. Perception means the process of transforming sensory information to higherlevel representation. Multi-modal Human-computer Interaction - 8

The Modalities We can divide the modalities in seven groups

The Modalities We can divide the modalities in seven groups Internal chemical (blood oxygen, etc.)

The Modalities We can divide the modalities in seven groups Internal chemical (blood oxygen, etc.) External chemical (taste, etc.)

The Modalities We can divide the modalities in seven groups Internal chemical (blood oxygen, etc.) External chemical (taste, etc.) Somatic senses (touch, etc.)

The Modalities We can divide the modalities in seven groups Internal chemical (blood oxygen, etc.) External chemical (taste, etc.) Somatic senses (touch, etc.) Muscle sense (stretch, etc.)

The Modalities We can divide the modalities in seven groups Internal chemical (blood oxygen, etc.) External chemical (taste, etc.) Somatic senses (touch, etc.) Muscle sense (stretch, etc.) Sense of balance

The Modalities We can divide the modalities in seven groups Internal chemical (blood oxygen, etc.) External chemical (taste, etc.) Somatic senses (touch, etc.) Muscle sense (stretch, etc.) Sense of balance Hearing

The Modalities We can divide the modalities in seven groups Internal chemical (blood oxygen, etc.) External chemical (taste, etc.) Somatic senses (touch, etc.) Muscle sense (stretch, etc.) Sense of balance Hearing Vision Multi-modal Human-computer Interaction - 9

System-Centered View In computer science multi-modal user interfaces have been defined in many ways.

System-Centered View In computer science multi-modal user interfaces have been defined in many ways. Chatty s explanation of multi-modal interaction is the one that most computer scientist use. With the term multi-modal user interface they mean a system that accepts many different inputs that are combined in a meaningful way. Multi-modal Human-computer Interaction - 10

Definition of the Multimodality Multi-modality is the capacity of the system to communicate with a user along different types of communication channels.

Definition of the Multimodality Multi-modality is the capacity of the system to communicate with a user along different types of communication channels. Both multimedia and multi-modal systems use multiple communication channels.

Definition of the Multimodality Multi-modality is the capacity of the system to communicate with a user along different types of communication channels. Both multimedia and multi-modal systems use multiple communication channels. But a multi-modal system strives for meaning. Multi-modal Human-computer Interaction - 11

Two Types of Multi-modal Systems The goal is to use the computer as a tool.

Two Types of Multi-modal Systems The goal is to use the computer as a tool. The computer as a dialogue partner. Multi-modal Human-computer Interaction - 12

History Bolt s Put-That-There system

History Bolt s Put-That-There system. In this system the user could move objects on screen by pointing and speaking.

History Bolt s Put-That-There system. In this system the user could move objects on screen by pointing and speaking. CUBRICON is a system that uses mouse pointing and speech.

History Bolt s Put-That-There system. In this system the user could move objects on screen by pointing and speaking. CUBRICON is a system that uses mouse pointing and speech. Oviatt presented a multi-modal system for dynamic interactive maps. Multi-modal Human-computer Interaction - 13

Benefits Efficiency follows from using each modality for the task that it is best suited for.

Benefits Efficiency follows from using each modality for the task that it is best suited for. Redundancy increases the likelihood that communication proceeds smoothly because there are many simultaneous references to the same issue.

Benefits Efficiency follows from using each modality for the task that it is best suited for. Redundancy increases the likelihood that communication proceeds smoothly because there are many simultaneous references to the same issue. Perceptability increas when the tasks are facilitated in spatial context. Multi-modal Human-computer Interaction - 14

Benefits Naturalness follows from the free choice of modalities and may result in a humancomputer communication that is close to human-human communication.

Benefits Naturalness follows from the free choice of modalities and may result in a humancomputer communication that is close to human-human communication. Accuracy increases when another modality can indicate an object more accurately than the main modality. Multi-modal Human-computer Interaction - 15

Benefits Synergy occurs when one channel of communication can help refine imprecision, modify the meaning, or resolve ambihuities in another channel. Multi-modal Human-computer Interaction - 16

Applications Mobile telecommunication

Applications Mobile telecommunication Hands-free devices to computers

Applications Mobile telecommunication Hands-free devices to computers Using in a car

Applications Mobile telecommunication Hands-free devices to computers Using in a car Interactive information panel Multi-modal Human-computer Interaction - 17

Turk-2 Multi-modal Human-computer Interaction - 18

System Components Multi-modal Human-computer Interaction - 19

Introduction Faces are our interfaces in our emotional and social live.

Introduction Faces are our interfaces in our emotional and social live. Automatic analysis of facial gestures is rapidly becoming an area of interest in multimodal human-computer interaction.

Introduction Faces are our interfaces in our emotional and social live. Automatic analysis of facial gestures is rapidly becoming an area of interest in multimodal human-computer interaction. Basic goal of this area of research is a human-like description of shown facial expression. Multi-modal Human-computer Interaction - 20

Introduction The solution of this problem can be based on the idea of some face detection approaches. Multi-modal Human-computer Interaction - 21

Related Research Topics (one face/image)

Related Research Topics (one face/image) Face localization (more faces/image)

Related Research Topics (one face/image) Face localization (more faces/image) Facial feature detection (eyes, mouth, etc.)

Related Research Topics (one face/image) Face localization (more faces/image) Facial feature detection (eyes, mouth, etc.) Facial expression recognition

Related Research Topics (one face/image) Face localization (more faces/image) Facial feature detection (eyes, mouth, etc.) Facial expression recognition Face recognition, face identification

Related Research Topics (one face/image) Face localization (more faces/image) Facial feature detection (eyes, mouth, etc.) Facial expression recognition Face recognition, face identification Face tracking Multi-modal Human-computer Interaction - 22

Problems of the Face Detection Pose: The images of a face vary due to the relative camera-face pose.

Problems of the Face Detection Pose: The images of a face vary due to the relative camera-face pose. Presence or absence of structural components (beards, mustaches, glasses etc.).

Problems of the Face Detection Pose: The images of a face vary due to the relative camera-face pose. Presence or absence of structural components (beards, mustaches, glasses etc.). Facial expression: The appearance of faces are directly affected by the facial expression. Multi-modal Human-computer Interaction - 23

Problems of the Face Detection Occlusion: Faces may be partially occluded by other objects.

Problems of the Face Detection Occlusion: Faces may be partially occluded by other objects. Image orientation: Face images vary for different rotations about the optical axis of the camera.

Problems of the Face Detection Occlusion: Faces may be partially occluded by other objects. Image orientation: Face images vary for different rotations about the optical axis of the camera. Imaging conditions (lighting, background, camera characteristics). Multi-modal Human-computer Interaction - 24

Face Detection in a Singe Image Knowledge-based methods (G. Yang and T.S. Huang, 1994).

Face Detection in a Singe Image Knowledge-based methods (G. Yang and T.S. Huang, 1994). Feature invariant approaches (T. K. Leung, M. C. Burl, and P. Perona, 1995), (K. C. Yow and R. Cipolla, 1996). Multi-modal Human-computer Interaction - 25

Face Detection in a Singe Image Template matching methods (A. Lanitis, C. J. Taylor, and T. F. Cootes, 1995).

Face Detection in a Singe Image Template matching methods (A. Lanitis, C. J. Taylor, and T. F. Cootes, 1995). Appearance-based methods (E. Osuna, R. Freund, and F. Girosi, 1997), (A. Fazekas, C. Kotropoulos, I. Pitas, 2002). Multi-modal Human-computer Interaction - 26

Face Detection Scanning of the picture by a running window in a multiresolution pyramid.

Face Detection Scanning of the picture by a running window in a multiresolution pyramid. Normalize of the window.

Face Detection Scanning of the picture by a running window in a multiresolution pyramid. Normalize of the window. Hide some parts of the face.

Face Detection Scanning of the picture by a running window in a multiresolution pyramid. Normalize of the window. Hide some parts of the face. Normalize of the local variance of the brightness on the picture. Multi-modal Human-computer Interaction - 27

Face Detection Equalization of the histogram.

Face Detection Equalization of the histogram. Localization of the face (decision). Multi-modal Human-computer Interaction - 28

Face Gesture Recognition Let us consider a set of the facial pictures.

Face Gesture Recognition Let us consider a set of the facial pictures. Let us set up a finite system of some features related the pictures.

Face Gesture Recognition Let us consider a set of the facial pictures. Let us set up a finite system of some features related the pictures. It is known any pictures is related to only one class:

Face Gesture Recognition Let us consider a set of the facial pictures. Let us set up a finite system of some features related the pictures. It is known any pictures is related to only one class: face with the given gesture,

Face Gesture Recognition Let us consider a set of the facial pictures. Let us set up a finite system of some features related the pictures. It is known any pictures is related to only one class: face with the given gesture, face without the given gesture. Multi-modal Human-computer Interaction - 29

Face Gesture Recognition The problem to find a method to determine the class of the examined picture.

Face Gesture Recognition The problem to find a method to determine the class of the examined picture. One possible way to solve this problem: Support Vector Machine. Multi-modal Human-computer Interaction - 30

Experimental Results For all experiments the package Light developed by T. Joachims was used. For complete test, several routines have been added to the original toolbox.

Experimental Results For all experiments the package Light developed by T. Joachims was used. For complete test, several routines have been added to the original toolbox. The database recorded by our institute was used. Multi-modal Human-computer Interaction - 31

Experimental Results Training set of 40 images (20 faces with the given gesture, 20 faces without the given gesture.).

Experimental Results Training set of 40 images (20 faces with the given gesture, 20 faces without the given gesture.). All images are recorded in 256 grey levels.

Experimental Results Training set of 40 images (20 faces with the given gesture, 20 faces without the given gesture.). All images are recorded in 256 grey levels. They are of dimension 640 480. Multi-modal Human-computer Interaction - 32

Experimental Results The procedure for collecting face patterns is as follows.

Experimental Results The procedure for collecting face patterns is as follows. A rectangle part of dimension 256 320 pixels has been manually determined that includes the actual face. Multi-modal Human-computer Interaction - 33

Experimental Results This area has been subsampled four times. At each subsampling, nonoverlapping regions of 2 2 pixels are replaced by their average.

Experimental Results This area has been subsampled four times. At each subsampling, nonoverlapping regions of 2 2 pixels are replaced by their average. The training patterns of dimension 16 20 are built. Multi-modal Human-computer Interaction - 34

Experimental Results The class label +1 has been appended to each pattern.

Experimental Results The class label +1 has been appended to each pattern. Similarly, 20 non-face patterns have been collected from images in the same way, and labeled 1. Multi-modal Human-computer Interaction - 35

Facial Gesture Database Surprising face Smiling face Sad face Angry face Multi-modal Human-computer Interaction - 36

Classification Error Angry Happy Sad Serial Suprised 22.4% 10.3% 11.8% 9.4% 18.9% Multi-modal Human-computer Interaction - 37

Multi-modal Human-computer Interaction - 38

Multi-modal Human-computer Interaction - 39

Multi-modal Human-computer Interaction - 40

Multi-modal Human-computer Interaction - 41

Multi-modal Human-computer Interaction - 42

Multi-modal Human-computer Interaction - 43

Multi-modal Human-computer Interaction - 44

Multi-modal Human-computer Interaction - 45

Support Vector Machine Statistical learning from examples aims at selecting from a given set of functions {f α (x) α Λ}, the one which predicts best the correct response. Multi-modal Human-computer Interaction - 46

Support Vector Machine This selection is based on the observation of l pairs that build the training set: (x 1, y 1 ),..., (x l, y l ), x i R m, y i {+1, 1} which contains input vectors x i and the associated ground truth given by an external supervisor.

Support Vector Machine This selection is based on the observation of l pairs that build the training set: (x 1, y 1 ),..., (x l, y l ), x i R m, y i {+1, 1} which contains input vectors x i and the associated ground truth given by an external supervisor. Let the response of the learning machine f α (x) belongs to a set of indicator functions. Multi-modal Human-computer Interaction - 47

Support Vector Machine If we define the loss-function: { 0, if y = fα (x), L(y, f α (x)) = 1, if y f α (x). The expected value of the loss is given by: R(α) = L(y, f α (x))p(x, y)dxdy, where p(x, y) is the joint probability density function of random variables x and y. Multi-modal Human-computer Interaction - 48

Support Vector Machine We would like to find the function f α0 (x) which minimizes the risk function R(α).

Support Vector Machine We would like to find the function f α0 (x) which minimizes the risk function R(α). The basic idea of to construct the optimal separating hyperplane. Multi-modal Human-computer Interaction - 49

Support Vector Machine Suppose that the training data can be separated by a hyperplane, f α (x) = α T x + b = 0, such that: y i (α T x i + b) 1, i = 1, 2,..., l where α is the normal to the hyperplane.

Support Vector Machine Suppose that the training data can be separated by a hyperplane, f α (x) = α T x + b = 0, such that: y i (α T x i + b) 1, i = 1, 2,..., l where α is the normal to the hyperplane. For the linearly separable case, simply seeks for the separating hyperplane with the largest margin. Multi-modal Human-computer Interaction - 50

Support Vector Machine For linearly nonseparable data, by mapping the input vectors, which are the elements of the training set, into a high-dimensional feature space through so-called kernel function.

Support Vector Machine For linearly nonseparable data, by mapping the input vectors, which are the elements of the training set, into a high-dimensional feature space through so-called kernel function. We construct the optimal separating hyperplane in the feature space to get a binary decision. Multi-modal Human-computer Interaction - 51

Thank you for your attention! Multi-modal Human-computer Interaction - 52