Multimodal Research at CPK, Aalborg

Similar documents
Project Multimodal FooBilliard

Multi-Modal User Interaction

Evaluating 3D Embodied Conversational Agents In Contrasting VRML Retail Applications

Interactive Tables. ~Avishek Anand Supervised by: Michael Kipp Chair: Vitaly Friedman

Advancements in Gesture Recognition Technology

Modalities for Building Relationships with Handheld Computer Agents

ARMY RDT&E BUDGET ITEM JUSTIFICATION (R2 Exhibit)

Shopping Together: A Remote Co-shopping System Utilizing Spatial Gesture Interaction

Exercise questions for Machine vision

University of Toronto. Companion Robot Security. ECE1778 Winter Wei Hao Chang Apper Alexander Hong Programmer

Multimodal Metric Study for Human-Robot Collaboration

AAU SUMMER SCHOOL PROGRAMMING SOCIAL ROBOTS FOR HUMAN INTERACTION LECTURE 10 MULTIMODAL HUMAN-ROBOT INTERACTION

ModaDJ. Development and evaluation of a multimodal user interface. Institute of Computer Science University of Bern

HandsIn3D: Supporting Remote Guidance with Immersive Virtual Environments

synchrolight: Three-dimensional Pointing System for Remote Video Communication

Pinch-the-Sky Dome: Freehand Multi-Point Interactions with Immersive Omni-Directional Data

White Balance and Colour Calibration Workflow in Lightroom with the X -Rite ColorChecker Passport

Introduction Installation Switch Skills 1 Windows Auto-run CDs My Computer Setup.exe Apple Macintosh Switch Skills 1

Integration of Speech and Vision in a small mobile robot

Version User Guide

Active Agent Oriented Multimodal Interface System

ENHANCED HUMAN-AGENT INTERACTION: AUGMENTING INTERACTION MODELS WITH EMBODIED AGENTS BY SERAFIN BENTO. MASTER OF SCIENCE in INFORMATION SYSTEMS

Activity monitoring and summarization for an intelligent meeting room

E90 Project Proposal. 6 December 2006 Paul Azunre Thomas Murray David Wright

The IntelliMedia WorkBench An Environment for Building Multimodal Systems

Collaborative Multimodal Authoring of Virtual Worlds

Movie 7. Merge to HDR Pro

Interactive and Immersive 3D Visualization for ATC. Matt Cooper Norrköping Visualization and Interaction Studio University of Linköping, Sweden

Battleship Table Display

Research Seminar. Stefano CARRINO fr.ch

WELCOME TO THE SEASONS FOR GROWTH PROGRAM PRE-GROUP SURVEY LEVEL. (for completion by the child or young person at the start of the group)

Human-Robot Interaction in Service Robotics

04. Two Player Pong. 04.Two Player Pong

Lights, Camera, Literacy! LCL! High School Edition. Glossary of Terms

Controlling vehicle functions with natural body language

STE Standards and Architecture Framework TCM ITE

COMET: Collaboration in Applications for Mobile Environments by Twisting

NCCT IEEE PROJECTS ADVANCED ROBOTICS SOLUTIONS. Latest Projects, in various Domains. Promise for the Best Projects

Human Robot Dialogue Interaction. Barry Lumpkin

Campus Fighter. CSEE 4840 Embedded System Design. Haosen Wang, hw2363 Lei Wang, lw2464 Pan Deng, pd2389 Hongtao Li, hl2660 Pengyi Zhang, pnz2102

Contents. Part I: Images. List of contributing authors XIII Preface 1

Collaborating with a Mobile Robot: An Augmented Reality Multimodal Interface

Natural Interaction with Social Robots

Performance Task. Asteroid Aim. Chapter 8. Instructional Overview

Live Hand Gesture Recognition using an Android Device

No one claims that people must interact with machines

Multi-modal Human-computer Interaction

SIAPAS: A Case Study on the Use of a GPS-Based Parking System

What topic do you want to hear about? A bilingual talking robot using English and Japanese Wikipedias

* Intelli Robotic Wheel Chair for Specialty Operations & Physically Challenged

Computer Progression Pathways statements for KS3 & 4. Year 7 National Expectations. Algorithms

All projected images must be visible from the camera point of view. The content exists in 2D - an "unwrapped" view of the content in the aspect ratio

Context-sensitive speech recognition for human-robot interaction

A*STAR Unveils Singapore s First Social Robots at Robocup2010

Enabling Natural Interaction. Consider This Device... Our Model

11+ A STEP BY STEP GUIDE HOW TO DO NON-VERBAL REASONING 11+ CEM STEP BY STEP NON-VERBAL REASONING 12+

Natural User Interface (NUI): a case study of a video based interaction technique for a computer game

University of California, Santa Barbara. CS189 Fall 17 Capstone. VR Telemedicine. Product Requirement Documentation

Objective Data Analysis for a PDA-Based Human-Robotic Interface*

A Multimodal Air Traffic Controller Working Position

Prof. Emil M. Petriu 17 January 2005 CEG 4392 Computer Systems Design Project (Winter 2005)

MULTI-LAYERED HYBRID ARCHITECTURE TO SOLVE COMPLEX TASKS OF AN AUTONOMOUS MOBILE ROBOT

Module 1 Introducing Kodu Basics

OALCF Task Cover Sheet. Apprenticeship Secondary School Post Secondary Independence

3D and Sequential Representations of Spatial Relationships among Photos

Multi-modal System Architecture for Serious Gaming

Context-Aware Planning and Verification

Battleship as a Dialog System Aaron Brackett, Gerry Meixiong, Tony Tan-Torres, Jeffrey Yu

Evaluating the Augmented Reality Human-Robot Collaboration System

Projection Based HCI (Human Computer Interface) System using Image Processing

White Balance and Colour Calibration Workflow in Photoshop with the X -Rite ColorChecker Passport

Introduction to Game Design. Truong Tuan Anh CSE-HCMUT

Handling Emotions in Human-Computer Dialogues

A flexible application framework for distributed real time systems with applications in PC based driving simulators

Multi-Platform Soccer Robot Development System

User Interface Agents

Incorporating a Connectionist Vision Module into a Fuzzy, Behavior-Based Robot Controller

RGB COLORS. Connecting with Computer Science cs.ubc.ca/~hoos/cpsc101

Smart Classroom an Intelligent Environment for distant education

CSCI370 Final Report CSM Gianquitto

1. Creating a derived CPM

Markerless 3D Gesture-based Interaction for Handheld Augmented Reality Interfaces

Asura. An Environment for Assessment of Programming Challenges using Gamification

Conversational Gestures For Direct Manipulation On The Audio Desktop

Installation Instructions

Portfolio. Swaroop Kumar Pal swarooppal.wordpress.com github.com/swarooppal1088

KNOWLEDGE & INTERPRETATION

Well-Being Survey 2010 Draft questionnaire: year 4

Learning serious knowledge while "playing"with robots

Limits of a Distributed Intelligent Networked Device in the Intelligence Space. 1 Brief History of the Intelligent Space

Mostly Passive Information Delivery a Prototype

The Role of Dialog in Human Robot Interaction

RoboCupJunior OnStage - Scoresheets 2018

Overview. The Game Idea

Real Time Hand Gesture Tracking for Network Centric Application

playing game next game

Step 1 - Setting Up the Scene

Instructions.

Records the location of the circuit board fiducials.

Visual Interpretation of Hand Gestures as a Practical Interface Modality

Transcription:

Multimodal Research at CPK, Aalborg Summary: The IntelliMedia WorkBench ( Chameleon ) Campus Information System Multimodal Pool Trainer Displays, Dialogue Walkthru Speech Understanding Vision Processing Other (student) projects New projects: Multimodality in Wireless Networks 1

The IntelliMedia Workbench ( Chameleon ) A suite of modules for vision and speech processing, dialogue management, laser pointing, blackboard etc. Purpose: Cross-disciplinary collaboration at. Exploring cross-media fusing techniques Exploring multimodal human-machine interaction 2

Workbench Application 1 A Campus Information System 3

Workbench Application 2 Multimodal Pool Trainer 4

Architecture Initially designed WorkBench architecture (as used in The Campus Information system) - and as used in the Pool Trainer 5

The Game of Pool Pool is a game that requires a combination of strategic thinking as well as physical skills. Without one, the other is not of much use. Basically, the most important requirement for any pool player is the ability to shoot the target balls into the pocket, while ensuring a good position of the cue ball for the next shot. 6

Target Pool The automatic Pool Trainer is based on the widely used Target Pool system, developed by the professional pool player Kim Davenport. Example of a typical Target Pool exercise 7

The computer Vision subsystem The main functions of the image analysis subsystem are: Calibration and detection of the positions of the empty pool table, i.e. the rails, diamonds and Detection pockets. of still and moving balls placed on the pool table Detection of when the cue ball is hit. Recording of the shot 8

The computer Vision subsystem All image analysis is carried out on binary difference images. This greatly reduces the time and space requirements for the image processing 9

Image Processing Detection of still and moving balls benefits from the distinctive patterns created by the CCD chip line scan effect. Close-lying balls are detected by removing edge pixels 10

The Laser Sub-system The laser is placed above the pool table and is used to draw the target and optimal paths of the cue- and target balls : Mark the positions where the user must place the balls 11

The Speech Sub System A number of speech recognition engines have been used a in the development of the system. SR is presently carried out by the IBM ViaVoice recogniser. Previously, Entropics GraphVite/HAPI recognition engine have been used. We are currently extending the interface (JSAPI) to include the public domain hvite recognition engine from Cambridge University. This will in turn allow us to support a larger number of languages, e.g. through the COST 249 Task Force reference recogniser initiative. 12

The Speech Sub-system The CPK Natural language Processing Suite is presently being integrated into the trainer. Apart from enabling a compund feature-based language model, the suite supports a number of popular SR grammar formats, such as htk and jsgf. Synthetic speech output is used to achieve the high degree of flexibility needed in the spoken output IBMs ViaVoice and the Infovox speech synthesisers have been used, but any SAPI compliant synthesiser is supported Speech output is synchronized with the laser, graphics and text output to form an integrated output to the user 13

Examples An example of a user interacting with the system An example as seen by the system s camera 14

The Display Sub-system To issue commands and receive instructions, the user communicates by speech via the interface agent James James is animated and understands simple commands corresponding to the menues. He instructs the user by speaking, pointing and moving around on the screen. 15

Example of the interaction during an exercise The system is activated; Q takes the initiative: Q Welcome to SMARTPOOL. Tell me your name. Svend Svend. Q Q Svend Yes. [SMARTPOOL looks up Svend and checks if he is known. Svend is known]. Hi Svend. Do you want to continue where you left last time? Q That was Course 2, Exercise 3. Screen[The exercise is shown on the projector screen. It consists of the layout of the pool table (positions and route of balls), a closeup of the location to hit the que ball, and a verbal instruction] Q (reads the verbal description aloud) 16

Example of the interaction during an exercise Laser [The position of the que ball is indicated with a circle on the table and the target ball with a cross] ScreenThe same is shown on the table drawn on the screen. Svend [Places the balls on the table, but is not careful and does not place it right] Screen[SMARTPOOL is checking the position of the ball when no more activity can be detected on the table. A ball in the wrong position is shown as red. When a pool ball is placed correctly, it turns from red to white/yellow on the PS]. Screen[When all balls are in place, the path of the cue ball, the pocket for the target ball, and the target are drawn on the table shown on the screen] Laser [The target is drawn on the table] Svend [Shoots the target ball in the pocket and manages to get the que ball fairly close to the target drawn on the table] 17

Q Example of the interaction during an exercise Nice Svend, you got 2 points. Screen[The score is shown on the screen. The status automatically returns to the setup of the exercise] Laser [The laser switches back from showing the target to the balls initial position] Svend [Pauses] Q Do you want to see your stroke? Svend Yes please. Screen[The path of the shot together with the original path are shown in different colours.] Q Do you want to see a replay of your stroke? Svend Yes please. Screen[A movie is compiled from the images captured by the camera and is shown on the screen.] Q Would you like to repeat the exercise or go on to a new? Svend No thank you. 18

Comments to the Dialogue The spoken dialogue can be carried out using the touch screen instead Dialogue is most intensive during setup and evaluation of the exercise. Although the example does not illustrate this, the user can take the initiative at almost any point. An extensive help function (both about playing pool, the exercises and the system) are available During the exercise the interaction is almost exclusively nonverbal, via physical interaction with the pool table and display on the wall-screen 19

Users Tests All users were asked to fill out a questionnaire after performing the test Usability Aspects of Interacting with the Interface agent The language was suitable The Dialogue was satisfactory The possibility to interrupt the agent was satisfactory The on-screen visualization of the Agent was nice Strongly Agree Agree Neutral Disagree Strongly Disagree 20

Users Tests However, the participating pool instructors pointed out a number of issues not addressed by the system, e.g: Stance Bridges 21

Discussion Overall, the Pool Trainer has been successful. However, some improvements are needed: The image analysis subsystem, although performing fast and accurate needs to be made more robust against changes in e.g. the lighting conditions, if the system were to be placed in a non-controlled environment If a detailed feedback of the user errors is needed, it will require knowledge about the direction and speed of the balls 22

Other (student) projects Affective computing, classification of emotional speech Recognition of hummed tunes Enhancing Lego Mind Storm with vision GPS-systems using touricstic (non-true scale) maps White Board application using gesture recognition 23

Multimodality in Wireless Networks Handheld client - Remote server distribution: what is executing where, what is transmitted? Selection of modality based on information type (e.g. speech is temporal, don t use it for time tables!) based on situation (e.g. speech enables eyes-free / hands-free operation) based on network conditions is your modality (what you transmit) sensitive to package loss? Is your modality sensitive sensitive delays does your modality require a bandwith 24