AAU SUMMER SCHOOL PROGRAMMING SOCIAL ROBOTS FOR HUMAN INTERACTION LECTURE 10 MULTIMODAL HUMAN-ROBOT INTERACTION

AAU SUMMER SCHOOL PROGRAMMING SOCIAL ROBOTS FOR HUMAN INTERACTION LECTURE 10 MULTIMODAL HUMAN-ROBOT INTERACTION COURSE OUTLINE 1. Introduction to Robot Operating System (ROS) 2. Introduction to isociobot and NAO robot, and demos 3. Social Robots and Applications 4. Machine Learning and Pattern Recognition 5. Speech Processing I: Acquisition of Speech, Feature Extraction and Speaker Localization 6. Speech Processing II: Speaker Identification and Speech Recognition 7. Image Processing I: Image Acquisition, Pre-processing and Feature Extraction 8. Image Processing II: Face Detection and Face Recognition 9. User Modelling 10. Multimodal Human-Robot Interaction 5 AUGUST 2015 AALBORG UNIVERSITY 2 1

MULTIMODAL INTERACTION WHAT? IN THE CONTEXT OF HUMAN COMPUTER INTERACTION, A MODALITY IS THE CLASSIFICATION OF A SINGLE INDEPENDENT CHANNEL OF SENSORY INPUT/OUTPUT BETWEEN A COMPUTER AND A HUMAN. A SYSTEM IS DESIGNATED UNIMODAL IF IT HAS ONLY ONE MODALITY IMPLEMENTED, AND MULTIMODAL IF IT HAS MORE THAN ONE. KARRAY, FAKHREDDINE, ET AL. "HUMAN-COMPUTER INTERACTION: OVERVIEW ON STATE OF THE ART." (2008). 5 AUGUST 2015 AALBORG UNIVERSITY 3 MULTIMODAL INTERACTION WHY? Extended functionality, e.g. we can speak to the robot instead of typing Human-Human like communication 5 AUGUST 2015 AALBORG UNIVERSITY 4 2

MULTIMODAL INTERACTION WHY? Robustness against noise Data from one modality might be very noisy, however the rest are not combine the modalities Person Identification: background music is corrupting recorded speech, however vision is unaltered. Speech Recognition: background music is corrupting recorded speech, however vision can be used to recognize lip movements and classify words How to know which modality to trust? 5 AUGUST 2015 AALBORG UNIVERSITY 5 MULTIMODAL INTERACTION WHY? Provide new information, which could not be provided by individual modalities Combination of sound + facial expression = emotion Learning: Sometimes only one modality is available, but noisy Use knowledge from one modality to re-train/adapt model in other domain Examples: Person Identification, Direction of Attention 5 AUGUST 2015 AALBORG UNIVERSITY 6 3

05/08/15 D U R A B L E I N T E R A C T I O N W I T H S O C I A L LY I N T E L L I G E N T ROBOTS isociobot Research project supported by The Danish Council for Independent R e s e a r c h Te c h n o l o g y a n d P r o d u c t i o n S c i e n c e s, M i n i s t r y o f Science, Innovation and Higher Education To m a k e r o b o t s s o c i a l l y i n t e l l i g e n t a n d c a p a b l e o f e s t a b l i s h i n g durable relationship with their users Multi-modal: speech, vision, facial expression etc. 5 AUGUST 2015 AALBORG UNIVERSITY 7 AALBORG UNIVERSITY 8 H A R D WA R E First generation 5 AUGUST 2015 4

HARDWARE First generation 5 AUGUST 2015 AALBORG UNIVERSITY 9 HARDWARE Second generation 5 AUGUST 2015 AALBORG UNIVERSITY 10 5

HARDWARE Second generation What changed? New body material and shape New ears ipad (input and output) New robot base (Pioneer P3-DX) 5 AUGUST 2015 AALBORG UNIVERSITY 11 SOFTWARE System OS: UBUNTU Robot OS: ROS A great framework for each module/function to communicate Widely used and high-quality software available Open-source Support Python or C 5 AUGUST 2015 AALBORG UNIVERSITY 12 6

SOFTWARE 5 AUGUST 2015 AALBORG UNIVERSITY 13 DEMOS The Day of Research 2014 5 AUGUST 2015 AALBORG UNIVERSITY 14 7

DEMOS Sikker 7 in Nibe 5 AUGUST 2015 AALBORG UNIVERSITY 15 DEMOS The Culture Night 2014 5 AUGUST 2015 AALBORG UNIVERSITY 16 8

DEMOS The people s meeting 2015 5 AUGUST 2015 AALBORG UNIVERSITY 17 FUTURE WORK Research User modeling Reinforcement fusion Collaboration/Application: Future Nursing Home Potential application: Playing/learning with children 5 AUGUST 2015 AALBORG UNIVERSITY 18 9

COURSE OUTLINE 1. Introduction to Robot Operating System (ROS) 2. Introduction to isociobot and NAO robot, and demos 3. Social Robots and Applications 4. Machine Learning and Pattern Recognition 5. Speech Processing I: Acquisition of Speech, Feature Extraction and Speaker Localization 6. Speech Processing II: Speaker Identification and Speech Recognition 7. Image Processing I: Image Acquisition, Pre-processing and Feature Extraction 8. Image Processing II: Face Detection and Face Recognition 9. User Modelling 10. Multimodal Human-Robot Interaction 5 AUGUST 2015 AALBORG UNIVERSITY 19 10