Handling Emotions in Human-Computer Dialogues

Similar documents
Social Understanding

ENHANCED HUMAN-AGENT INTERACTION: AUGMENTING INTERACTION MODELS WITH EMBODIED AGENTS BY SERAFIN BENTO. MASTER OF SCIENCE in INFORMATION SYSTEMS

INFORMATION TECHNOLOGY AND LAWYERS

Current Technologies in Vehicular Communications

Smart AD and DA Conversion

Application of Evolutionary Algorithms for Multi-objective Optimization in VLSI and Embedded Systems

Educational Research: The Ethics and Aesthetics of Statistics

Springer Series in Advanced Microelectronics 33

Dry Etching Technology for Semiconductors. Translation supervised by Kazuo Nojiri Translation by Yuki Ikezi

Advances in Computer Vision and Pattern Recognition

NO MORE MUDDLING THROUGH

Design for Innovative Value Towards a Sustainable Society

Robust Hand Gesture Recognition for Robotic Hand Control

Automotive Painting Technology

Real-time Adaptive Concepts in Acoustics

ANALOG CIRCUITS AND SIGNAL PROCESSING

Variation Tolerant On-Chip Interconnects

Interactive and Immersive 3D Visualization for ATC

Philips Research Book Series

Understanding the Mechanism of Sonzai-Kan

STUDY ON REFERENCE MODELS FOR HMI IN VOICE TELEMATICS TO MEET DRIVER S MIND DISTRACTION

E. Torenbeek H. Wittenberg. Flight Physics. Essentials of Aeronautical Disciplines and Technology, with Historical Notes

AUTOMATIC MODULATION RECOGNITION OF COMMUNICATION SIGNALS

Design of Ultra Wideband Antenna Matching Networks

Sustainable Development

Offshore Energy Structures

SpringerBriefs in Space Development

INTEGRATED AUDIO AMPLIFIERS IN BCD TECHNOLOGY

Handbook of Engineering Acoustics

ICT for the Next Five Billion People

Text Emotion Detection using Neural Network

Context-sensitive speech recognition for human-robot interaction

Broadband Networks, Smart Grids and Climate Change

Advanced Information and Knowledge Processing

SpringerBriefs in Astronomy

Socio-technical Design of Ubiquitous Computing Systems

Speech and Audio Processing for Coding, Enhancement and Recognition

Intelligent Control Systems with LabVIEW

The Cultural and Social Foundations of Education. Series Editor A.G. Rud College of Education Washington State University USA

SpringerBriefs in Computer Science

AUTOMATED BIOMETRICS Technologies and Systems

LEARNING, INNOVATION AND URBAN EVOLUTION

Automated Multi-Camera Surveillance Algorithms and Practice

Technology Roadmapping for Strategy and Innovation

Digital Photo Guide. Version 8

HANDBOOK OF TABLEAU METHODS

Progress in Computer Science No.4. Edited by J.Bendey E. Coffman R.L.Graham D. Kuck N. Pippenger. Springer Science+Business Media, LLC

Risk/Benefit Analysis in Water Resources Planning and Management

Health Information Technology Standards. Series Editor: Tim Benson

Dao Companion to the Analects

STAUNING Trade-In Internet Sales Process with /Voic Templates to Non-Responsive Prospects 2018 Edition

Rubber Processing and Production Organization

Communications in Computer and Information Science 85

AFFECTIVE COMPUTING FOR HCI

Hiroyuki Kajimoto Satoshi Saga Masashi Konyo. Editors. Pervasive Haptics. Science, Design, and Application

Marketing and Designing the Tourist Experience

Hand Gesture Recognition System for Daily Information Retrieval Swapnil V.Ghorpade 1, Sagar A.Patil 2,Amol B.Gore 3, Govind A.

GE/GN8648. Guidance on Positioning of Lineside Telephones. Rail Industry Guidance Note for GE/RT8048

HCI Midterm Report CookTool The smart kitchen. 10/29/2010 University of Oslo Gautier DOUBLET ghdouble Marine MATHIEU - mgmathie

STAUNING /Voic Templates to Non-Responsive Trade-In Prospects 2017 Edition

Convolutional Neural Networks: Real Time Emotion Recognition

SpringerBriefs in Space Development

Graduate Texts in Mathematics. Editorial Board. F. W. Gehring P. R. Halmos Managing Editor. c. C. Moore

Advanced Analytics for Intelligent Society

Human and Mediated Communication around the World

Reliability Data Collection and Analysis

Solar Energy for Development

Advances in Metaheuristic Algorithms for Optimal Design of Structures

Postdisciplinary Studies in Discourse

Damage Assessment and Reconstruction after War or Natural Disaster

A Practical Guide to Frozen Section Technique

Video Segmentation and Its Applications

Microwave Photonics. From Components to Applications and Systems. edited by. Anne Vilcot. IMEP - INPG, France. Béatrice Cabon. and

HIGH INTEGRITY DIE CASTING PROCESSES

Voice Control System Operation Guide. Mercedes-Benz

Principles of Data Security

Essay on A Survey of Socially Interactive Robots Authors: Terrence Fong, Illah Nourbakhsh, Kerstin Dautenhahn Summarized by: Mehwish Alam

A new technique for distance measurement of between vehicles to vehicles by plate car using image processing

VIEW POINT CHANGING THE BUSINESS LANDSCAPE WITH COGNITIVE SERVICES

ARTIFICIAL NEURAL NETWORKS Learning Algorithms, Performance Evaluation, and Applications

TIPS FOR COMMUNICATING WITH CRIME VICTIMS

Announcements. HW 6: Written (not programming) assignment. Assigned today; Due Friday, Dec. 9. to me.

Enjoy Public Speaking - Workbook Saying Goodbye to Fear or Discomfort

Privacy, Data Protection and Cybersecurity in Europe

Types of roads. Text and symbols on the screen Direction to take at next guidance point. Distance to the next guidance point

Requirements Engineering for Digital Health

Lynne Lee. There are those who speak rashly, like the piercing of a sword, but the tongue of the wise brings healing. Proverbs!

Inside the Smart Home

Management of Software Engineering Innovation in Japan

Tech Center a-drive: EUR 7.5 Million for Automated Driving

Founding Editor Martin Campbell-Kelly, University of Warwick, Coventry, UK

ROBOT CONTROL VIA DIALOGUE. Arkady Yuschenko

StraBer Wahl Graphics and Robotics

Imaging Spectrometry - a Tool for Environmental Observations

Applied Technology and Innovation Management

Computational Intelligence for Network Structure Analytics

Designing the user experience of a multi-bot conversational system

Lateral Flow Immunoassay

Virtual Tactile Maps

AUTOMATIC SPEECH RECOGNITION FOR NUMERIC DIGITS USING TIME NORMALIZATION AND ENERGY ENVELOPES

Transcription:

Handling Emotions in Human-Computer Dialogues

Johannes Pittermann Angela Pittermann Wolfgang Minker Handling Emotions in Human-Computer Dialogues ABC

Johannes Pittermann Universität Ulm Inst. Informationstechnik Albert-Einstein-Allee 43 89081 Ulm Germany johannes.pittermann@alumni.uni-ulm.de Angela Pittermann Universität Ulm Inst. Informationstechnik Albert-Einstein-Allee 43 89081 Ulm Germany angelapittermann@gmx.de Wolfgang Minker Universität Ulm Fak. Ingenieurwissenschaften And Elektrotechnik Albert-Einstein-Allee 43 89081 Ulm Germany wolfgang.minker@uni-ulm.de ISBN 978-90-481-3128-0 e-isbn 978-90-481-3129-7 DOI 10.1007/978-90-481-3129-7 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2009931247 c Springer Science+Business Media B.V. 2010 No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Cover design: Boekhorst Design b.v. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

Preface The finest emotion of which we are capable is the mystic emotion (Albert Einstein, 1879 1955) During the past years the mystery of emotions has increasingly attracted interest in research on human computer interaction. In this work we investigate the problem of how to incorporate the user s emotional state into a spoken language dialogue system. The book describes the recognition and classification of emotions and proposes models integrating emotions into adaptive dialogue management. In computer and telecommunication technologies the way in how people communicate with each other is changing significantly from a strictly structured and formatted information transfer to a flexible and more natural communication. Spoken language is the most natural way of communication between humans and it also provides an easy and quick way to interact with a computer application. These systems range from information kiosks where travelers can book flights or buy train tickets to handheld devices which show tourists around cities while interactively giving information about points of interest. Generally, spoken language dialogue does not only mean simplicity, comfort and saving of time but moreover contributes to safety aspects in critical environments like in cars, where hands-free operation is indispensible in order to keep the driver s distraction minimal. Within the context of ubiquitous computing in intelligent environments dialogue systems facilitate everyday work, e.g., at home where lights or household appliances can be controlled by voice commands, and provide the possibility, especially in assisted living, to quickly summon help in emergency cases. In parallel to the progress made in technical development the customer s demands concerning the products have increased. While car owners in the 1920s might have been completely satisfied once they arrived at a destination without any major complications, people in the 1970s would have already tended to become annoyed once their engine refuses to start on the first turn of the ignition key. And nowadays a navigation system showing the wrong way might even cause more anger. For ubiquitous technology like cars this means on the one hand that the driver is literally at the mercy of sophisticated technology on the other hand this does not hinder him/her from building some kind of personal relation to the car, ranging from decorations v

vi Preface like car fresheners or fuzzy dice to expensive tuning. Such a relation includes as well the expression of emotions towards the car just imagine drivers spurring on their cars when climbing a steep hill and being glad having reached the top, or drivers shouting at their non-functioning navigation system, hitting or kicking their cars... A similar behavior can be observed among computer users. Having successfully written a book using a word processing software might arouse happiness, however a sudden hard disc crash destroying all documents will probably drive the author up the wall. Normally neither the car nor the computer is capable of replying to the user s affect. So why not enable devices to react accordingly? Think of a car that refuses to start and the driver shouting angrily Stupid car, I paid more than $40,000 and now it s only causing trouble!. Here a car s reply like I am sorry that the engine does not run properly. This is due to a defective spark-plug which needs to be replaced. would certainly defuse the tense situation and it moreover provides useful information on how to solve the problem. This again contributes to safety aspects in the car as the driver can be calmed down, e.g., in the case of a delay due to a traffic jam, whereupon the driver tries to make up the loss of time by speeding. Here the car s computer could try to rearrange the planned meeting and inform the user: Due to our delay I have rescheduled your meeting one hour later. So there is no need to hurry. To implement a more flexible system, the typical architecture of a spoken language dialogue system needs to be equipped with additional functionality. This includes the recognition of emotions and the detection of situation-based parameters as well as user-state and situation managers which calculate models based on these parameters and influence the course of the dialogue accordingly. Constituting a hot topic of interest in current research there exist several approaches to classify the user s emotions. These methods include the measurement of physiological values using biosensors, the interpretation of gestures and facial expressions using cameras, natural language processing spotting emotive keywords and fillers in recognized utterances or classification of prosodic features extracted from the speech signal. Concentrating on a monomodal system without video input and trying to reduce inconveniences to the user, this work focuses on the recognition of emotions from the speech signal using Hidden Markov Models (HMMs). Based on a database of emotional speech, a set of prosodic features has been selected and HMMs have been trained and tested for six emotions and ten speakers. Due to variations in model parameters multiple recognizers have been implemented. According to the output of the emotion recognizer(s) the course of the dialogue is influenced. With the help of a user-state model and a situation model the dialogue strategy is adapted and an appropriate stylistic realization of its prompts is chosen. I.e., if the user is in a neutral mood and speaks clearly, there are no confirmations necessary and the dialogue can be kept relatively short. However if the user is angry and speaks correspondingly unclearly, the system has to try to calm down the user but it also has to ask often for confirmation, which again makes the user turn angry... Principally there exist two methods to model the influence of these so-called control parameters like emotions: a rule-based approach where every eventuality in the

Preface vii user s behavior is covered by a rule which contains a suitable reply, or a stochastic approach which models the probability of a certain reply in dependence of the user s previous utterances and corresponding control parameters. So how is this book organized? An introduction to the research topic is followed by an overview on emotions theories and emotions in speech. In the third chapter, dialogue strategy concepts with regard to integrating emotions in spoken dialogue are described. Signal processing and speech-based emotion recognition are discussed in Chapter 4 and improvements to our proposed emotion recognizers as well as the implementation of our adaptive dialogue manager are discussed in Chapter 5. Chapter 6 presents evaluation results of the emotion recognition component and of the end-to-end system with respect to existing spoken language dialogue systems evaluation paradigms. The book concludes with a final discussion and an outlook on future research directions. Ulm, May 2009 Johannes & Angela Pittermann Wolfgang Minker

Contents 1 Introduction... 1 1.1 Spoken Language Dialogue Systems... 2 1.2 Enhancing a Spoken Language Dialogue System... 6 1.3 Challenges in Dialogue Management Development... 8 1.4 Issues in User Modeling... 11 1.5 Evaluation of Dialogue Systems... 14 1.6 Summary of Contributions... 16 2 Human Emotions... 19 2.1 Definition of Emotion... 19 2.2 Theories of Emotion and Categorization... 22 2.3 Emotional Labeling... 36 2.4 Emotional Speech Databases/Corpora... 42 2.5 Discussion... 45 3 Adaptive Human Computer Dialogue... 47 3.1 Background and Related Research... 48 3.2 User-State and Situation Management... 61 3.3 Dialogue Strategies and Control Parameters... 65 3.4 Integrating Speech Recognizer Confidence Measures into Adaptive Dialogue Management... 66 3.5 Integrating Emotions into Adaptive Dialogue Management... 72 3.6 A Semi-Stochastic Dialogue Model... 78 3.7 A Semi-Stochastic Emotional Model... 90 3.8 A Semi-Stochastic Combined Emotional Dialogue Model... 95 3.9 Extending the Semi-Stochastic Combined Emotional Dialogue Model...100 3.10 Discussion...104 4 Hybrid Approach to Speech Emotion Recognition...107 4.1 Signal Processing...108 4.2 Classifiers for Emotion Recognition...120 4.3 Existing Approaches to Emotion Recognition...127 ix

x Contents 4.4 HMM-Based Speech Recognition...131 4.5 HMM-Based Emotion Recognition...135 4.6 Combined Speech and Emotion Recognition...142 4.7 Emotion Recognition by Linguistic Analysis...144 4.8 Discussion...149 5 Implementation...151 5.1 Emotion Recognizer Optimizations...151 5.2 Using Multiple (Speech )Emotion Recognizers...159 5.3 Implementation of Our Dialogue Manager...173 5.4 Discussion...185 6 Evaluation...187 6.1 Description of Dialogue System Evaluation Paradigms...187 6.2 Speech Data Used for the Emotion Recognizer Evaluation...190 6.3 Performance of Our Emotion Recognizer...192 6.4 Evaluation of Our Dialogue Manager...217 6.5 Discussion...223 7 Conclusion and Future Directions...227 A Emotional Speech Databases...237 B Used Abbreviations...251 References...253 Index...273