Detecting perceived quality of interaction with a robot using contextual features. Ginevra Castellano, Iolanda Leite & Ana Paiva.

Detecting perceived quality of interaction with a robot using contextual features Ginevra Castellano, Iolanda Leite & Ana Paiva Autonomous Robots ISSN 0929-5593 DOI 10.1007/s10514-016-9592-y 1 23

Your article is protected by copyright and all rights are held exclusively by Springer Science +Business Media New York. This e-offprint is for personal use only and shall not be selfarchived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com. 1 23

DOI 10.1007/s10514-016-9592-y Detecting perceived quality of interaction with a robot using contextual features Ginevra Castellano 1 Iolanda Leite 2 Ana Paiva 2 Received: 13 March 2015 / Accepted: 28 June 2016 Springer Science+Business Media New York 2016 Abstract This work aims to advance the state of the art in exploring the role of task, social context and their interdependencies in the automatic prediction of affective and social dimensions in human robot interaction. We explored several SVMs-based models with different features extracted from a set of context logs collected in a human robot interaction experiment where children play a chess game with a social robot. The features include information about the game and the social context at the interaction level (overall features) and at the game turn level (turn-based features). While overall features capture game and social context at the interaction level, turn-based features attempt to encode the dependencies of game and social context at each turn of the game. Results showed that game and social context-based features can be successfully used to predict dimensions of quality of interaction with the robot. In particular, overall features proved to perform equally or better than turn-based features, and game context-based features more effective than social contextbased features. Our results show that the interplay between game and social context-based features, combined with features encoding their dependencies, lead to higher recognition performances for a subset of dimensions. B Ginevra Castellano ginevra.castellano@it.uu.se Iolanda Leite iolanda.leite@gmail.com Ana Paiva ana.paiva@inesc-id.pt 1 Division of Visual Information and Interaction, Department of Information Technology, Uppsala University, Uppsala, Sweden 2 INESC-ID and Instituto Superior Tecnico, Technical University of Lisbon, Porto Salvo, Portugal Keywords Affect recognition Quality of interaction Social robots Human robot interaction Contextual features 1 Introduction The recent advances in affective behavioural computing have brought a lot of attention to the design of systems for the automatic prediction of users affective states in computerbased applications. Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions, eye gaze, speech, body movement, physiological data, etc.) (Castellano et al. 2012) have been investigated in the recent years. Researchers have been investigating how accurately these systems can predict users affective states (Zeng et al. 2009) and other social variables related to engagement and quality of user experience, mainly in the areas of human computer and human robot interaction (Rani and Sarkar 2005; Mandryk et al. 2006; Rich et al. 2010; Lassalle et al. 2011). Despite the increasing interest in this area, the role of context in the automatic recognition of affective and social dimensions is still underexplored. This is somewhat surprising, as behavioural expressions are strictly dependent on the context in which they occur (Castellano et al. 2010). On the other hand, the process of capturing context and its dependencies with behavioural expressions, social signals and affective states is extremely challenging (Ochs et al. 2011). Capturing the role of context in the automatic recognition of affective and social dimensions is of high importance in applications where there is some form of interaction taking place, for example in human robot interactions. Here, affective and social dimensions are related with contextual aspects

such as the task the interactants are involved in, their behaviour, actions, preferences and history (Riek and Robinson 2011). As robotic companions are increasingly being studied as partners that collaborate and perform tasks along with people (Aylett et al. 2011), the automatic prediction of affective and social dimensions in human robot interaction must take into account the role of task and social interaction-based features. In the area of automatic affect recognition, there are some good examples of studies exploring the role of task-related context (Kapoor and Picard 2005; Martinez and Yannakakis 2011; Sabourin et al. 2011). However, research investigating task-related context, social context and their interdependencies for the purpose of automatically predicting affective and social dimensions is still limited, especially in the areas of in human computer and human robot interaction. In this work, we investigate the role of social and game context and their interdependencies in the automatic recognition of quality of interaction between children and social robots. To do so, we measure perceived quality of interaction with affective [i.e., social engagement (Poggi 2007; Sidner et al. 2004)], friendship [i.e., help and self-validation (Mendelson and Aboud 1999)], and social presence [i.e., perceived affective interdependence (Biocca 1997)] dimensions, which have been previously shown to be successful in measuring the influence of a robot s behaviour on the relationship between the robot itself and children (Leite 2013). The interaction scenario involves a social robot, the icat (Breemen et al. 2005), which plays chess with children and provides real-time feedback through non-verbal behaviours, supportive comments and actions based on the events unfolding in the game and the affective states experienced by the children. We derived a set of contextual features related to the game and the social behaviour displayed by the robot from automatically extracted context log files. These features encode information at the interaction level (overall features) and the game turn level (turn-based features). The overall features capture game and social context in an independent way at the interaction level, while turn-based features encode the interdependencies of game and social context at each turn of the game. The extracted features were used to train Support Vector Machines (SVMs) models to predict dimensions of quality of interaction with the robot during an interaction session. Our main research questions in this work are the following: How can game and social context-based features be used to predict children s quality of interaction with a robot? What is the role of game and social context in the automatic prediction of dimensions of quality of interaction? How do overall features and turn-based features compare in terms of recognition performance? Is the combination of overall and turn-based features more successful than the two sets of features used independently for predicting dimensions of quality of interaction? We trained several SVMs-based models using different sets of game and social context-based features. Results showed that game and social context-based features can successfully predict children s quality of interaction with a robot in our investigation scenario. In particular, overall features proved to perform equally or better than turn-based features, and game context-based features were more effective than social context-based features. Our results also showed that the integration of game and social context-based features with features encoding their interdependencies leads to higher recognition performances for the dimensions of social engagement and help. This paper is organised as follows. In the next section we present related work on context-sensitive affect and social signals recognition, and socially intelligent robots. Section 3 describes our scenario and contains an overview of the architecture of the robotic game companion, while Sect. 4 presents the experimental methodology. Finally, Sect. 5 provides an overview of the experimental results, Sect. 6 discusses the results and in Sect. 7 we summarise our main findings. 2 Related work 2.1 Context-sensitive affect recognition A limited number of studies have investigated the problem of context-sensitive affect recognition. Kapoor et al. (2005), for example, proposed an approach for predicting interest in a learning environment by using a combination of nonverbal cues and information about the learner s task. Malta et al. (2008) proposed a system for the multimodal estimation of a driver s irritation that exploits information about the driving context. Peters et al. (2010) modelled the user s interest and engagement with a virtual agent displaying shared attention behaviour, by using contextualised eye gaze and head direction information. More recently, Sabourin et al. (2011) used Dynamic Bayesian Networks modelling personality attributes for the automatic prediction of learner affect in a learning environment. In our own previous work, we demonstrated the importance of game context for discriminating user s affect (Castellano et al. 2009), and that children s engagement with the robot can be recognised using posture data (Sanghvi et al. 2011) and a combination of task and social interaction-based features (Castellano et al. 2009). One of the main challenges in this area of research is the question of how to model and encode relationships between

different types of context and between context and other modalities. In this work we investigate contextual feature representation in a human robot interaction scenario and explore how to timely encode task and game context and their relationships in a human robot interaction session. We take inspiration from the work by Morency et al. (2008), who proposed a context-based recognition framework that integrates information from human participants engaged in a conversation to improve visual gesture recognition. They proposed the concept of encoding dictionary, a technique for contextual feature representation that captures different relationships between contextual and visual gestures. Similarly, we aim to propose a method to encode relationships between different types of features, but we focus primarily on different types of contextual information (and not on features encoding information from different modalities), and on predicting social variables such as quality of interaction (and not on automatic gesture recognition). Other work used contextual information for the purpose of automatic affect recognition. For example, Martinez and Yannakakis (2011) investigated a method for the fusion of physiological signals and game-related information in a game play scenario. Their proposed approach uses frequent sequence mining to extract sequential features that combine events across different user input modalities. On the contrary, our work attempts to model interdependencies of different contextual features not only at the turn level, but also at the interaction level. 2.2 Socially intelligent robots Recent research on socially intelligent robots shows that robots are increasingly being studied as partners that collaborate, do tasks and learn from people, making the use of robotic platforms as tools for experimental human robot interaction more approachable (Breazeal 2009). Social perception abilities are of key importance for a successful interaction with human users. These allow robots to respond to human users in an appropriate way, and facilitate the establishment of a human-centred multi-modal communication with the robot. Breazeal et al. were amongst the first to design socially perceptive abilities for robots: they developed an attention system based on low-level perceptual stimuli used by robots such as Kismet and Leonardo (Breazeal et al. 2001; Breazeal 2009). Other examples of social perceptive abilities are described in the works by Mataric and colleagues. Mead et al., for example, developed a model for the automatic recognition of interaction cues in a human robot interaction scenario, such as initiation, acceptance and termination (Mead et al. 2011), while Feil-Seifer and Mataric proposed a method to automatically determine whether children affected by autism are interacting positively or negatively with a robot, using distance-based features (Feil-Seifer and Mataric 2011). There are several types of robots that benefit from expressive behaviour and socially perceptive abilities, from those employed in socially assistive robotics (Scassellati et al. 2012) to those acting as companions of people. Examples of the latter include companions for health applications (von der Putten et al. 2011), for education purposes (Saerbeck et al. 2010), for children (Espinoza et al. 2011) and for the elderly (Heerink et al. 2008). In this paper we focus on a game companion for young children, endowed with expressive behaviour and social perception and affect recognition abilities. We investigate social perception abilities in relation to the development of a computational model that allows a robot to predict a user s perception of the quality of interaction with the robot itself based on contextual features. 3 Scenario The human robot interaction scenario used in this work consists of an icat robot that acts as a chess game companion for children. The robot is able to play chess with children, providing affective feedback based on the evaluation of the moves played on an electronic chessboard. The affective feedback is provided by means of facial expressions and verbal utterances, and by reacting empathically to children s feelings experienced throughout the game (see Fig. 1). Ifthe user is experiencing a negative feeling, the robot generates an empathic response to help or encourage the user. Alternatively, if the user is experiencing a positive feeling, no empathic reaction is triggered. In this case, the robot simply generates an emotional facial expression (Leite et al. 2008). After every user s move, the robot asks the user to make its own move by providing the desired chess coordinates. Then, the robot waits for the user s move and so on. Initial user studies using this scenario showed that affect sensing and empathic interventions increase children s engagement and perception of friendship towards the robot (Leite et al. 2012a, b). 3.1 System overview Our system was built on a novel platform for affect sensitive, adaptive human robot interaction that integrates an array of sensors in a modular client-server architecture (Leite et al. 2014). The main modules of the system are displayed in Fig. 2. The Vision module captures children s non-verbal behaviours and interactions with the robot (Fig. 1). This module tracks head movements and salient facial points in real

Fig. 1 User interacting with the robot during the experiment Fig. 2 icat system architecture time using a face tracking toolkit 1 and extracts information about users gaze direction and probability of smile. The Game Engine is built on top of an open-source chess engine. 2 In addition to deciding the next move for the robot, this module includes evaluation functions that determine how well the robot (or the child) are doing in the game. The values of these functions are updated after every move played in the game. Based on the history of these values, the game engine automatically extracts game-related contextual features, as well as features related to certain behaviours displayed by the robot. After every move played by the user, the Affect Detection module predicts the probability values for valence (i.e., whether the user is more likely to be experiencing a 1 http://www.seeingmachines.com/. 2 http://www.tckerrigan.com/chess/tscp. positive, neutral or negative feeling) (Castellano et al. 2013) using synchronised features from the Vision module (i.e., user s probability of smile and gaze direction) and the Game Engine. This SVM-based module was trained with a corpus of contextualised children s affective expressions (Castellano et al. 2010). The Appraisal module assesses the situation of the game and determines the robot s affective state. The output of this module is used by the Action Selection module, responsible for generating affective facial expressions in tune with the robot s affective state. If the user is experiencing a negative feeling with probability over a certain threshold (fine-tuned after preliminary tests with a small group of users) this module also selects one of the robot s empathic strategies to be displayed to the user.

4 Experimental methodology To investigate the role of game and social context in the automatic prediction of dimensions of quality of interaction with the robot, we conducted a data collection field experiment. Our main goal was to explore new ways in which context can improve the design of empathic robots. 4.1 Experimental setting The study was conducted in a Portuguese school where children have chess lessons as part of their maths curriculum. The setting consists of the robot, an electronic chessboard, a computer where all the processing takes place and a webcam that captures the children s expressions to be analysed by the affect detection module (Fig. 1). Two video cameras were also used to record the interactions from a frontal and lateral perspective. The setting was installed in the room where children have their weekly chess lessons. 4.2 Procedure A total of 40 children (19 male and 21 female), with ages between 8 and 10 years old, participated in the experiment. Participants were randomly assigned to three different control conditions, corresponding to three different parameterisations of the robot s behaviour: 1. Neutral The robot simply comments the children s moves in a neutral way (e.g., you played well, good move ). 2. Random empathic When the user is experiencing a negative feeling (i.e., when the probability that the user is a negative feeling calculated by the Affect Detection Module is high), the icat randomly selects one of the available empathic strategies. 3. Adaptive empathic When the user is experiencing a negative feeling (i.e., when the probability that the user is a negative feeling calculated by the Affect Detection Module is high), the empathic strategies are selected using an adaptive learning algorithm that captures the impact of each strategy on different users. In this case, the robot adapts its empathic behaviour considering the previous reactions of a particular user to an empathic strategy (Leite et al. 2014). In addition to differences in selecting the empathic strategies, the robot s affective feedback is also different in the three conditions. While in the two empathic conditions the affective behaviours are user-oriented (i.e., the robot displays positive emotions if the user makes good moves and negative emotions if the user makes bad moves), in the neutral condition the robot s behaviour is self-oriented (i.e., it shows sadness if the user makes good moves, etc.). Our final sample consisted of 14 children in the adaptive empathic condition, 13 in the random empathic and 13 in the neutral condition. Participants were instructed to sit in front of the robot and play a chess exercise. The exercise was suggested by the chess instructor so that the difficulty was appropriate for the children and it was the same for all the participants. Two experimenters were in the room observing the interaction and controlling the experiment. On average, each participant played 15 minutes with the robot. After that period, depending on the state of the game, the icat either gave up (if it was in disadvantage) or proposed a draw (if the child was losing or if none of the players had advantage), arguing that it had to play with another user. 4.3 Data collection For each human robot interaction, context logs including information about the game and the social context throughout the game were automatically collected. After playing with the robot, participants were directed to another room where they filled in a questionnaire to rate their quality of interaction with the robot. Note that children from the age group taken into consideration are sufficiently developed to answer questions with some consistency (Borgers et al. 2000). We collected a total of 40 context logs and ratings for five dimensions of quality of interaction. 4.3.1 Quality of interaction: dimensions We measured quality of interaction using a set of affective, friendship, and social presence dimensions that have been previously shown to be successful in measuring the influence of a robot s behaviour on the relationship between the robot itself and children (Leite 2013). Children were asked to rate these dimensions throughout the interaction using a 5-point Likert scale, where 1 meant totally disagree and 5 meant totally agree. For each dimension presented below, we considered the average ratings of all the questionnaire items associated to it. Social engagement this metric has been extensively used to measure quality of interaction between humans (Poggi 2007), and between humans and robots (Sidner et al. 2004). The engagement questionnaire we used is based on the questions formulated by Sidner et al. (2004) to evaluate users responses towards a robot capable of using social capabilities to attract users attention. It included the questions below: icat made me participate more in the game. It was fun playing with icat. Playing with the icat caused me real feelings and emotions. I lost track of time while playing with icat. Help measures how the robot is perceived to have provided guidance and other forms of aid to the user throughout

the game. In particular, we refer to help as a friendship dimension that measures the degree to which a friend fullfils a support function existing in most friendship definitions, as suggested by Mendelson and Aboud in the McGill Friendship Questionnaire (Mendelson and Aboud 1999). Questions included: icat helped me during the game. icat s comments were useful to me. icat showed me how to do things better. icat s comments were helpful to me. Self-validation another friendship dimension taken from the McGill Friendship Questionnaire (Mendelson and Aboud 1999), which measures the degree to which children perceived the robot as reassuring, encouraging and helped the children to maintain a positive self-image. To measure selfvalidation we asked children the following questions: icat encouraged me to play better during the game. icat praised me when I played well. icat made me feel smart. I felt that I could play better in the presence of icat. icat highlighted the things that I am good at. icat made me feel special. Perceived affective interdependence this dimension was used to measure social presence, that is, the degree to which a user feels access to the intelligence, intentions, and sensory impressions of another (Biocca 1997). Here, perceived affective interdependence measures the extent to which the user s emotional states are perceived to affect and be affected by the robot s emotional states. I was influenced by icat s mood. icat was influenced by my mood. 4.3.2 Game context For each interaction with the robot, the game context logs include information on the events happening in the game. These include: Game state a value that represents the advantage/ disadvantage of the child in the game. This value is calculated by the same chess evaluation function that the icat uses to plan its own moves. The more the value of the game state is positive, the more the user is in advantage with respect to the icat; the more it is negative, the more the icat is in advantage. Game evolution given by the difference between the current and the previous value of the game state, a positive value indicates that the user is improving in the game with respect to the previous move, while a negative value means that the user s performance is worse since the last move. User emotivector after every move made by the user the chess evaluation function in the Game Engine returns a new value, updated according to the current state of the game. The robot s Emotivector System (Martinho and Paiva 2006) is an anticipatory system that captures this value and, by using the history of evaluation values, computes an expected value for the chess evaluation function associated with the user s moves. Based on the mismatch between the expected value and the actual value, the system generates a set of affective signals describing a sentiment of reward, punishment or neutral effect for the user (Leite et al. 2008), which are encoded into the following features: UserEmotivectorReward, UserEmotivectorPunishment and UserEmotivectorNeutral. Game result the outcome of the game (e.g., user won, icat won, draw, icat gave up). Number of moves total number of turns in the game. Game duration duration of the game in seconds. 4.3.3 Social context These logs contain information on the behaviour displayed by the icat at specific times of the interaction. Additionally, social context logs also include information about the robot s conditions (i.e., neutral, random empathic, adaptive empathic) as well as, in case of the empathic condition, the type of adopted empathic strategy. The empathic strategies employed by the robot are the following: Encouraging comments (strategy n.0) statements such as don t be sad, you can still recover your disadvantage. Scaffolding (strategy n.1) providing feedback on the user s last move and allowing the user to take back a move (in case the move is not good). Offering help (strategy n.2) suggesting a good move for theusertoplay. Intentionally playing a bad move (strategy n.3) deliberately playing a bad move, for example, choosing a move that allows the user to capture a piece. 5 Experimental results In this section, we start by describing the process of feature extraction and then the results of training and testing models for the prediction of quality of interaction with the robot. Our goal is to investigate whether we can predict the perceived quality of interaction with the robot over a whole interaction session, using solely information about the game and the social context throughout the interaction.

Table 1 Overall features Feature Name Category of context Description 1 AvgGameState Game Game state averaged over the whole interaction 2 AvgGameEvol Game Game evolution averaged over the whole interaction 3 NofMoves/GameDur Game Number of moves played during the game divided by the duration of the game (in seconds) 4 EmotivReward/NofMoves Game Number of moves associated with a feeling of reward divided by total number of moves 5 EmotivPunish/NofMoves Game Number of moves associated with a feeling of punishment divided by total number of moves 6 EmotivNeut/NofMoves Game Number of moves associated with a feeling of neutral feeling divided by total number of moves 7 icatwon Game Whether or not icat won (binary feature) 8 UserWon Game Whether or not user won (binary feature) 9 Draw Game Whether or not icat proposed a draw (binary feature) 10 icatgaveup Game Whether or not icat gave up (binary feature) 11 Neutral Social Whether the icat displayed or not a neutral behaviour (binary feature) 12 RandomEmpat Social Whether the icat displayed or not random empathic behaviour (binary feature) 13 AdaptEmpat Social Whether the icat displayed or not adaptive empathic behaviour (binary feature) 14 NofStrat0/NofStrategies Social Number of times the empathic strategy n. 0 was displayed divided by the number of displayed strategies 15 NofStrat1/NofStrategies Social Number of times the empathic strategy n. 1 was displayed divided by the number of displayed strategies 16 NofStrat2/NofStrategies Social Number of times the empathic strategy n. 2 was displayed divided by the number of displayed strategies 17 NofStrat3/NofStrategies Social Number of times the empathic strategy n. 3 was displayed divided by the number of displayed strategies 5.1 Features extraction Game and social context-based features were extracted from the context logs for each of the 40 user interactions with the robot. These features were extracted at the interaction level (overall features) and the game level (turn-based features). 5.1.1 Overall features These features were calculated over the whole interaction. They capture game and social context in an independent way at the interaction level. The complete list of overall features is displayed on Table 1. 5.1.2 Turn-based features These features were extracted after each game turn. They encode relationships between social and game context after every move, therefore being more focused on the dynamics of the interaction. Note that the turn-based features are not dynamic per se, but are encoded in a way that attempts to capture the dynamic changes of the game context as a consequence of the robot s behaviour. We are particularly interested in features that capture whether social context affects game context at each turn of the game. The complete set of turn-based features is listed on Table 2. 5.2 Methodology The overall and turn-based features were used to train and test models based on Support Vector Machines (SVM) to predict children s perceived quality of interaction with the robot over a whole session. SVMs are state-of-the-art algorithms for multi-class classification, widely used for the automatic prediction of affect-related states (Batrinca et al. 2011; Hammal and Cohn 2012). The classification experiments followed three different stages. First, we performed training and testing to predict perceived quality of interaction using only overall features. In the second stage, we focused on prediction of perceived quality of interaction using turn-based features only. Finally, we merged overall and turn-based features into a single feature vector. We used the LibSVM library (Chang and Lin 2011) and a Radial Basis Function (RBF) kernel for training and testing. RBF kernels are the most common kernels for SVM-based detection of discrete affect-related states, as they

Table 2 Turn-based features Feature Robot s behaviour Game state Emotivector Description 1 Neut-No-strat N/A N. of times game state increases after a turn when the condition is neutral/n. of turns 2 Neut-No-strat N/A N. of times game state decreases after a turn when the condition is neutral/n. of turns 3 Neut-No-strat N/A N. of times game state stays the same after a turn when the condition is neutral/n. of turns 4 Emp-strat N/A N. of times game state increases after a turn when the condition is empathic and a strategy is employed/n. of turns 5 Emp-strat N/A N. of times game state decreases after a turn when the condition is empathic and a strategy is employed/n. of turns 6 Emp-strat N/A N. of times game state stays the same after a turn when the condition is empathic and a strategy is employed/n. of turns 7 Emp-No-strat N/A N. of times game state increases after a turn when the condition is empathic and a strategy is NOT employed/n. of turns 8 Emp-No-strat N/A N. of times game state decreases after a turn when the condition is empathic and a strategy is NOT employed/n. of turns 9 Emp-No-strat N/A N. of times game state stays the same after a turn when the condition is empathic and a strategy is NOT employed/n. of turns 10 Neut-No-strat N/A Reward N. of times emotivector is reward after a turn when the condition is neutral/n. of turns 11 Neut-No-strat N/A Punishment N. of times emotivector is punishment after a turn when the condition is neutral/n. of turns 12 Neut-No-strat N/A Neutral N. of times emotivector is neutral after a turn when the condition is neutral/n. of turns 13 Emp-strat N/A Reward N. of times emotivector is reward after a turn when the condition is empathic and a strategy is employed/n. of turns 14 Emp-strat N/A Punishment N. of times emotivector is punishment after a turn when the condition is empathic and a strategy is employed/n. of turns 15 Emp-strat N/A Neutral N. of times emotivector is neutral after a turn when the condition is empathic and a strategy is employed/n. of turns 16 Emp-No-strat N/A Reward N. of times emotivector is reward after a turn when the condition is empathic and a strategy is NOT employed/n. of turns 17 Emp-No-strat N/A Punishment N. of times emotivector is punishment after a turn when the condition is empathic and a strategy is NOT employed/n. of turns 18 Emp-No-strat N/A Neutral N. of times emotivector is neutral after a turn when the condition is empathic and a strategy is NOT employed/n. of turns support cases in which the relation between class labels and attributes is not linear. Previous studies on automatic recognition of affective and social dimensions indicate that they provide the best performances (Gunes et al. 2012). We followed a leave-one-subject-out cross-validation approach for training and testing. To address the issue of the limited size of the dataset, which includes one sample for each user interaction with the robot, this approach allowed us to train the system with as many samples as possible at each step to increase the chance that the classifier is accurate, while at the same time allowing for a measure of the ability of the classifier to generalise to new users to be obtained. In all experiments, to optimise the cost parameter C and kernel parameter, we carried out a grid-search using crossvalidation, as suggested in Hsu et al. (2010). The parameter values producing the highest cross-validation accuracy were selected. Before the training and testing phase for each classification experiment, a feature selection step was performed. To identify the features with the most discriminative power, a LibSVM tool that uses F-scores for ranking features was used (Chen and Lin 2006). 5.3 Social engagement: results The first step consisted of encoding social engagement with the objective of producing ground truth labels for this dimension for each interaction with the robot. We did not employ coders for this purpose, but used instead the answers

Table 3 Social engagement prediction: comparison between different classification experiments Classification Performance experiment Best features Overall features 77.5 % (baseline = 52.5 %) icatwon, Neutral, NofStrat0/NofMoves, NofStrat1/NofMoves, AvgGameEvol, icatgaveup, Draw, AvgGameState, RandomEmpat, NofStrat2/NofMoves Turn-based features 72.5 % (baseline = 52.5 %) Neut-No-strat-AND-Game-state-lower, Neut-No-strat-AND-Emotiv-reward Overall and turn-based features 80 % (baseline = 52.5 %) icatwon, Neut-No-strat-AND-Game-state-lower, Neut-No-strat-AND-Emotiv-reward, Neutral provided by the children in the 5-point Likert scale questionnaire (self-reports). As the number of samples is relatively small and, on average, the children reported high levels of engagement with the robot, it was decided to divide the 40 interactions in two (rather than several) groups that differentiate the samples with medium-high to high engagement from the rest of the samples. The two groups correspond to two classes, one for medium-high to high engagement (21 interactions) and one for medium-high to low engagement (19 interactions). This distinction was made based on the children s answers, e.g., all interactions rated higher or equal to a specified threshold were included in the medium-high to high engagement group. The threshold was chosen with the objective to identify two groups with a similar number of samples, in order to train the system with as many samples as possible from the two different groups. Therefore 21 samples labelled as medium-high to high engagement and 19 samples labelled as medium-high to low engagement were used in the evaluation. 5.3.1 Social engagement prediction using overall features We ran a series of classification experiments where models were trained and tested with different combinations of the overall features listed on Table 1. Experiments were performed by adding one feature at a time, based on the ranked output of the feature selection process. We started using a single feature experiment with the first ranked feature, then performed a 2-feature experiment with the two most ranked features, and so on, until the performance starts decreasing. We followed a leave-one-subject-out cross validation approach for training and testing. The best recognition performance for predicting social engagement using overall features was 77.5 % (baseline = 52.5 % 3 ), achieved using 10 features (Table 3). When using solely game context-based features, the models achieved a recognition rate of 77.5 % (baseline = 52.5 %), while 65 % (baseline = 52.5 %) was the 3 Given that the samples in the training datasets are not evenly distributed across classes, in the following sections we will use the percentage of the samples of the most frequent class as the baseline for the classifiers performance. best performance when social context-based features were used (Table 7). 5.3.2 Social engagement prediction using turn-based features Classification experiments using turn-based features were performed by following the same approach. We ran a series of classification experiments, training and testing different combinations of the turn-based features. A leave-one-subject-out cross validation approach was followed during the training and testing phase. The best recognition performance for predicting social engagement using turn-based features was 72.5 % (baseline = 52.5 %), achieved using only 2 features (Table 3). 5.3.3 Social engagement prediction using overall and turn-based features Following the same procedure, we trained and tested a series of models using different combinations of the overall and turn-based features. As in the previous two experiments, a leave-one-subjectout cross validation approach was followed. The best performance for predicting social engagement using overall and turn-based features was 80 % (baseline = 52.5 %), achieved using 4 features (Table 3). 5.4 Help: results The first step consisted of encoding help in order to produce ground truth labels for this dimension for each interaction with the robot, following the same procedure adopted for social engagement described earlier. As the number of samples is relatively small and, on average, the children reported high levels of perceived help, we divided the 40 interactions in two (rather than several) groups that differentiate the samples with medium-high to high help from the rest of the samples. The two groups correspond to two classes, one for medium-high to high help (23 interactions) and one for medium-high to low help (17 inter-

Table 4 Help prediction: comparison between different classification experiments Classification experiment Performance Best features Overall features 75 % (baseline = 57.5 %) NofStrat3/NofStrategies, icatgaveup Turn-based features 75 % (baseline = 57.5 %) Emp-strat-AND-Game-state-same, Emp-strat-AND-Game-state-higher Overall and turn-based features 80 % (baseline = 57.5 %) Emp-strat-AND-Game-state-same, NofStrat3/NofStrategies, Emp-strat-AND-Game-state-higher, Emp-strat-AND-emotiv-reward, icatgaveup Table 5 Self-validation prediction: comparison between different classification experiments Classification experiment Performance Best features Overall features 69.23 % (baseline = 56.41 %) AdaptEmpat, NofStrat1/NofStrategies Turn-based features 69.23 % (baseline = 56.41 %) Emp-strat-AND-Game-state-same, Emp-No-strat-AND-Game-state-higher, Emp-No-strat-AND-Game-state-same Overall and turn-based features 69.23 % (baseline = 56.41 %) Emp-strat-AND-Game-state-same, Emp-No-strat-AND-Game-state-higher, Emp-No-strat-AND-Game-state-same Table 6 Perceived affective interdependence prediction: comparison between different classification experiments Classification experiment Performance Best features Overall features 72.5 % (baseline = 57.5 %) NofStrat0/NofStrategies Turn-based features 67.5 % (baseline = 57.5 %) Emp-strat-AND-emotiv-punish Overall and turn-based features 72.5 % (baseline = 57.5 %) NofStrat0/NofStrategies, Emp-strat-AND-emotiv-punish actions). Therefore 23 samples labelled as medium-high to high help and 17 samples labelled as medium-high to low help were used in the evaluation, in order to train the system with as many samples as possible from the two different groups. 5.4.1 Help prediction using overall features These classification experiments were performed using the overall features (Table 1). As we did for social engagement, we ran a series of classification experiments, where models were trained and tested with different combinations of the overall features. A leave-one-subject-out cross validation approach was followed during the training and testing phase. The best recognition performance for predicting help using overall features was 75 % (baseline = 57.5 %), achieved using 2 features (Table 4). We also trained and tested models using first solely game context-based features and then solely social context-based features. We found that models achieved a recognition rate of 72.5 % (baseline = 57.5 %) using game context-based features and 70 % (baseline = 57.5 %) using social contextbased features (Table 7). 5.4.2 Help prediction using turn-based features Classification experiments using turn-based features were performed by following the same approach adopted for the experiments using overall features. As we did for social engagement, we ran a series of classification experiments, where models were trained and tested with different combinations of the turn-based features. A leave-one-subject-out cross validation approach was followed during the training and testing phase. The best recognition performance for predicting help using turn-based features was 75 % (baseline = 57.5 %), achieved using 2 features (Table 4). 5.4.3 Help prediction using overall and turn-based features Following the same methodology adopted for the previous two classification experiments, we trained and tested a series of models using different combinations of the overall and turn-based features. As for the previous two experiments, a leave-one-subjectout cross validation approach was followed during the training and testing phase. The best recognition performance for predicting help using overall and turn-based features

Table 7 Predicting social engagement, help, self-validation and perceived affective interdependence using overall features: comparison between game and social context Performance Classification experiment Social engagement Help Self-validation Perceived affective interdependence Overall features game context 77.5 % (baseline = 52.5 %) 72.5 % (baseline = 57.5 %) 74.36 % (baseline = 56.41 %) 70 % (baseline = 57.5 %) Overall features social context 65 % (baseline = 52.5 %) 70 % (baseline = 57.5 %) 64.1 % (baseline = 56.41 %) 67.5 % (baseline = 57.5 %) was 80 % (baseline = 57.5 %), achieved using 5 features (Table 4). 5.5 Self-validation: results The first step consisted of encoding self-validation in order to produce ground truth labels for this dimension for each interaction with the robot, following the same procedure adopted for social engagement described earlier. Note that all self-validation recognition experiments were performed on 39 subjects and not 40 as for social engagement, help and perceived affective interdependence, as one of the subjects did not provide a score for self-validation. As the number of samples is relatively small and, on average, the children reported high levels of perceived self-validation, we divided the 39 interactions in two (rather than several) groups that differentiate the samples with medium-high to high selfvalidation from the rest of the samples. The two groups correspond to two classes, one for medium-high to high self-validation (17 interactions) and one for medium-high to low self-validation (22 interactions). Therefore 17 samples labelled as medium- high to high self-validation and 22 samples labelled as medium-high to low self-validation were used in the evaluation, in order to train the system with as many samples as possible from the two different groups. 5.5.1 Self-validation prediction using overall features These classification experiments were performed using the overall features (Table 1). As we did for social engagement and help, we ran a series of classification experiments, where models were trained and tested with different combinations of the overall features. A leave-one-subject-out cross validation approach was followed during the training and testing phase. The best recognition performance for predicting self-validation using overall features was 69.23 % (baseline = 56.41 %), achieved using 2 features (Table 5). We also trained and tested models using first solely game context-based features and then solely social context-based features. We found that models achieved a recognition rate of 74.36 % (baseline = 56.41 %) using game context-based features and 64.1 % (baseline = 56.41 %) using social context-based features (Table 7). 5.5.2 Self-validation prediction using turn-based features Classification experiments using turn-based features were performed by following the same approach adopted for the experiments using overall features. As we did for social engagement and help, we ran a series of classification experiments, where models were trained and tested with different combinations of the turn-based features.

A leave-one-subject-out cross validation approach was followed during the training and testing phase. The best recognition performance for predicting self-validation using turn-based features was 69.23 % (baseline = 56.41 %), achieved using 3 features (Table 5). 5.5.3 Self-validation prediction using overall and turn-based features Following the same methodology adopted for the previous two classification experiments, we trained and tested a series of models using different combinations of the overall and turn-based features. As for the previous two experiments, a leave-one-subjectout cross validation approach was followed during the training and testing phase. The best recognition performance for predicting self-validation using overall and turn-based features was 69.23 % (baseline = 56.41 %), achieved using 3 features (Table 5). 5.6 Perceived affective interdependence: results The first step consisted of encoding perceived affective interdependence in order to produce ground truth labels for this dimension for each interaction with the robot, following the same procedure adopted for social engagement described earlier. As the number of samples is relatively small and, on average, the children reported high levels of perceived affective interdependence, we divided the 40 interactions in two (rather than several) groups that differentiate the samples with medium-high to high perceived affective interdependence from the rest of the samples. The two groups correspond to two classes, one for medium-high to high perceived affective interdependence (23 interactions) and one for medium-high to low perceived affective interdependence (17 interactions). Therefore 23 samples labelled as medium-high to high perceived affective interdependence and 17 samples labelled as medium-high to low perceived affective interdependence were used in the evaluation, in order to train the system with as many samples as possible from the two different groups. 5.6.1 Perceived affective interdependence prediction using overall features These classification experiments were performed using the overall features (Table 1). As we did for social engagement, help, and self-validation, we ran a series of classification experiments, where models were trained and tested with different combinations of the overall features. A leave-one-subject-out cross validation approach was followed during the training and testing phase. The best recognition performance for predicting perceived affective interdependence using overall features was 72.5 % (baseline = 57.5 %), achieved using 1 feature (Table 6). We also trained and tested models using first solely game context-based features and then solely social context-based features. We found that models achieved a recognition rate of 70 % (baseline = 57.5 %) using game context-based features and 67.5 % (baseline = 57.5 %) using social context-based features (Table 7). 5.6.2 Perceived affective interdependence prediction using turn-based features Classification experiments using turn-based features were performed by following the same approach adopted for the experiments using overall features. As we did for social engagement, help, and self-validation, we ran a series of classification experiments, where models were trained and tested with different combinations of the turn-based features. A leave-one-subject-out cross validation approach was followed during the training and testing phase. The best recognition performance for predicting perceived affective interdependence using turn-based features was 67.5 % (baseline = 57.5 %), achieved using 1 feature (Table 6). 5.6.3 Perceived affective interdependence prediction using overall and turn-based features Following the same methodology adopted for the previous two classification experiments, we trained and tested a series of models using different combinations of the overall and turn-based features. As for the previous two experiments, a leave-one-subjectout cross validation approach was followed during the training and testing phase. The best recognition performance for predicting perceived affective interdependence using overall and turn-based features was 72.5 % (baseline = 57.5 %), achieved using 2 features (Table 6). 6 Discussion This paper aims to investigate the relationship between social and game context-based features and the effect that their interdependencies have on perception of quality of interaction. Overall, results showed that game and social contextbased features can be successfully used to predict children s perceived quality of interaction with the robot over a whole interaction session. Specifically, overall features proved to perform equal (for the dimensions of help and self-validation) or better (for social engagement and perceived affective interdependence) than turn-based features. It may be that overall features, i.e.,

features that encode game and social context in an independent way at the interaction level, better capture the variables that affect the level of engagement with the robot (e.g., variables that relate to the result of the chess game, the type of robot condition (neutral vs. empathic,) and the type of strategy employed by the robot, see Table 3) and the extent to which the user s affective state is perceived by the user to affect and be affected by the robot s affective state (e.g., variables that relate to the type of strategy employed by the robot, see Table 6), compared to turn-based features. The second result is that overall game context features proved more successful than overall social context features. This applies to all dimensions of quality of interaction (Table 7). These results are somewhat surprising, except for the social engagement dimension, where game-related features, such as the result of the game, are likely to have affected children s reported level of engagement with the robot. It is worth noting, however, that the difference in performance between classifiers trained with game context features and classifiers trained with social context features is less pronounced for help and perceived affective interdependence. Finally, the experiments showed that the integration of game and social context-based features with features encoding their interdependencies leads to higher recognition performances for a subset of dimensions (i.e., social engagement and help, see Tables 3, 4). For the dimensions of selfvalidation and perceived affective interdependence, instead, the performance of the classifiers trained with a combination of overall and turn-based features is, in the worse case scenario, comparable with that of the classifiers trained with the overall and turn-based features separately (see Tables 5, 6). One should note that some of the recognition rates are smaller than 75 %. Therefore, in order to evaluate success, we compare the recognition rates with a suitable baseline. While an option for the baseline would be chance level (50 %), in our case, given that the samples in the training datasets are not perfectly evenly distributed across classes, we used as the baseline the percentage of the samples of the most frequent class. When we compare the achieved recognition accuracies with the respective baselines calculated with this method, one can observe that in all cases the recognition accuracies are above the baselines. We consider the classifiers successful when they beat the baseline. Moreover, in most cases the performance above baseline is at least 10 % or higher (reaching a maximum of 27.5 % in the case of recognition of social engagement using overall and turn-based features), with the exception of recognition of self-validation using social-context based features (7.69 %). There are some important aspects encountered in the presented work that require further discussion. The first aspect concerns the reliability of the ground truth for the dimensions of quality of interaction, which, in this work, is provided by the users themselves, i.e., the children. First, we would like to point out, as mentioned earlier in this paper, that children from this age group are sufficiently developed to answer questions with some consistency (Borgers et al. 2000). It was noted that, on average, children reported high levels of social engagement, help, self-validation and perceived affective interdependence. While we were not able to identify a large number of interactions characterised with low levels of the selected dimensions, it was clearly possible to identify, for all dimensions, two separate groups of interactions. This affected the way we decided to encode social engagement, help, self-validation and perceived affect interdependence for each interaction with the robot: we aimed to differentiate interactions rated with medium-high to high levels of these dimensions from those rated with mediumhigh to low levels. It is also worth noting that in this work we measure perceived quality of interaction at the end of the game/interaction with the robot. We are aware that children may have been biased, in their judgement, by the way the game ended. Finding ways of identifying a ground truth for affective and social dimensions throughout an ongoing interaction is still an open question for researchers in the field. This is especially true if such a ground truth is required from the users themselves, and not from external coders, and if one aims to measure perceived quality of interaction, as is in our case. To support self-assessments with external assessments, an alternative could be the use of systems for the automatic detection of indicators relating to user behaviour (Bae et al. 2012), or brain-computer interfaces techniques (Poel et al. 2012), although the latter is not a viable solution when experimenting with children in a real world environment, rather than in the laboratory. Other studies employed human coders to obtain external assessments (Castellano et al. 2014), but in this work we are interested in perception of quality of interaction, hence our preference for self-reports. Another solution that is currently being explored in the literature is the use of implicit probes (Corrigan et al. 2014) to find a ground truth for affect-related states, a non-intrusive, pervasive and embedded method of collecting informative data at different stages of an interaction. Second, it is worth pointing out that there may be more dimensions that contribute to the definition of quality of interaction. In this work we chose to explore a subset of affective (i.e., social engagement), friendship (i.e., help and self-validation) and social presence (i.e., perceived affective interdependence) dimensions, which have been previously identified as playing a key role in measuring the effect that a robot s behaviour had on the relationship established between the robot and children (Leite 2013). However, there may be other dimensions that carry additional important information about quality of interaction, for example the successful establishment of social bonding between users and robots. This will be the subject of future work, for example

how the dimensions considered here relate to social bonding, and how contextual information could be further exploited to allow the robot to make predictions of perceived quality of interaction that may help it plan more personalised, appropriate empathic responses. Third, we note that, while the chosen contextual features can successfully predict dimensions of quality of interaction with a robot in a chess game context, they could not be used, exactly as they stand, in a scenario that is substantially different. However, the proposed approach for feature encoding and representation can be generalizable to other types of human machine interactions applications. Examples in the human robot interaction domain include scenarios characterised by (1) social interaction between a user and a robot, and (2) a task that user and robot are engaged with, for example human robot collaborative scenarios. More specifically, results point in the direction of choosing feature vectors that model task and social context and their interdependencies at the turn-based and interaction level. Social human robot interaction scenarios involving some form of turn taking are an example of application that could benefit from this proposed modelling of social and task context. Finally, it is important to acknowledge the potential impact that age and culture may have had on our results. Previous studies, for example, investigated the effect of cultural variability on opinions and attitudes towards robots (Bartneck et al. 2007; Riek et al. 2010; Mavridis et al. 2012). While the study presented in this paper was conducted with a limited population of Portuguese school children, future work in the area is likely to benefit from the exploration of the effect of culture and age on the perception of quality of interaction with a robot. We believe that the results presented in this paper contribute to a better understanding of what quality of interaction is in a task-oriented social human robot interaction, and how it can be measured, especially in relation to the effect that the interdependencies of game and social context have on perceived quality of interaction, rather than the role that game and social context-based features play in isolation. 7 Conclusions Despite the recent advances on automatic affect recognition, the role of task and social context for the purpose of automatically predicting affective and social dimensions is still unclear. This work aimed to advance the state of the art in this direction, exploring the role of task, social context and their interdependencies, in the automatic prediction of the quality of interaction between children and a robotic game companion. Several SVMs models with different features extracted from context logs collected in a human robot interaction experiment were investigated. The features included information about the game and the social context during the interaction with the robot. Our experimental results lead us to the following conclusions: Game and social context-based features can be successfully used to predict perceived quality of interaction with the robot. Using only information about the task and the social context of the interaction, our results suggest that it is possible to predict with high accuracy the perceived quality of interaction in a human robot interaction scenario. Game context proved more successful than social context in the automatic prediction of perceived quality of interaction. Despite the high impact of both game context and social context for the automatic detection of perceived quality of interaction, game context seems to play a more important role in this case. Overall features performed equally or better than turnbased features; the combination of game and social contextbased features with features encoding their interdependencies lead to higher recognition performances for a subset of the selected dimensions of quality of interaction. Contextual feature representation (e.g., how to timely encode game, social context and their relationships) is a key issue in context-sensitive recognition of affective and social dimensions. Our results indicate that overall features are more successful than turn-based features for predicting social engagement and perceived affective interdependence, and that the fusion of overall features with turn-based features encoding relationships between social and game context achieved the best performance rate for a subset of the considered dimensions, i.e., social engagement and help. This paper represented a first step towards investigating the role of different types of contexts and their relations for the automatic prediction of perceived quality of interaction with a robot. We consider context-sensitive detection of perceived quality of interaction as a key requirement for the establishment of empathic and personalised human computer and human robot interactions that are more natural and engaging (Leite et al. 2012a; Castellano et al. 2010). While our proposed approach was tested on a specific human robot interaction scenario, we believe that the modelling of interdependencies between social and game context-based features could be applied to other task-oriented social human robot interactions. Acknowledgments This work was partially supported by the European Commission (EC) and funded by the EU FP7 ICT-317923 project EMOTE. The authors are solely responsible for the content of this publication. It does not represent the opinion of the EC, and the EC is not responsible for any use that might be made of data appearing therein.

References Aylett, R., Castellano, G., Raducanu, B., Paiva, A., & Hanheide, M. (2011). Long-term socially perceptive and interactive robot companions: Challenges and future perspectives. In: Proceedings of the 13th international conference on multimodal interaction (ICMI 11). NewYork:ACM. Bae, B. C., Brunete, A., Malik, U., Dimara, E., Jermsurawong, J., & Mavridis, N. (2012). Towards an empathizing and adaptive storyteller system. In: Proceedings of the 8th AAAI conference on artificial intelligence and interactive digital entertainment (pp. 63 65). Stanford: CA. Bartneck, C., Suzuki, T., Kanda, T., & Nomura, T. (2007). The influence of people s culture and prior experiences with aibo on their attitude towards robots. AI and Society, The Journal of Human-Centred Systems, 21(1 2), 217 230. Batrinca, L. M., Mana, N., Lepri, B., Pianesi, F., & Sebe, N. (2011). Please, tell me about yourself: Automatic personality assessment using short self-presentations. In: Proceedings of the 13th international conference on multimodal Interaction ICMI 2011 (pp. 255 262). New York: ACM. Biocca, F. (1997). The cyborgs dilemma: Embodiment in virtual environments. In: Cognitive technology, 1997. Humanizing the information age. Proceedings., second international conference on (pp. 12 26). IEEE. Borgers, N., de Leeuw, E., & Hox, J. (2000). Children as respondents in survey research: Cognitive development and response quality. Bulletin de Methodologie Sociologique, 66(1), 60. Breazeal, C. (2009). Role of expressive behaviour for robots that learn from people. Philosophical Transactions of the Royal Society B, 364, 3527 3538. Breazeal, C., Edsinger, A., Fitzpatrick, P., & Scassellati, B. (2001). Active vision for sociable robots. IEEE Transactions on Systems, Man and Cybernetics-Part A, 31(5), 443 453. Castellano, G., Leite, I., Pereira, A., Martinho, C., Paiva, A., & McOwan, P. (2009). It s all in the game: Towards an affect sensitive and context aware game companion. In: Proceedings of the 3rd international conference on affective computing and intelligent interaction (ACII 2009) (pp. 29 36). IEEE. Castellano, G., Leite, I., Pereira, A., Martinho, C., Paiva, A., & McOwan, P. W. (2010). Inter-ACT: An affective and contextually rich multimodal video corpus for studying interaction with robots. In: Proceedings of the ACM International Conference on Multimedia (pp. 1031 1034). Florence: ACM. Castellano, G., Pereira, A., Leite, I., Paiva, A., & McOwan, P. W. (2009). Detecting user engagement with a robot companion using task and social interaction-based features. In: International conference on multimodal interfaces and workshop on machine learning for multimodal interaction(icmi-mlmi 09). Cambridge, MA, USA: ACM Press, 2009, pp. 119 126. Castellano, G., Caridakis, G., Camurri, A., Karpouzis, K., Volpe, G., & Kollias, S. (2010). Body gesture and facial expression analysis for automatic affect recognition. In K. R. Scherer, T. Baenziger, & E. B. Roesch (Eds.), Blueprint for affective computing: A aourcebook. Oxford: Oxford University Press. Castellano, G., Leite, I., Pereira, A., Martinho, C., Paiva, A., & McOwan, P. (2010). Affect recognition for interactive companions: Challenges and design in real-world scenarios. Journal on Multimodal User Interfaces, 3(1 2), 89 98. Castellano, G., Leite, I., Pereira, A., Martinho, C., Paiva, A., & McOwan, P. (2013). Multimodal affect modelling and recognition for empathic robot companions. International Journal of Humanoid Robotics, 10, 1350010. Castellano, G., Leite, I., Pereira, A., Martinho, C., Paiva, A., & McOwan, P. (2014). Context-based affect recognition for a robotic game companion. ACM Transactions on Interactive Intelligent Systems, 4(2), 10. Castellano, G., Mancini, M., Peters, C., & McOwan, P. W. (2012). Expressive copying behavior for social agents: A perceptual analysis. IEEE Transactions on Systems, Man and Cybernetics Part A Systems and Humans, 42(3), 776 783. Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. In: ACM Transactions on intelligent systems and technology (vol. 2, pp. 27:1 27:27). Software. http://www.csie. ntu.edu.tw/~cjlin/libsvm. Chen, Y.-W., & Lin, C.-J. (2006). Combining svms with various feature selection strategies. In I. Guyon, S. Gunn, M. Nikravesh, & L. Zadeh (Eds.), Feature extraction, foundations and applications. New York: Springer. Corrigan, L., Basedow, C., Kuster, D., Kappas, A., Peters, C., & Castellano, G. (2014). Mixing implicit and explicit probes: Finding a ground truth for engagement in social human robot interactions. In: Proceedings of the ACM/IEEE international conference on human robot interaction (HRI14) (pp. 140 141). Bielefeld: ACM. Espinoza, R. R., Nalin, M., Wood, R., Baxter, P., Looije, R. Demiris, Y., Belpaeme, T., Giusti, A., & Pozzi, C., (2011). Child-robot interaction in the wild: Advice for the aspiring experimenter. In: Proceedings of the 13th international conference on multimodal interaction, ser. ICMI 11. New York: ACM. Feil-Seifer, D., & Mataric, M. (2011). Automated detection and classification of positive versus negative robot interactions with children with autism using distance-based features. In: Proceedings of the 6th international conference on human robot interaction,ser.hri 11. (pp. 323 330). New York: ACM. Gunes, H., Shan, C., Chen, S., & Tian, Y. (2012). Bodily expression for automatic affect recognition. In A. Konar & A. Chakraborty (Eds.), Advances in emotion recognition. New York: Wiley. Hammal, Z., & Cohn, J. F. (2012). Automatic detection of pain intensity. In: Proceedings of the 14th ACM international conference on multimodal interaction, ser. ICMI 12. (pp. 47 52). New York: ACM. Heerink, M., Krose, B., Evers, V., & Wielinga, B. (2008). The influence of social presence on acceptance of a companion robot by older people. Journal of Physical Agents, 2(2), 33 40. Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2010). A practical guide to support vector classification. Kapoor A., & Picard, R. W. (2005). Multimodal affect recognition in learning environments. In: ACM international conference on multimedia (pp. 677 682). Lassalle, J., Gros, L., and Coppin, G. (2011). Combination of physiological and subjective measures to assess quality of experience for audiovisual technologies. In: Third International Workshop on Quality of Multimedia Experience (QoMEX) (pp. 13 18). IEEE. Leite, I. (2013). Long-term interactions with empathic social robots. Ph.D. Dissertation, Universidade Técnica de Lisboa, Instituto Superior Técnico. Leite, I., Castellano, G., Pereira, A., Martinho, C., & Paiva, A. (2012b). Modelling empathic behaviour in a robotic game companion for children: An ethnographic study in real-world settings. In: ACM/IEEE international conference on human robot Interaction (HRI). Boston: ACM. Leite, I., Pereira, A., Castellano, G., Mascarenhas, S., Martinho, C., & Paiva, A., (2012a). Modelling empathy in social robotic companions. In: Advances in user modeling: Selected papers from UMAP 2011 workshops. Lecture notes in computer science (Vol. 7138). New York: Springer. Leite, I., Pereira, A., Martinho, C., & Paiva, A., (2008). Are emotional robots more fun to play with? In: Robot and human interactive communication, 2008. RO-MAN 2008. The 17th IEEE international symposium on (pp. 77 82).

Leite, I., Castellano, G., Pereira, A., Martinho, C., & Paiva, A. (2014). Empathic robots for long-term interaction. International Journal of Social Robotics, 6(3), 1 13. Malta, L., Miyajima, C., & Takeda, K. (2008). Multimodal estimation of a driver s affective state. In: Workshop on affective interaction in natural environments (AFFINE), ACM international conference on multimodal interfaces (ICMI 08). Chania. Mandryk, R., Inkpen, K., & Calvert, T. (2006). Using psychophysiological techniques to measure user experience with entertainment technologies. Behaviour and Information Technology, 25(2), 141 158. Martinez, H. P., & Yannakakis, G. N. (2011). Mining multimodal sequential patterns: A case study on affect detection. In: Proceedings of the 13th international conference on multimodal interaction (ICMI 11). NewYork:ACM. Martinho, C., & Paiva, A. (2006). Using anticipation to create believable behaviour. In: American Association for artificial intelligence technical conference (pp. 1 6). Boston, July 2006. Mavridis, N., Katsaiti, M.-S., Naef, S., Falasi, A., Nuaimi, A., Araifi, H., et al. (2012). Opinions and attitudes towards humanoid robots in the middle east.springer Journal of AI and Society, 27(4), 517 534. Mead, R., Atrash, A., & Mataric, M. J. (2011). Recognition of spatial dynamics for predicting social interaction. In: Proceedings of the 6th international conference on Human robot interaction,ser.hri 11. (pp. 201 202). New York: ACM. Mendelson, M. J., & Aboud, F. E. (1999). Measuring friendship quality in late adolescents and young adults: Mcgill friendship questionnaires. Canadian Journal of Behavioural Science, 31(1), 130 132. Morency, L.-P., de Kok, I., & Gratch, J. (2008). Context-based recognition during human interactions: Automatic feature selection and encoding dictionary. In: ACM international conference on multimodal interfaces (ICMI 08) (pp. 181 188). Chania. Ochs, M., Niewiadomski, R., Brunet, P., & Pelachaud, C. (2011). Smiling virtual agent in social context. In: Cognitive processing. Special issue on social agents. From theory to applications, September 2011. Peters, C., Asteriadis, S., & Karpouzis, K. (2010). Investigating shared attention with a virtual agent using a gaze-based interface. Journal on Multimodal User Interfaces, 3(1 2), 119 130. Poel, M., Nijboer, F., van den Broek, E. L., Fairclough, S., & Nijholt, A. (2012). Brain computer interfaces as intelligent sensors for enhancing human computer interaction. In: 14th ACM international conference on multimodal interaction (pp. 379 382). ACM. Poggi, I. (2007). Mind, hands, face and body. A goal and belief view of multimodal communication. Berlin: Weidler. Rani, P., & Sarkar, N. (2005). Operator engagement detection and robot behavior adaptation in human robot interaction. In: Proceedings of the 2005 IEEE international conference on robotics and automation (ICRA 2005) (pp. 2051 2056). IEEE. Rich, C., Ponsler, B., Holroyd, A., & Sidner, C. L. (2010). Recognizing engagement in human robot interaction. In: HRI 10: Proceeding of the 5th ACM/IEEE international conference on human robot interaction (pp. 375 382). New York: ACM. Riek, L., Mavridis, N., Antali, S., Darmaki, N., Ahmed, Z., Al-Neyadi, M., & Alketheri, A. (2010). Ibn sina steps out: Exploring arabic attitudes toward humanoid robots. In: Proceedings of the symposium on new frontiers in human robot Interaction (SSAISB), the 36th annual convention of the society for the study of artificial intelligence and simulation of behavior (AISB 2010) (pp. 88 94). Riek, L., & Robinson, P. (2011). Challenges and opportunities in building socially intelligent machines. IEEE Signal Processing, 28(3), 146 149. Sabourin, J., Mott, B., & Lester, J. (2011). Modeling learner affect with theoreticallly grounded dynamic bayesian networks. In: Proceedings of the 4th international conference on affective computing and intelligent interaction (ACII 11). New York: Springer. Saerbeck, M., Schut, T., Bartneck, C., & Janse, M. (2010). Expressive robots in education Varying the degree of social supportive behavior of a robotic tutor. In: Proceedings of the 28th ACM conference on human factors in computing systems (CHI 2010) (pp. 1613 1622). Atlanta: ACM. Sanghvi, J., Castellano, G., Leite, I., Pereira, A., McOwan, P. W., & Paiva, A. (2011). Automatic analysis of affective postures and body motion to detect engagement with a game companion. In: ACM/IEEE international conference on human robot interaction. Lausanne: ACM. Scassellati, B., Admoni, H., & Mataric, M. (2012). Robots for use in autism research. Annual Review of Biomedical Engineering, 14, 275 294. Sidner, C. L., Kidd, C. D., Lee, C. H., & Lesh, N. B. (2004). Where to look: A study of human robot engagement. In: IUI 04: Proceedings of the 9th international conference on intelligent user interfaces (pp. 78 84). New York: ACM. van Breemen, A., Yan, X., & Meerbeek, B. (2005). icat: An animated user-interface robot with personality. In: AAMAS 05: Proceedings of the fourth international joint conference on autonomous agents and multiagent systems (pp. 143 144). New York: ACM. von der Putten, A. M., Kramer, N. C., & Eimler, S. C. (2011). Living with a robot companion: Empirical study on the interaction with an artificial health advisor. In: Proceedings of the 13th international conference on multimodal interaction, ser. ICMI 11. New York: ACM. Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39 58. Ginevra Castellano is an Associate Senior Lecturer at the Department of Information Technology, Uppsala University, where she leads the Social Robotics Lab. Her research interests are in the areas of social robotics and affective computing. She was the coordinator of the EU FP7 EMOTE (EMbOdied-perceptive Tutors for Empathy-based learning) project (2012 2016). She is the recipient of a project grant for young researchers awarded by the Swedish Research Council (2016 2019) and co-investigator of the COIN (Co-adaptive human-robot interactive systems) project, funded by the Swedish Foundation for Strategic Research (2016 2021). She was an invited speaker at the Summer School on Social Human Robot Interaction 2013; she is a member of the management board of the Association for the Advancement of Affective Computing (AAAC); cofounder and co-chair of the AFFINE (Affective Interaction in Natural Environments) workshops; co-editor of special issues of the Journal on Multimodal User Interfaces and the ACM Transactions on Interactive Intelligent Systems; Program Committee member of several international conferences, including ACM/IEEE HRI, IEEE/RSJ IROS, ACM ICMI, IEEE SocialCom, AAMAS, IUI, ACII.

Iolanda Leite is an Associate Research Scientist at Disney Research, Pittsburgh. She received her Ph.D in Information Systems and Computer Engineering from Instituto Superior Técnico, University of Lisbon, in 2013. During her Ph.D., she was a Research Associate at the Intelligent Agents and Synthetic Characters Group of INESC-ID in Lisbon. From 2013 to 2015, she was a Postdoctoral Associate at the Yale Social Robotics Lab, conducting research within the NSF Expedition on Socially Assistive Robotics. Her doctoral dissertation, Long-term Interactions with Empathic Social Robots, received an honorable mention in the IFAAMAS-13 Victor Lesser Distinguished Dissertation Award. Iolanda received the Best Student Paper Award at the International Conference on Social Robotics in 2012 and the Best Submission Award at the ACM Special Interest Group on Artificial Intelligence (SIGAI) Career Conference in 2015. Her main research interests lie in the areas of child-robot interaction, artificial intelligence and affective computing. Ana Paiva is the research group leader of GAIPS at INESC-ID and a Full Professor at Instituto Superior Tecnico, University of Lisbon. She is well known in the area of Artificial Intelligence Applied to Education, Intelligent Agents and Affective Computing. Her research is focused on the affective elements in the interactions between users and computers with concrete applications in robotics, education and entertainment. She served as a member of numerous international conference and workshops. She has (co)authored over 100 publications in refereed journals, conferences and books. She was a founding member of the Kaleidoscope Network of Excellence SIG on Narrative and Learning Environments, and has been very active in the area of Synthetic Characters and Intelligent Agents. She co-ordinated the participation of INESC-ID in several European projects, such as the Safira project (IST- 5th Framework), where she was the prime contractor, the VICTEC project in the area of virtual agents to help victims of bullying, the LIREC (in the 7th Framework) in the area of robotic companions, the SIREN project in the area of games to teach conflict resolution, and recently the Emote project in the area of Empathic robotic tutors.