Context-sensitive speech recognition for human-robot interaction Pierre Lison Cognitive Systems @ Language Technology Lab German Research Centre for Artificial Intelligence (DFKI GmbH) Saarbrücken, Germany. [ pierrel@coli.uni-sb.de ] European Summer School in Logic, Language and Information, Universität Hamburg, Germany. August 2008. 1 Pierre Lison Context-sensitive speech recognition for HRI
Introduction Basic research question : how do we make talking robots? 2 Pierre Lison Context-sensitive speech recognition for HRI
Talking robots? Basic research question : how do we make talking robots? Our long-term aim : «Hi, I am C3-PO,Human Cyborg Relations.» (And he knows over 6 million languages... ) For the time being, we ll obviously need to scale down our expectations... 3 Pierre Lison Context-sensitive speech recognition for HRI
Human-robot interaction We seek to develop robots which are able to interact with humans using (spoken) natural language to perform a variety of service-oriented tasks. To do so, we must integrate a (rather sophisticated) dialogue system into the cognitive architecture. This dialogue system must encompass multiple processing stages, from speech recognition up to semantic interpretation. In our poster, we present a technique we developed to significantly improve the speech recognition stage. 4 Pierre Lison Context-sensitive speech recognition for HRI
The first step in comprehending spoken dialogue is automatic speech recognition [ASR]. For robots operating in real-world noisy environments, and dealing with utterances pertaining to complex, open-ended domains, this step is particularly error-prone. Potential problems : noise, wide variety of voices and accents ; out-of-vocabulary words ; poor performance of current ASR technology ; presence of various disfluencies (filled pauses, speech repairs, corrections, repetitions, etc.). 5 Pierre Lison Context-sensitive speech recognition for HRI
The intuition underlying our approach : use context! More precisely, we prime the utterance recognition by exploiting information about 1 The salient entities in the situated visual environment ; 2 The dialogue state. Our claim : for HRI, the speech recognition performance can be significantly enhanced by using contextual knowledge. 6 Pierre Lison Context-sensitive speech recognition for HRI
A simple example 1 Let s imagine we are in the lab with the robot. There is a big red ball in front of him (= high saliency). 2 The red ball is perceived by the robot sensors (camera, laser scanner, etc.), and recognised as a red ball. 3 In the robot s knowledge base, the red ball object is associated to words like ball like round, pick up, etc. 4 As a final step, we adapt the language model included in the speech recogniser to increase the probability of hearing these words. 7 Pierre Lison Context-sensitive speech recognition for HRI
A simple example 1 Let s imagine we are in the lab with the robot. There is a big red ball in front of him (= high saliency). 2 The red ball is perceived by the robot sensors (camera, laser scanner, etc.), and recognised as a red ball. 3 In the robot s knowledge base, the red ball object is associated to words like ball like round, pick up, etc. 4 As a final step, we adapt the language model included in the speech recogniser to increase the probability of hearing these words. 7 Pierre Lison Context-sensitive speech recognition for HRI
A simple example 1 Let s imagine we are in the lab with the robot. There is a big red ball in front of him (= high saliency). 2 The red ball is perceived by the robot sensors (camera, laser scanner, etc.), and recognised as a red ball. 3 In the robot s knowledge base, the red ball object is associated to words like ball like round, pick up, etc. 4 As a final step, we adapt the language model included in the speech recogniser to increase the probability of hearing these words. 7 Pierre Lison Context-sensitive speech recognition for HRI
A simple example 1 Let s imagine we are in the lab with the robot. There is a big red ball in front of him (= high saliency). 2 The red ball is perceived by the robot sensors (camera, laser scanner, etc.), and recognised as a red ball. 3 In the robot s knowledge base, the red ball object is associated to words like ball like round, pick up, etc. 4 As a final step, we adapt the language model included in the speech recogniser to increase the probability of hearing these words. 7 Pierre Lison Context-sensitive speech recognition for HRI
We evaluated our approach using a test suite of 250 spoken utterances recorded during Wizard of Oz experiments. The participants were asked to interact with the robot while looking at a specific visual scene. The evaluation results showed a significant reducation of the word error rate compared to the baseline ( 16.1% compared to the baseline). 8 Pierre Lison Context-sensitive speech recognition for HRI
The end Want to know more, and see how it works? See my poster in the main hall! (or check our website : www.dfki.de/cosy ) 9 Pierre Lison Context-sensitive speech recognition for HRI