Being natural: On the use of multimodal interaction concepts in smart homes Joachim Machate Interactive Products, Fraunhofer IAO, Stuttgart, Germany 1 Motivation Smart home or the home of the future: A notion which is often used when describing the connection of electronic home devices, the exchange of information between actors and sensors via home bus protocols. In her book "Computers As a Theatre", Brenda Laurel put forward the question: "Why don't all the great electronic and micro-processor-based gadgets we have or might have in our home talk to each other?" (Laurel 1991) Meanwhile, this can positively be answered. E.g. Negroponte described a scenario, in which a toaster provides the morning toast with the actual stock rates of the favorite stocks burned in (Negroponte 1995). So, technology is ready to use. But, how can it contribute to more safety and an enhance of quality of life? How do we interact with it? Which concepts are easy to learn and to use, and moreover provide fun when using them? This paper focuses on the use of multimodal interaction concepts in smart home environments for elderly and disabled people. A hypothesis is, that using speech and gestures, combining them to deictic expressions, and establishing clearly organized graphical user interfaces will contribute to a natural interaction behavior, which is highly accepted by the majority of all users, and hence is contributing to a "design for all" philosophy (Cooper & Ferreira 1998). 2 The Vision When talking nowadays about smart homes or home automation concepts, the focus often lies on technical grounds regarding home network infrastructures, ecological aspects, e.g. energy saving, and higher comfort. Smart home providers and manufactures put most of their effort in developing new home network topologies, but only little effort in interaction concepts which are
suitable for the average customer. The impression arises that smart home systems (Ziegler & Machate 1997) are rather developed as "toys for the boys" than assistive systems for everyone. In order to make this technique accessible for everybody we need to consider what natural communication constitutes. Besides natural spoken language as a primary communication channel, facial expression, gestures, and postures provide other natural ways of communication. In addition, to get the complete picture of the meaning of a message, not only one modality is interpreted, but rather the combination of it all. So, wouldn't it be a great idea to enable people to use whatever modality or modality combination they like when they want to interact with their "smart home"? It seems that this question leads to a more philosophical oriented discussion. Seriously, do you want to talk to your home like to your neighbor or friends? Or would you rather prefer to talk to it like to your dog leaving aside lonely people who need their dog as a substitute for a human being. Wouldn't it be even more natural to use only command-style oriented communication? Hence, avoiding misunderstanding, needs for disambiguation, or even justifying your plans. A clear advantage for effective and unambiguous smart home interaction concepts evolves when commandstyle dialogues are established. This does not mean that a smart home shouldn't be able to provide any help or explanation it can. It implies that your smart home will not do any conversation with you, but will just react on your commands, no matter with which modality they were issued. 3 Multimodal Interaction Nowadays, after having turned your former ordinary home into a smart home, one or more control stations will be available which serve as contact points for the interaction with your home system. Generally, these contact points are installed at fixed positions, sometimes remote control units are available which in most cases serve only one particular kind of device, e.g. shutter or light control. Environment control systems for people with special needs often come with either specific remote control units or use the TV set as primary control station. With a large enough menu displayed on the TV screen, users only have to navigate through the menu and select the desired options. For doing so, not many commands are required. Basically, for people with very limited abilities, e.g. people suffering from multiple sclerosis, this can be managed by simply issuing "yes" and "no" by means of a specific input device. With more capabilities, navigation and selection can become more comfortable, e.g. by also using a "next" or "back" option. Menus for these kinds of systems are usually organized in a hierarchical form and are displayed in an alpha-numerical way. Putting speech recognition at the disposal of the users opens a new communication channel. Even with limited recognition capabilities, i.e. less than
100 words, it is possible to provide commands for menu navigation and shortcuts for the most important or preferred options and actions. If the speech recognition system is speaker-dependent and hence must be trained by it's user, this modality also gains importance for people with speech deficiencies. For these people, an adaptable speech recognition system contributes to a better quality of life by leaving it up to the user to assign a specific command to any spoken utterance of the user, no matter if this is understandable by a human being. With this, speech handicapped people experience that they can use speech in order to get certain things going. Summarizing the home control facilities being discussed so far, we have available for the general public portable device specific control units and fixedposition control units with alpha-numeric menu capabilities, for people with special needs specific environment control units sometimes enhanced by speech recognition systems or specific input devices. Taking all of this into consideration, the question arises how devices in a real house can easily be controlled. You surely do not want to carry a bouquet of remote control units around. You probably also do not want to have a home control unit installed at a fixed position in each room. You may like talking to your home system, but do you want to always carry a microphone? Or, would you prefer to turn your walls into so-called acoustic walls which are prepared in such a way that they can record any word you were saying no matter where your actual position is? These questions just illustrate the challenges and dimensions for the design of a consistent and intuitive interaction concept for smart home systems. 4 The HOME-AOM approach The approach of the HOME-AOM project is to develop a multimodal and multimedia interaction concept (Bekiaris e.a. 1997) which is highly adaptive and modular with respect to it's control modules (Shao e.a. 1998). HOME-AOM is a Telematics project sponsored by the European Commission especially focussing on the requirements of elderly people and people with special needs. The project does not claim to have final solutions ready which solve the problems of multiple room control described above. This is surely an issue which requires some long-term research with people living in such houses for a longer period But, it is anticipated that the project's multimodal and modular interaction concept contributes to finding a usable, highly acceptable and enjoyable solution. With regard to the interaction modalities, HOME-AOM is establishing the following control facilities: Portable electronic tablet operated by touch input, Speaker-dependent speech recognition for speech commands in a wide set of European languages,
Speaker-independent literal understanding for French (demonstrated in a restricted scenario), Gesture recognition system for simple hand gestures, Control via mobile phone. The usability and acceptability of the envisaged concept is tested in a series of Wizard of Oz tests, expert evaluations, and usability tests (cf. Nielsen & Mack 1994). Users from the envisaged target groups participated during the early design phase in workshops, interviews and questionnaires, during the development phase in the above mentioned tests, and will also provide their scores and comments during a final validation and demonstration phase. Besides the combination of speech and pointing gestures into deictic expressions, simple hand gestures can be used in order to select and activate objects on a graphical user interface, which represent real world devices and their respective panels. Instead of the direct manipulation approach employed in GUIs by means of "point and click", deictic control employs a direct manipulation approach which we may call "point and say". In a spoken utterance, deictic expressions provide the means to refer to a specific device while pointing at it, e.g. "Switch on this lamp!", or in a shorter way: "Switch this off!". Even with low speech recognition capabilities, a facility of applying this kind of modality combination still remains. Just saying "Off" and pointing to the desired devices, bears the same information as the previous ones. Furthermore, if a default action is assigned to each device, just pointing to it may initiate this particular action. Of course, the system needs to check whether any relevant speech signal has been detected which might overrule the default action. Using simple hand gestures in order to control a graphical user interface is a challenging situation for novel users. In order to figure out the most suitable and favorable gestures, which also take into account requirements from the project's specific target groups, a Wizard of Oz experiment was conducted, which particularly aimed at this question. The test design was made in a way, that participants went through four different phases with increasing restrictions regarding the freedom of choice of gestures. At the end of each test session, users were asked which appliances they would like best to be controlled via hand gestures in their own homes. The highest ranking was achieved for light control, followed by the control of a TV set or radio receiver. The acceptance for white goods (washing machine, dish washer, percolator, etc.) did not receive a comparable rating. The study led to a recommendation of a navigation gesture alphabet for mediate gesture control via a graphical user interface, which is surely applicable to a broader range of applications. The advantage of this set of gestures is, that they are easy to learn and highly memorable. However, this set
will be used only as a default setting, which can be overruled by the user's preferred own set. 5 Conclusion and Acknowledgements This paper has illustrated how multimodal interaction can contribute to a more natural interaction behavior for the residential user, particularly taking into account the situation of elderly and disabled people. Using multimodal communication channels and combining information provided on different input channels moves the user into a new dimension of computer mediated interaction with a broad range of electronic appliances. Furthermore, smart home concepts provide the technological basis for the development of consistent and unique interaction concepts, which will finally lead to more fun and satisfaction regarding the use of electronic home appliances. The author wishes to thank the HOME-AOM project consortium for getting the chance to present results of the project to a larger international auditorium, and the TAP-DE Programme of the European Commission for funding this work. 6 References Bekiaris, E., Machate, J. & Burmester, M. (1997). Towards an intelligent multimodal and multimedia user interface providing a new dimension of natural HMI in the teleoperation of all home appliances by E&D users. La Lettre de l' IA, Proceedings of Interfaces 97 (Interfaces 97, Montpellier, France, May 28-30, 1997), 226-229, EC2 & Développement. Cooper, M. & Ferreira, J. (1998). Home networks for independent living, support and care services. In I.P. Porrero & E. Ballabio (Eds.) Improving the Quality of Life for the European Citizen, Technology for Inclusive Design and Equality, 359-363, Amsterdam: IOS Press. HOME-AOM Internet Home Page: http://www.swt.iao.fhg.de/home Laurel, B. (1991). Computers as Theatre. New York: Addison-Wesley. Negroponte, N. (1995). Being Digital. New York: Alfred A. Knopf. Nielsen, J. & Mack, R.L. (1994). Usability Inspection Methods. John Wiley. Shao, J., Tazine, N.-E., Lamel, L., Prouts, B. & Schroeter, S. (1998). An Open System Architecture for a Multimedia and Multimodal User Interface. In I.P. Porrero & E. Ballabio (Eds.) Improving the Quality of Life for the European Citizen, Technology for Inclusive Design and Equality, 374 378, Amsterdam: IOS Press. Ziegler, J. & Machate, J. (1997). Integrated User Interfaces for the Home Environment. Proceedings of the 7th International Conference on Human- Computer Interaction, (HCI 97, San Francisco, U.S.A., August, 1997), 807-811.