Project Multimodal FooBilliard

Project Multimodal FooBilliard adding two multimodal user interfaces to an existing 3d billiard game Dominic Sina, Paul Frischknecht, Marian Briceag, Ulzhan Kakenova March May 2015, for Future User Interfaces 2015, Prof. Denis Lalanne

Table of Contents Objectives Project outline Research questions Description of the interfaces Interface 1: Keyboard, Mouse & Voice Commands Keyboard, Mouse Voice Commands Additional Feedback Interface 2: Touch & Voice Commands Additional Feedback Positioning of the interfaces commands according to CASE and CARE Model CASE (system side) CARE (user side) Evaluation (Technical) difficulties Results Does the use of multimodal interface give better results in playing the game? & Which multimodal user interface is more attractive for users? Which interface do people learn better without instruction? References Appendix Evaluation Data Score and Satisfaction Comments from users regarding game, used modalities, and their answer to the question What would you improve? : User 1 User 2 User 3 User 4 User 5 User 6* Discovered Commands and Learning Time Interface 1: Keyboard, Mouse, Voice Commands Interface 2: Touch, Voice Commands 1

This is a short summary, the reader is expected to be familiar with concepts of (multimodal) user interface analysis and to have had a look at the game or the demo video of this project. Objectives 1 To add two multimodal user interfaces to an existing game. To pose and attempt to answer research questions related to these multimodal user interfaces. Project outline We adapt the open source game FooBilliard, made in 2004, to support the following interfaces: Interface 1: Keyboard, Mouse & Voice Commands; Interface 2: Touch & Voice Commands; The game is written in C and uses OpenGL for 3D rendering. It is a billiard simulation/game that is played with a mouse/keyboard interface. To develop the above mentioned interfaces we use: the Windows Platform s Speech Recognition API in a C# component the Windows Touch APIs (+custom swipe detection) in a C++ component and we fuse and submit the generated interaction commands in another C++ module plugged into the game. Research questions 1. Which interface do people learn better without instructions? 2. The use of which interface gives better results when playing the game? 3. Which interface is more attractive/satisfying for users? 1 interface in the rest of the report 2

Description of the interfaces Interface 1: Keyboard, Mouse & Voice Commands Keyboard, Mouse We keep the mouse and keyboard commands that come with the game. As the original game comes with no manual, the user is expected to know that he should press the F1 key to get a list of the available keyboard commands. Struggling ourselves with this interface we were inspired to ask the question how well can people learn to use an interface without instruction? Voice Commands No voice commands exist in real billiard, nevertheless we made up some. We included alternatives standing for the same command with the goal to make it more intuitive and memorable. shoot/push/hit: shooting the ball stronger/much stronger/weaker/much weaker: change power of the shot 3

put here: position the ball where it currently is (can also be used synergistically with the gestures, see below) cue: toggle the cue birdview: toggle birdview mode menu: display menu up/down: change highlighted item in the menu select: select highlighted menu item revert/undo: revert to the state before the last shot commands/help/what can I say: show list of available voice commands The Voice Commands are used to: Toggle discrete values like the view mode Access more complex settings in the menu control the game play: shooting, changing the power error handling with the revert/undo command Additional Feedback We show to the user what the voice recognition understood (wrong and correct things) and ask him to open the command list. Interface 2: Touch & Voice Commands The second interface uses the same voice commands as the first one. We implemented the following gestures: The user puts his finger on the ball and swipes up to shoot The user taps, holds and drags the white ball on the table when it's possible (with the keyboard interface, this is done with the unintuitive combination shift+left mouse button+move mouse) Changing of the viewpoint by dragging one finger over the screen In order to zoom in/out the user can spread/pinch 2 fingers (not shown in the video) Additional Feedback Circle around the ball indicating from where you can swipe. Clickable Put Here! button when you can place the ball. 4

Positioning of the interfaces commands according to CASE and CARE Model CASE (system side) Concurrent Using independent modalities in parallel for distinct tasks. The user can at the same time use voice to increase/decrease the power of the strike, and use touch gestures to change to zoom in/out the same could be done by using the keyboard and mouse. Alternate Using combined modalities in sequential order for one task. You can position the ball by dragging it and then say Put Here to confirm your placement. Synergistic Using combined modalities in parallel for the same task. The ball will be placed at the fingers position when you hold your finger on the screen and say Put Here. Exclusive Using independent modalities in sequential way for distinct tasks. Each shot can be considered a distinct task. You can always choose whether to use voice or keyboard/touch command to shoot. CARE (user side) Complementarity Multiple modalities are to be used within a temporal window to reach a given state. User touches the screen and holds the finger at a position and uses the voice command Put Here. The ball moves to the position where finger is pointing. Assignment Only one modality can be used to reach a given state. The list of all voice commands is only accessible through the voice commands commands/help/what can I say. Redundancy Multiple modalities have the same expressive power and they are all used within the same temporal window User can shoot by using voice and keyboard (only one shot is performed, the rest of the input is ignored). Equivalence Necessary and sufficient to use any one of the available modalities. User can shoot the ball using gestures or voice or keyboard. 5

Evaluation We let six users below the age of 30 play our game. User 1 was female, the rest male. We let them play for 5 minutes with each interface ( within group ), first with interface 1 then 2 ( independent variables ). We did no counterbalancing so every user played with the interfaces in the same order. We gave no instructions except that there is voice recognition (and thus the environment should be quiet) and that they are looking at a touchscreen. The environment was not fixed. We seated the user in front of the computer and told them to please make as much score as possible while playing this game. We recorded the following data ( dependent variables ) for each user/interface combination: Quantitative: After how much time was a certain command from a set of commands used for the first time and how often was this command used?* How much score did the user make (how many balls had disappeared from the table)? Satisfaction score Qualitative: Comments and answers from the user to the question What would you improve? Videos of the users The evaluation * is used to measure the amount of time the user spent on learning how to play the game. See the appendix for the detailed data we collected for the evaluation. We observed and filmed the users while playing to be able to draw and have proof for further qualitative conclusions. (Technical) difficulties Not all of implementation or evaluation proceeded smoothly. The game code was quite messy C code and thus hard to work with. The code we added to it was not of much better quality. Sometimes the friction of the balls would drop to 0 for no apparent reason and we would have to restart the game. Also, the voice recognition is not very tolerant of speakers with accent. We were also not sure how to make use of the quantitative data *. It seemed sometimes like the user purposefully chose not to use voice commands. 6

Results According to the obtained evaluation results, we try to give an answer to the questions stated in the introduction: Does the use of multimodal interface give better results in playing the game? & Which multimodal user interface is more attractive for users? Interface 1 (touch, voice commands) Interface 2 (touch, keyboard, mouse) Average Score (number of balls killed) 4.8 5.7 Average Satisfaction 6.8 7.5 The above results hint that the touch interface (2) is easier to use than the keyboard interface for at least the basic tasks required for playing the game (aiming and shooting): The use of interface 2 gives better results in playing the game. From the average satisfaction scores we might say that users seem to prefer interface 2 for playing the game over interface 1. We run a one sided, paired t test to check whether our differences in satisfaction and score are significant (null hypothesis: the means are the same, working hypothesis: the mean obtained with interface 2 is significantly larger). We obtain the following p Values (with Excel's TTEST function, parameters 1 (one sided);1 (paired test type)): Score: 0.11 Satisfaction: 0.23 Neither of these two values is smaller than say 0.1 (90% confidence), so we cannot reject the null hypotheses. Which interface do people learn better without instruction? Interface 2 seems more learnable without instructions than interface 1. This touch interface seems more familiar and intuitive to learn and use, whereas it is more complicated to find out the key combinations on keyboard or voice commands using the voice interface. We did not make use of our quantitative learning data, however. References [1] Laurence Nigay, Joëlle Coutaz A design space for multimodal systems: concurrent processing and data fusion, Proceedings of the INTERACT '93 and CHI '93 Conference on Human Factors in Computing Systems, Amsterdam, The Netherlands, p. 172 178, 1993, ACM [2] Joelle Coutaz, Laurence Nigay, Daniel Salber, Ann Blandford, Jon May, Richard M. Young Four Easy Pieces For Assessing The Usability of Multimodal Interaction: The CARE Properties, 1995 7

Appendix - Evaluation Data The following two tables show for each interface the number of balls pocketed (one score per pocketed ball) and the users satisfaction from playing the game (a rating from 1 to 10). Each tester could play for 5 minutes per interface. User 6* was one of our team members, no learning data and comments were recorded for him. Score and Satisfaction Score is the number of balls that were pocketed. The maximum is (1 + 2 + 3 + 4 + 5) = 15. Satisfaction goes from 1 (lowest) to 10. Interface 1 Interface 2 Score Satisfaction Score Satisfaction User 1 0 8 0 8 User 2 6 6 5 4 User 3 0 7 2 7 User 4 3 5 4 9 User 5 8 5 11 7 User 6* 12 10 12 10 Comments from users regarding game, used modalities, and their answer to the question What would you improve? : User 1 The game looked good. For me as a person who is playing the billiard for the first time the game was good to explore the modalities I can use to play it. I would improve the number of commands, so I could choose menu options using voice commands (using the name of the option, instead of select and up/down commands). Using gestures was easy, it looked familiar what you should do to zoom in/out, to shoot or to move the ball, as well as using keyboard and mouse, I have learned easy which button to press to play a game. Whereas learning voice commands take a little bit more time, however the voice commands are very suitable like shoot, menu and birdview, which is easy to learn. In general, the game is well developed, easy and interesting to learn. I give 8 for the game evalution since I didn't feel comfortable due to number of unknown keyboard commands which have to be learned, and because of the table view, where I didn t know how to proceed from one camera view to 8

another. The score I have achieved during playing the game is 0, I believe this is mostly because I have never experienced playing billiard both as a computer game and real one. User 2 Half of the voice commands are somehow useless. The line that shows the direction of the ball should have the right angle [be 3d]. Leaving birdview seems impossible. Put here! does not look like a button. Birdview does not help much. The game crashes. The cue and the camera cannot be moved independently. up, stronger, much stronger somehow up doesn t seem as if it has anything to do with the strength [our note: it doesn t, but there was a delay and the commands got mixed]. On the command list it is close to birdview, so you could think it has to do with the camera. The cue mode seems useless, you cannot change the direction into which you strike anymore [note: it is hard to figure out how the cue can be controlled with the keyboard and mouse interface]. Mouse seems to be there only to move the camera. User 3 Interface 1 (started with this) The design of the game looked nice Regarding the interface it made playing the game a bit hard I needed more hints like that you can shoot with space etc. Interface 2 It was a better experience than the first interface Problems: I didn t know how to control the power I couldn t read the Say commands above because I thought it might automatically shoot [user felt time pressure from seemingly automatic shooting due to background noise] User 4 Interface 1: No Instructions! This makes it difficult to figure out what the goal is. (touch?, speech?, keyboard?) After a while there was no target line Interface 2: Didn t find out what features the touch interface has, but much better and more natural and intuitive. 9

User 5 Interface 1: The 3D nature of the graphics is not helpful. It is not clear who s turn it is [note: there is only one player, but the ball is reset sometimes]. The assignment of keys is strange. (Why not hold the key longer to hit more strongly?) The voice commands are useless, you only feel like an idiot. It is not clear, why I sometimes could not shoot [note: you cannot shoot when looking straight down on the ball, the strength is modulated by the sine of the angle at which you look at the ball]. Interface 2: Outch, my arm, this is exhausting! [note: we had the users play with the touch screen in front of them instead of flat on the desk]. This is better than the keyboard. Sometimes there is strange behaviour (you cannot play while in birdview mode?). The [voice] commands are more of a nuisance than helpful. It is not easy the estimate the strength. It is hard to keep an overview of the situation User 6* No comment. 10

Discovered Commands and Learning Time The following tables record which users discovered which commands after how much time (in seconds) and how often they used them. Gaps indicate that the command was not used. Many of the voice commands were not actually intended by the user but resulted from commenting or noise. Interface 1: Keyboard, Mouse, Voice Commands Command User 1 User 2 User 3 User 4 User 5 Shoot (Voice) 40s, 4x 13s, 8x 25s, 1x 80s, 4x 90s, 1x Shoot (Keyboard) 80s, 3x 34s, 7x 210s, 32s, 5x 20s, 56x Bird view (Voice) 80s, 10x 59s, 2x 220s, 1x Bird view (Keyboard) Stronger/Weaker (Mouse) 65s, 3x 6s, 4x Commands (Voice) 60s,1x 55s, 26s, 1x 70s, 1x Stronger/Weaker (Voice) 120s, 4x 20s, 2x Cue (Keyboard) Cue (Voice) 70s, 4x 180s, 2x 75s, 1x Help F1 (Keyboard) Interface 2: Touch, Voice Commands Shoot (Voice) 20s, 4x 13s, 6x 2s, 8x Shoot (Gesture) 50s, 3x 15s, 14x 70s, 10x 4s, 35x Zoom (Gesture) 50s, 1x 120s, 2x Camera Move (Gesture) 15s, 16x 40s, 39x Place the Ball (Gesture) 20s,4x 1s, 3x 1s, 5x 5s, 5x 1s, 5x Bird view (Voice) 80s,10x 180s, 2x Commands (Voice) 50s, 1x 140s, 1x 1s, 1x Stronger/Weaker (Voice) 34s, 5x 130s, 2x 54s, 14x Cue (Voice) 55s, 1x 30s, 2x 33s, 1x Place the Ball (Voice) 28s, 2x 11