ModaDJ. Development and evaluation of a multimodal user interface. Institute of Computer Science University of Bern

Similar documents
Project Multimodal FooBilliard

A Kinect-based 3D hand-gesture interface for 3D databases

ChordPolyPad Midi Chords Player iphone, ipad Laurent Colson

Haptic Camera Manipulation: Extending the Camera In Hand Metaphor

Lab 4 Projectile Motion

creation stations AUDIO RECORDING WITH AUDACITY 120 West 14th Street

Korde Sequencer Operation Manual

TrampTroller. Using a trampoline as an input device.

E90 Project Proposal. 6 December 2006 Paul Azunre Thomas Murray David Wright

NIS-Elements: Grid to ND Set Up Interface

User Guidelines for Downloading Calibre Books on Android with Talkback Enabled

Harry Plummer KC BA Digital Arts. Virtual Space. Assignment 1: Concept Proposal 23/03/16. Word count: of 7

Solo Mode. Chords Mode

Android User manual. Intel Education Lab Camera by Intellisense CONTENTS

Advancements in Gesture Recognition Technology

Introducing Photo Story 3

POWER USER ARPEGGIOS EXPLORED

Creating Digital Stories for the Classroom

Studuino Icon Programming Environment Guide

I2C8 MIDI Plug-In Documentation

Power User Guide MO6 / MO8: Recording Performances to the Sequencer

Helm Manual. v Developed by: Matt Tytel

Usability Evaluation of Multi- Touch-Displays for TMA Controller Working Positions

Chapter 9 Organization Charts, Flow Diagrams, and More

CONTENTS JamUp User Manual

USER S MANUAL (english)

Photo Editing in Mac and ipad and iphone

Chlorophyll Fluorescence Imaging System

Blue-Bot TEACHER GUIDE

Motic Live Imaging Module. Windows OS User Manual

Nhu Nguyen ES95. Prof. Lehrman. Final Project report. The Desk Instrument. Group: Peter Wu, Paloma Ruiz-Ramon, Nhu Nguyen, and Parker Heyl

User Experience Guidelines

Lab 4 Projectile Motion

Practicing with Ableton: Click Tracks and Reference Tracks

Sense. 3D scanning application for Intel RealSense 3D Cameras. Capture your world in 3D. User Guide. Original Instructions

AR Cannon. Multimodal Interfaces. Students: Arnaud Durand 1, Léonard Stalder 2, Thomas Rouvinez 3 Professors: Dr. Denis Lalane 4, May 23, 2014

Adding Content and Adjusting Layers

CA58 MIDI Settings Manual MIDI Settings

Tap Delay. Multi-tap Delay and Tape Simulator. VirSyn

_ Programming Manual RE729 Including Classic and New VoX Interfaces Version 3.0 May 2011

Sensors and Scatterplots Activity Excel Worksheet

TouchMix Series. Quick Start Guide. Installing the Windows Driver. Non-DAW audio playback from computer. TouchMix-30 Pro settings.

M-16DX 16-Channel Digital Mixer

Introduction to Haptics

creation stations AUDIO RECORDING WITH AUDACITY 120 West 14th Street

Record your debut album using Garageband Brandon Arnold, Instructor

Introduction to: Microsoft Photo Story 3. for Windows. Brevard County, Florida

Ample China Pipa User Manual

IGNITE BASICS V1.1 19th March 2013

User Experience Guidelines

PHYSICS 220 LAB #1: ONE-DIMENSIONAL MOTION

What was the first gestural interface?

Solo Mode. Strum Mode

A guide to SalsaJ. This guide gives step-by-step instructions on how to use SalsaJ to carry out basic data analysis on astronomical data files.

Owner s Manual. Page 1 of 23

CS 247 Project 2. Part 1. Reflecting On Our Target Users. Jorge Cueto Edric Kyauk Dylan Moore Victoria Wee

NCSS Statistical Software

A Multimodal Air Traffic Controller Working Position

Autodesk. SketchBook Mobile

CA48 MIDI Settings Manual MIDI Settings

Inspiring Creative Fun Ysbrydoledig Creadigol Hwyl. Kinect2Scratch Workbook

THESE ARE NOT TOYS!! IF YOU CAN NOT FOLLOW THE DIRECTIONS, YOU WILL NOT USE THEM!!

Recording your Voice Tutorials 2 - Setting the Computer Setting Audacity Wayne B. Dickerson

ChordMapMidi Chord-Exploring App for iphone

The included VST Instruments

Heads up interaction: glasgow university multimodal research. Eve Hoggan

Learning Guide. ASR Automated Systems Research Inc. # Douglas Crescent, Langley, BC. V3A 4B6. Fax:

Introduction to HCI. CS4HC3 / SE4HC3/ SE6DO3 Fall Instructor: Kevin Browne

Color and More. Color basics

PRORADAR X1PRO USER MANUAL

a. the costumes tab and costumes panel

Quintic Software Tutorial 3

SMX-1000 Plus SMX-1000L Plus

Robotics Laboratory. Report Nao. 7 th of July Authors: Arnaud van Pottelsberghe Brieuc della Faille Laurent Parez Pierre-Yves Morelle

GlobiScope Analysis Software for the Globisens QX7 Digital Microscope. Quick Start Guide

Next Back Save Project Save Project Save your Story

Introduction. Overview

While entry is at the discretion of the centre, it would be beneficial if candidates had the following IT skills:

ColorPony User Manual

Go Daddy Online Photo Filer

Congratulations on purchasing Molten MIDI 5 by Molten Voltage

Version A u t o T h e o r y

Lynx Chipper Coded by Sage. Document Date : April 27 th 2011 VER: 0.1. (rough translation and additional guide by ctrix^disasterarea) Forward

Subject Name:Human Machine Interaction Unit No:1 Unit Name: Introduction. Mrs. Aditi Chhabria Mrs. Snehal Gaikwad Dr. Vaibhav Narawade Mr.

Apple Photos Quick Start Guide

Proposal Accessible Arthur Games

Aimetis Outdoor Object Tracker. 2.0 User Guide

GESTURE RECOGNITION SOLUTION FOR PRESENTATION CONTROL

Audacity 5EBI Manual

Help Manual - ipad. Table of Contents. 1. Quick Start Controls Overlay. 2. Social Media. 3. Guitar Tunes Library

J74 Progressive (Standalone Edition) - User Manual Page 1 of 52. J74 Progressive. - Standalone Edition -

* Apple, ipad, iphone, ipod touch, and itunes are trademarks of Apple Inc., registered in the U.S. and other countries. * All product names and

MUSC 1331 Lab 3 (Northwest) Using Software Instruments Creating Markers Creating an Audio CD of Multiple Sources

iphoto Getting Started Get to know iphoto and learn how to import and organize your photos, and create a photo slideshow and book.

Voice Recorder Recording/Playing Voice

Mine Seeker. Software Requirements Document CMPT 276 Assignment 3 May Team I-M-Assignment by Dr. B. Fraser, Bill Nobody, Patty Noone.

Making Music with Tabla Loops

7.0 - MAKING A PEN FIXTURE FOR ENGRAVING PENS

R (2) Controlling System Application with hands by identifying movements through Camera

Common ProTools operations are described below:

Transcription:

ModaDJ Development and evaluation of a multimodal user interface Course Master of Computer Science Professor: Denis Lalanne Renato Corti1 Alina Petrescu2 1 Institute of Computer Science University of Bern 2 Department of Informatics University of Fribourg

Contents 1 Introduction...1 1.1 Objectives and research question...1 1.2 Overview and project description...1 2 Input devices...2 2.1 Conventional interface (mouse and touch screen)...2 2.2 Multimodal interface (Kinect)...2 3 Modalities...3 4 Implementation...4 5 Evaluation...5 6 Conclusions...6 7 References...6

1 Introduction This project report describes a mini-project prepared and run in the course. We developed a simple music application controllable either by mouse, touch screen or Kinect used for gesture recognition. In accordance with terms commonly used in music world, velocity and volume are used synonymous and describe the loudness. 1.1 Objectives and research question Main purpose of this project is the question, how different age groups make use of the different modalities provided by the application when creating music. As the idea is to do live music i.e. not pressing pause to add notes and take time to think, another focus lies on the reaction time the modalities take to carry out given tasks. 1.2 Overview and project description The application is composed of a grid of notes on the left and a control panel on the right as depicted in figure 1. The grid of notes is composed of twelve rows and sixteen columns. Each row shows a certain pitch and the columns show the elapsed time. Each square can be filled with one of the four colours (black, white, red or green). Black represents void and the other Figure 1: Main window of the application three colours represent different instruments. Additionally, each note can have a velocity value which can be translated to the volume of the note. The velocity and colour can be chosen in the control panel on the right. Furthermore, there are three buttons: Play which starts or pauses the playback, Stop which stops the playback and Clear which deletes all the entered notes. The Tempo slider controls the speed of the playback and the 1

Pitch Bend slider applies a pitch bending effect on the playing melody. Both sliders can be changed during the animation. The application is written in Python version 3.5 and uses the Tkinter UI toolkit. The audio output is not handled by the application itself. Instead, MIDI messages are emitted through the mido library, which can then be processed by any MIDI-compatible program, such as the FluidSynth audio synthetizer. 2 Input devices We compared two modes of input for our application: a conventional mouse or touch screen interface for the right hand and a multimodal interface using the Kinect for the left hand. Without loss of generality the inputs can be switched for left-handed persons. 2.1 Conventional interface (mouse and touch screen) The user places notes on the grid with either the mouse or the finger. Before placing a note, the user has to choose a colour by clicking or tapping it. He also has to set the velocity by sliding it, again either with the mouse or the finger. Once all the desired notes are placed on the grid, the user pushes the Play button and the music playback starts to loop continuously. The user can also change the velocity or the pitch bend by sliding it while the application is running. 2.2 Multimodal interface (Kinect) First the user has to calibrate the colours on the cube and the hand recognition with the Kinect facing down on the table as depicted in figure 2. The cube can take up to six colours, each for one side of the cube. By doing this, the user needs to go through two calibration screens. The first one allows the user to crop the image around the hand to make its detection more easier as shown in figure 3. The second one permits the user to calibrate well enough the cube s colour faces in order to use the application properly. 2

Figure 2: Figure 4: The Setup of the second calibration Kinect device Figure 3: The first calibration screen screen Once the calibration is done, the user arrives at the actual application seen in figure 1. With the help of the Kinect, the user can adjust any note s velocity by sliding the hand vertically up and down under it, or he can adjust the whole melody s pitch bend by sliding his hand horizontally left and right under it. These two hand methods were well appreciated and faster to use. Finally the user can choose any instrument implemented in the application by simply rotating the cube s faces with the hand to the desired colour. 3 Modalities On the device side, the Kinect provides one possibility to select a specific note and multiple ways to modify or adjust the selection. This can be done in sequence or at the same time, for example selecting a note and changing the instrument with one movement. According to the CASE model this qualifies as a synergistic or alternate user interface. But in addition to that, the two input modalities can be redundant as for example the pitch can be changed with either input modality which adds a concurrent communication type to the whole application. An exclusive input is not demanded by the application and in addition not intended. In the case of contradicting inputs, for 3

example raising the volume with one gesture and decreasing the volume with the touch interface, the touch input device should take precedence. On the user side of the modalities, the CARE model describes the usability concepts the user is confronted with when composing live music. Normally the inputs are carried out complementary i.e. within a given temporal window to reach a given state i.e. the live change of the notes. An assignment to a specific input modality to reach a certain state is not given: The application does not lack different choices to carry out a task. The user can also perform modalities in a redundant way like for example selecting a different instrument and turn the cube to show a different face. This also was one of the objectives. Same goes with equivalence properties which allow the user to choose different modalities. Such choice possibilities sometimes yield interesting results as explained in chapter 5. Fission of the channels to provide the users with appropriate feedback was unobtrusive and worked without noteworthy problems. Fusion for the different modalities were also less of a problem and limited to a few particular situations like providing contradicting inputs as described earlier in this chapter. 4 Implementation Unfortunately, under Linux, the choice of libraries to interface with the Kinect is limited. We opted for libfreenect2 and its Python binding pyfreenect2. In addition to this a library for scientific computing with Python numpy was used for certain image and array manipulation functions. Before launching the program, the user is presented with two calibration screens. The first screen allows the user to choose a subregion of the Kinect s fieldof view seen in figure 3 to make the detection of their hand easier. On the second screen displayed in figure 4, the user must calibrate the colour detection by showing each of the cube s faces to the Kinect s camera and clicking the corresponding colour s button. Note that we initially planned on having six possible colour choices, but the results were poor. Therefore we reduced the number of choices to four colours to make the colour detection more accurate. 4

5 Evaluation Among the user feedbacks, there were several negatives and positives. For example, the cube was often seen as negative because people tend to click directly on the touch screen to select the desired colour. Additionally, instead of having colours on the cube s faces, we could have the actual image of the instrument to allow a distinction of the cube sides for users with colour blindness. In contrast, the hand gestures up / down and left / right have been highly appreciated as it was more intuitive. It was faster changing the velocity and the pitch bend with the Kinect s hand recognition method rather than touching the screen to do it. Moreover, a global amelioration could be to make the buttons on the right panel of the application bigger so that it is easier to manipulate them with the finger. The following table resumes the users evaluations. We had six different testers, all having different ages and gender. They were mostly family and friends. The third column answers the questions Do you know music? Do you play an instrument?. The fourth column indicated the approximate time the user took while adapting to the application. The fifth and sixth columns show what users liked and disliked with the overall system. Finally, the last column was for a little challenge proposed to them. The challenge was to reproduce a song, for example Frère Jacques only by hearing it and to see how much time they was needed in order to replicate it in the application with the preferred modalities. Gender Age Familiar Accomodation with time to the app Preferred modalities Did not like Successful task? music? Female 20 Yes ~ 5 minutes Hand gestures (up / down, The cube + the ~ 15 minutes left / right) + touch screen mouse for the instruments Female 58 No > 10 minutes Touch screen only The cube + the ~ 30 minutes Kinect Female 25 Yes < 5 minutes Hand gestures (up / down, The cube + the ~ 10 minutes left / right) + touch screen mouse for the instruments Male 28 Yes < 5 minutes Hand gestures (up / down, The cube + the ~ 15 minutes left / right) + touch screen mouse for the instruments 5

Male 59 No ~ 8 minutes Mouse + touch only The cube + the ~ 25 minutes Kinect Male 23 No > 10 minutes Hand gestures (up/down, The cube + the left/right) + touch screen mouse Abadoned for the instruments 6 Conclusions The results showed that elder persons preferred the touch and mouse input modalities over the Kinect. Younger people took less time to accommodate to the whole system, not only the application. From the last column it is also visible that users using the Kinect as input device were slightly faster at completing successfully the given task. Future improvements might include the modality of the voice limited to certain commands like Start, Pause and Stop. An additional third axis might also be set up to provide the user with an additional input channel. An important part at this point might be the usage behaviour of users to evaluate the intuition and assignments of usage modalities for the users. 7 References Lalanne, Denis. Slides from the course 2017. Department of Informatics: University of Fribourg. 6