LCC 3710 Principles of Interaction Design. Readings. Sound in Interfaces. Speech Interfaces. Speech Applications. Motivation for Speech Interfaces

LCC 3710 Principles of Interaction Design Class agenda: - Readings - Speech, Sonification, Music Readings Hermann, T., Hunt, A. (2005). "An Introduction to Interactive Sonification" in IEEE Multimedia, April-June 2005. Schmandt, Chris (1995). "Audio Hallway: A Virtual Acoustic Environment for Browsing" in Proceedings of UIST '98, ACM Press. Sound in Interfaces Some examples: Speech interfaces, require sound input Auditory displays, sound as output only Synthetic sound / music creation Speech Interfaces Motivation for Speech Interfaces Speech Applications I/O can be difficult for smaller devices People can talk at ~90 wpm -> high speed Potentially very large set of commands Freedom of hands and other body parts Imagine you are working on your car and need to know something from the manual Natural interaction for humans Ideal applications Hands are already busy Manual input is already overloaded Disabled users Pervasive information access, mobile devices Examples: Voice mail Handheld voice recorders Audio books Instructional systems

Speech Interface Requirements Speech recognition The computer needs to understand what the user is saying Sound Output Methods To a loudspeaker - Annoys others nearby - Allows multiple users to hear what is happening Speech production The computer needs to respond to the user To headphones - Private to headphone wearer - Unlike visual interfaces, sound can be an input Basic Method of Recognition Recognition vocabulary represented as stored patterns Speech sampled and digitized Waveforms or their parameters compared against patterns But currently speech interfaces are still infrequently used Implementation issues Interfaces issues (though they are getting better over time) Implementation Issues Interface Issues Some major issues: Vocabulary size? Single user or multiple users? Isolated words or continuous speech? Too many individual characteristics of speech (pitch, volume, habits, idioms, accent, etc) Command interfaces are generally simpler: Handling applications, commands inside an application Generally small vocabulary, application dependent Form is usually <verb> <noun> E.g. run firefox, select image1.jpg, hide word, etc Speech recognition is far from perfect and users don't cope well with unpredictable machines Imagine inputting commands with the mouse & getting the wrong result 5-20% of the time Speech UIs have no visible state Users can t see what they have done and what effect their commands have had Speech UIs are hard to learn How do you explore the interface? How do you find out what you can say? Speech UIs have limited vocabularies and impose greater human memory requirements

Speech vs. Visual Interfaces Speech Interface Guidelines Consider speech interfaces when information is short, simple is needed immediately, but not later refers to events over time requires verbal response adds to visual data can t be displayed visually (e.g. light insufficient) Consider visual interfaces when information is long and complex must be remembered refers to spatial relationships requires graphical response adds to auditory data can t be presented through speech (e.g. due to noise) Follow the general interaction design guidelines (consistency, feedback, etc), but also - Allow for user interruptions - Be concise (consider memory constraints) - Provide input alternatives (e.g. keyboard and gestures) - Consider synthesized voice vs. natural voice depending on the application (e.g. voice mail system, news stories) - Be flexible, always offer human assistance - Minimize intrusiveness (e.g. activating speech at a meeting) - Test the pace of the interface with actual users - Exploit traits of utterances such as pauses and abrupt signal changes (e.g. TalkBack) Put That There (1982) Bolt & Schmandt, MIT Media Lab Speech Interfaces Some example research Early example of voice commands augmented by physical gestures - Enables disambiguation - Still difficult to generalize Put That There Video Clip Conversational Desktop (1985) Hyperspeech (1985) Barry Arons, MIT Media Lab Interactive conversational office environment - Vision of a fully conversational workspace - Still hasn't become a reality today Conversational Desktop Video Clip Barry Arons, MIT Media Lab Speech-only hypermedia system - Browsing speech interface, navigation through digital recordings that are manually segmented and structured - No visual display - Speech recognition input, synthetic speech feedback - Technique practical only in limited domains Hyperspeech Video Clip

Impromptu (2001) Kwan Hong Lee, Chris Schmandt MIT Media Lab Mobile IP-based audio computing platform Multi-tasking interaction with multiple simultaneous streaming audio applications Speech services for user interaction Impromptu Chris Schmandt Demo TalkBack (2002) Vidya Lakshmipathy, Natalia Marmasse MIT Media Lab - Interactive messaging device - Respond to voice mail messages as you listen - Waveform analysis, inference about content based on pattern - Long breaks identify discourse segments - Sharp increase then decrease followed by a pause is assumed to be a question - Recording time inserted to allow conversation with the message TalkBack Video Clip Audio Information Auditory Displays The sounds we hear can help us understand what is going on around us in a given situation What are some examples when driving? Audio Information The sounds we hear can help us understand what is going on around us in a given situation What are some examples when driving? Roles for Auditory Displays Provide feedback, confirm actions Status indicators and monitors Alarms and warnings For visually-impaired users Music to provide mood context, e.g. in games Status indicators (e.g. revving engine) Action feedback (e.g. gears grinding) Warning sounds (e.g. screeching tires, blowout)

SonicFinder (1989) Bill Gaver, UCSD Sonification Some examples Desktop interface "earcons" Hear the trash can through a "tinny crash" Hear amount of space on disk through reverberation Hear status of scrolling through ascending or descending tones Issues Appropriate acoustic design Acoustic pollution voice (2005) Peter Meijer Vision technology for the blind Sonification applet for visual imagery Vertical positions represent pitch Horizontal positions represent time Brightness represents amplitude Synthetic Sound and Music http://www.seeingwithsound.com/javoice.htm Real-Time Abstract Animation and Synthetic Sound Golan Levin, MIT Media Lab 2000 AVES Audiovisual Environment Suite Set of 5 interactive systems that allow people to create and perform abstract animation and synthetic sound in real time Harmony Line, Inc. Hyperscore (2005) Expressive music creation software for children Creates music by drawing Yellowtail demo video Screen examples: Aurora clip Floo clip Loom clip Live performance: Scribble clip http://www.hyperscore.com/ Run demo