Android Speech Interface to a Home Robot July 2012

Similar documents
Testing an Assistive Fetch Robot with Spatial Language from Older and Younger Adults

Participant Information Sheet

GESTURE RECOGNITION SOLUTION FOR PRESENTATION CONTROL

DEVELOPMENT OF A ROBOID COMPONENT FOR PLAYER/STAGE ROBOT SIMULATOR

An IoT Based Real-Time Environmental Monitoring System Using Arduino and Cloud Service

ReVRSR: Remote Virtual Reality for Service Robots

PYBOSSA Technology. What is PYBOSSA?

University of Toronto. Companion Robot Security. ECE1778 Winter Wei Hao Chang Apper Alexander Hong Programmer

A SURVEY ON HCI IN SMART HOMES. Department of Electrical Engineering Michigan Technological University

A Study on the control Method of 3-Dimensional Space Application using KINECT System Jong-wook Kang, Dong-jun Seo, and Dong-seok Jung,

Computer Progression Pathways statements for KS3 & 4. Year 7 National Expectations. Algorithms

Formation and Cooperation for SWARMed Intelligent Robots

The Seamless Localization System for Interworking in Indoor and Outdoor Environments

Graduation Design Project Proposal Form

SMART ELECTRONIC GADGET FOR VISUALLY IMPAIRED PEOPLE

Development of a telepresence agent

Computer Usage among Senior Citizens in Central Finland

ROBOT FOR BIOMEDICAL APPLICATIONS CONTROLLED BY REGIONAL LANGUAGE

Voice Command Based Robotic Vehicle Control

Flexible Roll-up Voice-Separation and Gesture-Sensing Human-Machine Interface with All-Flexible Sensors

Human-Robot Interaction for Remote Application

ENHANCED HUMAN-AGENT INTERACTION: AUGMENTING INTERACTION MODELS WITH EMBODIED AGENTS BY SERAFIN BENTO. MASTER OF SCIENCE in INFORMATION SYSTEMS

DRAFT 2016 CSTA K-12 CS

Introduction to HCI. CS4HC3 / SE4HC3/ SE6DO3 Fall Instructor: Kevin Browne

Fourier Analysis of Smartphone Call Quality. Zackery Dempsey Advisor: David McIntyre Oregon State University 5/19/2017

Responding to Voice Commands

Developing Applications for the ROBOBO! robot

Real-time Real-life Oriented DSP Lab Modules

GPS Waypoint Application

LOCALIZATION AND ROUTING AGAINST JAMMERS IN WIRELESS NETWORKS

Major Project SSAD. Mentor : Raghudeep SSAD Mentor :Manish Jha Group : Group20 Members : Harshit Daga ( ) Aman Saxena ( )

VOICE RECOGNITION BASED HOME AUTOMATION SYSTEM USING ANDROID AND ARDUINO

MMHS (STANAG 4406 Annex E & ACP 142) over STANAG Steve Kille - CEO

Our Aspirations Ahead

Press Contact: Tom Webster. The Heavy Radio Listeners Report

Mission Reliability Estimation for Repairable Robot Teams

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Michel Tousignant School of Rehabilitation, University of Sherbrooke Sherbrooke, Québec, J1H 5N4, Canada. And

THE USE OF ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN SPEECH RECOGNITION. A CS Approach By Uniphore Software Systems

Multi-Modal User Interaction

Dr. Vincent Lau

Open Source Voices Interview Series Podcast, Episode 03: How Is Open Source Important to the Future of Robotics? English Transcript

The Effect of Natural Disasters on Climate Change and Sea Level Rise

A Brief Overview of Facebook and NLP. Presented by Brian Groenke and Nabil Wadih

Natural Spatial Language Generation for Indoor Robot

Speech Controlled Mobile Games

HeroX - Untethered VR Training in Sync'ed Physical Spaces

1. Future Vision of Office Robot

Microphone Array project in MSR: approach and results

The Technique for Writing a Book Fast

V2X-Locate Positioning System Whitepaper

Roadblocks for building mobile AR apps

DESIGN AND CAPABILITIES OF AN ENHANCED NAVAL MINE WARFARE SIMULATION FRAMEWORK. Timothy E. Floore George H. Gilman

Multiagent System for Home Automation

Enhancing Robot Teleoperator Situation Awareness and Performance using Vibro-tactile and Graphical Feedback

Turtlebot Laser Tag. Jason Grant, Joe Thompson {jgrant3, University of Notre Dame Notre Dame, IN 46556

Proceedings of th IEEE-RAS International Conference on Humanoid Robots ! # Adaptive Systems Research Group, School of Computer Science

Multiple Sound Sources Localization Using Energetic Analysis Method

Social Network Behaviours to Explain the Spread of Online Game

Using the Dragon NaturallyMobile Recorder

Emergency Stop Final Project

Collaborative Robotic Navigation Using EZ-Robots

An Analysis of Existing Android Image Loading Libraries: Picasso, Glide, Fresco, AUIL and Volley. Yoo-jeong SONG, Soo-bin OU and Jong-woo LEE *

High-speed Noise Cancellation with Microphone Array

2 Focus of research and research interests

Moving Domestic Robotics Control Method Based on Creating and Sharing Maps with Shortest Path Findings and Obstacle Avoidance

Pixie Location of Things Platform Introduction

1 Publishable summary

Incorporating a Software System for Robotics Control and Coordination in Mechatronics Curriculum and Research

Virtual Reality Calendar Tour Guide

Years 9 and 10 standard elaborations Australian Curriculum: Digital Technologies

INTELLIGENT KITCHEN MODEL FOR SMART HOMES

Team Description Paper

A MOBILE SOLUTION TO HELP VISUALLY IMPAIRED PEOPLE IN PUBLIC TRANSPORTS AND IN PEDESTRIAN WALKS

Live Hand Gesture Recognition using an Android Device

Reading and working through Learn Networking Basics before this document will help you with some of the concepts used in wireless networks.

AN547 - Why you need high performance, ultra-high SNR MEMS microphones

[Bhoge* et al., 5.(6): June, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

The Billy Scaife Guide to ICT Exam 2015

Implementation of Text to Speech Conversion

Prof. Subramanian Ramamoorthy. The University of Edinburgh, Reader at the School of Informatics

Software Requirements Specification Document. CENG 490 VANA Project

Autonomous Vehicle Speaker Verification System

Touch & Gesture. HCID 520 User Interface Software & Technology

DESIGN OF AN IMAGE PROCESSING ALGORITHM FOR BALL DETECTION

The WURDE Robotics Middleware and RIDE Multi-Robot Tele-Operation Interface

A Quality Watch Android Based Application for Monitoring Robotic Arm Statistics Using Augmented Reality

Mars Rover: System Block Diagram. November 19, By: Dan Dunn Colin Shea Eric Spiller. Advisors: Dr. Huggins Dr. Malinowski Mr.

RSSI-Based Localization in Low-cost 2.4GHz Wireless Networks

CONTROLLING METHODS AND CHALLENGES OF ROBOTIC ARM

Application Areas of AI Artificial intelligence is divided into different branches which are mentioned below:

Verus. Khalid Alqinyah, Muhsin Gurel, Michael Mullen, Richard Tran, Phil Weber

Interior Design with Augmented Reality

Does the Appearance of a Robot Affect Users Ways of Giving Commands and Feedback?

Microphone Array Design and Beamforming

Distributed spectrum sensing in unlicensed bands using the VESNA platform. Student: Zoltan Padrah Mentor: doc. dr. Mihael Mohorčič

idocent: Indoor Digital Orientation Communication and Enabling Navigational Technology

AES - Automotive Embedded Systems

Performance Evaluation of Different CRL Distribution Schemes Embedded in WMN Authentication

SUBELEMENT T4. Amateur radio practices and station set up. 2 Exam Questions - 2 Groups

Transcription:

Android Speech Interface to a Home Robot July 2012 Deya Banisakher Undergraduate, Computer Engineering dmbxt4@mail.missouri.edu Tatiana Alexenko Graduate Mentor ta7cf@mail.missouri.edu Megan Biondo Undergraduate, Computer Science mmbvfb@mail.missouri.edu Prof. Marjorie Skubic Faculty Mentor SkubicM@missouri.edu Abstract A growing elderly population and shortages of nursing staff create a need for innovative technologies in eldercare. The use of a home robot to assist with daily tasks, such as fetching objects, is one such example. However, there is also need for effective and convenient human robot interaction in this scenario, which a simple speech interface may provide. We investigated the use of the built-in speech recognition in Android phones for the fetch task, as well as the various methods of implementing a successful and efficient two-way server-client connection over an appropriate and practical type of wireless network between a smartphone and a home robot. An Android application was developed which utilizes the underlying network and process communication system to support its use. Finally, tests were performed comparing the accuracy of speech recognition on the Android phone for older and younger adult voices. Introduction Recent studies have shown that one of the top five tasks noted by seniors for assistive robots is help with fetching objects, for example, retrieving missing eyeglasses (Beer et al., 2012), and the preferred form of communication with the robot is a speech interface (Scopelliti et al., 2005). We investigated the use of the builtin speech recognition in Android phones for use in this scenario. We created an Android 1

application and implemented the underlying network and process communication system to support its use. We also collected voice recognition transcriptions from old and young people; they spoke into an Android device that had a testing application installed which we have developed. We also compared the accuracy of speech recognition on the Android phone for older and younger adults, as well as male and female ones. Previous Works Skubic et al. have studied spatial language in older and younger populations. In collaboration with Carlson et al. at Notre Dame Dept. of Psychology, they collected speech samples of older and younger adults giving spatial descriptions (Carlson et al, in review). They also created a robot capable of recognizing furniture and processing textual spatial descriptions, in addition to the common robot capabilities such as obstacle avoidance. The robot was made to listen to commands coming from the user through a computer s keyboard that is wired to the robot itself. Since it is impractical to type the spatial descriptions, there is a need for an accurate speech recognition which we addressed in our research. Why Android? We decided to test Android s speech interface, created by Google, because it is known for high accuracy and is freely available in Android-based devices which are being activated at a rate of 1 million devices per day worldwide (Android, 2012). Google s approach to speech recognition is also unique because it relies on crowdsourcing in addition to integration of existing acoustic models. We created an Android application that handles the audio data and sends the transcription to the robot for processing. 2

The use of Android devices for this purpose also has technical benefits including the audio processing and transcription is handled by Google s servers, Android applications are easy to install on any Android device, Android devices and the operating systems support a wide range of accessibility features for helping the elderly use the different applications installed, Android devices have built-in microphones, eliminating the need for the user to purchase a headset or other microphone, and finally a speech recognition application allows the user to decide when they want to communicate with the robot, which prevents the robot from reacting to speech directed to other people. System Components The system as a whole consists of two main components, an Android phone and a robot. Both components interact and send information to one another using a specific networking algorithm, Transmission Control Protocol (TCP). Android is based on Linux kernel which is an open source base for the growing operating system, it also utilizes Java s API into its development which allows it to perform and function in an object oriented way. Furthermore, Android, when it comes to development, makes it easy and practical for developers to change, switch and supply more resources to their applications by dealing with the XML based resources. XML is a simple language that Android allows developers to use to create and reference to sophisticated screen layouts and other resources such as pictures and videos. Android s platform and its use of Java s packages such as java.net, allows developers to use the phones hardware in a matter that is no different than a one in a fully featured computer. The networking 3

capabilities of the Android phone leave the users with the freedom to choose which networking protocol they would like to follow and integrate in their applications. ROS [Robot Operating System] is an open-source, meta-operating system for your robot. It provides the services you would expect from an operating system, including hardware abstraction, low-level device control, implementation of commonly-used functionality, messagepassing between processes, and package management. It also provides tools and libraries for obtaining, building, writing, and running code across multiple computers (ROS, 2012). Ultimately, for the purposes of this undertaking, ROS is a tool that can be used to program and control a robot. The robot uses ROS which is based around publishsubscribe pattern. The server process inside of ROS publishes the textual transcriptions it receives from the Android device while other processes in the robot (primarily language processing) subscribe to the server s feed. In order for the robot to receive these transcriptions, a TCP server had to be integrated into ROS. Data In (from device) TCP Server Process Data/ Create Message Server Side (The Robot) ROS Topic There are many ways to achieve a link between two devices, but for matters of reliability a TCP server used. TCP, or Transmission Control Protocol, uses what is called a sliding window to assure that all packets reach their destination. Although large amounts of vital data are not being sent, in this case of something as simple as a sentence or two, it is important that all the pieces make it to the destination. Robot Node Move According to Message/ Sensors Figure 1. Server communication within ROS. 4

System Functionality Everything begins when the user decides they want to use the robot. When they open the application and begin, is part of the robot, stores the transcription temporarily, and then it sends the Android Phone a message stating that it got it. It will then send the transcription to the other networking is established. The user speaks nodes on the robot to be processed. into the android device, and then the phone will connect to Google s voice engines and obtain a set of transcriptions. The user, if Client Server satisfied, with any of the transcriptions, selects to send the transcription to the robot. The phone will prompt the user to confirm their option and send to the robot. The use, if sure, will accept and the transcription will be sent to the robot as shown in Figure 1. User Testing Methods Originally we used prerecorded statement from both young adults and older adults. Wireless Router Internet (Google) Figure 3. View of overall system communication. Recordings were played through a speaker and directed at the android phone. After a total of 29 (a) (b) (c) (d) (e) Figure 2. (a) User connects to robot. (b) User chooses to speak into phone. (c) User speaks into phone. (d) Phone displays the possible transcriptions to user. (e) Phone prompts user to send selected transcription to the robot. Upon receipt of the transcription from the Android phone, the server, which transcriptions were taken from 16 different older adult voices, it seemed apparent that this method was not effective as the phone did not recognize any one sentence in its entirety. In fact, it recognized less than half 5

of the sentence for the majority of those recordings. The recordings of the younger group were better but the phone still only recognized very few (13 out of 49). This however, was not necessarily a result of poor speech recognition but a consequence of using recordings, rather than live human voices. One major issue with this method is that there is a loss of quality and clarity in the audio as well as electronic interference, which could alter the outcome of the 28 words. After collecting the data we calculated the accuracy of each transcription as well as a binary value. Accuracy was computed by dividing the number of correctly transcribed words by the total number of words spoken while the binary value simple represented rather or not the sentence was transcribed perfectly. Results We found that there was a difference of approximately 10% between transcription. So it was decided that we the average accuracy of transcriptions of must recruit people to speak into the phone. We were able to obtain 53 transcriptions from older adults and 48 from younger adults. It was immediately obvious that this was a much better method of testing the voice engine. The subjects were given descriptions to read from the recordings used earlier. These older and younger voices with the younger voices being better. (a) Younger Adult Voices # Std. % Trans. Average Dev. Min. Perfect Men 28 94.25% 9.69% 66.67% 60.71% Women 20 90.18% 14.67% 37.50% 40.00% All 48 92.55% 12.05% 37.50% 52.08% (b) Older Adult Voices # Std. % Trans. Average Dev. Min. Perfect Men 22 79.25% 15.86% 42.86% 9.09% Women 31 84.66% 16.96% 16.67% 32.26% All 53 82.41% 16.58% 16.67% 22.64% Figure 4. (a) Accuracy results of younger adult voices. (b) Accuracy results of older adult voices. descriptions ranged from a length of 11 to 6

Also according to other research there is seems to better recognition of elderly female voices than elderly male voices (S. Anderson et. al., 1999). We had similar results but also noticed that within the younger group, male transcriptions However, for the purposes of the fetching goal, there needs to be more structure around the results, such as which words are important and which are not needed at all. For instance, some people may give a very detailed descriptions but were actually more accurate. These the robot is only going to pick out certain findings not only appeared in the accuracy percentage but also shined through in the number of perfectly recognized sentences of each group. Although it seems that Google s voice recognition overall is reasonably effective. Men 87.65% words. So if the speech-to-text is mostly have trouble with words such as it, the, and less important words perhaps even some low accuracy transcriptions will still be functional. Conclusion In this paper, we have researched Android s Networking capabilities and the Older 82.41% 79.25% 94.25% 87.23% 84.66% 90.18% 86.83% Women Younger 92.55% accuracy of Google s voice recognition engine. An Android application was developed using Android s API libraries. The application was designed to listen to the user s commands and access Google s engines through a wireless router that is Figure 5. (a) Accuracy of speech recognition comparing older adults vs. younger adults as well as male and female. connected to the internet in order to obtain 7

a set of transcriptions. An algorithm was written and coded to use one of Android s networking features, TCP networking, to send the desired transcription to the server, the robot, wirelessly through the router s local network. We also obtained and compared live transcriptions from both old and young adults in order to investigate the voice recognition engine s accuracy. The research results have shown the effectiveness of the networking algorithm developed alongside Android s networking features. Moreover, the results have shown a significant difference between the transcriptions accuracy for old and young adults favoring the young male ones. The results can be justified by the fact that Google s unique engines rely on crowdsourcing, and one can comfortably argue that young adults are the higher suppliers of audio recordings to Google s engines and acoustic models. Overall, though, the results stated in this paper support the effectiveness and accuracy of Google s engines in transcribing voices over the cloud. References 1. Android, 2012. Android, the world's most popular mobile platform. http://developer.android.com/abou t/index.html 2. Beer, J.M., Smarr, C., Chen, T.L., Prakash, A., Mitzner, T.L., Kemp, C.C. & Rogers, W.A. 2012. The domesticated robot: design guidelines for assisting older adults to age in place. In Proc., ACM/IEEE Intl. Conf. on Human-Robot Interaction, 335-342, March, 2012, Boston, MA 3. Carlson, L., Skubic, M., Miller, J., Huo, Z., and Alexenko, T. In Review. Investigating Spatial Language Usage in a Robot Fetch Task to Guide Development and Implement of Robot algorithms for Natural Human-Robot Interaction. Topics in Cognitive Science. 4. ROS. KenConley. 02 March 2012. 21 June 2012. http://www.ros.org/wiki/ros/introd uction 5. S. Anderson, N. Liberman, E. Bernstein, S. Foster, E. Cate, B. Levin, and R. Hudson. 1999. Recognition of elderly speech and 8

voice-driven document retrieval. In Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01 (ICASSP '99), Vol. 1. IEEE Computer Society, Washington, DC, USA, 145-148. DOI=10.1109/ICASSP.1999.758083 http://dx.doi.org/10.1109/icassp.19 99.758083 6. Scopelliti, M., Giuliani, M., and Fornara, F. 2005. Robots in a domestic setting: a psychological approach. Universal Access in the Information Society, 4(2): 146-155. 9