Mobile Cognitive Indoor Assistive Navigation for the Visually Impaired

Similar documents
CCNY Smart Cane. Qingtian Chen 1, Muhammad Khan 1, Christina Tsangouri 2, Christopher Yang 2, Bing Li 1, Jizhong Xiao 1* and Zhigang Zhu 2*

t t t rt t s s tr t Manuel Martinez 1, Angela Constantinescu 2, Boris Schauerte 1, Daniel Koester 1, and Rainer Stiefelhagen 1,2

Lecture 23 Deep Learning: Segmentation

Continuous Gesture Recognition Fact Sheet

arxiv: v1 [cs.lg] 2 Jan 2018

A Survey on Assistance System for Visually Impaired People for Indoor Navigation

Detection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 -

tsushi Sasaki Fig. Flow diagram of panel structure recognition by specifying peripheral regions of each component in rectangles, and 3 types of detect

Deep Learning. Dr. Johan Hagelbäck.

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

Wadehra Kartik, Kathpalia Mukul, Bahl Vasudha, International Journal of Advance Research, Ideas and Innovations in Technology

Virtual Reality Based Scalable Framework for Travel Planning and Training

23270: AUGMENTED REALITY FOR NAVIGATION AND INFORMATIONAL ADAS. Sergii Bykov Technical Lead Machine Learning 12 Oct 2017

Neural Networks The New Moore s Law

GESTURE RECOGNITION WITH 3D CNNS

SPTF: Smart Photo-Tagging Framework on Smart Phones

Multi-task Learning of Dish Detection and Calorie Estimation

Smartphone Positioning and 3D Mapping Indoors

Mobile Robots Exploration and Mapping in 2D

MSc(CompSc) List of courses offered in

AI Application Processing Requirements

Study Impact of Architectural Style and Partial View on Landmark Recognition

Cognitive robots and emotional intelligence Cloud robotics Ethical, legal and social issues of robotic Construction robots Human activities in many

GPU ACCELERATED DEEP LEARNING WITH CUDNN

Embedding Artificial Intelligence into Our Lives

Distributed Vision System: A Perceptual Information Infrastructure for Robot Navigation

GPS Waypoint Application

A cognitive agent for searching indoor environments using a mobile robot

Haptic presentation of 3D objects in virtual reality for the visually disabled

Semantic Segmentation on Resource Constrained Devices

NCCT IEEE PROJECTS ADVANCED ROBOTICS SOLUTIONS. Latest Projects, in various Domains. Promise for the Best Projects

Interior Design using Augmented Reality Environment

Arup is a multi-disciplinary engineering firm with global reach. Based on our experiences from real-life projects this workshop outlines how the new

* Intelli Robotic Wheel Chair for Specialty Operations & Physically Challenged

Semantic Segmentation in Red Relief Image Map by UX-Net

Building Perceptive Robots with INTEL Euclid Development kit

Toward Culture-Aware Elderly Care Robots. Nak Young Chong School of Information Science Japan Advanced Institute of Science and Technology

CS415 Human Computer Interaction

Artificial Beacons with RGB-D Environment Mapping for Indoor Mobile Robot Localization

Using Intelligent Mobile Devices for Indoor Wireless Location Tracking, Navigation, and Mobile Augmented Reality

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

Introduction to Mobile Robotics Welcome

International Journal of Informative & Futuristic Research ISSN (Online):

Project Overview Mapping Technology Assessment for Connected Vehicle Highway Network Applications

MATLAB DIGITAL IMAGE/SIGNAL PROCESSING TITLES

Visually Guided Errand Service for Home Robot

Portfolio. Swaroop Kumar Pal swarooppal.wordpress.com github.com/swarooppal1088

Analysis and retrieval of events/actions and workflows in video streams

Deep Learning Basics Lecture 9: Recurrent Neural Networks. Princeton University COS 495 Instructor: Yingyu Liang

Service Cooperation and Co-creative Intelligence Cycle Based on Mixed-Reality Technology

PIP Summer School on Machine Learning 2018 Bremen, 28 September A Low cost forecasting framework for air pollution.

DISCUSSION. 12th IAPR International Workshop on Graphics Recognition Kyoto, Japan - November Josep Lladós

Transformation to Artificial Intelligence with MATLAB Roy Lurie, PhD Vice President of Engineering MATLAB Products

A MOBILE SOLUTION TO HELP VISUALLY IMPAIRED PEOPLE IN PUBLIC TRANSPORTS AND IN PEDESTRIAN WALKS

Physical Computing: Hand, Body, and Room Sized Interaction. Ken Camarata

Automatic understanding of the visual world

Rectifying the Planet USING SPACE TO HELP LIFE ON EARTH

INTELLIGENT GUIDANCE IN A VIRTUAL UNIVERSITY

Research and implementation of key technologies for smart park construction based on the internet of things and cloud computing 1

Artificial Intelligence Machine learning and Deep Learning: Trends and Tools. Dr. Shaona

COGNITIVE MODEL OF MOBILE ROBOT WORKSPACE

High-Level Programming for Industrial Robotics: using Gestures, Speech and Force Control

A RASPBERRY PI BASED ASSISTIVE AID FOR VISUALLY IMPAIRED USERS

Interior Design with Augmented Reality

Recent Progress on Wearable Augmented Interaction at AIST

Extracting Navigation States from a Hand-Drawn Map

Wavelet-Based Multiresolution Matching for Content-Based Image Retrieval

The Evolution of Artificial Intelligence in Workplaces

Wi-Fi Fingerprinting through Active Learning using Smartphones

Virtual Worlds for the Perception and Control of Self-Driving Vehicles

Spring 2018 CS543 / ECE549 Computer Vision. Course webpage URL:

Cognitive robotics using vision and mapping systems with Soar

Privacy Preserving, Standard- Based Wellness and Activity Data Modelling & Management within Smart Homes

fast blur removal for wearable QR code scanners

May Edited by: Roemi E. Fernández Héctor Montes

DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION

Advanced Techniques for Mobile Robotics Location-Based Activity Recognition

Forget Luminance Conversion and Do Something Better

Wearable Sensing for Understanding, Forecasting and Assisting Human Activity. Kris Kitani Assistant Research Professor Carnegie Mellon University

List of Publications for Thesis

Esri UC 2014 Technical Workshop

Wireless Powered Chess: - A Review

Computer Vision Based Real-Time Stairs And Door Detection For Indoor Navigation Of Visually Impaired People

Research on Hand Gesture Recognition Using Convolutional Neural Network

BORG. The team of the University of Groningen Team Description Paper

Concept of the application supporting blind and visually impaired people in public transport

Autonomous Localization

Gesture Recognition with Real World Environment using Kinect: A Review

ArcGIS Runtime: Analysis. Lucas Danzinger Mark Baird Mike Branscomb

Driving Using End-to-End Deep Learning

Formation and Cooperation for SWARMed Intelligent Robots

Autocomplete Sketch Tool

Robotics Enabling Autonomy in Challenging Environments

Multi-sensory Tracking of Elders in Outdoor Environments on Ambient Assisted Living

Event-based Algorithms for Robust and High-speed Robotics

Smart Navigation System for Visually Impaired Person

Attention-based Multi-Encoder-Decoder Recurrent Neural Networks

Kinect Interface for UC-win/Road: Application to Tele-operation of Small Robots

Music Recommendation using Recurrent Neural Networks

Recognizing Gestures on Projected Button Widgets with an RGB-D Camera Using a CNN

Transcription:

1 Mobile Cognitive Indoor Assistive Navigation for the Visually Impaired Bing Li 1, Manjekar Budhai 2, Bowen Xiao 3, Liang Yang 1, Jizhong Xiao 1 1 Department of Electrical Engineering, The City College, City University of New York, New York, USA 2 Department of Computer Engineering, The City College, City University of New York, New York, USA 3 Bergen County Academies, New Jersey, USA bli@ccny.cuny.edu Contact author mbudhai000@citymail.cuny.edu bowxia@bergen.org lyang1@ccny.cuny.edu jxiao@ccny.cuny.edu Presenting author Introduction Indoor assistive navigation system plays an essential role for the independently mobility in unfamiliar environment for the Blind & Visually Impaired (BVI). It has been researched extensively in recent years with the fast evolution of mobile technologies, from applying robotics simultaneous localization and mapping (SLAM) approach, deploying infrastructure sensors to integrating GIS indoor map databases. Although these researches provide useful prototypes to help BVI with independently traveling, cognitive assistant is still in early stage in the era of deep learning for computer vision. The goal of our project to develop an intelligent assistive navigation system with cognitive perception solution to help BVI in indoor environment based on the cutting-edge deep learning computing approaches. Using our ISANA App, The user will be able to navigate his/her-self to a specified destination, to query the spatial-context information of the environment, to be aware of moving object in front him/her (moving people is evaluated in this paper), and to understand what s going on there? or what s the scene in front of me?, which is called Scene Tell in this research. System Architecture Based on our previous researches on indoor assistive navigation (Li et al.), first indoor semantic map database is built to model the environment spatial context-aware information and perform way-point navigation based on Google Tango visual positioning service (VPS), then a TinyYOLO (You-only-look-once) convolutional neural network (CNN) model is trained on the Cloud Server and is applied on the Tango Android phone for real-time moving person recognition and tracking, and finally scene understanding using CNN and long short-term memory (LSTM) network is performed on the Cloud Server. The architecture of the system is illustrated in Fig. 1. Semantic Maps We developed an Indoor Maps Editor to parse the architectural CAD drawings and extract the spatial geometric information of the environment. Then the indoor geographic information is encoded in SQLite database for ISANA, such as walls, room text labels and doors as illustrated in Fig. 2. Based on the parsed geographic layers, we further retrieve the occupancy grip map and topological connections between room labels and doors (Li et al.). We call it Semantic Map in our research, and it includes necessary information to support BIV navigation and location-based services functionalities. We extending our Semantic Map SQLite tables with hybrid raster model (as shown in Tab. 2) and vector symbolic model. The raster model is used to support metric-level path planning, sensor perception updating and semantic alignment between raster map with VPS; and the vector symbolic model provides high-level semantic topological connections between semantic landmarks and multiple floor within a building.

2 Location (Where am I?) Navigation Cognitive Visual Perception Multi-model (speech-audio & vibration) HMI RGB RGB-D CNN Network CNN + LSTM Network Location Awareness Path Guidance via Waypoints Cognitive Object Tracking Cognitive Scene Tell User Profile Semantic Localization Fisheye + IMU Visual Positioning Service (Tango) Map Query (Context-Awareness) Semantic Maps CAD Model Figure 1: System architecture Figure 2: Retrieved geographic layers from architectural CAD drawings

3 Table 1: Indoor maps database table for grid map Field Name Field Type Comment id INT EGER primary key Auto increment, not null addressid INT EGER Index to map address submapid INT EGER Sub-map id within the floor rotation REAL Alignment rotation angle trans x REAL Alignment translation x trans y REAL Alignment translation y img BLOB Occupancy raster grid map Cogntive Object Tracking In this paper, we select the detection and tracking of people as the problem to verify the real-time cognitive perception and tracking capability on a mobile phone during the assitive navigation. The TinyYOLO CNN model (Redmon and Farhadi) is fine-tuned and applied using Darknet (Redmon) for this research on the mobile phone and runs along with BVI assisive navigation functionalities. The HollywoodHeads person head region labeled dataset is selected for our training and evaluation. It includes 369, 846 human heads annotated in 224, 740 movie frames (Vu, Osokin, and Laptev). To train the customized detection model, we fine-tune the TinyYOLO network for the category of head region and train the model in Cloud Server. The real-time object CNN detection and tracking on Tango phone is integrated into ISANA. During the navigating procedure, the App will read the image data from Tango API and perform multi-person detection at the rate of around 1HZ. A Multi-Box tracker is implemented for smooth tracking at the camera frame rate. Fig. 3 shows the ISANA App screenshot with multi-person tracking during the navigation traveling, and the evaluation demos of ISANA system can be seem at demo video 1, and with real-time cognitive people tracking demo video 2. Cognitive Scene Tell One of the most challenge and promising task for the assistive navigation is to provide a complete sense of field for visually impaired people. Recent techniques, CNN and RNN, enable us to do basic image caption for almost all life scenarios, which are originally proposed based on translation model. Base on the CNN, we propose to utilize the CNN+LSTM visual caption techniques (Fang et al.xu et al.) for scene understanding, that is, the capability to tell BVI what s going on in front?. The CNN+LSTM framework deploys CNN model for feature detection as input for a recurrent neural network (RNN) based translator, then the decoder generates the caption based on the image features. Furthermore, we propose a 3D annotated caption on top of the RGB scene captioning for spatial relationship. By implement it on the Cloud Server, during the navigation the user can interact with ISANA to perform a scene-tell understanding for the view in front of him/her. The CNN+LSTM network is implemented using Tensorflow based on VGGNet recognition model. The detection results of the testbed building for the scene understanding and tell are as shown in Fig. 4. 1 http://tinyurl.com/ccnyisana 2 https://youtu.be/eb Yxr93Tmc

4 Semantic maps Destination candidates Navigation path Current pose Current guidance direction Tracked people Detection confidence Figure 3: ISANA App screenshoot: real-time cognitive detection for moving people, and tracking which is indicated by different colors a man sitting at a table with a laptop a living room with a couch and a table a room with a bed and a window a living room filled with furniture and chairs a living room with a couch and a television a black and white fire hydrant sitting in front of a building a wooden bench sitting in front of a tree Accurate scene tell a park bench in front of a building Minor error a bathroom with a sink and a sink Big error a person standing in front of a refrigerator a bird sitting in front of a window a bathroom with a sink and a sink Unrelated to the scene Figure 4: Automatically generated scene description from CNN+LSTM network for our testbed building scenes

5 Works Cited Fang, Hao, et al. From captions to visual concepts and back. Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. 1473 1482. Print. Li, Bing, et al. ISANA: wearable context-aware indoor assistive navigation with obstacle avoidance for the blind. European Conference on Computer Vision. Springer, 2016. 448 462. Print. Redmon, Joseph and Ali Farhadi. YOLO9000: better, faster, stronger. Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on. IEEE, 2017. Print. Vu, Tuan-Hung, Anton Osokin, and Ivan Laptev. Context-aware CNNs for person head detection. International Conference on Computer Vision (ICCV). 2015. Print. Xu, Kelvin, et al. Show, attend and tell: Neural image caption generation with visual attention. International Conference on Machine Learning. 2015. 2048 2057. Print.