Persistent Autonomous Exploration, Mapping and Localization. Roxana Mata

Size: px

Start display at page:

Download "Persistent Autonomous Exploration, Mapping and Localization. Roxana Mata"

Nickolas Green
5 years ago
Views:

1 Persistent Autonomous Exploration, Mapping and Localization by Roxana Mata S.B. Massachusetts Institute of Technology (2015) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2017 c Massachusetts Institute of Technology All rights reserved. Author Department of Electrical Engineering and Computer Science Feb 3, 2017 Certified by John Leonard Samuel C. Collins Professor of Mechanical and Ocean Engineering Thesis Supervisor Accepted by Christopher J. Terman Professor of Electrical Engineering and Computer Science Chairman, Masters of Engineering Thesis Committee

2 2

3 Persistent Autonomous Exploration, Mapping and Localization by Roxana Mata Submitted to the Department of Electrical Engineering and Computer Science on Feb 3, 2017, in partial fulfillment of the requirements for the degree of Master of Engineering Abstract In this thesis, we investigate methods for exploration, persistent autonomy, and simultaneous localization and mapping tasks for an autonomous mobile robot with battery constraints. First, we present modifications to baseline frontier exploration on an occupancy grid that makes the robot s frontier exploration more efficient. Second, we describe the new software structure and recovery behavior for an autonomous robot to navigate to its dock despite errors of uncertainty in its map. Third, we implemented a landmark-based topological mapping method using a state-of-the-art toolbox that maps the environment using visually unique tags to compare with metric mapping methods. Our analysis shows that the robot explores its environment more efficiently using our method than with previous frontier exploration methods, and that graphbased mapping outperforms metric mapping against ground-truth accuracy tests. Thesis Supervisor: John Leonard Title: Samuel C. Collins Professor of Mechanical and Ocean Engineering 3

4 4

5 Acknowledgments I would like to thank Professor John Leonard, Sudeep Pillai, Dehann Fourie, and everyone in the Marine Robotics Group for the extensive help and encouragement. Also many thanks to the entire robotics community, in particular the Open Source Robotics Foundation for having basically everything ready to play with, and the Georgia Tech Borg Lab for their freely available SLAM implementation, perfect for any lost robot. And last but never least, my family for multiple readings of so many pages and the love and support. 5

6 6

7 Contents 1 Introduction Simultaneous Localization And Mapping Probabilistic formulation of SLAM SLAM algorithms Exploration Mobile robot exploration Long-Term Autonomy Autonomy in the literature Project Overview Thesis Outline Efficient Exploration Problem Statement Occupancy Grid Mapping Frontiers Exploration history Efficient Frontier Exploration Computing the frontier Choosing a goal Exploration constraints Integration of exploration on the TurtleBot

8 3 Long-Term Persistence Problem Statement Related Work High-Level State Machine Docked: Charging behind the scenes Undocking: Creating an initial map Exploring: Efficient map expansion MapHome: Reliably getting home Autodocking: Reliable, autonomous docking Error: When anything goes wrong Mapping Problem Statement SLAM Background The Map Mapping History State of the Art Pose Graph Mapping Coordinate Systems Localization SLAM: Simultaneous GTSAM and isam: Back End Processing Motivation: Bootstrapping SLAM with Vision Vision in the Context of Mapping AprilTags as Features Results Experiment Setup TurtleBot Structure Experiments Exploration Efficiency Reliable Redocking

9 5.2.3 Superior SLAM Discussion and Conclusion Discussion of Results Exploration Long Term Persistence AprilTag Mapping Summary of Contributions Future Work

10 10

11 List of Figures 1-1 An example of an occupancy grid Frontiers on an occupancy grid Occupancy grid: first stage of computing the frontier Nearest frontier exploration, nearest point chosen as goal Largest frontier exploration, random point as goal, without smoothing Largest frontier exploration, nearest point chosen as goal Largest frontier exploration, random point chosen as goal Global vs Local Map TurtleBot State Machine Data association on a factor graph Picture of the real TurtleBot Contours of the frontier, computed from the occupancy grid Landmark maps of GTSAM versus Tango/ground truth for log A Landmark maps of grid mapping versus Tango/ground truth for log B Landmark maps of GTSAM versus Tango/ground truth for log B Landmark maps of grid mapping versus Tango/ground truth for log B Landmark maps of GTSAM versus Tango/ground truth for log C Landmark maps of grid mapping versus Tango/ground truth for log C 93 11

12 5-9 Left: non-optimized pure odometry trajectory of the TurtleBot, with the tag positions also labeled. Right: GTSAM-optimized Trajectory of the TurtleBot, using the same metric data. In the data, the robot began and returned to its dock in front of tag 585. Only the right figure captures the correct return location of the robot at its dock

13 List of Tables 5.1 Exploration efficiency comparison Multiple recharging success rates Mapping results: Euclidean error from ground truth

14 14

15 List of Acronyms SLAM Simultaneous Localization And Mapping RGB-D Red, Green, Blue and Depth IMU Inertial Measurement Unit GPU Graphics Processing Unit ROS Robotics Operating System LIDAR LIght Detection And Ranging IR Infrared Radiation ORB Oriented FAST and Rotated BRIEF FAST Features from Accelerated Segment Test BRIEF Binary Robust Independent Elementary Features 15

16 16

17 Chapter 1 Introduction Mobile robotics serves an increasing range of needs in today s society. A mobile robot is a machine that has the ability to move around in its environment that usually carries out some specific task. This task can be driving around a house to search for a specific object, flying through a city to deliver a package, or swimming in deep and cold waters to survey the ocean floor. These examples in particular are current problems that scientists work on to make autonomous. This means that the robot should be able to carry out its task without human supervision or guidance. Teaching an autonomous mobile robot how to navigate through its environment is one of the largest open problems in research today. One simple answer for the navigation problem is to have the robot move randomly. However, this would be too simple a solution if the robot were to accomplish a complex task. Navigation of the robot determines the motion planning of the robot such that its task can be solved. Navigation in non-trivial environments, where there are obstacles or dangerous areas, require planning in advance, a purpose for which a map is critical. A mobile robot stuck inside a house must understand what a doorway is and where the doorways is in order to travel between rooms. A map provides information about the relative positions of landmark objects and obstacles in the environment; namely it has to be able to describe all the objects relevant to the robot that could possibly be in the environment to the extent that a robot needs in order to navigate. Landmarks are pieces of information in the 17

18 environment that the robot can use to orient itself; obstacles indicate places where the robot cannot go. Besides placing the landmarks and obstacles on a map, the robot needs to figure out where it currently is on its map with high accuracy in order to decide where to go next according to its task; this problem is called localization. While a robot navigates, it uses its map and augments it with new information, localizing within it as it travels. Fortunately, mapping and localization are joint problems that are solved together, and the combined problem is referred to as the Simultaneous Localization And Mapping (SLAM) problem in the literature [2, 7]. Current SLAM protocol is for a robot to move around in its environment, make observations of recognized landmarks, and approximate its own motion, position, and orientation relative to them. The work described in this thesis began with the desire to improve mapping accuracy on a TurtleBot robot [15] using the open source Robotics Operating System (ROS) [29] software framework. The motivating idea was that a robot could use the information about the objects it recognizes to improve the accuracy of its map. Combining object detection with SLAM, the robot could use the position and orientation of detected objects during a long period of data collection to improve its ability to recognize objects with the object classifier, in turn improving the accuracy of the map. Motivated by this idea to improve mapping for mobile robots, a system was built to explore its environment and recharge autonomously without human intervention. The purpose was to collect a long history of robot data to train the object detector and support real-time object classification and mapping. This thesis attempts to address the problems of expanding the map over a long period of time via efficient exploration while under finite battery constraints, and mapping the environment with a state-of-the-art landmark-based SLAM method. The metric by which the TurtleBot system successfully explores is efficiency, defined as the area of the environment explored versus the amount of time spent exploring. Navigation on the TurtleBot is done based on an occupancy grid. This grid contains discrete cells with a labeled state of either occupied, unoccupied, or 18

19 unknown. The portion of the map between unknown and unoccupied space is called the frontier, toward which the robot can go to expand its map. For the TurtleBot to create a large map autonomously, we implemented an exploration algorithm that enables the robot to explore its map efficiently. The original frontier exploration algorithm [42] involved choosing a cell in the vicinity of the nearest frontier to the robot as its navigation goal in order to expand the map. As the robot travels to its goal, it expands the known area of the map, recalculating the frontiers. After reaching its goal, the robot computes another frontier-based goal, and so on. We modified this method to include smoothing and send the robot to the largest frontier segment instead, the idea being that a greater amount of new information would be acquired. Indeed, a comparison of the methods in Table 5.1 shows that this method is more efficient. Another limitation on a real robot is its battery limit. In the case of the Turtle- Bot, the unsupervised autonomy of the TurtleBot was necessary for it to be able to return to its recharging dock despite errors of inconsistency arising from the grid map. To persist with its exploration and re-docking function, we developed a finite state machine to start and stop processes autonomously, enabling the robot to repeatedly dock and undock. The ROS-implemented modules for docking and exploration were unreliable for out purposes, and required a supplementary layer of behavior to ensure reliable docking and recovery from failure as well. The mapping result from the occupancy grid proved to be too unreliable for the TurtleBot s future navigation, as evidenced by the problems with the necessary required recovery behavior. While exploration is well-defined on an occupancy grid, the mapping method s accuracy can be quite poor and scales poorly to higher resolution maps. There are many different SLAM solutions meant to handle uncertainty in measurements by representing locations as nodes in a graph with a certain distribution [36]. A graph structure was implemented in order to make use of the TurtleBot testbed, and create a map of recognizable features in the environment. The accuracy of the map of landmark positions created using a graphical map was compared with a ground truth dataset and the occupancy grid map, and showed that on average the 19

20 graphical method performs much better. 1.1 Simultaneous Localization And Mapping The localization and mapping problems combined have the acronym SLAM and can be solved together. One important process of SLAM is a loop closure in which the robot must be able to realize that it is in a place it has been before. Once previous positions are correctly computed, the chain of positions is turned into a loop that better matches the actual shape of the room. Odometry is the robot s belief of how it is moving. Building a map of the environment is difficult without both odometry of the robot and observations of the environment. Observations should be correctly linked to the position from which they are observed to form a full and accurate representation of the environment. Another difficult step in SLAM is data association, in which the robot must realize it is observing landmarks that it has seen before. In the real world, observations of the same object are not easily grouped without some knowledge of where the objects are in the world. Multiple observations of an object may correspond to multiple viewpoints of a single object, or several views of multiple objects. Determining which of these is the case in the robot s current world is a job for the probabilistic processing inherent to graphical mapping, where data is incorporated and filtered to build a probabilistic distribution over all possible cases. The highest weighted case indicates the robot s most likely belief, the most probable state of the world, that is eventually integrated into the map that the robot will use. One may ask why undertaking a complex SLAM method is necessary, given that there exists technology to pinpoint exact global locations with fairly high accuracy - for instance, the Global Positioning System (GPS), which anyone can use to attain around 2 meter accuracy on consumer devices. However, there are many environments where GPS localization is not easily accessible or where the resolution of GPS is especially poor, for instance inside buildings and underwater. In these cases, a SLAM method could provide the map and localization that a robot needs. As such, 20

21 much development is being made for these environments in particular, as well as combining GPS measurements with SLAM methods to form a more accurate map than GPS alone. This thesis, however, focuses on the task of navigating inside a complex indoor environment, for which the robot must be able to detect and map detailed characteristics of the environment (smaller than 2 meters) Probabilistic formulation of SLAM If the robot s measurements had no uncertainty, then the robot s route could be computed exactly. A robot s odometry, the process in which the robot estimates its own motion, influences its ability to localize objects correctly. Without uncertainty, SLAM would be purely a mapping problem, since localization would be finished. Mapping would involve placing the robot s observations relative to the robot s location taken along its route, so observations could be exactly positioned as well. Since all sensors incur some amount of measurement error, current state-of-the-art SLAM algorithms represent the position of the robot and landmarks as a random variable that represents a robot or landmark location. This incorporation of uncertainty in the map structure allows for the robot s belief to change given new information. Whenever the map needs to be built, the robot can compute the most likely current positions of landmarks, obstacles, and itself. Optimization involves finding the values of the probabilistic variables for which the highest likelihood of the received measurements is attained. For SLAM algorithms, the structure of the relations between variables is a Bayesian graphical model. In such a graph, the nodes each represent locations in the environment. The relation between location variables is computed using the values of related observations with a Bayesian inference algorithm. An inference algorithm, such as belief propagation, can be run on a graphical model to find the most likely values for a set of variables [5]. 21

22 1.1.2 SLAM algorithms There are several high-performance SLAM algorithms, each of which require different kinds of sensor measurements to organize in a different structure. In general, the more data that the robot can take from the environment, the richer a map it can build. The task at hand determines which algorithm to use; having sufficient details in the map trades off with the fidelity of the sensors and the computational complexity of the algorithm. For instance, mapping a set of objects on a table for the purpose of object grasping might use high-resolution pointclouds to image the details of objects, requiring sensitive imaging sensors and the extraction and correlation among millions of point features; whereas a computationally and monetarily cheap IR sensor can provide measurements to walls and obstacles good enough for a mobile robot to navigate through a series of rooms. These sensors are typically combined by motion sensors that are used to estimate relative changes in position, for instance inertial measurement units (IMUs) and wheel encoders, that provide the robot with relative changes in position. In general, the most important features for the task at hand must be extracted from the environment to incorporate their information into the map. One example of a structure used for variable optimization is a factor graph, a Bayesian graphical model of variable nodes representing locations of the robot and landmarks in the environment, with factor nodes between every pair of variable nodes [5]. After sensors measure landmarks relative to the position of the robot, the distance between landmark and robot locations is inserted into a factor node between the landmark location measured and the current robot s position. The factors compile the information, and are used in the optimization for computing values of the locations. Dense-mapping algorithms require the measurements of all possible correspondences to insert as features in the robot s map. The map would consist of point locations equivalent to points in the real world. State-of-the-art dense mapping [40] uses both visual and depth data to estimate odometry and more accurately color and remove visual artifacts from a resulting 3D model of the scanned area, which could 22

23 serve as an accurate map. However, this algorithm s uses are limited by the requirements for GPU computing power - mobile robots would require advanced processing hardware to produce a high resolution map from the tens of millions of measurements computed from a point measurement sensor, and further processing to correlate its relative position to each of these. A dense map might also not be necessary depending on the task, so there is no need to waste computing power. Using more sparse features of the environment, such as objects, for instance a method like SLAM++ [33] could bypass the computations involved in matching large sets of image features. The main ability that these methods require is a method to accurately localize given a view of a recognizable object. The method that the authors use in SLAM++ is to pre-compute object scans into meshes for efficient detection in real time [33]. The object detector depends on this pre-processed data to accurately estimate object locations relative to the camera. Finally, the detected objects and positional constraints are added as nodes and edges into a graph representing the positions of all objects in the environment as well as the robot. Object-level mapping offers the most concise graph for a graphically represented map as there are considerably fewer relations between objects in the world, each of which convey more relations in the context of the map. This method of mapping also has the highest location accuracy among state of the art algorithms so far. The object detector is also more robust to noise than a visual feature detector, leading to fewer errors in relating locations of important objects in the map. However, the pre-processing necessary for this method is computationally expensive in increasingly complex environments and objects. Finally, feature-based algorithms are a middle ground between dense and sparse detection, which associate visual features extracted from monocular camera images to build a map of these features located in the environment. One recent high-performing example is ORB-SLAM [30]. It requires simple feature recognition via ORB visual features, for instance corners, and an efficient mechanism to prune the memory structure of the map. The algorithm relies on a particularly quick method for recognizing features. Oriented FAST and Rotated BRIEF (ORB) features can be efficiently matched 23

24 throughout a series of image frames [32] more quickly than traditional scale-invariant feature detection mechanisms [23]. Feature-based pose graph mapping methods must also incorporate a method to extract positions in the robot s position history, called keyframes. Keeping track of all possible positions of the robot as well as the observations made from each position is impractical as there are too many observations to compute relative positions to and too many nodes to maintain the distributions of in a graphical map. One large drawback of these methods, therefore, is that they require complicated schemes of choosing which keyframes to keep in order to maintain good localization performance. After this culling, they rely on error-prone feature detection provided by motion estimation. In this thesis, a comparison was made between the accuracy of landmark positions using a mapping method with location optimization and a method without optimization. A flexible framework called GTSAM [6] was used for relating distinctly labeled and easy to observe AprilTag landmarks [27] with the TurtleBot s odometry in a graphical model. The accuracy of this method in measuring the AprilTag features was compared with the features as measured by the robot running a metric mapping algorithm, which does not attempt to optimize or re-compute the locations of the observed AprilTags. Results are in chapter 5. Finally, the accuracy of both of these methods was compared against a ground truth dataset, created by a Tango device with more reliable odometry than the robot. 1.2 Exploration Although graphical models of objects in the environment are built to incorporate uncertainty, this representation is often mapped onto a discrete domain for navigation and other uses. Every time the graphical map is altered, a new discrete grid map could be created anew and the robot s control values modified. The exploration algorithm developed in this thesis was based on a rectangular grid cut into discrete cells, with each cell having an integer value of either unknown (-1) or a value related to its likelihood of being occupied, between 0 and 100 from unoccupied to occupied, 24

25 Figure 1-1: An example of an occupancy grid, as created by ROS gmapping module [10]. The gray cells indicate unknown area, where the black and white indicate occupied and unoccupied cells, respectively. This map has a 0.025m 2 per cell resolution. respectively. The method of exploration developed was to cover as much new area as possible in the shortest amount of time. The exploration algorithm described in Chapter 2 involves the repeated computation of frontiers, a method pioneered by Yamauchi in [42]. Frontiers comprise detached sections of the boundary between robot-accessible and unknown-occupancy cells on the map in order to determine where to direct the robot next. The direction of the robot was provided by goals chosen from regions surrounding the frontier, either by computing the center of mass of the frontier cells, choosing the point nearest to the robot, or by choosing a random point. 25

26 1.2.1 Mobile robot exploration Frontier exploration on a grid map is defined as follows. The frontier is computed as the region on the boundary between uncharted territory and known empty cells. In basic frontier exploration [42], the goal is to push the robot to the nearest frontier to keep it exploring. This algorithm does not take into account the utility of collecting the nearest frontier data, so there is no guarantee that other frontiers will not yield more information about the environment. Yamauchi s work is among the most widely used grid-based autonomous exploration algorithms for mobile robots. An exploring robot should also balance local goals with global goals so that it does not waste time exploring uninteresting parts of the map. A certain area of the environment might be worth exploring more than another area, balancing learning about the details of a certain area with exploring the entire map. Keeping track of global goals outside of the robot s close range is also an important factor in navigating unknown territory to increase data throughput and reduce the chance of getting lost. Autonomous feature exploration establishes directions for the robot to drive to investigate known features as much as search for new ones [26]. It also creates a hierarchy for exploration, looking first for new portions of the map to travel to, and then looking to discover finer-grained details in an already known portion of the map. This would balance the global goal of traversing the environment with the local goals of exhausting all the unexplored space in the vicinity of the robot. The authors in [26] describe an exploration algorithm based on creating a graphical map, allowing the importance of features to guide its creation. This is a different metric of exploration from guiding the map by growing the area by the frontier, as there might be a high density of features in one direction worth exploring. Thus this algorithm might persist exploring a small amount of area rather than expand its map. In this thesis, there are constraints on the robot battery that keep it coming back to a certain location to autonomously charge. The aim of improving frontier exploration was to improve the robot s map expansion over a shorter period of time. A more efficient frontier exploration was implemented by modifying the metric by 26

27 which the next frontier was chosen by the robot. Namely, the robot would choose the largest frontier to travel to instead of the nearest. The results are presented in Chapter 5. To overcome the navigational difficulties of small obstacles in the planned path caused by sensor noise, a random point on the largest frontier was chosen as the direct destination of the robot, to prevent invalid paths and dangerous obstacle collision. 1.3 Long-Term Autonomy The larger goal for the robot was to use large amounts of data, so we built a mechanism for the robot to efficiently explore its environment. The ability of the robot to continue under localization and navigation failures stemming from a suboptimal map was critical to this task. The already existing implementations for localization and navigation were unreliable, but rather than re-implement the entire modules, safeguards like memory clearing and repeated task cycling were integrated as workarounds. False information about the environment often does become part of the map, impacting navigation and forcing sudden obstacles to be immediately handled. Some machines might have the ability to allow a human to intervene, supervise, or take over functionality for a while; a good example of this would be semi-autonomous driving functions such as cruise control and parking, functions that cannot be relied upon for all cases, but in case they go wrong, a human is there to prevent an accident [21]. Fully autonomous mobile robots cannot depend on human intervention to correct inaccurate information, either because their environments are perilous enough for humans to prevent it, like deep-sea autonomous robots or mining, or no one wants to maintain it, like in many indoor experiments. There are two methods of dealing with map errors in particular: (1) to ignore the failure, and rely on reactive behavior to continue with the task, or (2) to attempt to correct the failure by repealing and replacing offending information from the map. 27

28 1.3.1 Autonomy in the literature Handling change in the environment is one of the most important aspects of robotic mapping in a real environment, because most real environments are dynamic, where objects move around in the environment. The robot could treat a change in the map as a feature rather than a failure of the sensors that needs to be corrected; the difference between an error and a change might not need to be distinguished as long as the most recent change in the environment is reflected in the map. There are methods to modify the SLAM position and orientation (pose) graph after sufficient measurements of the environment indicate a change occurred. Particularly, in [39], the authors introduce a modifiable pose graph representation of the environment called a Dynamic Pose Graph (DPG-SLAM) for running SLAM. In this method, once a set of observations from the past is perceived to be incompatible with current observations, previous observations and their corresponding variables are simply removed from the graph. This removes conflicting variables that could cause incorrect mapping results. However, incorrect removals could permanently affect the structure of the map. If obstacles are erased from the robot s representation, it might have an incorrect map, putting obstacles in positions where there shouldn t be any, or removing obstacles that are still there. Using this new data structure could be helpful in future experiments running long-term SLAM, but it needs an additional structure to quantify errors when detecting change in the environment, since not all changes are permanent. Another way of approaching the problem is to instead treat changes in the environment not as errors but as recurring events. In [22], the authors built a mobile robot to regularly map a dynamic environment on an occupancy grid map. On this map, the state of the cell - occupied or unoccupied - is represented by a function of time over which the Fourier transform could be computed. The idea is to consider the changes of cell occupations in a different representation entirely over the course of several timescales. However, this approach might not easily be converted to graphical representations of a map, where there could be an infinitely many time-varying positions to accurately describe with this representation. Moreover, the experiments were 28

29 carried out on a semi-autonomous robot that required occasional intervention; thus, in [22], the robot s use of this map was limited by the robot s autonomy abilities. 1.4 Project Overview A more efficient frontier exploration algorithm for a mobile robot was designed first for a simulated version of the TurtleBot. Given no prior knowledge about its environment, the robot had to navigate to goals and drive in its environment in order to expand the known portion of its map. The map on which intermediate exploring goals were computed was a discrete occupancy grid map created from scan matching. The TurtleBot [15] robot and its ROS-based software [29], simulated in Gazebo [20], had a 2D scanner, single camera and depth sensors simulated with high fidelity, with practically no errors. While in simulation, different navigational components of the open source ROS software had to be integrated to work in conjunction with the exploration algorithm, much like the integration of the system in real life. With a satisfactory performance of exploration in simulation, the method was ready to be tested on an actual TurtleBot in a real, cluttered lab environment. Working on a real robot introduced many problems with the hardware and sensors, and the integration was not exactly the same. Objects measured by the sensors in simulation were detected entirely correctly except for artificial sensor noise, while in real life, many more errors arose from incorrect sensor positioning, adjusted by extrinsic calibration, or internal sensor bias. Thus, the performance of the sensors in the real world was a surprise and the exploration algorithm had to deal with significantly worse mapping performance. As a consequence, we applied smoothing to the map before a goal was selected for the robot. This prevented the goal from being an inaccessible place. After demonstrating reliable autonomous exploration behavior, the TurtleBot was programmed to autonomously collect large amounts of data. To do so, it became necessary to build a robust system that could autonomously switch between docking and exploring. Much of this work involved understanding the robot behavior as 29

30 programmed and how to deal with potential errors. Moreover, having the robot far from its dock for a long period of time gives it the chance to lose its own position in the map. This required methods to handle potential errors in its map in order to correctly navigate home when necessary, and also motivated the need for a better mapping mechanism. The mapping abilities of the robot began with ROS gmapping module [10, 29] and were extended once charging became reliable by using the more current SLAM method of GTSAM [6]. The occupancy grid as a mapping method does not support maintaining large uncertainty resulting from a growing error in the robot s localization estimate; the discrete occupation status must be chosen. As the robot moves, sensor noise compounds, increasing the estimate error over time. To combat uncertainty by maintaining and converging position estimates, a graphical representation for the map can be used, based on uniquely identifiable landmarks for the robot to localize. Finally, the exploration method was progressively improved on the real robot to increase efficiency of space visited per charge. The robot s battery did not last long enough for it to explore the entire lab space on a single charge, so on a single charge its goal was always to increase its known mapped area. 1.5 Thesis Outline In Chapter 2, we will focus on improvements to the method of frontier exploration that make the TurtleBot more efficient. In Chapter 3, we describe the underlying structure of the TurtleBot software and recovery behavior that enabled it to continue exploring and docking despite mapping errors. In Chapter 4, we review the description and implementation of a current method for landmark-based mapping. The results of efficiency experiments, reliability experiments, and mapping experiments are described in detail in Chapter 5, and the discussion of the results, contributions, and possible future work is in Chapter 6. 30

31 Chapter 2 Efficient Exploration An autonomous mobile robot can have surprisingly complex tasks despite limiting constraints. A mobile robot placed in unknown and unfamiliar territory faces the greatest constraint of all: not knowing where it is in the world. This problem can be solved with exploration, where from some unknown starting point, the robot travels, collecting data to add to its map. This way a robot will come to understand its environment and have a map with which it can complete further tasks. A further constraint on autonomous mobile robots is their battery life. Without a constant source of energy, the robot cannot recharge its battery at will. Outdoor daytraveling robots, for instance NASA s Curiosity rover on Mars, have solar-powered mechanisms that can autonomously recharge the battery, but most indoor robots have no such system. There is no method viable to prolong the battery life of the robot other than a localized plugging in. This problem is considered with the introduction of efficiency in the methods by which it carries out its tasks, defined by the robot s performance of its high-level task over a period of time. Efficient exploration for an indoor robot was implemented to improve the mapping ability of the robot on limited battery life. This section describes the situation of the TurtleBot indoor mobile robot and its method of autonomous exploration, as well as successive improvements to the method to make it more efficient. 31

32 2.1 Problem Statement The exploration problem for a robot is the problem of devising a method by which the robot chooses its goals and updates its trajectories autonomously. The TurtleBot [15] is a made-for-research mobile, indoor robot with many sensors and finite battery life. Specifically, in order to conduct mapping and exploration it has a 2D RoboPeak A1 LIDAR [34], an RGB-D color and depth camera. Its onboard processing was an Acer netbook with 4GB of memory. The mapping conducted through exploration was occupancy grid mapping, which used LIDAR and depth data. This type of map is 2D, and contains discrete cells that have a discrete level of occupancy from among occupied, unoccupied, or unknown. For the robot to construct a map at all, it had to explore or its map would be very small and very local. The robot s goal was to produce as large a map as possible on a single battery charge, so efficient exploration was desired. A discrete representation for the world such as a grid map was used to conduct frontier exploration. With a robot with limited battery, the goal was to construct an accurate map, and the problem is to improve the area-per-time efficiency of this type of exploration. 2.2 Occupancy Grid Mapping An occupancy grid is a 2-dimensional map representation consisting of discrete cells, where each cell can take one of three values: occupied (1), unoccupied (0), or unknown ( 1). The resolution of this map relates the size of it to the area it represents; thus a single cell s value corresponds to each square proportional area in space having that occupation value [36]. The robot s location and landmark locations would correspond to single cells. An observation of an obstacle at a certain distance from the robot would mark the corresponding cell as occupied. Before uncertainty enters the picture, the cell s occupation status could change with the next reading; every cell s value is the last known value. 32

33 Particle filter methods such as [13] introduce uncertainty to a discrete representation. There are a finite number of weighted particles, pieces of information that indicate a potential trajectory of the robot and that contain a separate map, that maintain the current information of the map as a distribution. This type of mapping relies on accurate sensor information and reliable scan matching. As the robot moves, it samples the particles according to their weights, projecting its old map to the future with newly obtained information. At any moment in time, the metric grid map can be computed by projecting the most likely obtained information from each particle and placing it on a grid. The resulting map is a discrete set of cells, with some probability of obstacle occupation in each cell. Thus, values in a cell of a particle-filtered occupancy grid range from 1 (unknown occupation) to 0 (unoccupied, free cell) to (unlikely to be occupied-likely occupied). The type of particle filter used in ROS gmapping module [10] is a Rao-Blackwellized Particle Filter. This type of filter in particular maintains a compact representation and does not require heavy-duty feature recognition or other processes on a separate layer to build a map [13]. However, any errors made by the sensors tend to persist after being incorporated in the particle sample and confuse the robot if they enter the map. An error might present itself as an incorrect modification of the map based on erroneous sensor or motion models [36]. If the likelihood of incorrect information remains too high despite a few passes of particle sampling, the mistake cannot be removed and the robot s localization suffers. However, discrete grid mapping has an intuitive simplicity. Path planning and feasibility, where the robot determines that a path between two points is possible, is simpler to confirm on a discrete map, as obstacles either occupy or do not occupy a certain cell. For instance, a path planner such as Dijkstra [4] requires discrete representations for positions as well as an inventory of occupied and unoccupied space to determine where the robot can or cannot go. On a graphical map, instead of grid cells there are features or objects, and the distinction among occupied and unoccupied continuous space is not so clear. Often, a graphical map is projected to a metric map for these purposes. Keeping the robot on a 2D grid was a straightforward 33

34 solution to the navigation and exploration problem. 2.3 Frontiers Using the built-in ROS package gmapping [10] for grid mapping started the project more quickly, which is why exploration remained based on a discrete metric map. There are significant disadvantages to a discrete map. In particular, the resolution must be low so that computation in the map is fast enough for reactive navigation. However, having a higher resolution of the map might be required for more complicated tasks in complex and high-detail environments; simply increasing the resolution of the map while using a discrete map is an incomplete and imperfect solution. The efficient exploration algorithm discussed in section 2.4 addresses the need to circumvent occupancy grid mapping issues. As an overview, the exploration strategy that we used is as follows: Compute the frontier from the current occupancy grid. Send the robot to a point determined in various ways related to the best frontier. Continue repeating the above cycle of steps either until the map is fully explored (as in, there are no more frontiers left), or the robot must complete other tasks Exploration history The first step in an exploration algorithm is finding a suitable path for the robot to grow the known territory of the map. Yamauchi enabled an autonomous robot to explore the unknown territory of its environment using frontiers [42]. The frontier of an occupancy grid is defined as the boundary between explored, known-occupancy grid cells and unknown cells. The frontier is typically split into multiple smaller frontier segments due to obstacles in the environment creating occupied space; this leads to a decision problem for a robot faced with multiple frontiers. Namely, among the frontiers to which it can travel to explore, which frontier is best? An example of 34

35 the frontiers in a sample occupancy grid is illustrated in Figure 2-1, where the area of the yellow ellipses correspond to the relative sizes of frontiers. The boundary between black occupied cells and unknown area is not considered a frontier, since the robot cannot expand its map by going there. The problem of wasting time and resources - namely, inefficiency - is never addressed. In [42], for each frontier segment in the current occupancy grid, the robot computes a regional point that is accessible to the robot. From there, the best path for the robot to take is the shortest path to the nearest frontier region point. This method does not ensure that the robot sees the most new area in a certain period of time. Moreover, in the paper and online examples for the baseline frontier exploration work in [42], the number of frontiers seems to continue to increase. The sizes of the frontiers are not thresholded to remove extraneous computations of the frontier region, so many frontiers are computed. Over time, these frontiers can get very small and often are not places that yield new information for the map to get traveled to. For instance, if a robot cannot see into the empty corner of a room closest to it, with the nearest frontier region algorithm it will still consider moving to the corner instead of considering longer, yet more fruitful points of exploration. After choosing which frontier is the best to travel to and explore, the robot sets the goal of navigating to that frontier and takes in new occupancy information there and along the way. Grid-based exploration such as the frontier algorithms outlined above quantifies the advantages for each possible move the robot can make in terms of the number of cells that could become known. However, the core idea of the frontier was defined on a grid representation of a map, in discrete coordinates only refined by the resolution of the map. This representation for the frontier is limiting, as stateof-the-art mapping techniques produce a graphical map [36] that relates locations of features in the environment. In a map consisting purely of features, it is difficult to define the boundary of unknown and known data, and often even occupational data; any unknown data is simply not stored in a feature-based map representation. The next development in exploration sprung from the widespread use of posegraph SLAM. In the case of frontier exploration, the frontier s quantification on a 35

36 Figure 2-1: An example of the occupancy grid, with the frontiers highlighted. The yellow circles indicate areas where the unknown portion of the map, in gray, abut the accessible, unoccupied portions of the map, in white. 36

37 metric grid constrained it to applying directly to a grid representation of the world. Given a highly discretized world, such as a grid with finer and finer resolution resulting in exponentially more cells, computation of the entire frontier becomes an accordingly exponentially large problem. In feature-based exploration the robot considers local exploration goals that put its mapping and feature collection mechanisms at an advantage instead of computing the frontier across the whole map. This saves computations, maintains a flexible representation for the world, and quantifies a metric for successful exploration, namely how many features are collected in the map. In [26], the authors describe a method of planning robot moves that push the robot in the direction of new feature discovery while maintaining sight of known features. This algorithm improves the robotic exploration system in several ways. Establishing a new goal is achieved by determining the close-by features that can best augment the robot s map. The features that successfully augment the map depend on the task of the robot. If the robot s task is exploration, it may seek to find new features and avoid known features. If it is mapping, it may seek to revisit features in order to close loops. Unless a certain area offers several points of interest for the robot and increases the utility of traveling to that area by offering new information, the robot will not go there. This is divergent from the core idea of frontier exploration, where the entire global map is processed to produce swaths of interesting area. However, this feature discovery method introduces the need for global and local goal balancing; the utility that traveling to a certain area presents is often composed of multiple aspects of the robot s function, namely in [26], increasing the quantity of explored features and the desire to find new ones. 2.4 Efficient Frontier Exploration The robot s exploration strategy is described in detail in the sections below. The TurtleBot s built-in controller and path planning mechanisms often constrained the exploration strategies by preventing dangerous situations i.e. the robot driving too close to obstacles. Thus, there were mechanisms in place to make navigation more 37

38 reliable Computing the frontier Frontier segments [42] are a basic starting point for exploration for a mobile robot that travels only within a 2D plane. The frontier of the map is the boundary between unoccupied (termed accessible in [42]) and unknown space. Frontier Segments There are multiple segments of the frontier, interspersed among obstacles beyond which the robot s sensors cannot see. To find the individual cells that composed the frontier segments, a threshold was applied to the occupancy grid, creating a binary image of occupation, white where the space is unoccupied, and black where it is occupied. Another mask was taken to compute the frontier, by taking only the unoccupied cells adjacent to unknown cells to remove any occupied cells from the potential goal cells. This resulted in a map like Figure 2-2, which displays accessible areas and occupied points on the robot s map approximately at the time it starts exploring. Result: List of top goals {g k } given a new occupancy grid M Compute binary occupation masks u, o, a (unknown, occupied, unoccupied); Dilate occupied mask o ; Compute the distance transform of u AND o to get d; Compute the set of equal-valued contours C = {c i } in d; for c i C do if size(c) threshold then skip c; else take goal g k from c i ; end end Algorithm 1: Pseudocode for computation of the best goals for the robot to drive to, derived from frontier segments. The binary occupation masks are adjusted with dilation and a distance transform. A set of contours is derived and invalid goal cells are removed from the contours to ensure the goal is accessible. The top goals of the largest frontiers are presented to the algorithm for choosing. 38

39 Figure 2-2: This occupancy grid is the first stage of computing the frontier, without the smoothing steps. The red cells indicate occupied points. This is an example of the robot s starting map. 39

40 The next step of smoothing the frontier was required to solve the limitations of the sensors of the robot. The jagged corners problem was solved by dilating the frontier and taking a distance transform from the unoccupied cells of the map. The robot s abundant scanning sensors, for instance its LIDAR, are discrete, linear rays passing obstacles with sometimes surprisingly large errors, producing thin spikes of occupation readings of a new area. The measured readings are lines from the robot, creating accessible areas that are concave, meaning that a line between two points on the known map would intersect unknown, and therefore inaccessible, areas. Figures 2-2 and 2-4 show how pointed the map and consequently the frontier segments can be if the frontier is not smoothed. This concave accessible area produces sharp angles in the accessible portion of the map, leading to a path surrounded by obstacles, often causing infeasibility. If the scan readings are sparse enough, then the robot would interpret the readings as frontiers and attempt to drive there. However, it might not be able to fit on a path established between sparse few readings, as its body cannot fit through the few spaces marked as unoccupied by the scan. The unknown space around it often prevents the robot from traveling within the free ray, which is the reason behind actively smoothing the map to create frontier segments. Any goal chosen inside the frontier would not be in danger of invalidating the path. Weighted Frontier Segments Every frontier section on the map has an associated weight of the number of unoccupied cells adjacent to the frontier where the goal can be placed. This weight indicated the size of the frontier in map cells; if the robot goes where the frontier is largest, it has an opportunity to get the most amount of new information. Pseudocode for computing and smoothing the frontier is in algorithm Choosing a goal After determining a smooth set of frontier segments, the robot must choose where to go in order to learn the newest information. 40

41 Goal Setting The main idea of frontier exploration is to direct the TurtleBot robot to un-mapped areas to determine their occupation status. Thus, at every time the TurtleBot was exploring, it was either computing a new goal to expand its map or moving towards a goal. The goal was a point in the map provided by the exploration mechanism while it was in the exploration mode, as directed by the system-wide state machine (see Chapter 3). As soon as the robot s position was within a certain radius r from the goal, the robot would re-process its current map to compute a new goal. The nearness radius r, which in experiments was about the diameter of the robot, must be small enough that the robot s sensors can make decent headway in processing a new area of the map. Final Goal Selection: Random Sample of Accessible Frontier A point for the goal is sampled out of the accessible space adjacent to the frontier. If the goal point is not on accessible space, the planner can t plan a path there. This is the simplest method of choosing an intermediate goal point for the robot. Several methods of selecting the next goal for the robot were tested. The first method was choosing the nearest point of the nearest frontier, similar to how [42] computed frontier regions. However, because of the artificial obstacle inflation of the robot s local map, this method might place the goal within the free but unsafe cells of the map. The robot s controller prohibits it from veering too close to obstacles, so these paths are usually marked as invalid. The centroid of the set of accessible frontier is also not a guaranteed accessible space for a goal and uninhibited space for a path. Another method of choosing the goal would be the centroid of the frontier - the average location of all the cells in the frontier. However, the robot could then run into the same problem as above, where the goal point is too close to an obstacle or in the middle of unknown space to have a viable path planned to it. In fact, because of 41

42 the concavity of the free space given by sensor readings, it tends to be an occupied or unknown cell. It might also be too computationally expensive to iterate through all the points on the frontier to find a suitable point for the robot to travel to. Finally, taking a random unoccupied point within the frontier can serve to propel the robot towards the largest frontier. If an occupied cell is assigned as the next goal for the robot, the path planner would not be able to find a path to it, as the goal itself was inaccessible. These different methods of choosing the goal are compared and evaluated more fully in the results section with discussion to follow. A visual comparison may be made among Figures 2-3, 2-4, 2-5, and 2-6. The top frontiers are colored differently using each method and shown for a visual comparison. In each figure, the eight most relevant frontiers are shown, as is the robot s position in the map and its chosen goal. The colors indicate unique frontiers, where the larger points indicate a possible goal. The robot is near the center of the figure, a point in bright red not connected to any frontier. The different types of exploration also display the goal that the robot has chosen in each algorithm s case as a bright red dot, whether it is on the nearest frontier or the largest frontier, and whether it was randomly chosen or chosen by distance Exploration constraints First, the robot s path to a specific goal is found by using a Dijkstra path planner as provided by the ROS infrastructure for the TurtleBot [15, 29]. The costs to the goal are computed using a least-squares potential function, iterating outwards from the starting point to construct the graph. The software plans a path by running a Dijkstra lowest-cost path search from the goal location to its current location. After a path is found, the robot s controller takes over. The controller is a Dynamic Window Approach (DWA) planner, incorporating only the costs of a local area surrounding the robot currently (the local map ) into its decisions [29]. This local map is computed directly from the most recent set of observations, artificially inflating the obstacles, as illustrated by the bright colors in Figure

43 Figure 2-3: A map of the most relevant, closest frontiers to the robot, in different colors, computed after processing the occupancy grid. Yamauchi s frontier exploration method for finding the goal was the nearest point on the nearest frontier. The red point indicates the location of the robot, close by, and the green point on the blue frontier indicates the best goal point, which happens to be the nearest point on the nearest frontier. The other dots on the frontiers are random points that might have been chosen as goals using a different method. 43

44 Figure 2-4: The contours in this image are the outlines of the largest frontiers. The robot s chosen goal is the bright green dot at the bottom of the light green contour. There are many points to choose a goal where the robot can get physically stuck in the process of traveling there. 44

45 Figure 2-5: Frontier exploration via the nearest point on the largest frontier. The goal is the closest green point to the red robot point position. 45

46 Figure 2-6: Frontier exploration via a random point on the largest frontier. The goal point is the green point on the large green frontier, and the robot is the red point not on a frontier. 46

47 First, the planner discretely samples the robot s potential control space, the x, y- dimension velocities, and the change in yaw θ-dimension as a dx, dy, dθ triple. For every velocity triple, the DWA planner performs a forward simulation with that velocity to find the future predicted position. The future position in the occupancy grid is scored by how close to obstacles it drives, proximity to the goal, and acceptable speed. The lowest-cost trajectory is chosen at each time-step, and the velocity that spawns it is sent to the mobile base. This method is simple and highly reactive. At every point in time, the computation of the control step ensures that the robot is traveling through a valid space. In particular, the path planner will not be able to plan a path between two closely positioned obstacles because the size of the robot is taken into consideration - this is desired behavior, and works well for navigation on the 2D grid map. The circular shape and size restrictions of the TurtleBot come in especially handy here - simply fattening the path and checking that it does not intersect any occupied cells in the grid map is good enough to determine path feasibility. However, this method of planning and control is greedy and does not ensure that the robot is following the optimal path. The occupancy map could incur inflation errors, where it has artificially inflated the occupied regions too far. Its inflexibility in certain situations becomes particularly obvious when considering the path as a whole. Because the rankings of paths are computed incrementally, a sub-optimal path that might temporarily traverse unknown space is ranked extremely low. So because it involves some risk, the ideal path is marked as invalid. The combination of errors becoming permanent in a grid map leads to the exploration requiring some extra thought beyond where to set the goal point, and in general this problem has motivated the community to use a graphical map, as we ll see in Chapter 4. 47

48 Figure 2-7: Local map, in neon colors of yellow, light blue, and pink, is shown overlaid on the occupancy grid. It shows how much room for error the TurtleBot needs to navigate. 48

49 2.5 Integration of exploration on the TurtleBot The above methods describe a different mode of frontier exploration for a batteryconstrained mobile robot. While the map could grow and could allow for interesting analysis of the environment, exploration was built as a service module in ROS to allow for a higher-level task structure to be carried out. The next chapter will describe the underlying system upon which the robot was built. 49

50 50

51 Chapter 3 Long-Term Persistence There are many problems involved in building and ensuring reliable autonomous recharging for a TurtleBot mobile robot, in particular ensuring repeatable autonomous docking behavior and overcoming mapping issues. In this chapter we review the relevant published history of indoor autonomous robots used for mapping and navigation experiments, and solutions we created to overcome reliability issues with the Turtle- Bot. 3.1 Problem Statement Persistence of a mobile robot is defined as the ability of the robot to overcome errors in continuing a task. Persistent Navigation often requires coping with changes in the environment. A robot that is placed in a dynamic environment, with moving objects, must handle mistakes in its own map if it needs it. A robot s navigation is comprised of the motions it must make to achieve its task. Persistence is defined as the ability to continue its task and navigation despite processing and information errors. For example, a robot navigating an office must be able to ignore people walking around it when creating its map. Objects with enough consistent dynamics are not necessary in its representation of the environment. It must also be able to navigate around chairs and desks, whose locations may perturb slightly over the course of a day. A mobile robot with the purpose of carrying out a long-term task requiring success- 51

52 ful navigation must autonomously recharge several times. The TurtleBot s software was built to autonomously back out of its dock, carry out a higher-level task, find its way back to the dock to recharge despite map errors, and then reliably dock again and restart the whole process. The problem discussed in this chapter is the ability of the robot to complete its task, independent of the navigation-dependent high-level efficient exploration and mapping through several recharges. Particularly, the robot had many discrepancies in its map on account of unreliable sensors degrading the metric mapping behavior required for navigation. The goal of the work was to overcome discrepancies autonomously in order to get back home, and reliably repeat this behavior. Robot reliability is not often discussed in the literature. These challenges are often highly specific to the platform, their solutions possibly tailored to a certain robot and its sensors. Uncovering the limitations of current open source software is not glamorous work, and is often solved by just getting better hardware instead of figuring out a software solution. However, understanding the minute details and pitfalls of autonomous behavior is critical when solving the issues that keep it from performing reliably. A solution that drastically improves mapping or task performance despite poor sensing capability is preferable to a solution that requires expensive sensors. The persistence of the robot was important to the goal of this project to build an autonomous data collection system. Such a test-bed would be good to implement new research ideas, at the same time accumulating relevant data for offline testing. The idea of bootstrapping the mapping system with an object detector required an autonomous and continuous data collection system to be fed with at least thousands of training data for the detector, and for the data to be correctly positioned on a map. This chapter outlines the design of the mechanisms that keep the TurtleBot running autonomously over multiple recharges in the face of navigation errors. 52

53 3.2 Related Work Long term mapping and persistence are important abilities for an indoor mobile robot. The bane of robotics is the sporadic failure of hardware. In the TurtleBot s case, many sensors have the appropriate resolution to discern necessary obstacles, but still cannot be absolutely relied upon because of their existing potential to fail. One of the sensors on the TurtleBot used in this thesis, for instance, was the RoboPeak LIDAR, a laser scanner with a 6-meter maximum range and distance resolution of 0.5-millimeters. Alone, it should technically be able to provide the robot with enough obstacle information to navigate. In reality, on an occupancy grid, its performance degrades because of the compounding failures of sensor hardware. Noise in the sensors enters the map as valid data. Hence, the unreliability of sensors is why mapping is an interesting problem despite the existence of high-throughput sensors. Long-term persistence is a desired quality in many works [1, 22, 38]. However, the problem of maintenance and the resistance against hardware failure in particular is never fully addressed in any work. In [22] in particular, logs collected over long periods of time are analyzed by their frequency spectrum to understand the dynamic changes in the environment. Their experiments involved a mobile robot patrolling a specific room and set of locations at different times of day. The authors mention that their mobile robot experiments were done autonomously with supervision, stepping in when the robot announced that it had a problem. They unfortunately do not specify how many times the robot failed, and in the end their experiment did not involve it. Interestingly, they might have mentioned the robot s reporting mechanism because at some point it did fail. Another work with relevant persistent experiments explores the ability of a robot to successful prune and adjust the topological SLAM graph of mobile landmark locations when their observed positions and orientations change [39]. While the mapping technique is run in part on data that they collected, as well as on a separate university patrolling dataset [1], where the robot was occasionally operated remotely when necessary. It is unclear how much of the new data was collected autonomously, or if 53

54 the robot was driven part of the way and supervised. Thus, autonomy is not independently addressed in the literature, although longterm persistence is, for several reasons. First, many autonomy problems might be regarded as inaccurate sensor problems, solved by processing more data and increasing sampling for more accurate readings. Second, there is no clear-cut solution to autonomy. Many methods apply only to a highly specific task (see, for instance, Autonomous Feature Exploration for mapping in [26]), and general techniques need more work. Third, full autonomy isn t always necessary. Many experiments do not require or test full autonomy right away. Full autonomy - where a robot would be able to diagnose itself - has many applications, for instance in a barren environment like space where maintenance is difficult and costly for a human to do, or for highly complex systems like cars where the typical user doesn t know enough to diagnose every problem. 3.3 High-Level State Machine The motivation for the structure of the little TurtleBot s life-cycle was to allow for higher-level experiments to be flexible, repeatable, and easily integrated. The Turtle- Bot s software needed a way to be dictated by the limits of its hardware, but be able to complete all the necessary tasks. A finite state machine was chosen as the software structure, in particular because the data collection has to repeat recharging and exploring cycles, and because the mobile robot can only complete one task at a time. The state machine is shown in diagram 3-1, and its transitions are discussed in each state description Docked: Charging behind the scenes The TurtleBot typically begins in this state. The dock sits perpendicularly against the wall, as in Figure 5-1, with the TurtleBot s base snug against it. It charges the robot with the Kobuki adapter through spring-loaded metal contacts that connect 54

Figure 3-1: The TurtleBot s autonomous state machine diagram. contacts on the underside of its base. Docking precision has to ensure good contact, or the robot s charging rate slows dramatically.

55 Figure 3-1: The TurtleBot s autonomous state machine diagram. contacts on the underside of its base. Docking precision has to ensure good contact, or the robot s charging rate slows dramatically. While docked, the TurtleBot charges. Once the battery charge rate starts to slow and the battery charge turns a certain percentage, the robot begins the exploration cycle, switching to the Undocking state Undocking: Creating an initial map The undocking state is an initial backout and rotation. In this state, the sub-processes that the robot requires to navigate its environment are begun, in particular the grid mapping ROS module. In addition to mapping for navigation, the path planning and landmark detection processes are started. These different processes might initiate superfluous turning in place, over the top of the dock. The undocking process requires a meter radius space in front of the dock, as the robot does not make any obstacle checks before it backs up. The robot rotates in order to accumulate enough data to begin creating a map, as it will grow it when it begins exploring. Once finished with the undocking process, it switches directly into the exploring state. 55

56 3.3.3 Exploring: Efficient map expansion The method of exploration was described in section 2. While in the exploring state, the robot receives regular updates to its map and keeps on hand a potential set of goals. The explorer interacts with the path planner of the robot as a service: once the state machine determines the robot has reached its current destination, it queries the Explorer for a new one in order to further explore the map. These processes run independently, so modifying Explorer in any way does not impact the robot s autonomy. Once the robot s battery hits a minimum, or an exploring session timeout is reached, or, more rarely, the robot has determined it cannot find any more frontier goals for it to navigate to, it switches to the MapHome state MapHome: Reliably getting home In this state, the robot sets a goal for its previously visited home location one meter in front of the dock, a comfortable distance for the robot to begin autodocking. Driving to the dock from anywhere in the map does not happen perfectly, however, because the occupancy grid that navigation depends on could have errors in obstacle alignment. Thus, the robot proceeds cautiously, with the need to single out the errors that may arise in order to overcome them. One specific error occurs when the dock is no longer located at the robot s starting position of (0,0,0), since drift in the odometry of the robot - compounded measurement errors - might have caused obstacles to move to incorrect positions in the map. The TurtleBot s mapping method, gmapping implemented in ROS [10], aligns scans using a particle filtering method. To ensure that the robot will be able to start autodocking from a good position, the AprilTag detection process [27, 41] is turned on. AprilTags are an artificial landmark system that can accurately localize the camera s reference frame despite viewing angle and occlusions [27]. Once the robot has reached its updated goal, it switches its state to AutoDocking. 56

57 3.3.5 Autodocking: Reliable, autonomous docking Once positioned in front of the dock, the TurtleBot uses an open source autodocking module [16] to move forward and align the metal contacts on its underside to the metal contacts of the dock. This method of alignment uses on the dock s 3 IR emitters and the Kobuki base s matching 3 IR receivers. The dock emitters inform the receiver which of the three regions left, central, and right it is receiving from, and if they are near or far distance from the dock as well. The base will turn to a side to receive the center beam of the dock, move perpendicular to it, and then rotate to align its front-facing receiver with the central region of the dock. The autodocking mechanism is susceptible to inaccuracies, however, so a number of additional software catches were implemented to make this system robust. The location from which the robot begins autodocking is of critical importance to the trajectory of the docking. The autodocking code has documented that a two by five meter space directly in front of the dock is sufficient for the Kobuki base s autonomous docking. Our own repeated tests show that this space is sufficient, but not 100% reliable, as shown in the accuracy of docking experiments in section 5. Thus, a unique AprilTag [27] was attached to the wall above the dock for the robot to visually servo to and find its relative position to the dock as necessary. The robot then servos to a point about a meter perpendicular from the dock, with less than a meter in horizontal displacement, tightening the bounds of the area where the robot can better centrally align with the dock. The method of visual servoing is straightforward. Once the robot receives a tag detection, it immediately computes its relative pose. Using its current odometry, it computes the relative location of the tag and creates a new goal relative to the tag. Once the home AprilTag comes into view of the camera, the robot re-localizes using a more accurate estimate of where it is relative to the tag, and sets a new goal for the robot. This re-positioning improves the chance of successful docking in the future. After servoing, it finally runs the autonomous docking process. After an attempt to autodock, the robot might often dock incorrectly under 2 57

58 degrees of rotation, or about 3 millimeters of sideways precision along the dock. If the positioning error is this small, there would be no way to know at what point the TurtleBot would need to give up, backup, and servo to its MapHome location to try again. In addition, the robot would usually be in the AutoDocking state because the robot s battery was running low. Thus, a timeout for docking is set from when the automatic docking client was called to ensure that the robot would reach the dock in a finite amount of time. The robot also backs up once it fails to dock to avoid scratching its underside with the electrical contacts with superfluous rotations over the dock. After successful electrical contact, the Kobuki begins charging, at which point the state machine transitions, returning to the Docked state. 3.4 Error: When anything goes wrong From any other state, when anything goes wrong- the robot is lost or the robot is repeatedly throwing an error- the robot is sent to this state. Most problems that the robot cannot handle on its own stem from its incorrect conclusion that it is positioned on top of an obstacle, no longer able to navigate on its own. Usually when the robot determines that it has driven over an obstacle, its belief cannot be changed right away; only its local map can be cleared. The robot has a global map, built constantly by gmapping [10] while exploring, and a local map that it updates constantly from sensor readings for local planning. The global planner for the robot determines the global path that the robot should take and the local planner modifies the controls of the robot to stay on the global path. Being stuck on an obstacle is the fault of the local map and the artificial object inflation of the path planner. Inflation occurs to prevent the robot from navigating too close to obstacles, but the robot s observation errors compound over time, causing over-inflation of the map. Once the robot hits the error state, it halts all other processes to clear the local map, rebuilds it with fresh sensor readings, and then restarts the other processes, picking up where it left off. 58

59 The robot s state after exiting the Error state is the same as the one it was in before entering. The high level state machine is thus able to continue, providing the TurtleBot with recovery behavior that enables it to persist autonomously for multiple recharges. 59

60 60

61 Chapter 4 Mapping Exploration and long-term persistence of an autonomous mobile robot are the baseline for unsupervised experiments to be done over the course of several days. In particular, the goal was to implement pose graph mapping as a Simultaneous Localization And Mapping (SLAM) solution and analyze its performance over a longer period of time. 4.1 Problem Statement Mapping for a robot is the problem of accurately locating sensor observations relative to the robot s position. Applications of mobile robotics require accurate navigation, for which a map is typically necessary [36]. There are many ways to map an unknown environment using sensor readings. In conjunction with improving the exploration ability and high-level task capability of the TurtleBot, the mapping goal for the robot was to build a graphical map from a pose graph representation. A rectangular, discrete, grid map can be produced from the pose graph by projecting every location onto a 2D grid so that they can be compared. The goal of this section is to compare discrete versus graphically represented maps. The pose graph method GTSAM [6] and the discrete grid mapping methods are implemented for comparison between each other and against a ground truth dataset, created using a Tango visual odometry device, which has similar optical capabilities 61

62 as the TurtleBot, with improved odometry [11]. 4.2 SLAM Background This section provides an overview of the history of robot mapping and current stateof-the-art methods The Map First, mapping depends on what a map generally is for a mobile robot. Most map types are divided into discrete metric and pose graph maps. Metric maps model the world as a high resolution grid of cells in two dimensions [13] or cubes in three dimensions [14]. They are easy to use and understand, particularly in the context of navigation, but they become exponentially larger in space and require more computations as the discretization gets finer. Topological maps contain the relative locations in the environment of important features relevant to the robot s task. These features could be objects, image features like corners and edges of objects, or single specific points. Graphical maps are a sparse representation that can be smoothed and corrected, but have the disadvantage of extra computation required to determine the occupancy of a particular position in the environment. In all types of maps, the data available in the map depends on what information a sensor can produce from the environment. For instance, an underwater vehicle relies on sonar readings much more than the visual aspects of its environment. For different environments, a mobile robot maps with different features. An ideal mapping framework is flexible enough to support any kind of measurement or observation as input to map creation; we discuss the flexiblity of a topological pose graph as a map representation in later sections. Incorporating sensor measurements to produce a single global set of relative locations is a non-trivial task. The main problem is consolidating and quantifying the error of the various sensors producing the map. If the error of the visual and odome- 62

63 try sensors on board a robot is ignored, then computing relative locations of features is straightforward, though it would produce a jumbled map. Imagine a world where a robot places a measurement made by an onboard sensor on a coordinate grid map right away according to its last recorded location measurement. As the robot continues spotting the same feature, it continues to place it on the map. If measurements were perfect, all the measurements of a certain stationary feature would land on a single point of where the feature is really located. However, this is practically never the case. Assuming no errors occur in sensing the environment is a strong assumption for even the most accurate and expensive sensors available. For a mobile application in a finely detailed environment armed with relatively cheap sensors, the experiments in Chapter 5 show that a graphical representation of the environment best suits the purpose of mapping compared to a metric representation Mapping History Grid Map Computations and Drawbacks In a seminal paper published 1986 [3], Smith and Cheeseman introduce probabilistic distributions over positions, measuring relative locations from any reference frame, and computing relative poses of observations and the robot itself through time as it makes discrete motions. The distributions can be related, and their error estimated. The resulting map produced is a set of local reference frames connected by the robot s approximate motion. At one position in space, a robot may make a number of observations, and each can be related to its current position by a relative pose. After making observations, the robot moves, its change in position reflecting an approximate transformation [35]. From its new position, the robot makes more observations. After reaching some final position, the robot is interested in (1) compounding positional transformations in order to determine positional relations among objects in the world as well as itself, as well as the error bound to determine how good its guesses are, and (2) merging multiple observations it has seen along its positions to bound the error in its 63

64 observational and positional beliefs and actually compute final poses. More generally, this was a break from local mapping, where a robot s current sensor range directly augmented its map. In [35], the authors do not specify a particular map representation and instead implicitly discuss a global one, where local relations can be related on a global scale. The occupancy grid presents a traditional metric representation of the world [9] as a finite grid, assigning each location a probability. Each cell in an occupancy grid represents a d d cell in the world, and each cell is related to a binary random variable representing its occupation, where adjacent cells are not independent. Such a grid representation can be seen in Figure 1-1. In general the robot would use its most probable location status as its value to do navigation and other computational tasks on the map. An advantage of this type of grid is that it incorporates some measure of uncertainty in the sensor measurements. In case some set of measurements that came back to the robot were incorrect, the map could remain unaffected and ignore unlikely measurements. This also means that an incorrectly grounded belief is unlikely to change despite many new measurements that say otherwise. As only a small set of cells are dependent among each other, measurements of these cells that register in a different part of the map are not noticed to be unrelated, so the map grows without the ability to fix a string of related errors despite being inconsistent with more current observations. Another disadvantage of the grid method of mapping is that it does not incorporate the robot s uncertainty in localization, which intuitively must affect observations made. If the robot navigates, making observations, but completely loses track of where it is, it cannot relate previous observations to new ones. If the robot is unsure of its position, it must also be unsure of the measurements it made from that position. For this reason, pose graph mapping has risen as the choice method for state-ofthe-art mapping techniques. 64

65 4.2.3 State of the Art Pose Graph Mapping For most graphical maps, robot locations are represented as relevant points with an inherent structure [2]. The most relevant techniques for this thesis use factor graph optimization, outlined in section 4.2.7, although there are methods of mapping that compose a scan to be used as a map. Current dense mapping relies on matching points in the data to points in the real world [40]. Depth data comes out of an RGB-D sensor at a certain resolution; the points in the image can be matched with each other in the world, and directly to colored points in the image, producing a 3D model of the scanned area as well as estimated odometry for the camera in the world. This type of mapping deviates slightly from graphical mapping in that the map produced is more like a scan fused from the data points. The optimization necessary is rather the minimization of error from matching corresponding depth points in the data to the world. The method outlined in [40] in particular requires a Graphics Processing Unit (GPU) to parallelize the processing in real time because the dataset size is immense. Over the course of tens of minutes, a high resolution depth camera can collect hundreds of thousands of point readings, and analyzing and storing it in memory is computationally expensive. Moreover, the map-building might not include high resolution scans. For instance in an office environment, the general outline of the wall is all that s necessary rather than a point cloud of it. Dealing with a large amount of data will increase computation time for processing it, for instance to determine if the robot fits on a specific path. Relevant properties of the environment can be described concisely in terms of features with known characteristics. A more concise representation for the environment is necessary. A sparsely featured graphical representation is produced by landmark-based algorithms, where point features are characterized by their actual position in the environment. The relevant information in this map is the relation between all the point locations. There are many such algorithms [6, 30, 33, 36, 40], varying in the type of feature used and the method to prune which features and frames to use in the 65

66 map. Features range from binary pixel features to slightly more complex identified collections of gradients in the image; typically these methods run on purely RGB images, perhaps aided by a different sensing method as well. In particular, in terms of combined speed and accuracy with constrained monocular sensors and computation, the highest performing algorithm is ORB-SLAM [30], which uses simply recognized features and efficiently culls relevant frames from the processing to speed up the optimization process. This method has few sensor requirements, and can run in real-time. However, there is still the problem of association of features, as when a feature in one frame is marked equal to the feature in the next. This false positive occurrence does not greatly impact the performance of the algorithm, but it does seem to indicate a waste in computing time. The sparsest mapping algorithm treats objects themselves as landmarks and in turn uses a semantic understanding of objects in the environment to do loop closure. Object detections in SLAM++ [33] rely on pre-computed object meshes for every identifiable object in the environment. Once computed, real-time detections are possible, where a detection also marks the approximate size and distance of an object relative to the camera. Detections and positional constraints are added into the map structure, and the map constructed could either be treated as a sparse feature graph, with objects as features, or as a scan model, where the points of the objects are projected into the map space from their approximate location to create a 3D model of objects in the world. This method in general has the advantages of relatively straightforward data association, as objects are sparse enough to be distinguished by their positions. But it has the disadvantage of requiring high initial computation to store an efficient object representation in memory to speed things up later Coordinate Systems Understanding the coordinate system of the map is critical to the mapping process. 66

67 3D Most of the mapping done by the TurtleBot occurs in the three dimensional real space R 3. There are many coordinates systems in three dimensions to keep track of, for instance the robot coordinate frame x, the camera coordinate frame c, and the world coordinate frame w. All relevant transformations that occur between all the different coordinate systems consist of either rotation or translation or more typically a combination of both; they comprise SE(3) poses. Consider a location L as having some pose in the world p wl, with some rotation and translation relative to the world origin. It saves some time and space to refer to all possible poses and transformation as a 4x4 matrix R t 0 1 where the 3x3 matrix R represents a pure rotation in three dimensions, and a 1x3 vector t represents a translation in three dimensions. Whenever a transformation matrix is referred to, this pose representation encapsulates the entire possible transformation between two coordinate systems. One important note is that there are 6 degrees of freedom in the motion of an object, meaning there are 6 degrees of freedom in an SE(3) pose. An object in three dimensions can rotate along three unique axes, or be at a different three-coordinate point in space. One may also represent a rotation as a quaternion, which has four dimensions, but is unit-constrained, so there are still only three free dimensions left. 2D Often, for the simplicity of determining the position of many mobile robots, the relevant position of the robot is constrained to a flat surface, with a z dimension value of 0. For these cases, the robot has only 3 degrees of freedom - its translation relative to map origin, termed its position (x, y), and its yaw angle, namely its bearing relative to the origin θ. 67

68 3D Coordinate System Example Suppose the robot and environmental landmarks must be placed together on the map in order to run a simple navigational task, say along the lines of avoiding observed landmarks. Several poses must be computed: p wli, the poses of landmarks l i in the coordinate system of the world w; as well as p wx, the pose of the robot x in the world system. These are relatively straightforward to compute from precise value measurements of the landmarks and robot position. A robot with a camera at p xc on the robot (pose of the camera in the coordinate frame of the robot), will have made observations of landmarks l i through its camera sensor c, at positions p cli. Combining the relative rotations involves multiplying the rotation matrices and adding the translation vectors in the appropriate order if the goal is to determine the positions of a landmark in the world. One can write p wli = p wx p xc p cli to compute the proper pose. Thus, given an observation of a landmark by a camera on a robot, it is simple to compute the relative pose of the landmark with respect to the world coordinate system origin. Converting all poses to the world system ensures that the map can be used later, given an initial world reference point. The next step at abstracting this relative pose model for the real world is adding uncertainty in the observations of the world Localization With a background in coordinate systems, a robot requires more kinds of information to determines its relative pose in the world. Given a map, the robot can make observations, narrowing down its possible location options with every new observation. Dropped in unknown territory among the unique streets of Manhattan armed with a map of landmarks such as storefront locations and street names, a human might use the same technique of making observations to localize. Without any semblance of a map, a robot does a similar thing, making observa- 68

69 tions and narrowing down locations. It is building these relative locations from the moment it can sense. No longer does it have an exact state from which to localize. It can only assume its very first woken moment is a special arbitrary point in the world, typically the origin of its internal map. Then, as mobile robots do, it moves around its environment. The treatment of localization has not entirely changed since Smith and Cheeseman s relative frames approach [35]. In that work, they assume that the robot s world-frame poses are all random variables, and seek to compute the covariance and mean of its position relative to a specific past frame. After a long time period of executing motions and achieving new positions, the uncertainty between any two positions is bounded by the transformations between them. Particle Filter Localization A particle filter is a method to reduce uncertainty by computing prospective views from multiple locations to identify the true location of the robot [25]. Particles represent possible locations of the robot on a known map, and are weighted according to the likelihood that they are the actual position of the robot. The particles often start uniformly distributed across the known map if the robot has no prior information. Then, as the robot makes observations, the combination of measurements are contrasted to the view that each particle receives, so weights for particles that are more likely are increased, while weights of particles that are unlikely are decreased. Finally, the particles are re-sampled, with some randomness thrown in to avoid the robot getting stuck in local solutions. Non-global solutions that satisfy a recent set of observations are not always the correct solution, so some random selection of the rest of the known map gives the robot a chance to perturb an incorrect convergence. Current localization techniques Current localization methods in graph-based SLAM assume a belief distribution over all non definite parts of the system. These include the implicit variables in an execution of a motion, namely the robot s odometry, as well as the position variables 69

70 of the robot and landmarks. Once the map needs to be used, maximum a posteriori estimates over all the included variables are computed and the relations among them are determined, often in real-time as the robot moves and collects data [12]. The unknown variables and their relations are stored in a graphical model structure to constrain and relate them in a way so that optimization can be applied. The penultimate step of using the graphical model is a maximization over the combined likelihood of the variables, to assign final values. Finally, a discrete grid map could be created at each time step in which a new measurement is taken, to show the relation between all the variables through time. Localization in a graphical context is the maximum likelihood computation for the node representing the most current location SLAM: Simultaneous Doing mapping and localization simultaneously is conducted by adding constraints between the final distributions of the approximate positions obtained. Considering landmarks and self-position simultaneously simplifies and uses the underlying optimization processes, while providing a framework for more robust mapping. Moreover, the actual process of doing both mapping and localization at the same time is exactly what a robot dropped in an unknown environment has to do. These two processes are not independent, and should not be treated as such. The advantage of running a pose graph optimized SLAM method on any piece of hardware is that it minimizes the required reliability of the sensing mechanism used. This means that robots can still perform mapping well despite poor sensors. While general sensors are becoming inexpensive to manufacture, developing rigorous methods for obtaining precision given a larger set of inputs and allowing for a sensor to fail is critical to performing tasks in increasingly complex environments. 70

71 4.2.7 GTSAM and isam: Back End Processing To run Landmark-Based SLAM on the TurtleBot, a graphical mapping toolbox GT- SAM [6] equipped with an optimization toolbox [18] was used to constrain a set of probabilistically defined locations on a map. GTSAM GTSAM: Georgia Tech Smoothing And Mapping is the toolbox used for creating a map based off of observed landmarks in the environment [6]. The general approach is to collect all the positions of the robot, as well as the positions of unique landmarks, and represent them as random variables with presumed distributions to be stored in a factor graph (a type of graphical model). At any point in time, an inference method can be run over the factor graph to constrain the distributions of the locations and lower their covariances. The goal of constraining these variables is to eventually extract the most likely value for every distribution as the final value in the creation of a grid. The more accurate the distributions, the more accurate the final values. Location distributions are the variable nodes added to this graph. Between every two variable nodes is a factor node, containing the information between them in the form of a probabilistic factor φ, some set of samples of the relations extracted from robot data between the two variables. Between arbitrary adjacent variable nodes x i and x i+1, representing the position on the robot at times i and i + 1 respectively, the factor node between them contains the perceived control sequence that moved the robot between those positions. Another similar type of factor node is that between a robot position node x j and the k-th landmark position node l k observed from position x j. This factor node will contain the measurement details of the landmark at that position. One important part of the front end of GTSAM is the process of data association. The toolbox itself is not concerned with managing the number and types of various nodes that get added into the graph. The process feeding landmark measurements into the graph needs to ensure spurious duplicates are not added. Every relevant 71

72 change in position must be added to the factor graph to constrain the robot s position, but adding landmark repeats do not help constrain any variables. Thus, one needs a method of collapsing the observations of a single stationary landmark into a single landmark position node, as shown in Figure Over time, more and more edges are added from robot positions x i to a certain landmark l k, increasing the connectivity of the graph. Typically, inference over the variables in the graph is run after a certain number of new variables have been added. This is the optimization step, made whenever a robot wants specific highest-likelihood values. To understand the optimization, a deeper description of the underlying representations and concept of smoothing is necessary. isam isam: incremental Smoothing And Mapping [17, 18] is the back-end optimizer that efficiently computes the factorization of the joint probability distribution over all the desired variables. This distribution has several hundreds of variables, leading to thousands of terms in the covariance matrix. Removing directions and ignoring factor nodes in the factor graph model creates a Bayesian belief network of all nodes. The goal is to factorize the joint probability distribution P (x i, l j, u i,i+k, z i,j ) over robot positions x i, landmark positions l j, control measurements (random variable of odometry measurements at times i and i+k) u i,i+k, and landmark j measurement from position i z i,j. isam begins by assuming Gaussian models for the odometry process model and measurement model. The motion process of the robot is rarely measured exactly, so it gets treated as a probabilistic variable in the environmental model. The model for the robot s sensors also is inexact due to calibration errors and hardware biases that are very difficult to precisely quantify before use. Thus one can write its position as some function of its previously recorded position and the control input: x i+k = f i (x i, u i,i+k ) + w i 72

73 Figure 4-1: Graphical depiction of data association on a factor graph. From every yellow odometry node x i, observations to landmarks l k are made. The information is stored in the factors, the black nodes. Landmarks l 3 and l 4 are equated in the data association process, and consequently joined. This causes the addition of a constraint between the two odometry nodes x 1 and x 2, increasing the connectivity of the graph. 73

74 And similarly for the sensor measurement that the robot makes from a position x i : z i,j = h i,j (x i, l j ) + v i,j with w i and v i,j denoting zero-mean process/measurement noise with separate covariance matrices Λ i and Γ i,j. The precise parameters of the process models f i and the measurement models h i,j, are unknown. The Maximum A Posteriori (MAP) estimate, or the most likely set of positions that satisfy the constraints made through observations, are the positions to map. The solution to the problem is the minimization of the squared Mahalanobis distance between the model output distribution and position, over the set of possible positions [31]: X opt, L opt = arg min X,L M i=1...k f i (x i k, u i k,i ) x i 2 Λ i + i,j h i,j (x i, l j ) z i,j 2 Γ i,j Depending on the approximation of the process and measurement models, this turns into a linear or nonlinear optimization problem to solve for the positions of the robot and landmarks. 4.3 Motivation: Bootstrapping SLAM with Vision Using the TurtleBot was inspired by an idea to bootstrap mapping with an object detector. In general, the mapping process and the visual detection process can help each other; in one direction, the three-dimensional location of a detected object can be validated by the history of data locations that the map presents, potentially removing erroneous object detections from a set of proposed detections [28], and in turn, the construction of the map can directly benefit from improved detection as well by elimination of incorrectly detected features and association. 74

75 4.3.1 Vision in the Context of Mapping In section 4.2.2, a brief overview of different types of mapping was described, from dense maps, where a point in the pose graph is a point in the world, to semi-dense maps, where a point in the map is a feature in the world, to sparse maps, where a point in the map indicates a larger feature or object in the world. The object-level mapping method also matches human intuition of location semantics, where objects are typically found - certain places have certain objects in certain positions. For instance, monitors are typically found on top of desks, and doorknobs at least three feet above ground. Using this information, one way to distinguish between different offices across a uniformly outfitted office building is to consider the arrangement of immobile furnishing and other unique elements among offices. Object detector To recognize all these objects, one needs an advanced object detector. One relevant example of such a feature net would be SegNet, [19], a deep learning classifier for segmenting objects in images. As a deep-learning method, such methods require training, and their output is typically a classifier. There are several proposed modifications to the classification algorithm to generally build a feature extractor for a landmark-based SLAM method. First, SegNet s training typically requires many sets of images for each object it needs to classify, and both positive and negative training samples. Positive samples must have a bounding box around the location of the object along with its associated label, and negative images cannot contain the object. For this classifier to effectively run on an autonomously driving robot, it must be able to incrementally train on new images and labels. As the robot collects new data, the classifier should use it to incrementally improve the object detector. Second, instead of acting as a classifier, SegNet can be treated as a feature extractor. The job of a deep learning algorithm is to characterize nonlinear parameters in the relations between the image pixels of the input to output a final feature vector be- 75

76 fore clustering it with other classes of objects. The construction of this feature vector is essentially the same as a visual feature extractor, except that the process of analyzing the image is much more complex than the binary descriptors that characterize typical high-speed features such as ORB [32]. For many SLAM methods, running an expensive object detector isn t necessary. Point features serve the purpose of identifying features just as well, if not faster, in conjunction with real-time nonlinear optimization for pose graph mapping. However, object recognition characterizes the environment in a semantic way and presents a much sparser representation of the map that can be augmented by smaller features as well as more semantic ones. It also opens the door for semantic constructs of the environment to aid both mapping and detection. Bootstrapping mechanism With object recognition as the primary visual feature extractor, the construction of the map stands to benefit from improved object detections, eliminating incorrectly detected features used in the map. Intuitively this is the same situation as having wildly increased resolution sensors or decreasing false positive detections; a better sensor would come with better detection, improving mapping performance. Conversely, the object detector stands to gain something from more correct detections on the map as well. In [28], the authors use SLAM to improve object recognition by using location as an additional context for detecting candidate object proposal areas. In every frame of an input video, there are proposals made before objects are detected, namely areas in the image where a more localized object detection is made to minimize false positive detections. By incorporating the relative position of the camera and previously detected objects, the computation is narrowed to only likely positions of objects, and also brings attention to objects that are expected to be seen but might be occluded from view. This method uses information from previous frames instead of treating frames as independent recognition problems, strengthening the detector as a whole. The current work, involving an autonomously exploring robot that revisits a cer- 76

77 tain area with more or less consistent objects, would serve as an excellent testbed for this bootstrapping mechanism. The robot could incrementally improve its map creation after every explorative journey away from its base. After incorporating new viewpoints, images, and data from its exploration, it could improve the visual classifier, which in turn could improve the mapping algorithm as well AprilTags as Features Special landmarks had to be chosen to run SLAM with GTSAM to build a graphical map during repeated exploration of a specific area. Landmark observations that the robot makes should be easy to find and distinguish. Landmarks that get observed by the robot must also be fairly common in a particular environment to help the robot localize the most; for instance, in an office environment, the best landmarks are walls, monitors, and chairs. Moreover, landmarks observations should be distinguishable; in the same office environment, distinguishing between different chairs and monitor configurations presents unambiguous data that the robot can use. Once recognized, a landmark cannot be interpreted as something else. One abstraction for data association that serves these purposes in particular are fiducial objects that, when observed, provide the robot with a high-accuracy estimate of its location. In particular, for this work AprilTags [27] served as robustly detected and unique landmarks for the mapping system. AprilTags are 2D printed fiducial tags with adjustable sizes that can be detected in a high variety of lighting conditions, and from which a 6 degree-of-freedom relative pose can be determined. These tags are simple to use and detect with an off-the-shelf open source detector and provide a relative pose very quickly even with a low-resolution camera. 77

78 78

79 Chapter 5 Results 5.1 Experiment Setup TurtleBot Structure Hardware The TurtleBot 2 is a low-cost, open source indoor mobile robot for education and research [15] 5-1. Its individual components consist of a mobile Kobuki robot base, a multitude of sensors, a dock, and a netbook sitting on-board. The Yujin Kobuki robot base has a replaceable battery and electrical contacts on the underside for the TurtleBot to charge via the dock. In shape, size, and very nearly function, it has similar specifications to a Roomba. It has two wheels on each side and can hit a maximum linear speed of 0.65 m/s. Around the front half are three different bump sensors, so there are three different possible directions in the robot can pick up a bump. It also has three IR receivers for automatic docking. There are wheel encoders on both wheels, as well as a gyroscope and accelerometers. These proprioceptive sensors help the robot determine its own orientation relative to the ground. In our robot s situation and environment, it has little chance to divert from driving on a flat floor, thus for the most part the encoders obtain the odometry for our experiments. The TurtleBot also has a selection of 2D and 3D sensors onboard. First, attached 79

80 Figure 5-1: The TurtleBot in its natural habitat, docked. 80

81 the furthest distance from the base onto the topmost plate is an Asus Xtion Pro RGB-D camera, with a resolution of 640px 480px for both color and depth images and a capture rate of 30 frames per second. The robot also has a RoboPeak A1 LIDAR [24,34] mounted to the underside of the top plate. This is essentially a small laser scanner at the height of 0.5 meters, taking a horizontal scan at 5.5 Hz. It has a 6 meter maximum range, an angular resolution of 1 degree and a distance resolution of meters. The docking station for the robot has two electrical contacts that spring into the robot once it is in place to charge, an indicator light, and three IR emitters. The robot uses the emitters to understand in what direction to turn in order to dock reliably. The dock is typically stuck onto a hard surface such as a wall, and its power supply is the TurtleBot s normal charger. Finally, the TurtleBot software runs on an Acer Netbook that typically comes with with the robot, with the open source Ubuntu ROS installed. The netbook has modest specifications, with 4GB Memory, a 500GB hard drive, and an Intel i3-core. It is usually connected to the local laboratory Wi-Fi, available via ssh for troubleshooting and running experiments. Software The software of the robot was integrated using the ROS publish/subscribe interface [29]. All of the robot s firmware came fully installed, and many additional drivers or controllers can be found on the internet. In particular, ROS Dijkstra method path planner and controller (integral components of ROS on a mobile robot [8, 29]) were used off the shelf, as were the firmware for the sensors (the RoboPeak LIDAR [24,34] in particular fused with the RGB-D Asus Xtion Pro [37]), the AprilTag detector [41], and the grid mapping algorithm [10,13]. These were integrated with a new exploration algorithm and a novel state machine for autonomous continuous behavior. The offline pose graph mapping was implemented with a Python wrapper for the open source SLAM software GTSAM [6], developed by Sudeep Pillai, a doctorate candidate in Computer Science at MIT. 81

82 There are several notable details about the software structure of the robot. First, the explorative behavior of the robot, as well as the persistent state machine, could be developed separately and in parallel. This led to separate development of each of these software modules of the robot. Second, for the mapping experiments, grid mapping was run in real time onboard the TurtleBot for the purpose of exploration; the graphical method using AprilTags [27] was run offline with data collected by exploration. As the laptop s computational power was limited, the amount of software that could run in real-time was correspondingly limited. And third, the method by which all the data was incorporated to produce an occupancy grid for the exploration mapping required 2D scans. There were several sensors generating this type of data, namely the LIDAR and the depth scanner. The combined scan used for grid mapping consisted of the minimum detected depth per vertical line in the image, projected to the height of the LIDAR at halfway above the base, to act like a single sensor. The required input for graphical mapping was the set of distinguishable tag landmarks, so an AprilTag detector had to be run during data collection as well. Environment: Lab and Simulation Before a TurtleBot and its dock was acquired, the exploration algorithm was originally developed for a high-fidelity simulated robot running on the Gazebo-ROS platform [20]. Notably, the simulated data had practically no noise compared to real experiments. Moreover, simulated TurtleBot s sensors resembled the real sensors, but the 2D scanner was range limited, facing only the front 270 degrees. The final environment of the robot consisted of a lab within MIT s Ray and Maria Stata Center, fenced off from the corridor through the rest of the second floor by a set of double doors that get locked after hours. Typical objects in this lab are desks, chairs, boxes, and filing cabinets. At some point several small kayaks were moved in, narrowing a corridor that the robot typically traverses to explore the other side of the lab. The identifiable AprilTags were placed along objects and walls of the lab, some of which were primarily stationary, others of which could be moved. The dock of the TurtleBot is in a relatively wide strip of bare floor between more office doors 82

83 and desks to give the robot enough room to reliably find and move into the dock. The lighting varies between day and night and people move in and out of view during the day. Between midnight and 8AM there were fewer moving obstacles. 5.2 Experiments Exploration Efficiency Three different types of goal-directed frontier exploration were implemented to present a comparison of their efficiency of area explored over time, as well as a simplified algorithm to determine minor advantages of smoothing the calculation. We describe the empirical method and results here, and discuss the qualitative differences in the next chapter. The baseline for comparison among these experiments was the original frontier algorithm. The goal for the robot was the center of the nearest frontier [42]. Once a robot is close enough to its goal, the next one is computed. Nearness in the experiments was determined by comparing the distance to the centroid of the frontier cells on the occupancy grid. The second experiment was a modified version of the nearest frontier; in evaluating the set of frontiers, instead of choosing the nearest one, the robot would choose the largest. This would hopefully steer the robot towards a place where it could gather the most information, and its goal would be the centroid of the largest frontier. This exploration algorithm was further modified to allow the robot to bypass possible biases of the robot, such as the robot being prevented from planning a path to a point too close to an obstacle or occupied area. A random point on the largest frontier was chosen to be the goal instead of sending the robot to a certain point on the frontier. Finally, the frontier mask pre-processing was removed in an attempt to determine whether the processing made exploration any more effective. The image dilation and distance transform of the frontier mask were erased from the frontier goal computa- 83

84 Strategy Efficiency (m 2 /s) FD (m) PL (m) AE (m 2 ) Nearest frontier, closest point (Yamauchi) Largest frontier, closest point Largest frontier, random point Pre-smoothing frontier, random point Table 5.1: Exploration efficiency table. Comparison of various methods of exploration over the same amount of time, about 30 minutes. FD: Farthest distance reached. PL: total path length, as taken by robot. AE: Map area explored. tion, and the robot let loose. We present the following statistics for exploration: Efficiency: the efficiency of the algorithm in terms of how much area was discovered per time spent exploring. Maximum distance away from home: the maximum distance that the robot traveled over the course of the exploration. Path length: the length of the robot s total path after starting exploration, approximated by summing linear estimates of the robot s optimized odometry post-slam computation. Explored area: the average area that the robot explored in total. In this section, we also offer a set of images of the different frontier algorithms for a visual comparison of various successful and failed situations of the robot s exploration. For each exploration algorithm, the robot was set to undock, explore for 30 minutes, then return to its dock. The robot also logged its own behavior so that its travel distance and times could be studied. The results are listed in Table 5.1. The resulting visuals of the frontier processing steps among the different methods can be compared in figures 2-3, 2-4, 2-5, and 2-6. The efficiency for large frontier navigation has a higher efficiency than the nearest frontier method. These results are discussed in Chapter 6. 84

Figure 5-2: The computed frontier is made up of various disconnected contours, which must be separated from the single contour of occupied space.

85 Figure 5-2: The computed frontier is made up of various disconnected contours, which must be separated from the single contour of occupied space. The robot s position is the red point near the middle height of the image Reliable Redocking This section compares the effectiveness of our long-term recovery behavior to having no auxiliary behavior when the robot becomes stuck. Specifically, the state machine s implementations dealing with inactivity and navigation problems were disabled to compute baseline statistics. The inactivity behavior consisted of related rotations and erasing the local artificially inflated map in order to repopulate it with more recent data. Finally, a few critical elements of the autodocking behavior were also disabled to determine how they augmented the performance of the robot. Several metrics were analyzed regarding the TurtleBot s persistent behavior. Namely: Time to arrive at dock: the average time it takes for the robot to drive to the dock area and recognize its home AprilTag. Arrival at dock area success rate (%): the rate that the robot succeeds at driving home to its dock once it decides it wants to go home, from MapHome to 85

MEM380 Applied Autonomous Robots I Winter Feedback Control USARSim

MEM380 Applied Autonomous Robots I Winter 2011 Feedback Control USARSim Transforming Accelerations into Position Estimates In a perfect world It s not a perfect world. We have noise and bias in our acceleration