An Autonomous Mobile Robot Architecture Using Belief Networks and Neural Networks

An Autonomous Mobile Robot Architecture Using Belief Networks and Neural Networks Mehran Sahami, John Lilly and Bryan Rollins Computer Science Department Stanford University Stanford, CA 94305 {sahami,lilly,rollins}@cs.stanford.edu March 16, 1995 Abstract This paper introduces a novel mobile robot architecture based on a Situated Belief Network, a belief network that is dynamically updated as a consequence of its current environment. We initially show that it is possible to employ connectionist mechanisms to learn high-level features of the environment from the low-level (sonar) inputs of the robot. These high-level features can then be reasoned with using a belief network that is dynamically modified at run-time to produce different behaviors. We present experimental results for behaviors implemented using this architecture on an Erratic mobile robot platform in terms of both behavioral efficacy and programming efficiency. We conclude that this architecture is both effective and efficient for the control of a mobile robot. 1 Introduction While research in the field of mobile robotics has been on-going for a number of years [Fikes & Nilsson, 1971; Brooks, 1986; Saffiotti et al, 1993], there has been very little consensus reached as to what constitutes a desirable mobile robot architecture. While we by no means make an attempt to settle this debate, it is our contention that for a robot control architecture to be effective it must incorporate ways to deal with a number of issues raised by previous researchers. These issues include comprehensibility, modularity, scalability, and ease of programming. We outline these issues below and present them as features of the architecture we propose later in this paper. In order for a robot architecture to be comprehensible it must be understandable at an abstract level that divorces intention from implementation. In other words, the behaviors

that the architecture is to implement should be specified at some level higher than the explicit writing of code whether this be through the use of production rules, state space operators, fuzzy control rules, or assignment of probabilities to actions. Such comprehensibility is desirable since it not only allows humans to easily understand the abstract behavior of a robot, but it also provides the ability for another agent to reason about the behavior of our robot when coordination in a distributed environment is desired. Another important aspect of any computer system is that of modularity. In this respect, a system (or in our case, a mobile robot architecture) needs to allow for pieces to be easily decomposed and replaced. This allows for separate modules of the architecture to be upgraded or analyzed independently of the rest of the components of the system a crucial feature of any system that needs to be maintained over time. Scalability is a facet of robot architectures which is both essential and, unfortunately, difficult to achieve. For an architecture to be truly scalable it needs to allow for new perceptual capabilities, actuator mechanisms and behaviors to be added to the architecture without having to abandon the architecture and begin anew. While some major changes may need to be made to an existing system to realize such additional capabilities, the architecture should not have inherent limitations which make it unreasonable to apply it to a large variety of tasks and situations. Finally, an often overlooked issue in designing a robot architecture is ease of programming. If we are to design control systems that can exhibit complex behavior while at the same time being adaptable to a variety of task, it should not require us to have to generate extensive programs to do so. Moreover, we should be able to utilize as much of the existing code in the system as possible. With these goals in mind, we propose an integrated mobile robot control architecture based on belief networks and neural networks. This architecture is presented in the following section. Sections 3 and 4 discuss experimental results in implementing this 2

architecture on a mobile robot platform and the lessons learned from this endeavor. Finally, we present our conclusions and directions for future work. 2 Architecture The robot architecture we propose integrates several diverse elements. First, we utilize connectionist learning mechanisms to learn high-level features about the world from the low-level features of the robot (i.e. sonars). We then integrate the results of this learning with a belief network to create a Situated Belief Network [Sahami, 1995] which can be used to reason about a rapidly changing world. Finally, we add an administrative component to this architecture to plan and sequence high-level actions that should be taken by the robot to achieve its goals. 2.1 Learning Mechanisms One of the main focuses of this research stems from harnessing learning techniques to abstract the low-level sonar inputs which the robot produces into some high-level feature of the world which can be reasoned about. We refer to making this transition as the Robot Viewpoint Problem. Essentially, this problem refers to the fact that it is very difficult for humans to reason coherently about the sonar input received from the robot, and thus using this data effectively becomes a difficult reasoning and coding task for the human. Alternatively, it is much easier for humans to reason about abstract features, such as the degree to which Front-Is-Clear, when trying to decide on some action to take. Thus, we are confronted with the problem of translating the robot s viewpoint (sonars) into the human s viewpoint (abstract features). To address this problem we used neural network methods [Rumelhart et al, 1986] to cast the viewpoint translation task into a supervised machine learning task. Specifically, we attempted to learn the features Front-Is-Clear, Right-Is-Clear, and Left-Is-Clear by using a single artificial neuron for each feature. We collected data for the task by simply 3

having the robot collect sonar vectors from its environment and then varying the position of obstacles relative to the robot. The human supervision for the task was merely making a Boolean decision as to whether a given world configuration indicated one of the features we were trying to learn (i.e. Front-Is-Clear ). This forces the human to only make a simple yes/no decision based on their own viewpoint and not by looking at the actual sonar vector values. In our study, we collected nearly 2000 sonar vectors in real time (approximately 1 hour). Data collection was very efficient since we simply set the robot to write the sonar vectors it collected to a file while we moved obstacles around the robot ensuring, for example, that Front-Is-Clear was maintained. This file was then labeled as positive instances for the learning task. We then had the robot write another file of sonar vectors, again moving obstacles around the robot, but this time ensuring that there was always an obstacle somewhere in front of the robot to preclude Front-Is-Clear being true. This provided negative examples for our learning task. Similar methods were employed for collecting data regarding the other high-level features to be learned. We then ran the data through a gradient-decent weight learning algorithm (backpropagation applied to only one neural, also know as the delta rule) to produce a function, f, that maps seven-dimensional sonar vectors into a real value in [0, 1], indicating the degree to which some high-level feature is true. The weight updating rule is given by: W new = W old + (d - f(x, W old ))(f(x, W old ))(1 - f(x, W old ))(X) where W is the weight vector, X is the sonar vector, d {0, 1} is the desired response, and f(x,w) is the sigmoid function: f(x, W) = 1 1 + e -(X W) Interestingly enough, we found that the data collected for each feature were nearly linearly separable indicating that, from the robot s viewpoint, these were easily recognized features. 4

These results were further validated by testing the learned functions using an artificially generated clean dataset. While a complete discussion of those results is beyond the scope of this paper, we found the learned functions to be extremely accurate (on the order of 1% error). Moreover, the learned functions now provides us with a means to gauge the truth of some high-level feature given previously unseen data. We are now ready to reason about these high-level features. 2.2 Situated Belief Networks The mechanism we employ for reasoning about our high-level features is a Belief Network [Pearl, 1988] (also referred to as a Probabilistic Network, Bayes Network, or Influence Diagram in different literatures). While such methods have become popular recently in the "reasoning under uncertainty" community, they have still come under scrutiny as they rely on the subjective setting of prior probabilities to reason with. We circumvent this problem by using the learned functions for high-level features to provide us dynamic prior probabilities for some high-level feature being true. Hence, our belief network is situated in that the information which begins the reasoning process (the prior probabilities) change dynamically with the environment every time the robot gets a new set of sonar readings. Our belief network topology is shown below. Front-Is Clear Right-Is Clear Left-Is Clear Move Forward Turnright Turnleft Backup ACTUATORS Figure 1. Belief network topology. 5

Note that the nodes labeled X-Is-Clear each contain a function learned during the learning phase described in the previous section. The conditional probabilities in the belief network still need to be set by hand, although these conditional probability tables are small and relatively easy for a person to create. The conditional probabilities are then multiplied by the dynamic prior probabilities to produce a set of posterior probabilities for each action the robot can take. The resultant posterior probabilities are then sent to the actuators of the robot indicating the degree to which some action should be taken. Thus the conditional probability tables actually determine a behavior for the robot by providing the mechanism by which features in the world are translated into actions being taken by the robot. In order to change the behavior the robot is employing at a given point, we only need to change the values in the conditional probability tables of our network. 2.3 Administrator To allow for the behavior of the robot to change at run-time, we employ an administrator mechanism that is essentially a combination of a planner and a recognizer. The planner simply produces a very high-level sequence of actions (i.e. move-downcorridor, turnright-at-intersection, move-down-corridor, etc.) that the robot is to take to bring about a goal. The recognizer merely recognizes when one subgoal has been attained (i.e. reached the end of the corridor) and indicates that it is time to perform the next subgoal. As the recognizer moves from one subgoal to the next, it simply updates the conditional probability tables for our actions to reflect the change in behavior the robot is to undertake. The complete architecture is shown on the next page. 6

Front-Is Clear Right-Is Clear Left-Is Clear Move Forward Turnright Turnleft Backup Admin. ACTUATORS Figure 2. Complete control architecture topology. Mathematically, the addition of the administrator node to the belief network is equivalent to conditioning the action nodes on an additional prior probability. We can think of this prior as a selector variable that is selecting the conditional probabilities associated with a given behavior out of a larger conditional probability table. We graphically differentiate the administrator in Figure 2 to show its importance as a modular component of this architecture. This figure thus represents a complete instance of a situated belief network that is applicable to the domain of robotic control. 3 Experimental Results The architecture described above was implemented on an Erratic mobile robot. Initially, we chose to forgo use of an administrator and simply implemented an obstacle avoidance behavior by setting the right conditional probabilities to produce the intended behavior. This initial phase was just to test the viability of the architecture and also test the effectiveness of our learned function mappings from sonars to high-level features. The 7

obstacle avoidance behavior was achieved using the following conditional probability tables (note that F refers to Front-Is-Clear, likewise with R and L): F ~F Move 0.9 0.1 F R L F R ~L F ~R L F ~R ~L ~F R L ~F R ~L ~F ~R L ~F ~R ~L Turnright 0.1 0.4 0.0 0.0 0.9 0.9 0.0 0.1 F R L F R ~L F ~R L F ~R ~L ~F R L ~F R ~L ~F ~R L ~F ~R ~L Turnleft 0.0 0.0 0.4 0.0 0.1 0.0 0.9 0.1 F R L F R ~L F ~R L F ~R ~L ~F R L ~F R ~L ~F ~R L ~F ~R ~L Backup 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 Table 1. Conditional probability tables for obstacle avoidance behavior. While a cursory glance at these tables may render them incomprehensible, looking at the world states represented by the conditioning variables clearly indicates the stimuli in the world that the robot will react to given these conditional probability tables. In fact, the robot did show robust obstacle avoidance while using these conditional probabilities and the learned prior probability functions. In comparison to other methods for obstacle avoidance that we had implemented previously (fuzzy control rules [Saffiotti et al, 1993] and certainty grid-based methods [Elfes, 1990; Borenstein & Koren, 1989]), this architecture provided the most consistently robust behavior. To test the full power of this architecture, at least in a limited scope, we added an administrative module to our architecture. We hard-coded a plan into the module which was simply for the robot to move down a corridor to an intersection using the obstacle avoidance behavior, turnright at the intersection and continue to move down the corridor to the next intersection. Consequently we also added a recognizer to change the behavior of the robot when it had reached the intersection and then change it again once it had completed its turn. To make full use of the resources we had at our disposal, the recognizer employed the learned high-level features in making its decision as to when the intersection had been reached and when the robot had completed its turn. The intersection 8

recognition process checked to see if the robot s front and left were blocked while its right was still clear. Turn completion recognition was based simply on the robot s front becoming clear. Note that the architecture does not require the recognition module to make use of the learned features, but it does provide them for free if they could be deemed useful. Running the robot using the complete architecture showed a promising coincidence of theory and reality. In our constructed environment (cardboard boxes), the robot repeatedly was able to successfully move down the corridor, recognize when it had reached the intersection, switch behaviors to turn to the right and then continue to move down the next corridor. While we believe that some fine tunings could have made the behavior virtually flawless, the results were nevertheless very impressive. 4 Discussion If the only goal we attempted to achieve with our architecture was robust robotic control, our initial results would be quite encouraging. However, in terms of the criteria outlined in Section 1, our architecture provides many compelling advantages. The architecture presented here is comprehensible to the extent that we need not reason with low-level input from the robot, but can instead deal with high-level features thereby solving the Robot Viewpoint Problem. Moreover, the explicit topology of the belief network makes it readily apparent what we are computing, how to compute it and the values that are necessary to perform the computation. The integration of the administrator unit is also clearly motivated and its role in the overall architecture is easily understood. Perhaps an even bigger advantage of this architecture is its modularity. As explained above, the various components of the architecture (planner, recognizer, belief network connections, mapping functions for prior probabilities, etc.) can easily be modified without having the re-implement the entire controller. Moreover a variety of learning, planning or 9

recognizing methods can be employed to achieve the desired results as parts of the whole framework. Other methods can even be incorporated within this structure (i.e. using fuzzy rules to determine prior probabilities, certainty-grid based methods for recognition, etc.) The scalability of this architecture has not yet been thoroughly explored. However, there appears to be no reason why an extensive variety of behaviors could not achieved within this framework. Clearly this is a direction for further research. Finally, the ease of programming within this architecture was far greater than even we anticipated at the outset of this project. For example, implementing the obstacle avoidance behavior by simply specifying a set of conditional probabilities took nearly an order of magnitude less time than attempting to achieve the same behavior with fuzzy control rules. We believe this to stem primarily from the fact that we could reason about high-level features rather than working at the level of the sonars. Moreover, we could virtually forget about the raw sonar values as the learning mechanism provided an effective mapping to high-level features. We even tried an experiment to see if would could write an effective mapping function by hand. We discovered that our hand-written function performed worse than the learned function, again reinforcing the importance of learning as a way to deal with the Robot Viewpoint Problem. Nevertheless, we are not claiming that other robot control architecture do not offer some similar advantages, but only that these are important issues to consider in considering the relative worth of an agent architecture. 5 Conclusions and Future Work We have provide a new architecture for the control of a mobile robot which combines elements of machine learning and probabilistic reasoning to provide a robust control mechanism. While the initial results appear encouraging, there are still a number of issues to be explored in this work. Most notably, additional experiments need to be conducted to 10

measure the true scalability of this framework. This includes the implementation of additional behaviors, and consequently the use of more complex planning and goal recognition mechanisms. Moreover, as the perceptual capabilities of robots expands, new venues for learning high-level features and their integration into the belief network structure need to be explored. In the long-term we hope to use this same general architecture to address issues in domains entirely unrelated to robotics. Acknowledgments The authors are grateful for the guidance of Kurt Konolige who provided the format, encouragement and the mobile robots to conduct this research. This work has also benefitted from discussions with Nils Nilsson, Pat Langley, Jeff Shrager, and Ross Schacter. Additional thanks go to our robot, The Hoon, for providing late-night inspiration. The first author is supported by a Fred Gellert ARCS Foundation scholarship. 11

References [Borenstein & Koren, 1989] Borenstein, J., & Koren, Y., 1989. Real-time Obstacle Avoidance for Fast Mobile Robots. IEEE Transactions on Systems, Man and Cybernetics, Vol. 19, No. 5, pp. 1179-1187. [Brooks, 1986] Brooks, R.A., 1986. A Robust Layered Control System for a Mobile Robot. IEEE Journal of Robotics and Automation, Vol. RA-2, No. 1, pp. 14-23. [Elfes, 1990] Elfes, A., 1990. Occupancy Grids: A Stochastic Spacial Representation of Active Robot Perception. Proceedings of the Sixth Conference on Uncertainty in AI, pp. 60-70. [Fikes & Nilsson, 1971] Fikes, R.E. & Nilsson, N.J., 1971. STRIPS: A new approach to the application of theorem proving to problem solving, Artificial Intelligence 2, pp. 198-208. [Pearl, 1988] Pearl, J., 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann. [Rumelhart et al, 1986] Rumelhart, D.E., Hinton, G.E., & Williams, R.J., 1986. Learning Representations by Back-Propagating Errors. Nature 323, pp. 533-536. [Saffiotti et al, 1993] Saffiotti, A., Ruspini, E.H., & Konolige, K., 1993. A Fuzzy Controller for Flakey, An Autonomous Mobile Robot, SRI International Technical Note No. 529, March 1993. [Sahami, 1995] Sahami, M., 1995. Situated Belief Networks: An Integrated Agent Architecture Using Belief Networks and Neural Networks, working paper, Department of Computer Science, Stanford University. 12