Proceedings of IEEE International Symposium on Computational Intelligence in Robotics and Automation, Kobe, Japan, 16-20 July 2003 GA-based Learning in Behaviour Based Robotics Dongbing Gu, Huosheng Hu, Jeff Reynolds, Edward Tsang Department of Computer Science, University of Essex Wivenhoe Park, Colchester CO4 3SQ, UK Email: {dgu, hhu, reynt, edward}@essex.ac.uk Abstract: This paper presents a Genetic Algorithm (GA) approach to evolving robot behaviours. We use fuzzy logic controllers (FLCs) to design robot behaviours. The antecedents of the FLCs are pre-designed, while their conseuences are learned using a GA. The Sony uadruped robots are used to evaluate proposed approaches in the robotic football domain. Two behaviours, ball-chasing and position-reaching, are studied and implemented. An embodied evolution scheme is adopted, by which the robot autonomously evolves its behaviours based on a layered control architecture. The results show that the robot behaviours can be automatically acuired through the GAbased learning of FLCs. Keywords: Genetic Algorithms, Evolutionary Robotics, Fuzzy Control, Behaviour-based Robots. 1. Introduction A control system for an autonomous robot has to cope with uncertainty in sensory readings and actuator execution as well as handle dynamic changes in the environment. The traditional robot software architecture uses deliberative reasoning in the form of sensing, planning and action. It is difficult to accommodate sensory uncertainty and the environmental dynamics in such an architecture [4]. Reactive or behaviour-based architectures are better able to handle the problems inherent in the deliberative architecture. The basic component in such an architecture is a group of behaviours. Behaviours directly map sensory information into motor actions without complex reasoning. The mapping enables robots to respond to environmental changes promptly. Behaviours can also operate concurrently to produce emergent behaviours for unknown environments [1]. When designing control strategies for mobile robots, it is impossible to predict all the potential situations robots may encounter and specify all robot behaviours optimally in advance. Predefined control strategies consume large amounts of time and are usually brittle in practice due to noise or the unpredictable nature of the real world. Robots have to learn from, and adapt to, changes in their operating environment. Evolutionary robotics provides an alternative way to design the control system for mobile robots. Many successful paradigms have been demonstrated using neural networks [10], classifier systems [5] and reinforcement learning [11][13][14]. Though evolutionary robotics has been criticised for being potentially slow [12], some examples of embodied evolution have been explored [18]. In this paper, we present a Genetic Algorithm (GA) approach to learning robot behaviours. Behaviours for soccer playing are evolved for a Sony legged robot. Two behaviours, ball-chasing and position-reaching, are studied and implemented. Our behaviour control uses Fuzzy Logic Controllers (FLCs) to implement the mapping from visual information to actions. A FLC represents uncertainty by fuzzy sets and an action is generated cooperatively by several rules that are triggered to some degree, to produce smooth and robust control outputs [2][3][16]. In our design, the FLC antecedents are predefined, including the selection of inputs and the definition of their membership functions. Therefore, the number of the rules is fixed (it is the product of the number of input fuzzy sets). The FLC conseuences are defined as fuzzy singletons, which are basic motion commands for the mobile robot. The GA is employed in the selection of output fuzzy singletons. Evolving FLC for a robot behaviour has been explored by many researchers, such as reported in [3][7][15][17] etc. However, some of them was tested in simulation, or the learning was conducted in simulation first (off-line learning), then tested in real robots. On-line learning was claimed in [15], which adapted on-board sensors to provide the fitness. The approach proposed in this paper is an on-line version and it integrates the external assessment into the learning system to improve the learning efficiency. Furthermore, a finite state machine is employed to coordinate the behaviours to achieve the embodied learning. The rest of this paper is organised as follows. Section 2 describes the learning setting for this research. The learning algorithm is formulated in section 3. Section 4 presents the simulation and experimental results to show the feasibility of proposed the learning algorithm. Finally section 5 provides a brief conclusion and future work.
2. Learning Setting 2.1 The robot and its playing field Sony legged robots are uadruped walking robots that resemble the basic behaviour of dogs. They are controlled by an embedded R4000 microprocessor with over 100 MIPS performance. There are 20 motors in a Sony robot for action. The neck and four legs of each robot have three degree of freedom (DOFs) for looking and walking. The other five are used for the tail (2 DOFs), mouth (1 DOF) motion, and two ears (1 DOF). The main sensors include 20 encoders for motion control of twenty motors, a colour CCD camera, an infrared range sensor, three gyros for posture measurement (roll, pitch, yaw), and touch sensors. Additionally, there is a stereo microphone and a loud speaker for communication [6]. The environment for the Sony Legged Robot League is a playing field 4m in length and 3m in width. Figure 1 shows a top view of the playing field from the overhead camera. The goals are centred on both ends of the field, and are 60cm wide and 30cm high. Six uniue coloured landmarks are placed around edges of the field, with one at each corner and one on each side of the halfway line. Each landmark is painted with two different colours of which pink is either at the top of landmarks on the one-side or at the bottom of landmarks on the other side. These landmarks are used by the robots to localise themselves within the field. The ball, walls, goals, landmarks and robot uniforms are painted with eight different colours distributed in the colour space so that a robot can easily distinguish them. performance at the end of one run, there is no significant effect of the communication on the learning process. 2.3 The Control architecture To control the robots we employ a layered architecture: a walking layer, a behaviour layer and a cognition layer as shown in Figure 3 [8][9]. The walking layer is at the bottom of the architecture. Its task is to implement basic walking operations. It can respond to a number of walking commands issued by the middle layer, i.e. the behaviour layer. The walking commands are defined as a set C={c k, k=1,..,k), including MOVE FORWARD, LFFT FORWARD, RIGHT FORWARD, LEFT TURN, RIGHT TURN, and STOP. The walking layer generates the discrete walking commands in term of the motor encoding readings. The vertical connection between the walking layer and the behaviour layer is represented by the selection (the arrows in the figure 3). The state space of the walking layer shown in figure 3 consists of the encoding readings. The grids in the layer are separated by the multiple dimensions and represent the walking commands. 2.2 Learning environment Although the onboard sensors can provide the results of the interaction between a robot and its environment, the perception aliasing is severe for the large perception space and high noised sensors. To evaluate the performance of robot behaviours, the environment should provide payoffs to robots with a certain accuracy to improve the learning efficiency. A global monitor is set up in our laboratory (see figure 2), which includes an overhead camera, a desktop computer, and visual tracking software, to provide the external judgement for the interaction between the robot and its environment. The function of the monitor is to feed the position information of the robot and the ball to the robot. Then, the robot can autonomously test its control strategies and evaluate the results. The monitor recognises the robot and the ball according to their colours. Through the image processing, the monitor updates their position continuously. The robot can ask for the information at any time during its learning process. The communication is achieved through Internet where the monitor system acts as a server and the robot act as a client. The sever provides the global information when the client has a reuest. Since the robot only evaluates its Fig. 1 The top view of the playing field from an overhead camera Fig. 2 The learning environment The middle layer is the behaviour layer that provides a group of behaviours to the top layer, i.e. the cognition layer. The behaviours include ball-chasing, obstacle-
avoiding, position-reaching, ball-dribbling, ball-kicking, etc. The feature states extracted by the robot's local camera constitute the state space of the behaviour layer shown in the figure 3. These features include the robot's position, the relative angle and distance from the robot to the ball and the goal. A grid denotes a behaviour in figure 3. Different grids may have different sizes or different dimensions. At the top, the cognition layer co-ordinates these behaviours to achieve a given task. The feature states in the cognition layer are more abstract predicated states. These predicated states are denoted by binary values, for instance, whether or not the ball is found. A grid represents a combination of the predicated states. A discrete event system model can be used to formulate the behaviour coordination. Abstraction of states and actions Cognition Layer Behaviour Layer There is only one output variable in each of the behaviours, which corresponds to a walking command. There are K walking commands denoted by c k (k=1,,k) which can be used for the output. K fuzzy singletons are defined as the fuzzy output. The mth fuzzy rule (m=1,,m) is denoted as: km R m : IF s 1 AND s 2 AND s N, THEN a is c k m c is the kth fuzzy singleton in mth rule. The where crisp output a, stimulated by an input state S after fuzzy reasoning, is calculated by the centre of area (COA) method, i.e. 3.2 The GA a = B(S) In this paper, a FLC or a behaviour is viewed as an individual. A population includes a group of FLCs. The running of the robot with the FLCs is the evaluation process. As the antecedents of an FLC are pre-defined, only the FLC conseuences are encoded as chromosomes. There are M rules in one FLC, meaning there are M fuzzy conseuences in one FLC. Therefore, one chromosome has M genes. The first gene corresponds to the first rule s conseuence. Each gene could be one of fuzzy singletons c k and illustrated in figure 3. k 1 c k 2 c k M c Walking Layer Fig. 3 A layered architecture The learning in this research occurs within the behaviour layer where individual behaviours need to designed to map the noisy feature states to unperfected walking actions. The walking layer provides a substrate for the entire system. The cognition layer is used in this research to provide a mechanism that co-ordinate the behaviours. And it also provides an opportunity for the robot to learning different behaviours continuously without intervention. For example, the robot can start a run of position-reaching behaviour after a run of the ball-chasing behaviour without repositioning the robot by operators. 3. Learning Algorithms 3.1 The FLC A behaviour is a mapping from sensory data or an environment state vector S to a walking action a. It can be expressed as a = B(S) where B is the mapping function. The FLC can be used to implement the function B [6]. Assuming there are N feature states for a behaviour B, i.e. there are N input state variables s i (i=1,, N) in state vector S. For each input state variable s i, L i fuzzy sets are defined. The total number of fuzzy rules is denoted as M. Fig. 4 One chromosome The operations used in the GA include: Initialisation: The first generation is initialised randomly. Each gene in each chromosome is chosen from the K fuzzy singletons evenly. Elitist: The best individual in current generation is automatically copied into next generation. Selection: Individuals are copied into next generation as their offspring according to their fitness values. The individuals with higher fitness values have more offspring than those with lower fitness values. Crossover: The crossover will happen for two individuals in offspring with the crossover probability p c. One point crossover is used to exchange the genes. Mutation: The mutation is taken for one gene of an offspring with the mutation probability p m. The operator randomly chooses one fuzzy singleton from the allowed set to replace the current gene. 4. Experiments 4.1 The behaviour models A simple version of the robot architecture shown in figure 3 includes four behaviours: ball-chasing, positionreaching, obstacle-avoiding, and ball-searching. The
obstacle avoiding behaviour simply avoids the edges and the goals of the playing field. The ball-searching behaviour is to scan the playing field to find the ball. Some heuristic rules are used to design these two behaviours. They are just used to help to learn other two behaviours and not learned in this paper. In ball-chasing behaviour, the ball distance, the ball angle, and the goal angle related to the robot's heading are chosen as the feature states. Three fuzzy sets are defined for each of them. In the position-reaching behaviour, the target distance and angle related to the robot are chosen as the feature states. These two states are calculated from the target co-ordination, the robot's co-ordination, and the robot's heading. Again three fuzzy sets are defined for both of them. The predicated states in the cognition layer are p 1 (the ball is found), p 2 (obstacles are found), p 3 (the ball is near enough), p 4 (the target is near enough), p 5 (the ball-chasing behaviour is timed-out), and p 6 (the position-reaching behaviour is timed-out). The transition of the behaviour is expressed as a finite state machine (see figure 5). p 4 p 6 Ball chasing Position reaching p 3 p 5 p 2 p 1 p 6 p 2 p 5 Ball searching Obstacle avoiding Fig. 5 A behaviour transition model The initial behaviour is the ball-searching. When the ball is found, the system starts to learn a ball-chasing behaviour. After one run, the system is transited to the learning of a position-reaching behaviour. Then, learning alternates between these two behaviours in order to keep the learning continuously without external intervention. If obstacles are found, the robot ends the current learning individual to avoid the obstacles until no obstacles are found. p 2 f(t) = (1-distance/3000)*(1-angle/180)*(1- time/maximum time) There are three items defined in the fitness function: the final distance between the robot and the ball, the final angle between robot heading and the line connecting the robot and the ball, the time spend on ball-chasing behaviour. All three items are normalised to 1. Position-reaching behaviour: f(t) = (1-distance/3000)*(1-angle/180)*(1- time/maximum time) Three items are defined in the fitness function: the final distance between the robot and the desired position, the final angle between the robot heading and the line connecting the robot and the position, the time spent on position-achieving behaviour. All three items are normalised to 1. 4.3 Results A simulator of the experiment environment was also developed in order to verify the algorithms and decrease learning time. The simulator is constructed from statistical samples of real robot motion. Ball-chasing behaviour In simulation, the population size is 10, the crossover probability is 0.2, and the mutation probability is 0.1. After 30 generations, the average fitness values and the standard deviations are shown in figure 6. The fitness values are gradually increased and finally converge to a high value. The dropdowns in the middle of the curve indicate the solution exploration procedures by the mutation and crossover operators. The decreasing of the standard deviations reflects ten individuals finally tend to have the same genes. So their behaviours tend to be same. The best FLC was picked up from the last generation to do the test. Figure 7 shows the behaviour was successfully evolved. The robot can move to a ball (in the middle of the playing field). 4.2 Fitness Functions The fitness functions f(t) are defined with respect to robot behaviours. Ball-chasing behaviour: Fig. 6 evolving ball-chasing behaviour in simulation
Fig. 7 The ball-chasing behaviour test in simulation In a real robot, the parameters of GA are same with the simulation. After 20 generations, the average fitness values and the standard deviations are shown in figure 8. The average fitness values again are gradually increased and converge to a high value. And the exploration of solution space is illustrated by down and up. The standard deviations did not converge to zero, but they are not diffuse. This was caused by many factors in the real situation. For example, the vision-tracking algorithm could fail to track the ball, the robot could slip on the pitch, the monitor could provide an inaccurate position etc. However, the GA still can find a good FLC that can move the robot to a ball. Figure 9 is a test of the ball-chasing behaviour in a real robot with the best FLC in last generation. The behaviour was successfully acuired. Position-reaching behaviour The GA algorithm and its parameters in this behaviour are the same as that in the ball-chasing behaviour. Figure 10 shows the average fitness values and the standard deviations. The average fitness values climb to a high value and the standard deviations converge to small values. Figure 11 shows a test result using the FLC picked up from the last generation with the highest fitness value. The robot faced the right side at the beginning. It can manage to turn left and move to the target (denoted as a cross in the figure). Fig. 9 The ball-chasing behaviour test in a real robot Fig. 10 The position-reaching behaviour evolving in simulation Fig. 11 The position-reaching behaviour test in simulation Fig. 8 Evolving ball-chasing behaviour in a real robot Fig. 12 The position-reaching behaviour evolving in a real robot
Fig. 13 The position-reaching behaviour test in a real robot The evolving of the real robot is shown in figure 12 where the same convergence result was obtained. Although the standard deviations are large due to un-modelled uncertainty, the FLC selected from the last generation is still successful and effective. The testing result is shown in figure 13. We can see that the target is denoted by a cross and the robot can move to this target. 5. Conclusions and Future Work We believe our results showed that it is feasible to use GA learning in behaviour-based robot control. This is because the learning task can be decomposed into the learning of individual behaviours. Each robot behaviour can be defined as an FLC. We have shown how a GA can be used to evolve the FLCs. The antecedents of these FLCs have been pre-defined and their conseuences were left for automatic acuisition. The learning scheme addressed in this paper focused on the embodied evolution, which involves both acuiring the payoffs from on-board sensors and external sensors and learning different behaviours continuously without external intervention. The experiments in both simulation and real robots showed that the behaviours could be acuired through this evolving procedure in an efficient way. Our future work will be focused on how to transfer the results evolved in the simulation to real robots in order to speed up the learning process. The ability to learn or refine antecedents is also needed. References [1] R. C. Arkin, Behaviour-based Robotics, The MIT Press, 1998. [2] H. R. Beom and H. S. Cho, A Sensor-based Navigation for a Mobile Robot Using Fuzzy Logic and Reinforcement Learning, IEEE Trans. on SMC, Vol. 25, No. 3, pp464-477, 1995. [3] A. Bonarini, Evolutionary Learning of Fuzzy rules: competition and cooperation, In Pedrycz, W. (Ed.), Fuzzy Modelling: Paradigms and Practice, Kluwer Academic Press, Norwell, MA, pages 265 284, 1997. [4] R. Brooks, A Robust Layered Control System for a Mobile Robot, IEEE Journal of Robotics and Automation, Vol. RA- 2, No. 1, pages 14-23, 1986. [5] M. Dorido, and M. Colombetti, Robot Shaping: An Experiment in Behaviour Engineering, The MIT Press, 1998. [6] M. Fujita, Development of an Autonomous Quadruped Robot for Robot Entertainment, Autonomous Robots, Vol. 7, pages 7-20, 1998. [7] J. Grefenstette and A. Schultz, An Evolutionary Approach to Learning in Robots, Machine Learning Workshop on Robot Learning, New Brunswick, NJ, 1994. [8] D. Gu and H. Hu, Evolving Fuzzy Logic Controllers for Sony Legged Robots, Proceedings of the RoboCup 2001 International Symposium, Seattle, Washington, 4-10 August 2001. [9] H. Hu and D. Gu, Reactive Behaviours and Agent Architecture for Sony Legged Robots to Play Football, International Journal of Industrial Robot, Vol. 28, No. 1, ISSN 0143-991X, pages 45-53, 2001. [10] P. Husbands and I. Harvy, Evolution versus Design: Controlling Autonomous Robots In Integrating Perception, Planning and Action: Proceedings of 3rd Annual Conference on Artificial Intelligence, Simulation and Planning, IEEE Press, pages 139-146, 1992. [11] S. Mahadevan and J. Connell, Automatic Programming of Behaviour-based Robots Using Reinforcement Learning, Artificial Intelligence, Vol. 55, pages 311-365, 1991. [12] M. Mataric and D. Cliff, Challenges in Evolving Controllers for Physical Robots, Robotics and Autonomous Systems, Special Issue on Evolutionary Robotics, 19 (1), pages 67-83, 1996. [13] D.E. Moriarty, A.C. Schultz and J.J. Grefenstette, Evolutionary Algorithms for Reinforcement Learning, International Journal of Artificial Intelligent Research, 11, pages 241-276, 1999. [14] S. Nolfi and D. Floreano, Learning and Evolution, Autonomous Robots, 7(1), pages 89-113, 1999. [15] A. Ram, R. Arkin, G. Boone, and M. Pearce, Using Genetic Algorithms to Learn Reactive Control Parameters for Autonomous Robotic Navigation, Adaptive Behaviour, Vol. 2, No. 3, pages 277-303, 1994. [16] A. Saffiotti, E. H. Ruspini, and K. Konolige, Using Fuzzy Logic for Mobile Robot Control, in Zimmermann, H. J., editor, Practical Applications of Fuzzy Technologies, Kluwer Academic Publisher, pages 185-206, 1999. [17] L. Steels, Emergent Functionality in Robotic Agents through On-line Evolution, In: Brooks, R. and Maes, P. (eds) Artificial Life IV. Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, pages 8-14, Cambridge, MA: The MIT Press, 1994. [18] R. Watson, S. Ficci and J. Pollack, Embodied Evolution: Embodying An Evolutionary Algorithm in A Population of Robots, In Michalewicz, Schoenauer, Yao, and Zalzala, (eds.), Proceedings of Congress on Evolutionary Computation, 1999.