GA-based Learning in Behaviour Based Robotics

Similar documents
Review of Soft Computing Techniques used in Robotics Application

CYCLIC GENETIC ALGORITHMS FOR EVOLVING MULTI-LOOP CONTROL PROGRAMS

Reactive Planning with Evolutionary Computation

Biologically Inspired Embodied Evolution of Survival

Learning and Using Models of Kicking Motions for Legged Robots

Learning Behaviors for Environment Modeling by Genetic Algorithm

FAST GOAL NAVIGATION WITH OBSTACLE AVOIDANCE USING A DYNAMIC LOCAL VISUAL MODEL

Key-Words: - Fuzzy Behaviour Controls, Multiple Target Tracking, Obstacle Avoidance, Ultrasonic Range Finders

Fuzzy Logic for Behaviour Co-ordination and Multi-Agent Formation in RoboCup

Behaviour-Based Control. IAR Lecture 5 Barbara Webb

Hybrid Learning Architecture for Fuzzy Control of Quadruped Walking Robots

Learning and Using Models of Kicking Motions for Legged Robots

Available online at ScienceDirect. Procedia Computer Science 24 (2013 )

Dipartimento di Elettronica Informazione e Bioingegneria Robotics

EMERGENCE OF COMMUNICATION IN TEAMS OF EMBODIED AND SITUATED AGENTS

Cooperative Behavior Acquisition in A Multiple Mobile Robot Environment by Co-evolution

Swarm Intelligence W7: Application of Machine- Learning Techniques to Automatic Control Design and Optimization

Optic Flow Based Skill Learning for A Humanoid to Trap, Approach to, and Pass a Ball

Adaptive Neuro-Fuzzy Controler With Genetic Training For Mobile Robot Control

NCCT IEEE PROJECTS ADVANCED ROBOTICS SOLUTIONS. Latest Projects, in various Domains. Promise for the Best Projects

* Intelli Robotic Wheel Chair for Specialty Operations & Physically Challenged

Space Exploration of Multi-agent Robotics via Genetic Algorithm

Fuzzy-Heuristic Robot Navigation in a Simulated Environment

On The Role of the Multi-Level and Multi- Scale Nature of Behaviour and Cognition

Associated Emotion and its Expression in an Entertainment Robot QRIO

Keywords Multi-Agent, Distributed, Cooperation, Fuzzy, Multi-Robot, Communication Protocol. Fig. 1. Architecture of the Robots.

Online Evolution for Cooperative Behavior in Group Robot Systems

Behavior Emergence in Autonomous Robot Control by Means of Feedforward and Recurrent Neural Networks

Online Interactive Neuro-evolution

Incorporating a Connectionist Vision Module into a Fuzzy, Behavior-Based Robot Controller

Evolving High-Dimensional, Adaptive Camera-Based Speed Sensors

Simulation of a mobile robot navigation system

RoboCup. Presented by Shane Murphy April 24, 2003

Kid-Size Humanoid Soccer Robot Design by TKU Team

Enhancing Embodied Evolution with Punctuated Anytime Learning

RoboPatriots: George Mason University 2010 RoboCup Team

Decision Science Letters

MULTI-LAYERED HYBRID ARCHITECTURE TO SOLVE COMPLEX TASKS OF AN AUTONOMOUS MOBILE ROBOT

Behavior generation for a mobile robot based on the adaptive fitness function

Multi-Platform Soccer Robot Development System

A Robust Neural Robot Navigation Using a Combination of Deliberative and Reactive Control Architectures

Fuzzy Logic Based Robot Navigation In Uncertain Environments By Multisensor Integration

A Genetic Algorithm-Based Controller for Decentralized Multi-Agent Robotic Systems

A Divide-and-Conquer Approach to Evolvable Hardware

Hierarchical Case-Based Reasoning Behavior Control for Humanoid Robot

FU-Fighters. The Soccer Robots of Freie Universität Berlin. Why RoboCup? What is RoboCup?

Behaviour Patterns Evolution on Individual and Group Level. Stanislav Slušný, Roman Neruda, Petra Vidnerová. CIMMACS 07, December 14, Tenerife

Hierarchical Controller for Robotic Soccer

Robo-Erectus Jr-2013 KidSize Team Description Paper.

Subsumption Architecture in Swarm Robotics. Cuong Nguyen Viet 16/11/2015

Obstacle avoidance based on fuzzy logic method for mobile robots in Cluttered Environment

AGENT PLATFORM FOR ROBOT CONTROL IN REAL-TIME DYNAMIC ENVIRONMENTS. Nuno Sousa Eugénio Oliveira

A SELF-EVOLVING CONTROLLER FOR A PHYSICAL ROBOT: A NEW INTRODUCED AVOIDING ALGORITHM

APPLICATION OF FUZZY BEHAVIOR COORDINATION AND Q LEARNING IN ROBOT NAVIGATION

Learning Reactive Neurocontrollers using Simulated Annealing for Mobile Robots

Evolving Predator Control Programs for an Actual Hexapod Robot Predator

Overview Agents, environments, typical components

Evolution of Efficient Gait with Humanoids Using Visual Feedback

Team Edinferno Description Paper for RoboCup 2011 SPL

Path Planning for Mobile Robots Based on Hybrid Architecture Platform

INTELLIGENT CONTROL OF AUTONOMOUS SIX-LEGGED ROBOTS BY NEURAL NETWORKS

A Lego-Based Soccer-Playing Robot Competition For Teaching Design

Autonomous Robot Soccer Teams

Converting Motion between Different Types of Humanoid Robots Using Genetic Algorithms

Evolution of Sensor Suites for Complex Environments

Robo-Erectus Tr-2010 TeenSize Team Description Paper.

Evolving Controllers for Real Robots: A Survey of the Literature

Using Reactive and Adaptive Behaviors to Play Soccer

Keywords: Multi-robot adversarial environments, real-time autonomous robots

Implicit Fitness Functions for Evolving a Drawing Robot

Evolutionary robotics Jørgen Nordmoen

Publication P IEEE. Reprinted with permission.

NAVIGATION OF MOBILE ROBOT USING THE PSO PARTICLE SWARM OPTIMIZATION

Path Following and Obstacle Avoidance Fuzzy Controller for Mobile Indoor Robots

Evolved Neurodynamics for Robot Control

This list supersedes the one published in the November 2002 issue of CR.

Situated Robotics INTRODUCTION TYPES OF ROBOT CONTROL. Maja J Matarić, University of Southern California, Los Angeles, CA, USA

4D-Particle filter localization for a simulated UAV

The Behavior Evolving Model and Application of Virtual Robots

A Hybrid Evolutionary Approach for Multi Robot Path Exploration Problem

Multi-Agent Control Structure for a Vision Based Robot Soccer System

Development of the Mechatronics Design course

A Hybrid Planning Approach for Robots in Search and Rescue

EVOLUTION OF EFFICIENT GAIT WITH AN AUTONOMOUS BIPED ROBOT USING VISUAL FEEDBACK

Evolving CAM-Brain to control a mobile robot

Human-robot relation. Human-robot relation

Cognitive robots and emotional intelligence Cloud robotics Ethical, legal and social issues of robotic Construction robots Human activities in many

S.P.Q.R. Legged Team Report from RoboCup 2003

An Evolutionary Approach to the Synthesis of Combinational Circuits

Team KMUTT: Team Description Paper

Using Cyclic Genetic Algorithms to Evolve Multi-Loop Control Programs

CS295-1 Final Project : AIBO

Robots in the Loop: Supporting an Incremental Simulation-based Design Process

ZJUDancer Team Description Paper Humanoid Kid-Size League of Robocup 2015

Neural Networks for Real-time Pathfinding in Computer Games

USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER

Service Robots in an Intelligent House

Evolving non-trivial Behaviors on Real Robots: an Autonomous Robot that Picks up Objects

APPLICATION OF FUZZY BEHAVIOR COORDINATION AND Q LEARNING IN ROBOT NAVIGATION

soccer game, we put much more emphasis on making a context that immediately would allow the public audience to recognise the game to be a soccer game.

Transcription:

Proceedings of IEEE International Symposium on Computational Intelligence in Robotics and Automation, Kobe, Japan, 16-20 July 2003 GA-based Learning in Behaviour Based Robotics Dongbing Gu, Huosheng Hu, Jeff Reynolds, Edward Tsang Department of Computer Science, University of Essex Wivenhoe Park, Colchester CO4 3SQ, UK Email: {dgu, hhu, reynt, edward}@essex.ac.uk Abstract: This paper presents a Genetic Algorithm (GA) approach to evolving robot behaviours. We use fuzzy logic controllers (FLCs) to design robot behaviours. The antecedents of the FLCs are pre-designed, while their conseuences are learned using a GA. The Sony uadruped robots are used to evaluate proposed approaches in the robotic football domain. Two behaviours, ball-chasing and position-reaching, are studied and implemented. An embodied evolution scheme is adopted, by which the robot autonomously evolves its behaviours based on a layered control architecture. The results show that the robot behaviours can be automatically acuired through the GAbased learning of FLCs. Keywords: Genetic Algorithms, Evolutionary Robotics, Fuzzy Control, Behaviour-based Robots. 1. Introduction A control system for an autonomous robot has to cope with uncertainty in sensory readings and actuator execution as well as handle dynamic changes in the environment. The traditional robot software architecture uses deliberative reasoning in the form of sensing, planning and action. It is difficult to accommodate sensory uncertainty and the environmental dynamics in such an architecture [4]. Reactive or behaviour-based architectures are better able to handle the problems inherent in the deliberative architecture. The basic component in such an architecture is a group of behaviours. Behaviours directly map sensory information into motor actions without complex reasoning. The mapping enables robots to respond to environmental changes promptly. Behaviours can also operate concurrently to produce emergent behaviours for unknown environments [1]. When designing control strategies for mobile robots, it is impossible to predict all the potential situations robots may encounter and specify all robot behaviours optimally in advance. Predefined control strategies consume large amounts of time and are usually brittle in practice due to noise or the unpredictable nature of the real world. Robots have to learn from, and adapt to, changes in their operating environment. Evolutionary robotics provides an alternative way to design the control system for mobile robots. Many successful paradigms have been demonstrated using neural networks [10], classifier systems [5] and reinforcement learning [11][13][14]. Though evolutionary robotics has been criticised for being potentially slow [12], some examples of embodied evolution have been explored [18]. In this paper, we present a Genetic Algorithm (GA) approach to learning robot behaviours. Behaviours for soccer playing are evolved for a Sony legged robot. Two behaviours, ball-chasing and position-reaching, are studied and implemented. Our behaviour control uses Fuzzy Logic Controllers (FLCs) to implement the mapping from visual information to actions. A FLC represents uncertainty by fuzzy sets and an action is generated cooperatively by several rules that are triggered to some degree, to produce smooth and robust control outputs [2][3][16]. In our design, the FLC antecedents are predefined, including the selection of inputs and the definition of their membership functions. Therefore, the number of the rules is fixed (it is the product of the number of input fuzzy sets). The FLC conseuences are defined as fuzzy singletons, which are basic motion commands for the mobile robot. The GA is employed in the selection of output fuzzy singletons. Evolving FLC for a robot behaviour has been explored by many researchers, such as reported in [3][7][15][17] etc. However, some of them was tested in simulation, or the learning was conducted in simulation first (off-line learning), then tested in real robots. On-line learning was claimed in [15], which adapted on-board sensors to provide the fitness. The approach proposed in this paper is an on-line version and it integrates the external assessment into the learning system to improve the learning efficiency. Furthermore, a finite state machine is employed to coordinate the behaviours to achieve the embodied learning. The rest of this paper is organised as follows. Section 2 describes the learning setting for this research. The learning algorithm is formulated in section 3. Section 4 presents the simulation and experimental results to show the feasibility of proposed the learning algorithm. Finally section 5 provides a brief conclusion and future work.

2. Learning Setting 2.1 The robot and its playing field Sony legged robots are uadruped walking robots that resemble the basic behaviour of dogs. They are controlled by an embedded R4000 microprocessor with over 100 MIPS performance. There are 20 motors in a Sony robot for action. The neck and four legs of each robot have three degree of freedom (DOFs) for looking and walking. The other five are used for the tail (2 DOFs), mouth (1 DOF) motion, and two ears (1 DOF). The main sensors include 20 encoders for motion control of twenty motors, a colour CCD camera, an infrared range sensor, three gyros for posture measurement (roll, pitch, yaw), and touch sensors. Additionally, there is a stereo microphone and a loud speaker for communication [6]. The environment for the Sony Legged Robot League is a playing field 4m in length and 3m in width. Figure 1 shows a top view of the playing field from the overhead camera. The goals are centred on both ends of the field, and are 60cm wide and 30cm high. Six uniue coloured landmarks are placed around edges of the field, with one at each corner and one on each side of the halfway line. Each landmark is painted with two different colours of which pink is either at the top of landmarks on the one-side or at the bottom of landmarks on the other side. These landmarks are used by the robots to localise themselves within the field. The ball, walls, goals, landmarks and robot uniforms are painted with eight different colours distributed in the colour space so that a robot can easily distinguish them. performance at the end of one run, there is no significant effect of the communication on the learning process. 2.3 The Control architecture To control the robots we employ a layered architecture: a walking layer, a behaviour layer and a cognition layer as shown in Figure 3 [8][9]. The walking layer is at the bottom of the architecture. Its task is to implement basic walking operations. It can respond to a number of walking commands issued by the middle layer, i.e. the behaviour layer. The walking commands are defined as a set C={c k, k=1,..,k), including MOVE FORWARD, LFFT FORWARD, RIGHT FORWARD, LEFT TURN, RIGHT TURN, and STOP. The walking layer generates the discrete walking commands in term of the motor encoding readings. The vertical connection between the walking layer and the behaviour layer is represented by the selection (the arrows in the figure 3). The state space of the walking layer shown in figure 3 consists of the encoding readings. The grids in the layer are separated by the multiple dimensions and represent the walking commands. 2.2 Learning environment Although the onboard sensors can provide the results of the interaction between a robot and its environment, the perception aliasing is severe for the large perception space and high noised sensors. To evaluate the performance of robot behaviours, the environment should provide payoffs to robots with a certain accuracy to improve the learning efficiency. A global monitor is set up in our laboratory (see figure 2), which includes an overhead camera, a desktop computer, and visual tracking software, to provide the external judgement for the interaction between the robot and its environment. The function of the monitor is to feed the position information of the robot and the ball to the robot. Then, the robot can autonomously test its control strategies and evaluate the results. The monitor recognises the robot and the ball according to their colours. Through the image processing, the monitor updates their position continuously. The robot can ask for the information at any time during its learning process. The communication is achieved through Internet where the monitor system acts as a server and the robot act as a client. The sever provides the global information when the client has a reuest. Since the robot only evaluates its Fig. 1 The top view of the playing field from an overhead camera Fig. 2 The learning environment The middle layer is the behaviour layer that provides a group of behaviours to the top layer, i.e. the cognition layer. The behaviours include ball-chasing, obstacle-

avoiding, position-reaching, ball-dribbling, ball-kicking, etc. The feature states extracted by the robot's local camera constitute the state space of the behaviour layer shown in the figure 3. These features include the robot's position, the relative angle and distance from the robot to the ball and the goal. A grid denotes a behaviour in figure 3. Different grids may have different sizes or different dimensions. At the top, the cognition layer co-ordinates these behaviours to achieve a given task. The feature states in the cognition layer are more abstract predicated states. These predicated states are denoted by binary values, for instance, whether or not the ball is found. A grid represents a combination of the predicated states. A discrete event system model can be used to formulate the behaviour coordination. Abstraction of states and actions Cognition Layer Behaviour Layer There is only one output variable in each of the behaviours, which corresponds to a walking command. There are K walking commands denoted by c k (k=1,,k) which can be used for the output. K fuzzy singletons are defined as the fuzzy output. The mth fuzzy rule (m=1,,m) is denoted as: km R m : IF s 1 AND s 2 AND s N, THEN a is c k m c is the kth fuzzy singleton in mth rule. The where crisp output a, stimulated by an input state S after fuzzy reasoning, is calculated by the centre of area (COA) method, i.e. 3.2 The GA a = B(S) In this paper, a FLC or a behaviour is viewed as an individual. A population includes a group of FLCs. The running of the robot with the FLCs is the evaluation process. As the antecedents of an FLC are pre-defined, only the FLC conseuences are encoded as chromosomes. There are M rules in one FLC, meaning there are M fuzzy conseuences in one FLC. Therefore, one chromosome has M genes. The first gene corresponds to the first rule s conseuence. Each gene could be one of fuzzy singletons c k and illustrated in figure 3. k 1 c k 2 c k M c Walking Layer Fig. 3 A layered architecture The learning in this research occurs within the behaviour layer where individual behaviours need to designed to map the noisy feature states to unperfected walking actions. The walking layer provides a substrate for the entire system. The cognition layer is used in this research to provide a mechanism that co-ordinate the behaviours. And it also provides an opportunity for the robot to learning different behaviours continuously without intervention. For example, the robot can start a run of position-reaching behaviour after a run of the ball-chasing behaviour without repositioning the robot by operators. 3. Learning Algorithms 3.1 The FLC A behaviour is a mapping from sensory data or an environment state vector S to a walking action a. It can be expressed as a = B(S) where B is the mapping function. The FLC can be used to implement the function B [6]. Assuming there are N feature states for a behaviour B, i.e. there are N input state variables s i (i=1,, N) in state vector S. For each input state variable s i, L i fuzzy sets are defined. The total number of fuzzy rules is denoted as M. Fig. 4 One chromosome The operations used in the GA include: Initialisation: The first generation is initialised randomly. Each gene in each chromosome is chosen from the K fuzzy singletons evenly. Elitist: The best individual in current generation is automatically copied into next generation. Selection: Individuals are copied into next generation as their offspring according to their fitness values. The individuals with higher fitness values have more offspring than those with lower fitness values. Crossover: The crossover will happen for two individuals in offspring with the crossover probability p c. One point crossover is used to exchange the genes. Mutation: The mutation is taken for one gene of an offspring with the mutation probability p m. The operator randomly chooses one fuzzy singleton from the allowed set to replace the current gene. 4. Experiments 4.1 The behaviour models A simple version of the robot architecture shown in figure 3 includes four behaviours: ball-chasing, positionreaching, obstacle-avoiding, and ball-searching. The

obstacle avoiding behaviour simply avoids the edges and the goals of the playing field. The ball-searching behaviour is to scan the playing field to find the ball. Some heuristic rules are used to design these two behaviours. They are just used to help to learn other two behaviours and not learned in this paper. In ball-chasing behaviour, the ball distance, the ball angle, and the goal angle related to the robot's heading are chosen as the feature states. Three fuzzy sets are defined for each of them. In the position-reaching behaviour, the target distance and angle related to the robot are chosen as the feature states. These two states are calculated from the target co-ordination, the robot's co-ordination, and the robot's heading. Again three fuzzy sets are defined for both of them. The predicated states in the cognition layer are p 1 (the ball is found), p 2 (obstacles are found), p 3 (the ball is near enough), p 4 (the target is near enough), p 5 (the ball-chasing behaviour is timed-out), and p 6 (the position-reaching behaviour is timed-out). The transition of the behaviour is expressed as a finite state machine (see figure 5). p 4 p 6 Ball chasing Position reaching p 3 p 5 p 2 p 1 p 6 p 2 p 5 Ball searching Obstacle avoiding Fig. 5 A behaviour transition model The initial behaviour is the ball-searching. When the ball is found, the system starts to learn a ball-chasing behaviour. After one run, the system is transited to the learning of a position-reaching behaviour. Then, learning alternates between these two behaviours in order to keep the learning continuously without external intervention. If obstacles are found, the robot ends the current learning individual to avoid the obstacles until no obstacles are found. p 2 f(t) = (1-distance/3000)*(1-angle/180)*(1- time/maximum time) There are three items defined in the fitness function: the final distance between the robot and the ball, the final angle between robot heading and the line connecting the robot and the ball, the time spend on ball-chasing behaviour. All three items are normalised to 1. Position-reaching behaviour: f(t) = (1-distance/3000)*(1-angle/180)*(1- time/maximum time) Three items are defined in the fitness function: the final distance between the robot and the desired position, the final angle between the robot heading and the line connecting the robot and the position, the time spent on position-achieving behaviour. All three items are normalised to 1. 4.3 Results A simulator of the experiment environment was also developed in order to verify the algorithms and decrease learning time. The simulator is constructed from statistical samples of real robot motion. Ball-chasing behaviour In simulation, the population size is 10, the crossover probability is 0.2, and the mutation probability is 0.1. After 30 generations, the average fitness values and the standard deviations are shown in figure 6. The fitness values are gradually increased and finally converge to a high value. The dropdowns in the middle of the curve indicate the solution exploration procedures by the mutation and crossover operators. The decreasing of the standard deviations reflects ten individuals finally tend to have the same genes. So their behaviours tend to be same. The best FLC was picked up from the last generation to do the test. Figure 7 shows the behaviour was successfully evolved. The robot can move to a ball (in the middle of the playing field). 4.2 Fitness Functions The fitness functions f(t) are defined with respect to robot behaviours. Ball-chasing behaviour: Fig. 6 evolving ball-chasing behaviour in simulation

Fig. 7 The ball-chasing behaviour test in simulation In a real robot, the parameters of GA are same with the simulation. After 20 generations, the average fitness values and the standard deviations are shown in figure 8. The average fitness values again are gradually increased and converge to a high value. And the exploration of solution space is illustrated by down and up. The standard deviations did not converge to zero, but they are not diffuse. This was caused by many factors in the real situation. For example, the vision-tracking algorithm could fail to track the ball, the robot could slip on the pitch, the monitor could provide an inaccurate position etc. However, the GA still can find a good FLC that can move the robot to a ball. Figure 9 is a test of the ball-chasing behaviour in a real robot with the best FLC in last generation. The behaviour was successfully acuired. Position-reaching behaviour The GA algorithm and its parameters in this behaviour are the same as that in the ball-chasing behaviour. Figure 10 shows the average fitness values and the standard deviations. The average fitness values climb to a high value and the standard deviations converge to small values. Figure 11 shows a test result using the FLC picked up from the last generation with the highest fitness value. The robot faced the right side at the beginning. It can manage to turn left and move to the target (denoted as a cross in the figure). Fig. 9 The ball-chasing behaviour test in a real robot Fig. 10 The position-reaching behaviour evolving in simulation Fig. 11 The position-reaching behaviour test in simulation Fig. 8 Evolving ball-chasing behaviour in a real robot Fig. 12 The position-reaching behaviour evolving in a real robot

Fig. 13 The position-reaching behaviour test in a real robot The evolving of the real robot is shown in figure 12 where the same convergence result was obtained. Although the standard deviations are large due to un-modelled uncertainty, the FLC selected from the last generation is still successful and effective. The testing result is shown in figure 13. We can see that the target is denoted by a cross and the robot can move to this target. 5. Conclusions and Future Work We believe our results showed that it is feasible to use GA learning in behaviour-based robot control. This is because the learning task can be decomposed into the learning of individual behaviours. Each robot behaviour can be defined as an FLC. We have shown how a GA can be used to evolve the FLCs. The antecedents of these FLCs have been pre-defined and their conseuences were left for automatic acuisition. The learning scheme addressed in this paper focused on the embodied evolution, which involves both acuiring the payoffs from on-board sensors and external sensors and learning different behaviours continuously without external intervention. The experiments in both simulation and real robots showed that the behaviours could be acuired through this evolving procedure in an efficient way. Our future work will be focused on how to transfer the results evolved in the simulation to real robots in order to speed up the learning process. The ability to learn or refine antecedents is also needed. References [1] R. C. Arkin, Behaviour-based Robotics, The MIT Press, 1998. [2] H. R. Beom and H. S. Cho, A Sensor-based Navigation for a Mobile Robot Using Fuzzy Logic and Reinforcement Learning, IEEE Trans. on SMC, Vol. 25, No. 3, pp464-477, 1995. [3] A. Bonarini, Evolutionary Learning of Fuzzy rules: competition and cooperation, In Pedrycz, W. (Ed.), Fuzzy Modelling: Paradigms and Practice, Kluwer Academic Press, Norwell, MA, pages 265 284, 1997. [4] R. Brooks, A Robust Layered Control System for a Mobile Robot, IEEE Journal of Robotics and Automation, Vol. RA- 2, No. 1, pages 14-23, 1986. [5] M. Dorido, and M. Colombetti, Robot Shaping: An Experiment in Behaviour Engineering, The MIT Press, 1998. [6] M. Fujita, Development of an Autonomous Quadruped Robot for Robot Entertainment, Autonomous Robots, Vol. 7, pages 7-20, 1998. [7] J. Grefenstette and A. Schultz, An Evolutionary Approach to Learning in Robots, Machine Learning Workshop on Robot Learning, New Brunswick, NJ, 1994. [8] D. Gu and H. Hu, Evolving Fuzzy Logic Controllers for Sony Legged Robots, Proceedings of the RoboCup 2001 International Symposium, Seattle, Washington, 4-10 August 2001. [9] H. Hu and D. Gu, Reactive Behaviours and Agent Architecture for Sony Legged Robots to Play Football, International Journal of Industrial Robot, Vol. 28, No. 1, ISSN 0143-991X, pages 45-53, 2001. [10] P. Husbands and I. Harvy, Evolution versus Design: Controlling Autonomous Robots In Integrating Perception, Planning and Action: Proceedings of 3rd Annual Conference on Artificial Intelligence, Simulation and Planning, IEEE Press, pages 139-146, 1992. [11] S. Mahadevan and J. Connell, Automatic Programming of Behaviour-based Robots Using Reinforcement Learning, Artificial Intelligence, Vol. 55, pages 311-365, 1991. [12] M. Mataric and D. Cliff, Challenges in Evolving Controllers for Physical Robots, Robotics and Autonomous Systems, Special Issue on Evolutionary Robotics, 19 (1), pages 67-83, 1996. [13] D.E. Moriarty, A.C. Schultz and J.J. Grefenstette, Evolutionary Algorithms for Reinforcement Learning, International Journal of Artificial Intelligent Research, 11, pages 241-276, 1999. [14] S. Nolfi and D. Floreano, Learning and Evolution, Autonomous Robots, 7(1), pages 89-113, 1999. [15] A. Ram, R. Arkin, G. Boone, and M. Pearce, Using Genetic Algorithms to Learn Reactive Control Parameters for Autonomous Robotic Navigation, Adaptive Behaviour, Vol. 2, No. 3, pages 277-303, 1994. [16] A. Saffiotti, E. H. Ruspini, and K. Konolige, Using Fuzzy Logic for Mobile Robot Control, in Zimmermann, H. J., editor, Practical Applications of Fuzzy Technologies, Kluwer Academic Publisher, pages 185-206, 1999. [17] L. Steels, Emergent Functionality in Robotic Agents through On-line Evolution, In: Brooks, R. and Maes, P. (eds) Artificial Life IV. Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, pages 8-14, Cambridge, MA: The MIT Press, 1994. [18] R. Watson, S. Ficci and J. Pollack, Embodied Evolution: Embodying An Evolutionary Algorithm in A Population of Robots, In Michalewicz, Schoenauer, Yao, and Zalzala, (eds.), Proceedings of Congress on Evolutionary Computation, 1999.