Swarm Intelligence W7: Application of Machine- Learning Techniques to Automatic Control Design and Optimization

Learning to avoid obstacles Outline Problem encoding using GA and ANN Floreano and Mondada experiment Dealing with noise and comparison with PSO Noise-resistant GA Noise-resistant PSO Pugh et al. systematic study Moving beyond obstacle avoidance Learning of more complex behaviors HW&SW co-design Specific learning issues in collective systems

Learning to Avoid Obstacles by Shaping a Neural Network Controller using Genetic Algorithms

Evolving a Neural Controller N i f(x i ) w ij i synaptic weight f ( x) 2 1 I j input O i output neuron N with sigmoid transfer function f(x) O = x i = = x m f j= 1 ( x i 1+ e w ij I ) j + I 0 S 1 S 2 S 3 S 4 S 5 S 8 S 7 S 6 M 1 M 2 inhibitory conn. excitatory conn. Note: In our case we evolve synaptic weigths but Hebbian rules for dynamic change of the weights, transfer function parameters, can also be evolved (see Floreano course)

Evolving Obstacle Avoidance (Floreano and Mondada 1996) Defining performance (fitness function): Φ = V ( 1 V )(1 i) V = mean speed of wheels, 0 V 1 v = absolute algebraic difference between wheel speeds, 0 v 1 i = activation value of the sensor with the highest activity, 0 i 1 Note: Fitness accumulated during evaluation span, normalized over number of control loops (actions).

Evolving Robot Controllers Note: Controller architecture can be of any type but worth using GA/PSO if the number of parameters to be tuned is important

Evolving Obstacle Avoidance Evolved path Fitness evolution

Evolved Obstacle Avoidance Behavior Generation 100, on-line, off-board (PC-hosted) evolution Note: Direction of motion NOT encoded in the fitness function: GA automatically discovers asymmetry in the sensory system configuration (6 proximity sensors in the front and 2 in the back)

Noise-Resistant GA and PSO for Design and Optimization of Obstacle Avoidance

Noisy Optimization Multiple evaluations at the same point in the search space yield different results Depending on the optimization problem the evaluation of a candidate solution can be more or less expensive in terms of time Causes decreased convergence speed and residual error Little exploration of noisy optimization in evolutionary algorithms, and very little in PSO

Key Ideas Better information about candidate solution can be obtained by combining multiple noisy evaluations We could evaluate systematically each candidate solution for a fixed number of times not smart from computational point of view In particular for long evaluation spans, we want to dedicate more computational power/time to evaluate promising solutions and eliminate as quickly as possible the lucky ones each candidate solution might have been evaluated a different number of times when compared In GA good and robust candidate solutions survive over generations; in PSO they survive in the individual memory Use aggregation functions for multiple evaluations: ex. minimum and average

GA PSO

Example: Gaussian Additive Noise on Generalized Rosenbrock Fair test: same number of evaluations candidate solutions for all algorithms (i.e. n generations/ iterations of standard versions compared with n/2 of the noise-resistant ones)

A Systematic Study on Obstacle Avoidance 3 Different Scenarios Scenario 1: One robot learning obstacle avoidance Scenario 2: One robot learning obstacle avoidance, one robot running pre-evolved obstacle avoidance Scenario 3: Two robots co-learning obstacle avoidance PSO, 50 iterations, scenario 3 Idea: more robots more noise (as perceived from an individual robot); no standard com between the robots but in scenario 3 information sharing through the population manager!

Fair test: same number of evaluations of candidate solutions for all algorithms (i.e. n generations/ iterations of standard versions compared with n/2 of the noise-resistant ones) Results Best Controllers

Results Average of Final Population Fair test: idem as previous slide

Results Scenario 1, Population Fitness Evolution Fair test: idem as previous slide

Not only Obstacle Avoidance: Evolving More Complex Behaviors

Evolving Homing Behavior (Floreano and Mondada 1996) Set-up Robot s sensors

Evolving Homing Behavior Fitness function: Controller Φ = V ( 1 i) V = mean speed of wheels, 0 V 1 i = activation value of the sensor with the highest activity, 0 i 1 Fitness accumulated during life span, normalized over maximal number (150) of control loops (actions). No explicit expression of battery level/duration in the fitness function (implicit). Chromosome length: 102 parameters (real-to-real encoding). Generations: 240, 10 days embedded evolution on Khepera.

Evolving Homing Behavior Fitness evolution Evolution of # control loops per evaluation span Battery recharging vs. motion patterns Battery energy Left wheel activation Right wheel activation Reach the nest -> battery recharging -> turn on spot -> out of the nest

Evolved Homing Behavior

Not only Control Shaping: Off-line Automatic Hardware-Software Co- Design and Optimization

Moving Beyond Controller-Only Evolution Evidence: Nature evolve HW and SW at the same time Faithful realistic simulators enable to explore design solution which encompasses off-line coevolution (co-design) of control and morphological characteristics (body shape, number of sensors, placement of sensors, etc. ) GA (PSO?) are powerful enough for this job and the methodology remain the same; only encoding changes

Evolving Control and Robot Morphology (Lipson and Pollack, 2000) http://www.mae.cornell.edu/ccsl/research/golem/index.html Arbitrary recurrent ANN Passive and active (linear actuators) links Fitness function: net distance traveled by the centre of mass in a fixed duration Example of evolutionary sequence:

Examples of Evolved Machines Problem: simulator not enough realistic (performance higher in simulation because of not good enough simulated friction; e.g., for the arrow configuration 59.6 cm vs. 22.5 cm)

From Single to Multi-Unit Systems: Co-Learning in a Shared World

Evolution in Collective Scenarios Collective: fitness become noisy due to partial perception, independent parallel actions

Credit Assignment Problem With limited communication, no communication at all, or partial perception:

Co-Learning Collaborative Behavior Three orthogonal axes to consider (extremities or balanced solutions are possible): Individual and group fitness Private (non-sharing of parameters) and public (parameter sharing) policies Homogeneous vs. heterogeneous systems Example with binary encoding of candidate solutions

Co-Learning Competitive Behavior fitness f 1 fitness f 2

Co-Learning in a Competitive Framework

Co-Evolution of Competitive Behavior No credit assignment problem: individual fitness! Example: co-evolution of prey s and predator s controllers Φ prey = T Φ predator = 1-T T = normalized time of survival of the prey (before is caught)

Prey-Predator Experiment (Nolfi and Floreano, 1998)

Prey-Predator Experiment Fitness co-evolution in simulation Fitness co-evolution in real robots

Co-learning in a Collaborative Framework

Learning to Aggregate See Lab+Hwk 5 ANN architecture, 2 neurons Parameter space larger: range and bearing information added but only the average over all detected robots within a given range fed to the ANN. Number of parameters: Obstacle avoidance (proximity-to-motors+lateral+recursive+bias): 16+2+2+2 = 22 weights Aggregation (obstacle avoidance + range&bearing-to-motors): 22 + 4 = 26 weights Individual fitness: number of robots around robot i; group fitness: average over all the measurements taken by individual robots Individual & Public & Heterogeneous vs. Group & Public & Homogeneous Preliminary results: Heterogeneous PSO/GA faster (exploit multi-robot parallel platform) Homogeneous PSO/GA potential higher fitness at the end but slower No major benefit of enforced homogeneity since in this case individual and group fitness very much aligned and only limited number of iteration/generation considered

Learning to Pull Sticks Homogeneous and heterogeneous learning Diversity & specialization Simple in-line adaptive learning algorithm All applied to the stick-pulling case study See next week after lecture on multi-level modeling!

Additional Literature Week 7 Books Nolfi S. and Floreano D., Evolutionary Robotics: The Biology, Intelligence, and Technology of Self-Organizing Machines. MIT Press, 2004 Sutton R. S. and Barto A. G., Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA, 1998. Papers Lipson, H., Pollack J. B., "Automatic Design and Manufacture of Artificial Lifeforms", Nature, 406: 974-978, 2000. Murciano A. and Millán J. del R., "Specialization in Multi-Agent Systems Through Learning". Biological Cybernetics, 76: 375-382, 1997. Dorigo M., Trianni V., Sahin E., Groß R., Labella T., Nolfi S., Baldassare G., Deneubourg J.-L., Mondada F., Floreano D., and Gambardella L.. Evolving Selforganising Behaviours for a Swarm-bot. Autonomous Robots, 17:223 245, 2004 Mataric, M. J. Learning in behavior-based multi-robot systems: Policies, models, and other agents. Special Issue on Multi-disciplinary studies of multi-agent learning, Ron Sun, editor, Cognitive Systems Research, 2(1):81-93, 2001. Nolfi S. and Floreano D. Co-evolving predator and prey robots: Do 'arm races' arise in artificial evolution? Artificial Life, 4 (4): 311-335, 1999. Antonsson E. K, Zhang Y., and Martinoli A., Evolving Engineering Design Trade- Offs. Proc. of the ASME Fifteenth Int. Conf. on Design Theory and Methodology, September 2003, Chicago, IL, USA, paper No. DETC2003/DTM-48676.