Creating autonomous agents for playing Super Mario Bros game by means of evolutionary finite state machines

Creating autonomous agents for playing Super Mario Bros game by means of evolutionary finite state machines A. M. Mora J. J. Merelo P. García-Sánchez P. A. Castillo M. S. Rodríguez-Domingo R. M. Hidalgo-Bermúdez

Preview!

Input of environment Two 19x19 matrices given with Mario in the center: One containing enemies One containing area

Mario Modes Small: In this mode Mario is smaller than in the others. If an enemy strikes him, Mario dies. He can t crouch. Big: This is Mario s intermediate mode. Mario reaches this mode if he is in Fire state and an enemy touches him, or devouring a mushroom in the Small state. Fire: In it Mario can shoot fireballs. Mario can reach this mode if he takes a fire flower (it only appears if Mario is in Big mode).

Ways to lose Fall off a cliff Get hit in by an enemy in small mode Out of time

Enemies Enemies who can die by a fireball strike or if Mario jumps on them and crushes them or if Mario is carrying a koopa shell and throws it at the enemies: Goomba, Goomba Winged, Green Koopa, Red Koopa, Koopa Winged Green, Red Koopa Winged. Enemies who only die by a fireball strike: carnivorous flowers that appear in the pipes. Enemies who only die when Mario jumps on them and crushes them: the cannon balls. Enemies who only die if Mario launches a koopa shell against them: the Spiky and Spiky Winged.

Output Output is any combination of: left, right, down, fire/speed and jump Must be given within 40 miliseconds Hard limit Offline learning

GA with a Finite State Machine Represents states and connections between them Based on sensors (receives inputs) and on actuators (performs response action) Inherited using a genetic algorithm

Representation An individual is a FSM Each state has a table of input output pairs The input are the situations mario can encounter The output is a set of Mario actions.

Input problems The amount of possible inputs is large Transform inputs into fuzzy values Still large? Output states for new entries will be randomly set

Fitness function Mono-seed Single stage per individual Multi-seed 30 stages per individual Fitness function of the competition: Fitness = win * 1024 + stat * 32 + enems * 42 + coins * 16 + remtime * 8 + cells

Generic Fitness function If(win==1) Fitness = win * 200 + stat * 300 + (powers - numcollis) * 200 + (enems / numenems) * 300 + (coins / numcoins) * 50 + (remtime / tottime) * 50 Else Fitness = MIN FITNESS + (cells / totcells) * 600 + ((cells / totcells) * (spetime / tottime)) * 200 + (enems / numenems) * 150 + (coins/numcoins)

Selection Ordering is hierarchical in case of the multi-seed approach (%out of time or fallen, average % stages complete, generic fitness) Parent pool: best individual, percentage of the best individuals, selected by tournament selection.

Crossover and Mutation Crossover is done on the best individual and one other parent with positive fitness. They generate multiple children each with uniform crossover. Mutation, randomly select a certain percentage of individuals, various output states in each individual are randomly changed. The new population consists of: the best solution, the offspring and random individuals.

Experiments Single run takes 40-50s 25 min for a single evaluation for multi-seed FSM often too big for memory

Experiments (mono-seed) Changed FSM implementation for memory issues Mono-seed agents are good for one specific stage

Experiments (mono-seed)

Experiments (multi-seed) Better CPU and more memory Reduced number of states Competent agent for difficulties 0-4 More stages evaluated, higher adaptation Training time grows exponentially with difficulty

Experiments (multi-seed)

Conclusion Mono-seed can solve all difficulty levels But is only good at one stage Multi-seed is more adaptive But requires too much memory to be trained beyond difficult 4

Discussion Couple of unfinished sentences Not all methods choices are justified No explanation of optimal parameter selection Adding random solutions to each population No comparison with other approaches

Discussion Unexplained runtime results Representation not completely explained Uniform crossover maybe too disruptive No fitness results for multi-seed