Improving AI for simulated cars using Neuroevolution

Improving AI for simulated cars using Neuroevolution Adam Pace School of Computing and Mathematics University of Derby Derby, UK Email: a.pace1@derby.ac.uk Abstract A lot of games rely on very rigid Artificial Intelligence techniques such as Finite-State machines that get overly complex as the game worlds themselves do. This often leads to very static AI behaviour that is the same across a number of characters. Through the use of Neural Networks this paper hopes to show how these rigid controllers can be replaced with a more intelligent solution that gives better visual behaviour and can provide greater variety. Keywords Neuroevolution, Genetic Algortihms, Neural Networks I. INTRODUCTION Creating believable Artificial Intelligence for games is a becoming a more important task as games seek further realism, not only is it becoming more important but its also becoming much harder. As the complexity of our game environments increases, such as through improved physics, larger environments etc. the AI needed becomes more complex too as it needs to factor in these new variables. A lot of games within the industry rely on relatively simple approaches to their AI, using Finite State Machines mixed with smoke and mirror techniques to create the illusion of intelligence. Obviously there are many notable exceptions to this who use more advanced techniques such as F.E.A.R that utilised STRIPS planning [1] or the Halo series with its use of behaviour trees [2] that create varied behaviours for multiple characters. Still though it is acceptable for game AI to cheat or uses workarounds to create an illusion of greater intelligence for the player, as both of these titles do, unlike with traditional AI. Even with a relatively simple domain, that consists of a top down view of a grid-road system, through the addition of realistic, wheel driven physics creating AI to simply navigate from point A to point B already starts to get a little more complex when they only have the ability to accelerate, brake or steer, trying to avoid collisions and add more varied behaviour becomes increasingly difficult and time consuming. What If we could get the machine to learn how to drive a car along a pre-defined path, react to different circumstances and the physics in play in the environment, or even to work out its own path as well? This would give us much more power to create believable AI that deals with complex environments. This is something we wish to investigate, how we can use Evolutionary Algorithms to evolve Artificial Neural Networks (ANN) which will act as the drivers of these cars. II. A. Neural Networks TECHNICAL BACKGROUND An Artificial Neural Network is a programmers attempt at recreating a real brain and how it deals with processes, but in a much smaller simplified way. Like a brain a Neural Network is made up of a number of neurons and connections, the network takes a series of input values which are passed along the connections and through the network to create a set of outputs. Neural Networks are very good at pattern recognition and can be applied to a number of fields such as data processing and robotics or Artificial Intelligence [3]. Fig. 1. layer. Simple Feed-Forward Neural Network design with just one hidden As Figure 1 shows, a NN consists of a layer of input neurons, any number of hidden layers containing any number of neurons, in this case just one layer of 3 neurons, and finally an output layer. The network in the diagram is fully connected, meaning every node is connected to every other in the next layer, though this doesnt always have to be the case. The connections are all weighted, each neuron sums the values that each connection gives it, after it has been modified by the connections weight, before passing this value through its activation function and onto all of its connected neurons. The output layer does exactly the same, but holds onto its value which will form the set of outputs that will be used by some form of controller or whatever is making use of the network to react. As we mentioned each of these neurons has an activation function that its output must go through. This activation function is meant to represent the firing mechanism of an actual neuron and these can be of two forms. They can either respond

or not respond, i.e. always firing either 0 or 1 or can react with a gradient, where a stronger input can cause a stronger signal to be fired, these would then be able to fire any value between 0 and 1 for example. This have many benefits for use in Artificial Neural Networks by providing how much a controller should react rather than just a react or dont react output. Typical activation functions that are used can be a normalized Sigmoid function [4] which outputs a value between 0 and 1 and if its output were to be plotted would form a simple S shape, not firing whilst its input was at 0 and quickly escalating at first before easing into constantly firing. This function has its downfalls for certain Networks, especially that it doesnt output negative values. The other most common function used is a hyperbolic tangent, tanh [5], which can either output between -1 and 1 or be translated to the 0 and 1 range. One typical way a Neural Network is trained is by observing its performance against a set of data, and using what you know the result should be to feed back the error into the network to try and correct it. This is known as supervised learning. The other way, with which we have more experience and will be using as the basis for this paper is through the use of Evolutionary Algorithms. B. Evolutionary Algorithms Evolutionary Algorithms (EA) are a subset of Evolutionary Computation that use selection and modification methods that are inspired by real biological concepts. What is now referred to as evolutionary algorithms emerged in the 50s and 60s, however it was Hollands 1975 work [6] where the idea of Genetic Algorithms (GA) originated. A Genetic Algorithm borrows from biological evolution to essentially find solutions to a given problem through a number of techniques. A GA consists of a population of chromosomes, each chromosomes is a potential solution to the problem trying to be solved. In this instance we could use a chromosome to represent the weightings for the connections of a Neural Network, but they could represent any number of things. A GA operates over a number of cycles, referred to as generations, once again lending the biological terminology. At the beginning of each generation a new population of chromosomes is created, the way this is done can differ but it essentially involves using chromosomes from the previous generation to act as parents for the new generation. How these parents are selected and how they create the offspring varies across different techniques. So evolutionary algorithms rely on modifying this ever changing population of potential solutions, but how do they do this? Firstly a key concept is the use of a fitness score, each individual solution is evaluated at how well it solves the problem through the use of a fitness function and its performance graded. This fitness function might be as simple as grading how close to a target the individual manages to get, or it may be slightly more complex and take into a number of different factors such as how many collisions did it occur and how long did it take. How fitness is calculated can obviously have a very large impact on the resulting chromosomes, and the algorithm will often manage to come up with high fitness individuals that whilst being high fitness according to the given fitness function doesnt behave exactly as the designer intends! This fitness is used in order to influence the selection of parents for the next generation, a popular method often used is known as Roulette selection. Unlike other methods where maybe the top 5 percent of individuals are chosen this allows all chromosomes a chance to be chosen, though it is proportionate to the chromosomes fitness. A random number is generated between 0 and the total fitness of the population, then the population is iterated over, adding up the total fitness as it goes. The chromosome where the total fitness becomes greater than the value randomly generated is the chromosome to be selected. This is then repeated until the new generation is created [7]. This technique is good in that weaker individuals that might actually have something to offer have a chance at being selected and we get to keep some variety. Some parents may also be preserved in the new generation, known as elitism this was presented by De Jong [8] as a way to prevent the destruction of high fitness individuals. Once the parents are selected we somehow need to make them reproduce. This is done through the use of Genetic Operators, and are once again based on how reproduction actually works in biology. The main approach is to use crossover between two parent chromosomes. For crossover a point is randomly selected somewhere along the length of the chromosomes, all the values up to this point are taken from Parent A and all the values after this point are taken from Parent B to form Offspring A. The opposite is done, using the same crossover point for Offspring B. This is known as single point crossover and is a simplification of how we are created as a combination of our parents, its possible to use any number of crossover points, though single-point is the most common form. This technique alone wouldnt get us too far, we need to add some variance to the population too else we will just keep mixing the same values. This is done through mutation. For each new chromosome there is a small chance that each of its values might be mutated a small amount. This is often achieved by applying a Gaussian Blur to a value that is to be mutated, very slightly altering it. Through this we create more variance within the population and help discover new and potentially better solutions. III. PREVIOUS WORK Creating intelligent controllers for cars has been a popular topic among researchers, both for simulated and RC vehicles. A lot of different work has been done for the racing domain where the emphasis is on speed, either how many lap a controller can manage in a given time, or just how fast a lap it can manage. Numerous competitions have even been hosted with this in mind [9]. Work by Togelius and Lucas [10] explored using different evolutionary controllers for simulated car racing, concluding that a sensor based Neural Network gave the most promising results. Further work [11] explored these ideas in greater depth and managed to evolve generalized racing controllers. The controllers they used were equipped with sensors that could read the environment around the car and work out distances to walls or the edge of the track, in some tests these sensors themselves could be evolved alongside the network allowing for cars with further reaching sensors at the loss of fidelity or

differently angled sensors. The controllers were also given the speed, waypoint angle and a bias to be able to navigate. In order to evolve generalised controllers that could race well on tracks they hadnt encountered yet they needed to be trained on a selection of different tracks in order to evolve a more generalised behaviour. The interesting way this was approached was to slowly introduce new, harder tracks as the fittest individual gained proficiency on the tracks already in the training set (initially one). A good example of a game that actually puts this to use is Forza. Forzas Drivatar [12] AI system uses neural networks to control its AI drivers. Each driver uses a different NN and as such each driver has slightly different skill levels and driving tendencies, such as how they take a particular corner or how aggressively they try to overtake. IV. THE GOAL As weve already discussed we wish to try and use Neural Networks to improve our AI. We have a very simple domain which is a top-down view of a section of streets laid out in a grid like pattern like pattern, i.e. all corners are right angles. In previous work we created cars that simulated real physics in order to move, creating a force that in turn drove the rear wheels in move the car forward as well as calculating accurate angular velocity in order to steer the car. As such it becomes much harder for AI, as well as players to be able to drive these cars as they now have to take into account a number of new factors, such as speed and resistance. Originally a Hierarchical Finite State Machine (HFSM) was used in order to control the navigation of these cars and whilst it was relatively well performing, managing to control 50 individual cars on the map at once whilst maintaining a relatively good frame rate it certainly had a number of drawbacks. For one it was already starting to get cumbersome with around 6 or 7 states already, and it didnt manage to avoid car-on-car collisions only attempt to slow down if another car was too close in front, though with more work this could have been added. What was more of a problem was that every single car drove the exact same way, and they could only really handle the exact physics they were built for. If the roads resistances changed slightly or the cars speed changed then the map turned to a mess of cars all over the place. The goal here is to swap out the HFSM the cars currently use for an Artificial Neural Network that hopefully we can train to be much better. For one they should be able to deal with varying physics, and provide better human like variety. A. Initial Evolution V. RESULTS We began our investigations small aiming to evolve a car that could navigate a small section of our map, this section was rather short and comprised a single right turn and a two left turns. All of our Neural networks have just two outputs, the first relates to acceleration and braking using positive and negative outputs and the second is steering, where a negative value means turn left and a positive value means turn right. These outputs are directly passed into the control functions so have a direct control on how much to steer/accelerate. We took from Togelius and Lucas and used a sensor based approach for our controllers. The Network that we had most success with comprised of 6 inputs and 4 hidden layers of 5 neurons each. Our 6 inputs were the angle difference from the cars rotation to the next waypoint and then the angle to the waypoint after, the cars speed and three collision sensors. The three sensors measured the distance to a blocked tile up to 25 pixels away. Each sensor started at the front corner of the car, one on the left and one on the right and travelled along that 45 degree angle, with the third travelling directly forward. These inputs varied greatly until we settled upon this combination, the second angle only being added in when cars often failed at wider corners due to not being able to see far enough ahead and not reaching the waypoint quick enough. It makes sense that the car should be able to see more than directly in front of it. Our controllers are given a path to the destination using an A* path finding algorithm, and the controller must work its way along the path, where every tile is a way point. Its possible for the controller to count being next to the appropriate tile as having reached a waypoint so travel doesnt need to be that precise. Waypoints can also be skipped altogether, reaching a further waypoint moves the progress along to that point, regardless of other points reached. This is particularly useful if the car loses control and as it comes round a corner too fast misses a lot of waypoints near the corner by using other waypoints managing to still receive useful information rather than needing to turn around. B. Perfecting the Fitness After a number of failed attempts of just using the progress along the defined route as an indication of fitness we realised we needed a much more specific function. Early attempts saw cars just driving in one large circle disregarding the roads in order to reach the destination. We adding collisions to the edge of the roads in order to force the car to stay on the road, though we were aware that steps would need to be taken to prevent the usual wall hugging behaviour often exhibited by AI controlled robots trying to find their way. This is when we improved the fitness function to minimise its collisions whilst on route to the destination. Foolishly we ended up with cars that didnt move, instantly being graded 0.5 for not having any collisions. We changed to a two part fitness function as used in [13], if the car failed to reach the destination its fitness would be grade solely on how far it managed to get on a 0 0.5 scale. If it succeeded in reaching the target, it would instantly be given a fitness of 0.5 with the other 0.5 based upon how many collisions it encountered. Using this fitness function we managed to evolve a range of cars that managed to reach the target in our scenario and not collide with the wall at all, quite a difficult task in fact given our physics model and the tightness of some of the corners. However it managed this at a snails pace, never gaining much momentum, stopping all acceleration and turning hard into corners to drift its way round, very slowly. It worked, but it wasnt the human like behaviour we were hoping for. This led to another change to the way we graded the chromosomes, the first 50

This was good progress, but wed made a specialist. Though there are only really two types of corners in the map, tight right hand turns, and wider left hand turns there is much more that needs to be taken into consideration. The car didnt perform quite so well on other routes, some it managed okay, but it wasnt as good as it was on the route it was trained on. This was partly due to flaws with the inputs used and partly to how it was trained. C. Gradual Evolution Togelius, Lucas and Nardi [10] took an interesting approach that proved successful in their work for evolving control networks for simulated cars. They would evolve the population on a certain track and when the population was deemed fit enough, introduce another track into the training sample. Our domain has a limited set of problems, in the fact we only really have the two different styles of corners, tight right turn and wide left turn so it reduces the need for different maps to train on. We do have different corners in the sense of corners at the end of the road or corners at a junction which differ slightly too. Giving the network a pretty decent route to try and learn on was quite a big task to tackle at once and took quite a long time to get anything of much worth. We decided to adapt the gradual learning approach and break the route down. To begin with the networks would train on a simple section with just one corner, once they got good at this another section was added to the route, taking it up to two corners which now covered both types of corner. This would continue until we had covered the four sections that composed the initial route and we had a number of individuals that could tackle to whole route reasonably well. This led to some pretty good results, cars could get through the route with a minimal number of collisions, the best being around 5 individual collisions. It was however impossible given the current state to improve past this, the cars in fact seemed to rely on these few collisions in order to get round the corners, almost slamming into the opposite side of some of the tight corners. D. Perfecting the Behaviour It could have been possible to devise a fitness function that penalised for collisions based on the level of impact. For example a slight scrap would not be as harshly penalised for as a high velocity side on collision. Although this we never experimented with this we felt that whilst it may work at least reducing the amount the cars used the walls it would be quite a slow process. Instead we decided to continue the gradual learning approach and see collisions as almost like the training wheels used to get the cars going in the right direction. In fact we never wanted collisions in the first place so this helped serve that desire too. The plan would be to remove collisions once the cars had become moderately good at completing the full route and rather than grade them on the number of collisions encountered, theyd be scored dependant on the amount of time they were off-road. A further addition made was to allow the Genetic Algorithm to also evolve the sensor lengths individually. In previous tests theyd always been fixed at 25 pixels distance. Individual distances were allowed for each sensor available, this is something that was once again presented in [10] with the sensor based racing model. They allowed the sensor distance as well as angles to be evolved and later on let the networks decide how many wall sensors and how many car sensors it had. Their work required all of this sensory data as it was much less reliant upon a provided route, only being given a handful of waypoints along a racing track. We were just curious as to what effect, if any giving away some control of these sensors would have on the resulting chromosomes. The results after these changes were quite good, in fact quite surprising in some regards. The sensors distances were the most interesting, the forward sensor seemed to stay relatively far which makes sense. The two side sensors however became much shorter, with the right hand one seemingly always slightly shorter still. Looking at this it seems to make sense, the car is trying to stick to the right hand side of the road and the roads are pretty narrow. The sensors only ever really need to reach the edge of the road, and the shorter they are the better granularity they provide. The difference in side sensor length also allows for better positioning. From observing the networks in action they seem to try and position themselves using these sensors, by having a shorter right sensor they position themselves to the right hand side more which is what the A* path follows. These cars also had much improved cornering, being able to take the corners much tighter especially now they didnt have a wall to help them round. We were able to evolve individuals that score greater than 0.97 and only ever slightly drifted off the road with the rear tyres, a much greater result that the previous chromosomes that were only trained using collisions. The resulting chromosomes were solid AI controllers, at least on par with if not better than the previous HFSM implementation. The Neural Networks managed to push the car to its limits, managing to navigate without the need to brake for corners, this was never possible with the HFSM implementation that drove slower and needed to brake quite early on. The physics felt like a hindrance to these early controllers whereas the Neural Network based ones took advantage of it wherever they could. VI. FUTURE WORK The controllers developed for this paper were quite basic in their behaviours, completing the task of navigating a route and not much else. In the future it would be good to look to try and enhance these behaviours to include things such as the avoidance of other cars or increasing the challenge of the environment so that they need to brake to be able to stand a chance of getting round corners. The behaviour exhibited wasnt quite as human like as we would have wished either. Its possible that for future work these networks could be trained against a player model in order to try and learn more human like tendencies. These could include braking as weve mentioned, much more cautious driving, real drivers dont hit a corner at full momentum or constantly alter their steering direction. In fact gradual evolution could be used here again, where already successful

controllers are refined against a player model in order to try and eradicate some of the wilder behaviour exhibited. VII. CONCLUSION For a first exploration into implementing such a control system into this domain we feel the results were definitely promising. The behaviour of the final chromosomes are arguably much better than the initial HFSM implementation and by using a random assortment of different NNs wed be able to easily create a range of slightly different NPC behaviours with minimal effort. In fact rather interesting behaviors appeared that we did not expect. For example if the car s front sensor detected a collision too close, such as with the early route that ended in front of a wall, the car would begin to turn away to prevent itself crashing. Obviously there are downsides to this approach, proper testing would need to be done to ensure that no unexpected behaviour was possible before adding to a final game. Whilst these networks are computationally quick at processing they do have a larger memory requirement. However because these networks have no context other than the inputs passed in it would be possible for one network to power as many cars as needed, potentially saving memory even when compared to FSMs or other rigid AI controllers. Personally we feel like weve also learnt a lot of useful techniques for evolving Neural Networks which can be applied to other projects or future work in this domain. The use of gradual complexity increases for the training methods greatly helped the generation of good chromosomes and the time it took to find these. REFERENCES [1] J. Orkin, Three states and a plan: The A.I. of F.E.A.R, 2006. [2] D. Isla, Handling complexity in the halo 2 AI, 2005. [3] S. Russel and P. Norvig, Artificial intelligence: A modern approach, 1st ed. New Jersey: Prentice-Hall, Inc., 1995. [4] The MathWorks, Inc, Hyperbolic tangent sigmoid transfer function - matlab tansig, http://www.mathworks.co.uk/help/nnet/ref/tansig.html, 2013. [Online]. Available: http://www.mathworks.co.uk/help/nnet/ref/tansig.html [5] MathWorks, Inc, Hyperbolic tangent - matlab tanh, http://www.mathworks.co.uk/help/matlab/ref/tanh.html, 2013. [Online]. Available: http://www.mathworks.co.uk/help/matlab/ref/tanh.html [6] J. Holland, Adaptation in natural and artificial systems, MIT Press, Tech. Rep., 1975. [7] M. Mitchell, An introduction to Genetic Algorithms (complex adaptive systems), new edition ed. Prentice-Hall, Inc., 1998. [8] K. De Jong, Analysis of the behaviour of a class genetic adaptive systems, University of Michigan, Tech. Rep., 1975. [9] D. Loiacono, P. L. Lanzi, J. Togelius, and E. Onieva, The 2009 simulated car racing championship, computational intelligence and AI in games, IEEE Transactions on, vol. 2, no. 1, pp. 131 147, 2010. [10] J. Togelius and S. M. Lucas, Evolving controllers for simulated car racing, Evolutionary Computation, The 2005 IEEE Congress on, vol. 2, pp. 1906 1913, 2005. [11] J. Togelius, S. Lucas, and R. D. Nardi, Computational intelligence in racing games, in Advanced Intelligent Paradigms in Computer Games. Springer Berlin, 2007, pp. 39 69. [12] Microsoft Research, Applied Group, Drivatar theory, http://research.microsoft.com/en-us/projects/drivatar/theory.aspx, 2008. [Online]. Available: http://research.microsoft.com/enus/projects/drivatar/theory.aspx [13] T. Thompson and J. Levine, Scaling-up behaviours in evotanks: Applying subsumption principles to artificial neural networks, Computational Intelligence and Games, 2008. IEEE Symposium On, pp. 159 166, 2008.