Ensemble Evolution of Checkers Players with Knowledge of Opening, Middle and Endgame

Ensemble Evoluton of Checkers Players wth Knowledge of Openng, Mddle and Endgame Kyung-Joong Km and Sung-Bae Cho Department of Computer Scence, Yonse Unversty 134 Shnchon-dong, Sudaemoon-ku, Seoul 120-749 South Korea {kjkm, sbcho}@cs.yonse.ac.kr Abstract. In ths paper, we argue that the nserton of doman knowledge nto ensemble of dverse evolutonary checkers can produce mproved strateges and reduce evoluton tme by restrctng search space. The evolutonary approach for game s dfferent from the tradtonal one that explots knowledge of the openng, mddle, and endgame stages, so that t s not sometmes effcent to evolve smple heurstc that s found easly by humans because t s based purely on a bottom-up style of constructon. In ths paper, we have proposed the systematc nserton of openng knowledge and an endgame database nto the framework of evolutonary checkers. Also, common knowledge, the combnaton of dverse strateges s better than the sngle best one, s nserted nto the mddle stage and s mplemented usng crowdng algorthm and a strategy combnaton scheme. Expermental results show that the proposed method s promsng for generatng better strateges. 1 Introducton Incorporatng a pror knowledge, such as expert knowledge, meta-heurstcs, human preferences, and most mportantly doman knowledge dscovered durng evolutonary search, nto evolutonary algorthms has ganed ncreasng nterest n recent years [1]. In ths paper, we propose a method for systematcally nsertng expert knowledge nto an evolutonary checkers framework at the openng, mddle, and endgame stages. In the openng stage, openngs defned by the Amercan Checkers Federaton (ACF) are used. In prevous work, we have used specaton technques to search for dverse strateges that embody dfferent styles of game play and have combned them usng votng for hgher performance [2]. Ths dea comes from the common knowledge that the combnaton of dverse well-playng strateges can defeat the best one because they can complement each other for hgher performance. Fnally, we have used an endgame database from Chnook, the frst man-machne checkers champon. Fgure 1 explans the conceptual framework of the proposed method. The most mportant dea s the systematcal ntegraton of three doman knowledge (openng DB, mddle stage knowledge and endgame DB). The mddle stage knowledge s comng from the Korean event n the game of Go. In 2003, Internet ste TYGEM (http://www.tygem.co.kr) held a many-to-one style game between Hoon Hyun Cho, one of the greatest go players, and 3000 amateur players. The wnner of Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 950 954, 2006. Sprnger-Verlag Berln Hedelberg 2006

Ensemble Evoluton of Checkers Players 951 the game was Cho. After the game, he sad that t was a very dffcult game because there was no obvous mstake of amateur players. Specaton algorthm for evolutonary checkers s adopted for an mplementaton of the knowledge. Openng DB Generatng Neural Network Populaton Genetc Operaton Two Smlar Neural Networks Game Organzer Generatng Game Tree Board Evaluaton of Leaf Nodes Usng NN Next Generaton Applyng Crowdng Algorthm Wnner Neural Network Endgame DB Decson of Next Move Mn-Max Search Specated neural network evoluton Checkers game playng Fg. 1. Conceptual dagram of the proposed method 2 Incorporatng Knowledge nto Evolutonary Checkers 2.1 Openng Stage The openng move s the most mportant opportunty to defeat an expert player because trval mstakes n the openng can lead to an early loss. The frst move n checkers s played by red and there are seven choces (9-13, 9-14, 10-14, 10-15, 11-15, 11-16, and 12-16). Usually, 11-15 s the best move for red but there are many other alternatves. They are descrbed wth specfc names, such as Ednburgh, Double Corner, Denny, Kelso, Old Fathful, Brstol, and Dundee, respectvely. For each choce, there are many well establshed more sequences whch range n length from 2 to 10. The longest sequence s descrbed as the Whte Doctor: 11-16, 22-18, 10-14, 25-22, 8-11, 24-20, 16-19, 23-16, 14-23, 26-19. Careful analyss over decades of tournament play has proven the usefulness or farness of the openng sequences. Intal sequences are decded by the openng book untl the move s out of the book. Each player chooses ther openng randomly and the seven frst choces have the same probablty to be selected as an openng. 2.2 Evolutonary Specated Checkers Followng Fogel [3], a checkers board s represented by a vector of length 32 and components n the vector could have a value of {-K, -1, 0, +1, +K}, where K s the

952 K.-J. Km and S.-B. Cho value assgned for a kng, 1 s the value for a regular checker, and 0 represents an empty square. For reflectng spatal features of the board confguraton, sub-boards of the board are used as an nput. One board can have 36 3 3 sub-boards, 25 4 4 subboards, 16 5 5 sub-boards, 9 6 6 sub-boards, 4 7 7 sub-boards and 1 8 8 sub-board. 91 sub-boards are used as an nput to the feed-forward neural network. The sgn of the value ndcates whether or not the pece belongs to the player or the opponent. The closer the output of the network s to 1.0, the better the poston s. Smlarly, the closer the output s to -1.0, the worse the board. The archtecture of the network s fxed and only the weghts can be adjusted by evoluton. Each ndvdual n the populaton represents a neural network (weghts and bases) that s used to evaluate the qualty of the board confguraton. Addtonally, each neural network has the value of K and self-adaptve parameters for weghts and bases. An offsprng P ', = 1,..., p for each parent P, = 1.,,, p s created by σ '( j ) = σ ( j)exp( τn (0,1)), j = 1,..., N w '( j) = w ( j) + σ '( j) N (0,1), j = 1,..., N where N W s the number of weghts and bases n the neural network (here ths s 5046), τ = 1 / 2 Nw = 0. 0839, and N j (0,1) s the standard Gaussan random varable resampled for every j. In ftness evaluaton, each ndvdual chooses fve opponents from a populaton pool and plays games wth the players. Ftness ncreases by 1 for a wn whle the ftness of an opponent decreases by 2 for a loss. In a draw, the ftness values of both players reman the same. After all the games are played, the ftness values of all players are determned. In ths paper, we utlze a crowdng algorthm [4], a popular form of specaton algorthm, for searchng for dverse neural networks. In ths algorthm, one neural network s selected from two smlar ndvduals based on the result of game played between them (usually, a crowdng algorthm uses ther ftness but n ths case, we cannot use ftness because of the dynamc property of ftness landscape). A crowdng algorthm s one of the representatve specaton methods that attempt to dscover dverse speces n a search space. The dstance between two neural networks s calculated by usng Eucldean dstance between ther weghts. To dscover clusters of ndvduals n the populaton at the last generaton wth arbtrary shape, densty-based clusterng methods have been used. DBSCAN (Densty-based Spatal Clusterng of Applcatons wth Nose) s one of the algorthms [5]. Representatve players from each cluster are chosen by tournament of all players n the same cluster. Moves of combned players are determned usng a smple votng of the representatve players. It pcks the move that has the greatest number of votes. If there s no clear wnner, one of the moves that have the greatest votes s selected randomly. 2.3 Endgame Stage The estmated qualty of the board s calculated usng the evolved neural networks to evaluate the leaf nodes of the tree wth mn-max algorthm. If the value of f (estmated goodness of the next moves) s not relable, we refer to the doman specfc knowledge and revse f. The decson rule for queryng the doman knowledge must j j w w

Ensemble Evoluton of Checkers Players 953 be defned prevously as follows. IF (f<0.75 and f>0.25) or (f<-0.25 and f>-0.75) THEN queryng the doman knowledge. 3 Expermental Results The non-specated evolutonary algorthm uses a populaton sze of 15 and lmts the run to 60 generatons. The specated evolutonary algorthm sets the populaton sze to 15 and generatons to 60. The mutaton rate s 0.01 and crossover rate s 1.0. The number of leagues (t s used to select the best player from each speces) s 5 (5 means that each player selects 5 players from speces randomly and the competton results are used for the selecton). Tme (hours) Wth Knowledge 50 Wthout Knowledge 40 30 20 10 0 1 11 21 31 41 51 Generaton Fg. 2. Comparson of runnng tme (Smple evoluton) The Chnook endgame DB (2~6 peces) s used for revson when the estmated value from the neural network s between 0.25 and 0.75 or between -0.25 and -0.75. Tme analyss ndcates that the evoluton wth knowledge takes much less tme than that wthout knowledge n smple evoluton (Fgure 2). Ths means that the nserton of knowledge wthn a lmted scope can accelerate the speed of evolutonary algorthm because t can reduce computatonal requrement for fndng optmal endgame sequence by usng endgame DB. Table 1 summarzes the competton results between the best ndvdual n the evoluton wth knowledge and the best ndvdual n the evoluton wthout knowledge for each generaton. The knowledge ncorporaton model can perform better than the one wthout knowledge. Table 2 shows the competton results n the specated evoluton. Table 3 shows the effect of the stored knowledge (openng and endgame DB) n specaton. Table 1. Expermental results on openng and endgame knowledge ncorporaton (Wn/ Lose/Draw) for smple evoluton. Evoluton wth the stored knowledge performs better than that wthout the knowledge. (Op=Openng knowledge, SGA=Smple GA, E=Endgame knowledge). Op+SGA+E SGA Generatons 1~14 15~29 30~44 45~59 Total Red Whte 5/0/10 3/3/9 3/0/12 5/3/7 16/6/38 Whte Red 4/3/8 4/2/9 5/4/6 4/2/9 17/11/32

954 K.-J. Km and S.-B. Cho Table 2. Expermental results on openng and endgame knowledge ncorporaton (Wn/Lose/ Draw) for specated evoluton. Evoluton wth the stored knowledge performs better than that wthout the knowledge. (S=Specaton). Op+S+E Specated Generatons 1~14 15~29 30~44 45~59 Total Red Whte 5/1/9 4/3/8 6/0/9 8/2/5 23/6/31 Whte Red 7/3/5 5/2/8 8/4/3 6/2/7 26/11/23 Table 3. The competton results between the specated players usng both openng and endgame DB and the specated player wth one of the knowledge Op+S+E Op+S Total Red Whte 6/2/7 Whte Red 8/4/3 Op+S+E S+E Total Red Whte 5/5/5 Whte Red 4/5/6 Op+S S+E Total Red Whte 3/6/6 Whte Red 2/7/6 4 Concluson and Future Work The fnal concluson of the experment s SGA < Specated < Op+S < S+E Op+S+E (SGA < Specated s from the results of [2]). The effect of openng knowledge s not so bg because they have only the lmted sequences. The lmted openng knowledge can prevent the player from makng a bg mstake but t s not much useful when the opponent chooses a move that s not ncluded n the openng sequence. Multple dverse neural networks can perform better than the sngle best one but there s always problem of combnaton and averagng may not work. As a future work, sophstcated combnaton method should be explored for better performance. References 1. Jn, Y.: Knowledge Incorporaton n Evolutonary Computaton, Sprnger (2004) 2. Km, K.-J. and Cho, S.-B.: Evolvng specated checkers players wth crowdng algorthm. Proc. of the 2002 Congress on Evolutonary Computaton (2002) 407-412 3. Fogel, D.B.: Blonde24: Playng at the Edge of AI. Morgan Kaufmann (2001) 4. Mahfoud, S. W.: Nchng methods. Handbook of Evolutonary Computaton, C6.1, IOP Publshng and Oxford Unversty Press, (1997) 5. Ester, M., Kregel, H.-P., Sander J. and Xu, X.: A densty-based algorthm for dscoverng clusters n large spatal databases wth nose. Knowledge Dscovery and Data Mnng (1996) 226-231