情報処理学会研究報告 IPSJ SIG Technical Report Vol.2010-GI-24 No /6/25 UCT UCT UCT UCB A new UCT search method using position evaluation function an

Size: px

Start display at page:

Download "情報処理学会研究報告 IPSJ SIG Technical Report Vol.2010-GI-24 No /6/25 UCT UCT UCT UCB A new UCT search method using position evaluation function an"

Arlene Laureen Bridges
5 years ago
Views:

1 UCT UCT UCT UCB A new UCT search method using position evaluation function and its evaluation by Othello Shota Maehara, 1 Tsuyoshi Hashimoto 2 and Yasuyuki Kobayashi 1 The Monte Carlo tree search, particularly UCT, gains a great success and is being widely studied as a game tree search method. However UCT using position evaluation function has been studied because of the difficulty of calculating position evaluation function the game of GO, the main target of UCT research. We propose a new method that adds position evaluation function to the UCB value in this paper. It is implemented for the game of Othello that is relatively easy to make position evaluation function and experiments are performed. The results show the overwhelming ability of proposed method and its effectiveness is verified ) UCT 2) UCT 1 Graduate School of Science and Engineering, University of Shimane 2 Department of Information Engineering, College of Technology of Matsue 1 M.Buro Logistello 1997 UCB 3) 4)5) 1)2)6) UCT Amazons 7)8) LOA 9) 2 Nested Monte-Carlo Morpion Solitaire 10)11) 1 c 2010 Information

2 UCT UCB UCT 2. UCB UCB Upper Confidence Bound 3) UCB UCB i X i n c X i + c (1) UCB UCB1 3) (1) UCB UCB UCT(UCB applied to Trees) 2) UCT UCB )13) Amazons 8) UCT 13) MINMAX 7) MINMAX Amazons 4)5) UCB 4. UCT UCB UCB UCB UCB UCB UCT+ i i E i E i (1) UCB (X i + E i) + c (2) UCT+ (2) UCT+ (1) UCT 2 c 2010 Information

3 Sum1 Sum2 (2) 2 Score Sum1 Sum i Sum1 i Sum2 i Score i UCT+ (3) E = (X i + Sum1i Sum2i, c = Sum1i Sum2i ) (3) (2) E i c UCT+ 1 15) 14) Sum1 Sum2 Score (+1) ( 1) (0) 1 A 1 Sum1 A Sum2 Score Sum1 Sum2 Score Score (2) X i Sum1 Sum2 E i 5.2 UCT scrzebra 1 S S=40 40 S=50 (g,1),-10, (h,1),14, (h,2),-10, (h,3),0 (h,7),2, 2 5 (h,1) (h,7) c 2010 Information

Table 3 3 (%) Right answer percentage by simple method. 45.30 46.70 (Playout=2000) 45.00 57.33 1 50% 7. 2 1 UCT (%) Table 1 Right answer percentage by UCT. Playout=10000 47.83 56.33 Playout=30000 50.

4 Table 3 3 (%) Right answer percentage by simple method (Playout=2000) % UCT (%) Table 1 Right answer percentage by UCT. Playout= Playout= UCT+ (%) Table 2 Right answer percentage by UCT+. Playout= Playout= ( 0.5) OS Debian Linux5.0.3 CPU Pentium4 3.2GHz 512MB C UCB UCT UCT+ c UCT UCT 1 3 UCT UCT+ UCB UCT Amazons 2 UCB 8. UCB Logistello 1) Coulom, R.: Efficient selectivity and backup operators in monte-carlo tree search, Proceedings of the 5th International Conference on Computers and Games, Turin, Italy (2006). 2) Kocsis L. and Szepesvari, C.: Bandit based Monte-Carlo Planning Proceedings of the 15th European Conference on Machine Learning pp (2006). 3) Auer, P., Cesa-Bianchi, N. and Fischer, P.: Finite time Analysis of the Multi-armed Bandit Problem, Machine Learning, Vol. 47, pp c 2010 Information

5 256 (2002). 4),, :, Proceedings of The 11th Game Programming Workshop pp (2006). 5), :, 13 pp.1 8 (2008). 6) Gelly, S., Wang, Y., Munos, R. and Teytaud, O.: Modifications of UCT with Patterns in Monte-Carlo Go, Technical Report RR-6062, INRIA (2006). 7) Lorentz, R.: Amazons Discover Monte Carlo Computers and Games, Lecture Notes in Computer Science, Vol. 5131, pp.13 24, (2008). 8) Julien Kloetzer, Hiroyuki Iida and Bruno Bouzy: Playing Amazons Endgames, ICGA Journal, To be appear, 9) Winands, M.H.M. and Bjornsson, Y. (2010): Evaluation Function Based Monte-Carlo LOA, In Advances in Computer Games (ACG 2009), Lecture Notes in Computer Science (LNCS 6048), pp c Springer, Berlin Heidelberg. 10) Tristan Cazenave: Nested Monte-Carlo Search, IJCAI2009, pp (2009). 11) Nested Monte-Carlo AMAF, Vol.2010, No.7, 2009-GI-23, pp.1 7 (2010). 12) Coulom, R.: Computing Elo Ratings of Move Patterns in the Game of Go, In Computer Game Workshop, Amsterdam, The Netherlands (2007). 13) :, Vol.2009, No.27, 2009-GI-21, pp (2009). 14) : 15) :, 17 (2005). 5 c 2010 Information

A Bandit Approach for Tree Search

A Bandit Approach for Tree Search A An Example in Computer-Go Department of Statistics, University of Michigan March 27th, 2008 A 1 Bandit Problem K-Armed Bandit UCB Algorithms for K-Armed Bandit Problem 2 Classical Tree Search UCT Algorithm