Handling Search Inconsistencies in MTD(f)

Similar documents
CSE 332: Data Structures and Parallelism Games, Minimax, and Alpha-Beta Pruning. Playing Games. X s Turn. O s Turn. X s Turn.

2 person perfect information

Computer Science and Software Engineering University of Wisconsin - Platteville. 4. Game Play. CS 3030 Lecture Notes Yan Shi UW-Platteville

ARTIFICIAL INTELLIGENCE (CS 370D)

CMPUT 657: Heuristic Search

Monte Carlo tree search techniques in the game of Kriegspiel

CS 229 Final Project: Using Reinforcement Learning to Play Othello

Computer Game Programming Board Games

CS 4700: Artificial Intelligence

2359 (i.e. 11:59:00 pm) on 4/16/18 via Blackboard

Opponent Models and Knowledge Symmetry in Game-Tree Search

CMPUT 396 Tic-Tac-Toe Game

Algorithms for Data Structures: Search for Games. Phillip Smith 27/11/13

Adversary Search. Ref: Chapter 5

Game-Playing & Adversarial Search Alpha-Beta Pruning, etc.

a b c d e f g h 1 a b c d e f g h C A B B A C C X X C C X X C C A B B A C Diagram 1-2 Square names

game tree complete all possible moves

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

CSC 380 Final Presentation. Connect 4 David Alligood, Scott Swiger, Jo Van Voorhis

Solving Dots-And-Boxes

Algorithms for solving sequential (zero-sum) games. Main case in these slides: chess. Slide pack by Tuomas Sandholm

Applications of Artificial Intelligence and Machine Learning in Othello TJHSST Computer Systems Lab

mywbut.com Two agent games : alpha beta pruning

UNIVERSITY of PENNSYLVANIA CIS 391/521: Fundamentals of AI Midterm 1, Spring 2010

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

CS188 Spring 2010 Section 3: Game Trees

AI Approaches to Ultimate Tic-Tac-Toe

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements

Instability of Scoring Heuristic In games with value exchange, the heuristics are very bumpy Make smoothing assumptions search for "quiesence"

Machine Learning in Iterated Prisoner s Dilemma using Evolutionary Algorithms

1 Introduction. 1.1 Game play. CSC 261 Lab 4: Adversarial Search Fall Assigned: Tuesday 24 September 2013

Algorithms for solving sequential (zero-sum) games. Main case in these slides: chess! Slide pack by " Tuomas Sandholm"

Last-Branch and Speculative Pruning Algorithms for Max"

Artificial Intelligence. Minimax and alpha-beta pruning

Introduction to Artificial Intelligence CS 151 Programming Assignment 2 Mancala!! Due (in dropbox) Tuesday, September 23, 9:34am

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Today. Nondeterministic games: backgammon. Algorithm for nondeterministic games. Nondeterministic games in general. See Russell and Norvig, chapter 6

CS 221 Programming Assignment Othello: The Moors of Venice

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

CSE 473: Artificial Intelligence Fall Outline. Types of Games. Deterministic Games. Previously: Single-Agent Trees. Previously: Value of a State

Playing Games. Henry Z. Lo. June 23, We consider writing AI to play games with the following properties:

1 Modified Othello. Assignment 2. Total marks: 100. Out: February 10 Due: March 5 at 14:30

CS 387/680: GAME AI BOARD GAMES

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Ar#ficial)Intelligence!!

Experiments on Alternatives to Minimax

Game-playing: DeepBlue and AlphaGo

Game Playing. Why do AI researchers study game playing? 1. It s a good reasoning problem, formal and nontrivial.

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

4. Games and search. Lecture Artificial Intelligence (4ov / 8op)

: Principles of Automated Reasoning and Decision Making Midterm

Adversarial Search 1

CPS331 Lecture: Search in Games last revised 2/16/10

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Game Playing State-of-the-Art

Artificial Intelligence Search III

CS61B Lecture #22. Today: Backtracking searches, game trees (DSIJ, Section 6.5) Last modified: Mon Oct 17 20:55: CS61B: Lecture #22 1

ADVERSARIAL SEARCH. Chapter 5

Documentation and Discussion

Parameter-Free Tree Style Pipeline in Asynchronous Parallel Game-Tree Search

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

Gradual Abstract Proof Search

Adversarial Reasoning: Sampling-Based Search with the UCT algorithm. Joint work with Raghuram Ramanujan and Ashish Sabharwal

Game Playing. Dr. Richard J. Povinelli. Page 1. rev 1.1, 9/14/2003

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

Prepared by Vaishnavi Moorthy Asst Prof- Dept of Cse

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Variations on the Two Envelopes Problem

Generalized Game Trees

Computing Science (CMPUT) 496

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs

CS 5522: Artificial Intelligence II

For slightly more detailed instructions on how to play, visit:

Foundations of Artificial Intelligence

Game-Playing & Adversarial Search

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Video of Demo Mystery Pacman. Adversarial Search

Game playing. Outline

New Values for Top Entails

Game Playing Part 1 Minimax Search

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA

CS 188: Artificial Intelligence Spring 2007

Game Playing Beyond Minimax. Game Playing Summary So Far. Game Playing Improving Efficiency. Game Playing Minimax using DFS.

Locally Informed Global Search for Sums of Combinatorial Games

Search the action space of 2 players Russell & Norvig Chapter 6 Bratko Chapter 24

Game Playing. Chapter 8

CS 297 Report Improving Chess Program Encoding Schemes. Supriya Basani

CS 4700: Foundations of Artificial Intelligence

Foundations of AI. 5. Board Games. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard and Luc De Raedt SA-1

CS 2710 Foundations of AI. Lecture 9. Adversarial search. CS 2710 Foundations of AI. Game search

MULTI-PLAYER SEARCH IN THE GAME OF BILLABONG. Michael Gras. Master Thesis 12-04

Last update: March 9, Game playing. CMSC 421, Chapter 6. CMSC 421, Chapter 6 1

Programming Project 1: Pacman (Due )

Fictitious Play applied on a simplified poker game

Assignment 2 (Part 1 of 2), University of Toronto, CSC384 - Intro to AI, Winter

CS188 Spring 2010 Section 3: Game Trees

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

University of Alberta. Library Release Form. Title of Thesis: Recognizing Safe Territories and Stones in Computer Go

Artificial Intelligence

Virtual Global Search: Application to 9x9 Go

COMP219: Artificial Intelligence. Lecture 13: Game Playing

Transcription:

Handling Search Inconsistencies in MTD(f) Jan-Jaap van Horssen 1 February 2018 Abstract Search inconsistencies (or search instability) caused by the use of a transposition table (TT) constitute a well-known problem for efficient algorithms for minimax searching such as MTD(f). This paper presents an effective and efficient solution to the problem of MTD(f) playing occasional blunder moves as a result of search inconsistencies. This makes MTD(f) safe to use, even without clearing the TT. Introduction Search inconsistencies caused by the use of a TT constitute a problem for the most efficient algorithms for minimax searching, Aspiration PVS/NegaScout and MTD(f), that is not addressed in the textbook versions of these algorithms. Ways to handle search inconsistencies in Aspiration PVS/NegaScout can be found online [1], and involve re-searches with a widened (α, β) window. However, these solutions do not apply to MTD(f) and to the best of our knowledge no solutions are known for MTD(f). In [2], search inconsistencies are mentioned as an open problem and suggested for future work. Search Inconsistencies in MTD(f) MTD(f) [3] was implemented in the author's 10x10 draughts engine MAX. Reproducible situations were found where MTD(f) plays a wrong move, with and without clearing the TT, for instance a drawing move instead of a winning move. These situations can be prevented by 1. clearing the TT before the search, and 2. only allowing the use of TT entries with an exact depth match (not with a greater depth). Both are needed to avoid all problems, but both are also considered wasteful. In addition, if a program is to be used in the context of machine learning (ultrafast games), clearing the TT in between moves simply takes too much time. So MTD(f) needs a way to handle these situations. To address the problem of search inconsistencies, we first make the move selection mechanism in MTD(f) explicit, see Algorithm 1. The function MT 2 is unrolled for the root node, giving rootmt. This function does not use the TT but assigns a value to each searched move. After a cutoff, the remaining moves get a value of. In the context of MTD(f), a search inconsistency arises if a search with window returns a value and a subsequent search with (for instance) window returns a value. Or, 1 E-mail: janjaapvanhorssen@gmail.com 2 MT( ) is equivalent to alphabetatt(, ). 1

conversely, if a search with window returns a value and a subsequent search with (for instance) window returns a value. * // post: best move is first in 'moves' * int MTDf(Board root, MoveList moves, int f, int d) { g = f; low = INF; upp = +INF; do { gamma = g == low? g + 1 : g; * g = rootmt(root, moves, gamma, d); // post: moves have bound values if (g < gamma) upp = g; else low = g; * moves.sort(); // stable sort descending } while (low < upp); return g; } Algorithm 1. MTD(f) with move selection code (marked by an asterisk and italics). MTD(f) makes a series of calls to rootmt with an adjusted window for given depth, yielding a sequence of values If there are no search inconsistencies then the minimax value for depth. The converse is not true: if there can still be search inconsistencies. There are two cases: 1. The lass pass failed high. Then all preceding passes must have failed low. We have. 2. The lass pass failed low. Then all preceding passes must have failed high. We have. Case 1. The last pass failed high, so we have a lower bound for the value. If, as expected, (and low = upp) then it is assumed that we found the minimax value and the best move is first in the list. If, unexpectedly, (and low > upp: the lower bound overshot the upper bound), then we have a search inconsistency. Let move be first in the list after sorting. We know from the second-last pass that the values of all moves were, but now we find that the value of is. Moreover, the search was cut off (failed high) after searching, so the remaining moves were not searched to find an even better move. In this case it is unlikely that one of the other moves is (much) better than because it is likely that their values are still roughly. Because we have a lower bound on the value of (which is better than expected) and we are maximizing, the search inconsistency is not considered harmful in this case. Case 2. The last pass failed low, so the second-last pass failed high, returning a value of. Let move be first in the list before the last pass. The search window for the last pass was which fails low. MTD(f) expects a value of so it can conclude that the minimax value is. Due to search inconsistencies, however, also a value of is possible (giving low > upp: the upper bound undershot the lower bound). In either case, suppose that after sorting the last pass yields a new best move (see Figure 1). This implies that failed low with value, which is a search inconsistency. Move also failed low, but with value. So is preferred over, even though we only have upper bounds for and. This does not make 2

value sense for a maximizing player. If this happens in the last completed iterative deepening iteration (depth) before we run out of time, MTD(f) will play a wrong move. The minimax value of this move can be any value. So the move can be anything between a good (alternative) move and a blunder. This happens very rarely and is not easy to reproduce, but nevertheless it is the reason why some programmers decide not to use MTD(f) anymore. 240 235 230 225 220 best move value g 215 210 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 pass Figure 1. Typical case 2 sequence of values in a winning position. Even if, a search inconsistency can arise if the best move after pass fails low with a value in pass. We propose the following solution. It involves no re-searches and imposes no additional overhead on MTD(f). The above case 2 is easy to detect after the last pass has finished, either with the expected low = upp or the search inconsistency low > upp. If the last pass failed low and the best move has changed, we do not trust this move and its value (only an upper bound), but instead return the best move and value of the previous pass (a lower bound after passes), which we saved beforehand. The version of MTD(f) with the proposed adaptation is called MTDfix, see Algorithm 2. Performance MTDfix was implemented in MAX. 3 It correctly solves (plays the right move in) the reproducible test cases where MTD(f) plays a wrong move. Also it performed 2.7% faster on a benchmark of 49 successive white-to-move positions from a single game, which were searched with iterative deepening to specified depths. This indicates that MTDfix improves move ordering in the root, i.e., the best move is more often first in the list. To further test the algorithm, a match of 988 games (3-move ballots with both colors) was played between MTDfix and MTD(f), both using 0.1 second per move. Neither program clears the TT before the search. MAX-MTDFIX vs MAX-MTD(F): 97 wins, 94 losses, 797 draws, score 50.2% 3 MAX is a developing program, currently single-thread and without the use of forward pruning techniques. 3

// post: best move is first in 'moves' int MTDfix(Board root, MoveList moves, int f, int d) { g = f; low = INF; upp = +INF; do { * bestmove = moves.getfirst(); * bestvalue = g; gamma = g == low? g + 1 : g; g = rootmt(root, moves, gamma, d); // post: moves have bound values if (g < gamma) upp = g; else low = g; moves.sort(); // stable sort descending } while (low < upp); * if (g < gamma && bestmove!= moves.getfirst()) { * moves.putfirst(bestmove); * g = bestvalue; * } return g; } Algorithm 2. MTDfix. The lines added to MTD(f) (see Algorithm 1) are marked with an asterisk. There is no significant increase in playing strength, but we can study the situations of a fail-low in the last pass together with a change of the best move in the last completed iteration (depth), i.e., directly affecting the move played. These situations were also detected but not fixed in MTD(f). For MTD(f) it occurred 77 times in 70 (out of 988) games, in which MTD(f) scored 70.0%. For MTDfix it occurred 100 times in 70 games, in which it scored 82.9%. In five games the situation occurred in both programs, in which MTDfix scored 60%. The latter five games were analyzed and indeed some blunders by MTD(f) were found. The situation occurred on average in 1 : 627 moves and in 1 : 14 games. Looking at the score in games where the problem occurs, we see that the score is 50%, even if we don't fix the problem. This indicates that it mainly happens in games where a big advantage is reached, i.e., with increasing search values. The results indicate that MTDfix does better in actually winning most of these games, where MTD(f) may spoil a number of games by playing wrong moves. To verify this hypothesis, a series of 158-game matchess (2-move ballots with both colors) was played between different versions of MTD(f) with 0.1 second per move and 1. an equal opponent: a version of MAX using PVS (without aspiration). 2. a weaker opponent: BOASE, participant in the 2011 Dutch Open computer tournament (and improved since then). BOASE was given 1 second per move to raise the level a bit. 3. a stronger opponent: SCAN 3.0, winner of the 2017 ICGA Computer Olympiad. SCAN was given 1 second per move to give it an advantage. SCAN was modified to disable time management. For the (only) version of MTD(f) that clears the TT before every search, the time to clear the TT was not included in the thinking time. For tournament games this time is negligible but for fast games is it not, and we want to study the effect of a clean TT. The tournament results are shown in Table 1. 4

program sec program sec opponent games win loss draw % detected MAX-MTDFIX 0.1 vs MAX-PVS 0.1 equal 158 18 15 125 50.9 29 MAX-MTD(F) 0.1 vs BOASE 5.4 1 weaker 158 119 0 39 87.7 83 MAX-MTD(F) CLEARTT 0.1 vs BOASE 5.4 1 weaker 158 132 0 26 91.8 7 MAX-MTDFIX 0.1 vs BOASE 5.4 1 weaker 158 143 0 15 95.3 197 MAX-PVS 0.1 vs BOASE 5.4 1 weaker 158 116 0 42 86.7 - MAX-MTD(F) 0.1 vs SCAN 3.0 1 stronger 158 0 90 68 21.5 6 MAX-MTD(F) CLEARTT 0.1 vs SCAN 3.0 1 stronger 158 0 94 64 20.3 4 MAX-MTDFIX 0.1 vs SCAN 3.0 1 stronger 158 0 86 72 22.8 0 MAX-PVS 0.1 vs SCAN 3.0 1 stronger 158 0 85 73 23.1 - Table 1. Tournament results. The results of 158-game matches should be interpreted with some care as they have a larger margin of error, for instance 95.3±2.3% and 22.8±3.9%, versus 50.2±1.4% for the 988-game match (95% confidence intervals). Nevertheless we can draw some conclusions from the results. In the match MAX-MTDFIX vs MAX-PVS the situation was detected 29 times in 15 out of 158 games, with up to five times in one game. MTDfix scored 14 wins and 1 draw in these games. The 15 lost games were analyzed and no blunders were found. Conclusion Not clearing the TT and ignoring the problem increases the probability of a wrong move. Clearing the TT prevents most situations where MTD(f) chooses a wrong move, but not all. In addition, this takes time and useful information is lost. MTDfix outperforms MTD(f) and MTD(f)-clearTT, most clearly against a weaker opponent, where there are lots of winning opportunities. Here we see a significant increase in playing strength. Against a stronger opponent the situation occurs far less frequently, and not much is to gain. PVS seems to do relatively better against a stronger opponent, i.e., MTD(f) relatively deals less well with decreasing search values. This behavior has been reported before. We conclude that the problem of MTD(f) playing a wrong move is rare but real. The weaker the opponent, the higher the probability of obtaining a winning position, and the higher the probability of a wrong move. MTDfix is an effective and efficient solution to this problem, which also removes the need to clear the TT before the search. It appears that the problem of occasional blunders by MTD(f) is solved, and it is safe to use the algorithm in tournaments and machine learning, without clearing the TT. Future Work Future work will include more experimental support of the conclusions. References [1] https://chessprogramming.wikispaces.com/search+instability [2] Plaat, A. (1996). Research re: Search & re-search (Doctoral dissertation, Thesis Publishers). [3] https://askeplaat.wordpress.com/534-2/mtdf-algorithm/ 5