Computing Science (CMPUT) 496

Size: px

Start display at page:

Download "Computing Science (CMPUT) 496"

Joy Doyle
5 years ago
Views:

1 Computing Science (CMPUT) 496 Search, Knowledge, and Simulations Martin Müller Department of Computing Science University of Alberta Winter 2017

2 Part IV Knowledge

3 496 Today - Mar 9 Announcements: Quiz 7 still open Quiz 8 simulations, due Mar 13 Added small clarifications to Assignment 3 specification - list of atari defense moves in policy_moves command vs generate a single move in policy of simulation player Today s topics: Knowledge for Heuristic Search and Simulations State Evaluation Move Evaluation

4 Using Knowledge in Heuristic Search Review - use of knowledge for search and simulations so far How can knowledge be used? Basic concepts - properties and interpretations of knowledge for heuristic search Representing knowledge Acquiring knowledge - manual vs machine learning

5 Knowledge for Search and Simulations - Story so Far Discussed techniques for heuristic search and simulation Many were knowledge-free Blind search, uniform random simulations Many others used a black box heuristic evaluation function Goal-distance heuristics in best-first search Admissible heuristics in A* Depth- or time-limited alphabeta search We did not discuss much how to build such a function. We will do that now.

6 Story so Far (Continued) We also used some knowledge to improve simulation policies 3x3 patterns Move filters Assignment 3 - atari capture and atari defense Probabilistic simulation policies (Coulom paper) Now, look deeper into knowledge for heuristic search What is knowledge used for? Where does it come from? How is it selected? constructed? learned?

7 Knowledge for State and Move Evaluation Evaluation function: mapping from state to number - how good is that state? (state evaluation, position evaluation) Move evaluation: mapping from move (action) to number - how good is that move? (e.g. probabilities in simulation policy) Filter: which moves are bad and should be filtered out (pruned) The big two for us now are: state evaluation and

8 Other Kinds of Knowledge Many other kinds of knowledge in heuristic search Examples: time control, search depth control game-specific knowedge to reduce size of state space (we discussed DAG vs tree already) Efficient state representation (we discussed) Knowledge about algorithm optimization and tuning...

9 Using Knowledge Part 1: State Evaluation We know exact evaluation in terminal states Games: Win, loss, draw, win by 23.5 points,... Best-first search: distance to goal h(s) = 0 What about heuristic evaluation in non-terminal states? In games, two kinds of evaluation are popular Heuristic evaluation: higher is better Winning probability: higher is better, plus has an interpretation as probability

10 What is State Evaluation used For? Most important: as evaluation function in search Leaf nodes evaluated by this function Interior nodes evaluated by minimax rule Heuristic evaluation of interior nodes for move ordering, for leaves of depth-limited searches What does an evaluation mean?

11 What does Winning Probability Mean? Different interpretations Clearest case: game with chance element, e.g. dice rolls The winning probability is the minimax score! Example - backgammon In some state s I need to roll two sixes to win, otherwise I lose Probability of rolling two sixes = 1/6 1/6 = 1/36 Value v(s) = 1/36

12 Winning Probability in Games with No Chance There are no probabilities in the game itself A perfect player would always know - winning probability is either 0% or 100% Probability comes from either imperfect opponents, or our imperfect understanding of the game Example - again simulation-based player Winning probability = winrate in simulation Probabilities come from both players using randomized policies in simulation Monte Carlo Tree Search also uses winning probabilities of simulations Main difference: they have a non-random in-tree phase followed by the randomized simulation

More on Winning Probabilities Image source: Silver et al, Mastering the game of Go with deep neural networks and tree search, Nature Simulations are not the only way to get probabilities Can use

13 More on Winning Probabilities Image source: Silver et al, Mastering the game of Go with deep neural networks and tree search, Nature Simulations are not the only way to get probabilities Can use machine learning to learn win probabilities Example: AlphaGo s value network - deep neural net that maps states to win probabilities Can also define rules for estimating winning probabilities Translate heuristic evaluation into a probability (more later) Difficult, not used frequently

Heuristic Evaluation Function Heuristic evaluation: higher is better Possible: estimate of score of game +12 = Black is about 12 points ahead Possibly no other meaning,

14 Heuristic Evaluation Function Heuristic evaluation: higher is better Possible: estimate of score of game +12 = Black is about 12 points ahead Possibly no other meaning, just a number Example: material evaluation function in chess Queen = 9, rook = 5, bishop = 3,... Evaluation = sum of my material s values - sum of opponent s material s values

15 One Interpretation General motto: Similar evaluation values for similar states (I believe the more precise version below is due to Stuart Russell, author of the AIMA textbook) All states with the same evaluation are equally good means they have the same (but not known to us) probability of winning Any state with a higher evaluation has a higher probability of winning Evaluation function partitions set of all states S into subsets S v, where each state in S v is equally good for us

16 Relative vs Absolute Evaluation Unless numbers in evaluation have a meaning such as probability or score, the numbers themselves do not matter Only the ordering given by the numbers matters - it decides the preference or ranking between moves Example 1: multiply all function values by 10 Example 2: add 7 to all function values The search will be exactly the same Any mapping by a monotonically increasing (order-preserving) function will give the same search behavior In utility theory this is called ordinal utility

17 Skill-Testing Question Everything I said on last slide was true for minimax search What about negamax? 5 minutes for discussion with neighbor or on chat Repeated claim from last slide: Any mapping by a monotonically increasing (order-preserving) function will give the same search behavior Is it true? False? Is it true under some conditions? Which?

18 Mixing Exact and Heuristic Evaluation We can mix both kinds of evaluations If we are careful, we can get true proofs of wins and losses this way Example: win = 10000, highest heuristic score = 5000 If alphabeta returns 10000, it is a proven win Having a good heuristic can help speed up an exact proof Provides good move ordering for iterative deepening search A better move sorted first means more cuts in the tree search

19 Using Knowledge Part 2: Move Evaluation Given a state, and the possible moves from that state Put a numeric value on each move Main use: action selection in search, in simulation Can also be used for move ordering in search Again, we can have evaluation both with and without a probabilistic interpretation

20 Move Evaluation as Probability Move i with probability p i : Interpretation 1: p i is probability that move i is a win Interpretation 2: p i is probability that move i is the best move Both make sense Which one you use depends on how you compute or estimate those numbers

(ca. 1989-1995) Next big question: where do evaluations come from?

21 Move Evaluation as A Number No interpretation As with state evaluation, bigger is better Example: classical Go program Explorer (ca ) Next big question: where do evaluations come from? In Explorer, they come from a large number of heuristics for different types of moves

22 Details on Move Generation in Classical Go Program Explorer Each move has list of move motives, each with a number Sum of numbers = evaluation of the move Pure guessing, no check of the state after playing the move

23 Relation between State and Move Evaluation (1) Case 1: we have only state evaluation, but need move evaluation Easy - do a 1 ply search Evaluation of move = evaluation of state after making that move Example: Go3 and Go4 - simulation-based players Play move, run simulations from state s afterwards Move evaluation = winrate of simulations starting from state s

24 Relation between State and Move Evaluation (2) Case 2: we have only move evaluation, but need state evaluation No easy solution We could try to do greedy rollout by following the sequence of best moves Still, in the end we have to evaluate the terminal state to get a value

25 Acquiring Evaluation Knowledge Where do evaluations come from? (now) Machine learning (old) Local goal-directed search (old) Handcoded rules First, discuss how to represent knowledge in a program

26 Representing Knowledge for Evaluation Many ways to represent knowledge Handcoded rules Simple features Pattern databases Neural nets

27 Handcoded Rules def selfatari(board, move, color): maxoldliberty = maxliberty(board, move, col if maxoldliberty > 2: return False cboard = board.copy() islegal = cboard.move(move, color) if islegal: newliberty = cboard.liberty(move,color) if newliberty == 1: return True return False Most direct way Example: move filters and some of the rules in Go4

28 Simple Features in Fuego enum FeBasicFeature{ FE_PASS_NEW, FE_PASS_CONSECUTIVE, FE_CAPTURE_ADJ_ATARI,... FE_CAPTURE_MULTIPLE, FE_EXTENSION_NOT_LADDER, FE_EXTENSION_LADDER,... FE_TWO_LIB_SAVE_LADDER, FE_TWO_LIB_STILL_LADDER,... FE_SELFATARI, FE_ATARI_LADDER,... FE_DOUBLE_ATARI, FE_DOUBLE_ATARI_DEFEND, FE_LINE_1, FE_LINE_2, FE_LINE_3,... } Idea: each feature is a boolean statement about a state, or a move Each feature is simple and easy to compute With machine learning, we can construct an evaluation function from a combination of many simple features Examples: see Remi Coulom s paper for list, Fuego screenshot for examples (on next few slides)

29 Remi Coulom s Simple Features (1)

30 Remi Coulom s Simple Features (2) Source: Remi Coulom, Computing Elo Ratings of Move Patterns in the Game of Go

31 Fuego Simple Features Simple features in Fuego Go program Similar to Coulom s features Each legal move will have a (small) set of features

Pattern Databases Image source: Stern et al, Bayesian Pattern Ranking for Move Prediction in the Game of Go Large patterns can be learned from master games, if they are frequently used In Go,

32 Pattern Databases Image source: Stern et al, Bayesian Pattern Ranking for Move Prediction in the Game of Go Large patterns can be learned from master games, if they are frequently used In Go, typically we have many different sizes of pattern, from 3x3 to full board A main question is how to evaluate such patterns Measure how often the move in the center is played immediately, or later

Neural Nets Represent knowledge in (large number of) weights of the neural net Lower levels have local knowledge (e.g. 3x3, 5x5) Image source: https://www.

33 Neural Nets Represent knowledge in (large number of) weights of the neural net Lower levels have local knowledge (e.g. 3x3, 5x5) Image source: ShaneSeungwhanMoon/ how-alphago-works Higher levels can combine local information for global evaluation Much more on nets later in the course

Example of Exact Knowledge: Benson s Algorithm Benson s algorithm finds stones and territories that are unconditionally alive No matter what the opponent

34 Example of Exact Knowledge: Benson s Algorithm Benson s algorithm finds stones and territories that are unconditionally alive No matter what the opponent plays, they cannot capture these stones A generalization of the two eyes concept Can be used as an exact filter in a program - do not generate moves in safe territory

35 Summary Many kinds of knowledge Used for evaluating states and moves Heuristic rules, patterns, neural networks Exact knowledge, e.g. safe stones Next: details - how to represent knowledge in program

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search COMP19: Artificial Intelligence COMP19: Artificial Intelligence Dr. Annabel Latham Room.05 Ashton Building Department of Computer Science University of Liverpool Lecture 1: Game Playing 1 Overview Last