CS-E4800 Artificial Intelligence

Similar documents
Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Adversary Search. Ref: Chapter 5

Game-Playing & Adversarial Search

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

CS 771 Artificial Intelligence. Adversarial Search

Adversarial Search 1

ARTIFICIAL INTELLIGENCE (CS 370D)

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game?

CS 4700: Foundations of Artificial Intelligence

Artificial Intelligence. Minimax and alpha-beta pruning

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1

Lecture 5: Game Playing (Adversarial Search)

CS 188: Artificial Intelligence

Game Playing State-of-the-Art

Artificial Intelligence

Foundations of Artificial Intelligence

Game-playing: DeepBlue and AlphaGo

Programming Project 1: Pacman (Due )

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence

CS 387: GAME AI BOARD GAMES

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

game tree complete all possible moves

CS 188: Artificial Intelligence Spring Announcements

Artificial Intelligence

CS510 \ Lecture Ariel Stolerman

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

A Bandit Approach for Tree Search

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Video of Demo Mystery Pacman. Adversarial Search

Artificial Intelligence Adversarial Search

Ar#ficial)Intelligence!!

Adversarial Search: Game Playing. Reading: Chapter

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

CS188 Spring 2014 Section 3: Games

Multiple Agents. Why can t we all just get along? (Rodney King)

Intuition Mini-Max 2

More on games (Ch )

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

Theory and Practice of Artificial Intelligence

CS 387/680: GAME AI BOARD GAMES

Game Playing: Adversarial Search. Chapter 5

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements

Artificial Intelligence

Games (adversarial search problems)

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

Game Tree Search. Generalizing Search Problems. Two-person Zero-Sum Games. Generalizing Search Problems. CSC384: Intro to Artificial Intelligence

Computer Science and Software Engineering University of Wisconsin - Platteville. 4. Game Play. CS 3030 Lecture Notes Yan Shi UW-Platteville

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

Adversarial Reasoning: Sampling-Based Search with the UCT algorithm. Joint work with Raghuram Ramanujan and Ashish Sabharwal

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

More on games (Ch )

CS 2710 Foundations of AI. Lecture 9. Adversarial search. CS 2710 Foundations of AI. Game search

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1

School of EECS Washington State University. Artificial Intelligence

Adversarial Search Lecture 7

Game Engineering CS F-24 Board / Strategy Games

CS 188: Artificial Intelligence. Overview

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie!

CSC 380 Final Presentation. Connect 4 David Alligood, Scott Swiger, Jo Van Voorhis

CS 331: Artificial Intelligence Adversarial Search II. Outline

COMP219: Artificial Intelligence. Lecture 13: Game Playing

CSC384: Introduction to Artificial Intelligence. Game Tree Search

Games and Adversarial Search

CS440/ECE448 Lecture 9: Minimax Search. Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017

Adversarial Search and Game Playing

Contents. Foundations of Artificial Intelligence. Problems. Why Board Games?

Game Playing State of the Art

CS 188: Artificial Intelligence Spring 2007

Adversarial Search and Game Theory. CS 510 Lecture 5 October 26, 2017

Artificial Intelligence. Topic 5. Game playing

Foundations of AI. 6. Board Games. Search Strategies for Games, Games with Chance, State of the Art

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

Foundations of AI. 5. Board Games. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard and Luc De Raedt SA-1

Game Playing. Philipp Koehn. 29 September 2015

43.1 Introduction. Foundations of Artificial Intelligence Introduction Monte-Carlo Methods Monte-Carlo Tree Search. 43.

6. Games. COMP9414/ 9814/ 3411: Artificial Intelligence. Outline. Mechanical Turk. Origins. origins. motivation. minimax search

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

Artificial Intelligence 1: game playing

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm

Algorithms for Data Structures: Search for Games. Phillip Smith 27/11/13

CSE 573: Artificial Intelligence

Artificial Intelligence Search III

Today. Types of Game. Games and Search 1/18/2010. COMP210: Artificial Intelligence. Lecture 10. Game playing

Monte Carlo Tree Search

Game playing. Outline

Data Structures and Algorithms

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

CS188 Spring 2010 Section 3: Game Trees

CS188 Spring 2010 Section 3: Game Trees

Artificial Intelligence

Announcements. Homework 1 solutions posted. Test in 2 weeks (27 th ) -Covers up to and including HW2 (informed search)

Transcription:

CS-E4800 Artificial Intelligence Jussi Rintanen Department of Computer Science Aalto University March 9, 2017

Difficulties in Rational Collective Behavior Individual utility in conflict with collective utility Examples: greenhouse gases over-population de-forestation arms race, military build-up No general solution to resolve this conflict Issue: How to align agents and collectives utilities Law/agreements to constrain individual actions (hard to enforce when utilities high)

Tragedy of the Commons Using jointly-owned resource; cost evenly shared 0 2 4 6 0 0,0-1,1-2,2-3,3 2 1,-1 0,0-1,1-2,2 4 2,-2 1,-1 0,0-1,1 6 3,-3 2,-2 1,-1 0,0 Using your own resource 0 2 4 6 0 0,0 0,0 0,0 0,0 2 0,0 0,0 0,0 0,0 4 0,0 0,0 0,0 0,0 6 0,0 0,0 0,0 0,0 Best action: Spend joint resource as much as you can (Or (under diminishing marginal utility) at least: Spend it more than you would if you had to pay for it in full.) (Flatmates agree to fill-up fridge every day, divide the cost evenly, and let everybody eat as much as they like. Good idea?)

Games with State Strategies in normal form games single shot Real-world games typically involve multiple stages Formalizations: Games in extensive form (game theory) Multi-agent Markov decision processes: MDPs with actions replaced by normal form games, and payoffs obtained from values of successor states) Game-tree search for zero-sum games (later today) Can be abstractly viewed as normal form games (reduction to normal form exponential size) Time-dependent aspects cannot be investigated in normal form

Challenges in multi-agent systems 1 players utilities opposite (coordination impossible, mixed strategies) 2 conflicting individual and collective utility (coordination difficult, suboptimal collective outcomes) 3 making a decision collectively (measuring utility)

Preference aggregation Decision between alternatives A, B and C Agents express their preferences option 1: some ranking/ordering of A, B, C option 2: numeric values of A, B, C Preferences need to be aggregated to obtain a joint ordering/valuation of A, B, C This is difficult! Agents utilities generally not publicly known Optimal strategy (often): lie about utilities/preferences Suboptimal outcomes

Aggregation of rankings Set of candidates (outcomes, alternatives) Set of agents Objective: Produce aggregate ordering of all candidates, or a winning candidate.

Aggregation of rankings A scoring rule assigns a numeric score based on the position in each invididual ordering. Aggregate ordering formed by summing the scores from each individual. Possible rules (for ranking 4 individuals) plurality x > y > z > u mapped to 1, 0, 0, 0 Only 1st preference counts. veto x > y > z > u mapped to 1, 1, 1, 0 (or 0, 0, 0, 1) Only last preference counts. orda count x > y > z > u mapped to 3, 2, 1, 0

Aggregation of rankings Scoring rules can be combined with runoff procedures: 2-candidate runoff with plurality rule 1 Eliminate all but top two based on scores 2 Recalculate scores, and winner is the one scoring higher Single transferable vote with plurality rule 1 Eliminate candidate with lowest plurality score 2 Continue eliminations until one left

Aggregation of preferences/ranks Ordering by pairwise plurality Order x > y if plurality of agents prefer x to y. Can lead to cycles: agent 1: > > agent 2: > > agent 3: > >

Aggregation of preferences/ranks Ordering by pairwise plurality How are these cycles possible? Candidates have different property vectors: (1,1,0) (1,0,1) (0,1,1) Even if the agents value all properties positively, uneven weights lead to cycles. Example: (3,2,1) (1,3,2) (2,1,3)

Strategic voting Expressing preferences incorrectly can be beneficial: Assume plurality voting Agent s actual preferences are A > B > C Agent knows that other agents preferences are B > C > A B > C > A C > A > B C > A > B B and C will be tied if agent votes A > B > C. B wins if agent votes B > A > C (better result!) Other scoring rules (and voting systems in general) are manipulable similarly.

(Can be viewed as a generalization of second-price sealed bid auctions. Vickrey-Clarke-Grove mechanism With the Clarke pivot rule Choice between alternatives in set X : 1 Agents report their value functions v i (x), x X 2 Best outcome is x opt = arg max n x X i=1 v i(x) 3 Agent i is paid j i v j (x opt ) max x X v j (x) Value of x opt - value of best alternative (without i) Agent s payment+utility is maximized by truthful reporting! j i

Game tree search Two-person multi-stage zero-sum games player wins, opponent loses, or vice versa (or it s a draw) Board games: checkers, chess, backgammon, go Other applications? (Military operations?) Issue: very large search trees Issue: focusing search difficult

Basic game tree search by Minimax Depth-first search of bounded depth AND-OR tree Leaf nodes evaluated with a heuristic value function Chess: value of pieces, relative positions (mobility, safety of king,...) Values of non-leaf nodes by min or max of children AND-nodes (opponent) by minimization OR-nodes (player) by maximization (Special case: whole game-tree covered, winning leafs 1, losing leafs -1, and draws 0)

Minimax Tree Search 1 0 1 2 0 3 1 0 2-1 0 3-2 1 1

Alpha-Beta Pruning Idea behind Alpha-Beta Pruning min(x, max(y, z)) = x if x y (α cuts) max(x, min(y, z)) = x if x y (β cuts) In both cases, z is irrelevant.

Alpha-Beta pruning example MAX 3 MIN 3 3 12 8

Alpha-Beta pruning example MAX 3 MIN 3 2 3 12 8 2 X X

Alpha-Beta pruning example MAX 3 MIN 3 2 14 3 12 8 2 X X 14

Alpha-Beta pruning example MAX 3 MIN 3 2 14 5 3 12 8 2 X X 14 5

Alpha-Beta pruning example MAX 3 3 MIN 3 2 14 5 2 3 12 8 2 X X 14 5 2

Heuristics to support Alpha-Beta Pruning Alpha-Beta prunes more if best actions tried first Determine promising actions through iterative deepening: use score for action/child from previous iterative-deepening round

Issue with depth-bounds: Horizon effect Black bishop is trapped, but its capture could be delayed to search depth d + 1

Transposition tables Depth-first used in games like chess because astronomic state spaces: algorithms that require storing all visited states not feasible. Need to utilize memory for pruning, without exhausting it DFS can reach a state in multiple ways = Multiple copies of the same subtree Transposition tables: Cache states encountered during DFS; retrieve value of already-encountered states, rather than repeating search When table full, delete low-importance states

Endgame databases In games with a limited number of simple (late) states/configurations, compute their value by exhaustive game-tree search and store for later use. Another form of caching, constructed once, before game-playing

Endgame databases All 7 piece states solved in 2012 7-piece DB is 140 TB; 6-piece DB is 1.2 TB Black to check-mate in 545 moves:

Checkers is solved Checkers (5 10 20 states) was shown to be a draw (Schaeffer et al., 2007) The solution consists of: AND-OR tree from initial state ( 10 7 nodes) Leaf nodes evaluated from endgame database with all 10 piece positions: consists of 3.9 10 13 states; computations took 2001-2005

Checkers is solved

Monte Carlo methods DFS not working well for some types of games Too many states Heuristics don t guide search well Information gained during search not utilized Monte Carlo methods Sample randomly full game-plays Focus search according to promising game-plays Works even without heuristics, e.g. for Go Similar methods used also for very large MDPs, POMDPs (e.g. in robotics)

Go (or Baduk or Weiqi) Two-player fully-observable deterministic zero-sum board game Has been a big challenge for computers

Rules of Go Go is played on 19 19 square grid of points, by players called Black and White. Each point on the grid may be colored black, white or empty. A point P, not colored C, is said to reach C, if there is a path of (vertically or horizontally) adjacent points of P s color from P to a point of color C. Clearing a color means emptying all points of that color that don t reach empty. Starting with an empty grid, the players alternate turns, starting with Black. A turn is either a pass; or a move that doesn t repeat an earlier grid coloring. A move consists of coloring an empty point one s own color; then clearing the opponent color, and then clearing one s own color. The game ends after two consecutive passes. A player s score is the number of points of her color, plus the number of empty points that reach only her color. White gets 6.5 points extra. The player with the higher score at the end of the game is the winner.

Example game of 9 9 Go

Example game of 9 9 Go

Example game of 9 9 Go

Why is Go difficult for computers? Go is visual and thus easy for people Could not show 10 Chess moves in one image Branching factor far larger than in Chess Evaluation of board configurations difficult Horizon effect is strong (easy to delay capture)

Paradigm shift in 2006 Computer Go was progressing slowly (weak amateur level) In 2006, Monte-Carlo methods surpassed traditional tree search In 2015 All competitive programs use Monte Carlo 19 19 is strong amateur level 9 9 is professional level 5 6 is solved, solving 6 6 feasible In 2016, board-evaluation by neural networks AlphaGo beats human champions

Monte Carlo Search Try out every possible action Several randomized plays: Choose actions randomly Stop only after game ends Score each gameplay according to who wins Best action is one with most wins Notice: No search tree here, only evaluation of current action alternatives

Monte Carlo Tree Search (MCTS) Extension of simulation/sampling-only Monte Carlo search Generate a search tree, with leafs evaluated by randomized simulation

Example (Single Agent) 0/0 0/0 Show number of wins/trials for each node

Monte Carlo Tree Search 1/1 1/1 win

Monte Carlo Tree Search 1/1 1/1 0/0

Monte Carlo Tree Search 1/2 1/1 0/1 loss

Monte Carlo Tree Search 2/3 1/1 0/1 1/1 win

Monte Carlo Tree Search 3/4 2/2 0/1 1/1 1/1 win

Monte Carlo Tree Search 3/5 2/2 0/1 1/2 1/1 0/1 loss

Monte Carlo Tree Search 4/6 3/3 0/1 1/2 1/1 1/1 0/1 win

Monte Carlo Tree Search Which tree node to choose for next expansion or trial? Incomplete information: results of previous trials Choose one with few trials with high rewards (low confidence) Choose one with many trials with lower rewards (high confidence) (Exploration-exploitation trade-off as in Reinforcement Learning) approach: Multi-Armed Bandits

Multi-Armed Bandits Consider three One-Armed Bandits (slot machines) with different win distributions, and with the following wins so far. 1 0, 1, 0, 0, 1 2 5 3 2, 2, 1 Which arm would you pull next?

Multi-Armed Bandits µ i = (initially unknown) expected pay-off of arm i T i (t) = how many times arm i played in steps 1..t µ = max K i=1 µ i is optimum pay-off Optimal way of choosing the arm minimizes regret (how much below optimum?) after n steps: nµ K µ i E[T i (n)] i=1

Multi-Armed Bandits x i = average reward from arm i in first n steps UCB1 Formula (Auer et al. 2002) First every arm is played once. Optimal arm after n steps: choose i to maximize 2 ln n x i + T i (n)

UCT algorithm Create a root of tree with initial state while within computational budget do leaf Selection(root) terminal Simulation(leaf) Backpropagation(leaf, Utility(terminal)) end return arg max Children(root) N(child)

UCT algorithm function Selection(node) while NonTerminal(State(node)) do action arg max Actions(node) UCB1(node, action) if Child(node, action) then node Child(node, action) else return Expand(node,action) end end return node function UCB1(node,action) child Child(node,action) if child then return SumUtil(child) + 2 ln N(node) N(child) N(child) else return

UCT algorithm function Expand(node, action) child Create a new child to node N(child) 0 SumUtil(child) 0 return child function Backpropagation(node, utility) while node do N(node) N(node) + 1 SumUtil(node) SumUtil(node) + utility node Parent(node) end

Properties of UCT algorithm Best action chosen exponentially more often Grows an asymmetric tree Utility estimates converge to true values Applicable to one or more agents deterministic or stochastic systems