Reinforcement Learning for Penalty Avoiding Policy Making and its Extensions and an Application to the Othello Game

Size: px
Start display at page:

Download "Reinforcement Learning for Penalty Avoiding Policy Making and its Extensions and an Application to the Othello Game"

Transcription

1 Reinforcement Learning for Penalty Avoiding Policy Making and its Extensions and an Application to the Othello Game Kazuteru Miyazaki National Institution for Academic Degrees, Ootsuka Bunkyo-ku Tokyo, Japan ougo Tsuboi TOHIBA, 1 Toshiba Komukai aiwai Kawasaki, Japan higebu Kobayashi kobayasi@dis.titech.ac.jp Tokyo Institute of Techlogy, 4259 Nagatsuta Midori Yokohama, Japan ABTRACT The purpose of reinforcement learning system is to learn optimal policies in general. However, from the engineering point of view, it is useful and important to acquire t only optimal policies, but also penalty avoiding policies. In this paper, we are focused on formation of penalty avoiding policies based on the Penalty Avoiding Rational Policy Making algorithm [1]. In applying the algorithm to large-scale problems, we are confronted with the combinational explosion. To suppless the problem, especially the number of states, we introduce several ideas and heuristics. We implemented the proposed method as an Othello game player s learning system. This learning player can always defeat against the well-kwn Othello game program KITTY [7] after learning. Keywords: reinforcement learning, reward and penalty, penalty avoiding rational policy, the Othello game, KITTY 1. INTRODUCTION Reinforcement learning (RL) is a kind of machine learning. It aims to adapt an agent to a given environment with a clue to rewards. If we give the agent what should he do (its purpose) and/or don t (its restriction), it can learn how to satisfy them. In RL, it is important how to design rewards. Recently, in most RL systems [5], a positive reward called a reward is given to the agent when it has achieved a purpose, and a negative one called a penalty is given to it when it has violated a restriction. However, if we set incorrect values for them, the agent will learn unexpected behavior. For example, in two players game, such that the Othello game, considering the case that a reward is given to the winner, and a penalty is given to the loser. If we have designed incorrect values for them, the agent may lose the game even if there is a victory strategy. This is because that reward and penalty are treated at the same dimension. Therefore, it is important to distinguish a reward (for achievement of a purpose) from a penalty (for violation of a restriction). We kw the Penalty Avoiding Rational Policy Making algorithm [1] as a reinforcement learning system to make a distinction a reward and a penalty. Though it can suppress any penalty as stable as possible and can get a reward constantly, it has to memorize many state-action pairs such that Q- learning [6] and TD(λ) [4]. In this paper, we discuss extensions of the Penalty Avoiding Rational Policy Making algorithm in the class where we have some information of target environments. We introduce several ideas and heuristics to suppless the combinational explosion in large-scale problems. Furthermore, we implemented the proposed method as an Othello game player s learning system. ection 2 describes the problem, the method, tations and the Penalty Avoiding Rational Policy Making algorithm. ection 3 describes extensions of the Penalty Avoiding Rational Policy Making algorithm. ection 4 applies it to the Othello game. ection 5 is conclusion. 2. THE DOMAIN 2.1 Target Environments Consider an agent in some unkwn environment. At each time step, the agent gets information about the environment through its sensors and chooses an action. As a result of some sequence of actions, the agent gets a reward or a penalty from the environment. We assume that target environments are Markov Decision Processes (MDPs). A pair of a sensory input (a state) and an action is called a rule. We dete a rule if x then a as xa. where x is a state and a is an action.

2 b x a a y Figure 1. An example of penalty rules (xa, ya) and a penalty state (y). The function that maps states to actions is called a policy. We call a policy rational if and only if expected reward per an action is larger than zero. The function that maps a state (or a rule) to a reward (or a penalty) is a reward function. We call a sequence of rules used between the previous reward (or penalty) and the current one an episode. We call a subsequence of an episode a detour when the state of the first firing rule and the state of the last firing rule are the same though both rules are different. The rule that does t exist on a detour in some episode is rational. Otherwise, a rule is called irrational. We call a rule penalty if and only if it has a penalty or it can transit to a penalty state in which there are penalty or irrrational rules. For example, in figure 1 xa and ya are penalty rules, and state y is a penalty state. We call a policy that cant have any penalty rule penalty avoiding policy. We assume that there is a deterministic rational policy in penalty avoiding policies. For each sensory input, a deterministic policy always returns an action but a stochastic policy returns an action stochastically. P b 2.2 The Penalty Avoiding Rational Policy Making algorithm [1] We kw the Penalty Avoiding Rational Policy Making algorithm (PARP) [1] as a reinforcement learning system to treat the environments discussed in section 2.1. To avoid all penalties, PARP suppresses all penalty rules in the current rule set by the Penalty Rule Judgment algorithm (PRJ) in figure 2. After suppressing all penalty rules, it makes a rational policy by the Rational Policy Improvement algorithm [1]. Though PARP can learn a stochastic rational policy in the class where there is deterministic rational policy in penalty avoiding policies, we do t treat a stochastic rational policy. Though PARP can always learn a deterministic rational policy in the class where there is it in penalty avoiding policies, PRJ has to memorize all rules that have been experienced and descendant states that have been transited by their rules to find all penalty rules. In applying PRJ to large-scale problems, we are confronted with the combinational ex- procedure The Penalty Rule Judgement begin et a mark on the rule that has been got a penalty directory do et a mark on the following state ; there is rational rule or there is rule that can transit to marked state. et a mark on the following rule ; there are marks in the states that can be transited by it. while (there is a new mark on some state) end. Figure 2. The Penalty Rule Judgment algorithm (PRJ) [1]; First, we set a mark on the rule that has been gotten a penalty directory. econd, we set a mark on the state where there is rational rule or there is rule that can transit to marked state. Last, we set a mark on the rule where there are marks in the states that can be transited by it. We can regard a marked rule as a penalty rule. We can find all penalty rules in the current rule set by continuing the above process until there is new mark. plosion of them. To suppless the problem, especially the number of states, we introduce several ideas and heuristics to PRJ. 3. EXTENION OF THE PENALTY AVOIDING RATIONAL POLICY MAKING ALGORITHM 3.1 The Basic Idea Though PRJ can find all penalty rules efficiently, it has to memorize all rules that have been experienced and descendant states that have been transited by their rules. In applying PRJ to large-scale problems, it is important to save the memory and restrict exploration. In section 3.2, we discuss how to save the memory. In general, there is free lunch to realize it.in this paper, we propose how to save the meory by calculation of state transition in ths class where we can kw a reward function and a candidate for a descendant state of the state transition. In section 3.3, we discuss how to restrict exploration. We propose an alogirithm to explore the environment by kwledge. 3.2 How to ave the Memory by Calculation of tate Transition In this paper, we treat the class where we can kw a reward function and a candidate for a descendant state of the state transition. When the agent selects

3 an action a A t in the state s t at time t, we can kw variation of the state s t+1 at time t + 1 and its immidiate reward or penalty. It is natural assumption in two players game such as the Othello, igo, shougi, backgammon and so on. We show extensions of PRJ in this situation. Before selecting an action, it finds all penalty rules in the current rule set by calculation of all states that can be transited from the current state. After selectiong an action, if the agent gets a new penalty, it tries to find a new penalty rule again. We use long and short term memories to realize it. long term memory If there is new penalty rules and states in short term memory, they are memorized in long term memory. They are holding in learning. short term memory hort term memory memorizes all states and actions in the current episode. After calculating all states and rules that can be transited from the current state, they are memorized in short term memory. If there is the states in long term memory, new penalty rules are found by PRJ. If there is new penalty rules and states, they are memorized in long term memory. hort term memory is initiallized for each episode. Therefore, the agent can find all new penalty rules by penalty rules only. It does t need to memorize descendant states of state transition in action selection. tate transition and reward functions that are given by the environment are t necessary correct functions. It is t confused by incomplete information such that some penalty or state that should be existed on are t given to the agent. However, it is confused by incredible information such that some penalty or state that should t be existed on are given to the agent. 3.3 How to Restrict Exploration by Kwledge In applying PRJ to large-scale problems, we need to try many trials to spread a penalty rule. Especially, it is a serious problem in long episode. We introduce how to design a semi-penalty that is a broad definition of a penalty by kwledge. It means that the action or the state may cause getting a penalty. After finding penalty rules by PRJ, we use PRJ to find semi-penalty rules. We call a rule semi-penalty if and only if it has a penalty or a semi-penalty, or it can transit to a penalty state or a semi-penalty state in which there are semi-penalty, penalty or irrrational rules. ince a semi-penalty does t always cause a penalty, it has a possible that all states are semi-penalty states even if there is a penalty avoiding rational policy. The problem can conquest an action selector. Usually, we should select a rational rule that is t a penalty and a semi-penalty rule. If we cant select any rational rule in semi-penalty states, we should select a rational rule that is t a penalty rule. However, if we define incorrect semi-penalty, we need more trial to find penalty rules than the original version of PRJ since exploration is biased. 4. APPLICATION TO THE OTHELLO GAME 4.1 The Basic Idea We implemented the proposed method as an Othello game player s learning system. We use KITTY by Igor Durdavic as an opponent player. It is the near-strongest program in open source players. We use kitty.ios in KITTY s source code [8]. It has interface of Internet Othello erver (IO). We do t give KITTY learning mechanism. Therefore, KITTY s action selection probability is stable. The depth sets 4 (it is minimum value) or 60 (it is maximum value). 4.2 Construction of the Reinforcement Learning Player peciffication We describe our RL player for the Othello game (see figure 3). It gets the state of the Othello from IO. It can calculate variations of actions from the state. It selects an action from them and returns it to IO. If it cant any action, it returns PA action to IO. If it loses the game, it gets a penalty from IO. Furthermore, we have ather experiment where if it cant win the game, it gets a penalty from IO. We set the size of short term memory It is e- ugh to storage at least one step state transitions. It can calculate two or three steps state transitions in first stages and one step them in middle stages. Remark that there is irrrational rule in the Othello game Kwledge of the Othello Game It is important to restrict exploration from first to middle stage since there is a huge state space in middle stages. We use kwledge to realize it. We can use the following two type kwledge. One is KIFU database that is memorized steps in previous famous games. The other is Evaluation Function that evaluates the state of games. i. KIFU database We use NEC s KIFU database [9]. It contains about

4 RL player learning system long term memory short term memory KIFU database sensory input penalty KITTY's evaluation value the action selector Environment IO action KITTY Figure 3. The Experimental Environment. 100,000 games. We can get typical state transitions in first stages from KIFU database. It may contribute to avoid wasteful exploration in first stages. ii. Evaluation Funcion We use KITTY s evaluation function that is sent to IO by KITTY as our RL palyer s evaluation function. KITTY returns a value from to to IO as the evaluation value of a state. We define a semi-penalty state as the state whose evaluation value is larger than +1. If our RL player always can win in the first player (the black player), we can regard our method as better than KITTY since winners of KITTY vs. KITTY games are always the second players (the white players) How to elect an Action We can use the following information in action selection. a penalty rule (or a penalty state) a semi-penalty rule (or a semi-penalty state) KIFU database The priority of these information is the following. a penarty rule (state) > a semi-penalty rule (state) > KIFU database Based on this priority, we use the action selector in figure 4. The basic strategy in the action selector is to select an action whose number of transition states is the least in all actions. It contributes to restrict wastfull exploration. getting state s from IO matching s with short term memory and KIFU database. penalty state? semi- penalty state? on KIFU database? game over suppress all penalty rules and select an action by the basic action strategy. suppress all penalty and semi-penalty rules and select the most frequently used rule. suppress all penalty and semi-penalty rules and select an action by the basic action strategy. Figure 4. The Action elector If the total number of black and white cells is larger than 54, our RL player calculates the end of game. On the other hand, KITTY calulates it if the number is larger than 50 since it can use min-max exploration with evaluation function. 4.3 Results We show the results of games in table 1 in condition that KITTY does t use its library. Table 1. the number of games to get a penalty avoiding rational policy Our RL Player(black) vs KITTY(white) penalty condition depth the number of games lost lost lost or even lost or even We can confirm the effectiveness of our method in this table. If KITTY does t use its library, it cant select several actions. Therefore, our RL player always win after getting a penalty avoiding rational policy. If KITTY can use its library, it can select several actions. In this case, our RL player has to learn several penalty avoiding rational policies. The number of games to get a penalty avoiding rational policy is about 2000 in the case that KITTY uses its library and the depth sets 4. In figure 5, We show a sample sequence to aquire a penalty avoiding rational policy of the latter condi-

5 game number N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 18 0N0 0N1 0N1 0N1 0N1 0N1 0N1 0N1 0N1 0N1 0N1 0N1 0N1 20 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 22 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 24 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 26 0N0 0N1 0N1 0N1 0N1 0N1 0N1 0N1 0N1 0N1 0N1 0N1 28 0N0 0N0 0N0 0N1 0N2 0N2 0N2 0N2 0N2 0N2 0N2 0N2 0N2 0N0 30 0N0 0N0 0N0 0N4 0N4 0N4 0N4 0N4 0N4 0N4 0N4 0N4 32 0N0 0N0 0N0 0N0 0N0 0N8 0N8 0N8 0N8 0N8 0N8 0N8 0N8 0N0 0N0 0N0 0N0 34 0N0 0N0 0N4 0N4 0N4 0N4 0N4 0N4 0N4 36 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N8 0N8 0N8 0N8 0N8 0N8 0N0 0N0 0N0 0N0 0N0 0N0 38 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 40 0N0 0N0 0N9 0N9 0N9 0N9 0N9 42 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N8 0N8 0N8 0N8 0N0 0N0 0N0 0N0 0N0 0N0 0N0 44 0N0 0N0 0N0 0N0 0N0 0N0 46 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N1 0N1 0N1 48 0N0 P 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N2 0N2 50 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 52 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 54 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 1N1 56 0N0 0N0 0N0 p 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 3n0 58 0N0 0N0 p p 0N0 0N0 p 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 1N0 60 0N0 0N0 0N0 0N0 p p p p n 0 p n 0N0 0N0 0N0 p 0N0 0N0 0N0 0N0 0N0 1N0 n 62 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 n 0 cell number Figure 5. A sample sequence to aquire a penalty avoiding rational policy. N(n) ; n (penalty or semipenalty)state, ; semi-penalty state, P(p) ; penalty state, n and p mean that our RL player calculates the end of game. The former and the latter number of N(n) are the numbers of penalty rules and semi-penalty rules, respectively. References [1] Miyazaki, K. & Kobayashi,. Reinforcement Learning for Penalty Avoiding Policy Making IEEE International Conference on ystems, Man and Cybernetics, pp , [2] Miyazaki, K. & Kobayashi,. On the Rationality of Profit haring in Partially Observable Markov Decision Processes, 5th International Conference on Information ystems Analysis and Cynthesis, pp (1999). [3] Miyazaki, K., Arai,. & Kobayashi,. Cranes Control Using Multi-agent Profit haring, 6th International Conference on Information ystems Analysis and Cynthesis, Vol.IX, pp (2000). [4] utton, R.. Learning to Predict by the Method of Temporal Differences. Machine Learning Vol.3, pp.9-44, [5] utton, R.. & Barto, A. Reinforcement Learning: An Introduction. A Bradford Book, The MIT Press, [6] Watkins, C. J. H., and Dayan, P.: Technical te: Q-learning, Machine Learning Vol.8, pp.55-68, [7] learn-game/systems/kitty.html [8] ftp://ftp.nj.nec.com/pub/igord/othello/kitty/ linux kitty.tgz [9] ftp://ftp.nj.nec.com/pub/igord/othello/misc/ database.zip tion of table 1. A penalty avoiding rational policy is made of a set of all hatched states. We can use K- IFU database before 16 cells. If we use the original version of PRJ, the frontier of penalty rules is 34 cells in 2000 games. On the other hand, in figure 5, we can use semi-penalty rule at 18, from 26 to 32 and larger than 36 cells in 949 games. It means that we can overcome the slow spreads of penalty rules by semi-penalty rule. 5. CONCLUION In this paper, we extend the Penalty Avoiding Rational Policy Making algorithm [1] to large scale MDPs. We have implemented our method as an Othello game player s learning system. Our RL player can always defeat against the well-kwn Othello game program KITTY after learning. In the future works, we will compare our method with KITTY with learning mechanism. Furtermore, we will extend our method to Partially Observable Markov Decision Processes [2] and multiagent systems [3].

TUD Poker Challenge Reinforcement Learning with Imperfect Information

TUD Poker Challenge Reinforcement Learning with Imperfect Information TUD Poker Challenge 2008 Reinforcement Learning with Imperfect Information Outline Reinforcement Learning Perfect Information Imperfect Information Lagging Anchor Algorithm Matrix Form Extensive Form Poker

More information

Decision Making in Multiplayer Environments Application in Backgammon Variants

Decision Making in Multiplayer Environments Application in Backgammon Variants Decision Making in Multiplayer Environments Application in Backgammon Variants PhD Thesis by Nikolaos Papahristou AI researcher Department of Applied Informatics Thessaloniki, Greece Contributions Expert

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

An Artificially Intelligent Ludo Player

An Artificially Intelligent Ludo Player An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported

More information

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game Outline Game Playing ECE457 Applied Artificial Intelligence Fall 2007 Lecture #5 Types of games Playing a perfect game Minimax search Alpha-beta pruning Playing an imperfect game Real-time Imperfect information

More information

Game Design Verification using Reinforcement Learning

Game Design Verification using Reinforcement Learning Game Design Verification using Reinforcement Learning Eirini Ntoutsi Dimitris Kalles AHEAD Relationship Mediators S.A., 65 Othonos-Amalias St, 262 21 Patras, Greece and Department of Computer Engineering

More information

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.

More information

Artificial Intelligence. Minimax and alpha-beta pruning

Artificial Intelligence. Minimax and alpha-beta pruning Artificial Intelligence Minimax and alpha-beta pruning In which we examine the problems that arise when we try to plan ahead to get the best result in a world that includes a hostile agent (other agent

More information

CS 771 Artificial Intelligence. Adversarial Search

CS 771 Artificial Intelligence. Adversarial Search CS 771 Artificial Intelligence Adversarial Search Typical assumptions Two agents whose actions alternate Utility values for each agent are the opposite of the other This creates the adversarial situation

More information

Intelligent Agents & Search Problem Formulation. AIMA, Chapters 2,

Intelligent Agents & Search Problem Formulation. AIMA, Chapters 2, Intelligent Agents & Search Problem Formulation AIMA, Chapters 2, 3.1-3.2 Outline for today s lecture Intelligent Agents (AIMA 2.1-2) Task Environments Formulating Search Problems CIS 421/521 - Intro to

More information

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence Adversarial Search CS 486/686: Introduction to Artificial Intelligence 1 Introduction So far we have only been concerned with a single agent Today, we introduce an adversary! 2 Outline Games Minimax search

More information

Intelligent Agents p.1/25. Intelligent Agents. Chapter 2

Intelligent Agents p.1/25. Intelligent Agents. Chapter 2 Intelligent Agents p.1/25 Intelligent Agents Chapter 2 Intelligent Agents p.2/25 Outline Agents and environments Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Environment types

More information

Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar

Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar Othello Rules Two Players (Black and White) 8x8 board Black plays first Every move should Flip over at least

More information

Homeostasis Lighting Control System Using a Sensor Agent Robot

Homeostasis Lighting Control System Using a Sensor Agent Robot Intelligent Control and Automation, 2013, 4, 138-153 http://dx.doi.org/10.4236/ica.2013.42019 Published Online May 2013 (http://www.scirp.org/journal/ica) Homeostasis Lighting Control System Using a Sensor

More information

Artificial Intelligence Adversarial Search

Artificial Intelligence Adversarial Search Artificial Intelligence Adversarial Search Adversarial Search Adversarial search problems games They occur in multiagent competitive environments There is an opponent we can t control planning again us!

More information

Practice Session 2. HW 1 Review

Practice Session 2. HW 1 Review Practice Session 2 HW 1 Review Chapter 1 1.4 Suppose we extend Evans s Analogy program so that it can score 200 on a standard IQ test. Would we then have a program more intelligent than a human? Explain.

More information

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence Adversarial Search CS 486/686: Introduction to Artificial Intelligence 1 AccessAbility Services Volunteer Notetaker Required Interested? Complete an online application using your WATIAM: https://york.accessiblelearning.com/uwaterloo/

More information

Reinforcement Learning Simulations and Robotics

Reinforcement Learning Simulations and Robotics Reinforcement Learning Simulations and Robotics Models Partially observable noise in sensors Policy search methods rather than value functionbased approaches Isolate key parameters by choosing an appropriate

More information

Cognitive Radio: Brain-Empowered Wireless Communcations

Cognitive Radio: Brain-Empowered Wireless Communcations Cognitive Radio: Brain-Empowered Wireless Communcations Simon Haykin, Life Fellow, IEEE Matt Yu, EE360 Presentation, February 15 th 2012 Overview Motivation Background Introduction Radio-scene analysis

More information

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng)

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) AI Plays 2048 Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) Abstract The strategy game 2048 gained great popularity quickly. Although it is easy to play, people cannot win the game easily,

More information

CMSC 671 Project Report- Google AI Challenge: Planet Wars

CMSC 671 Project Report- Google AI Challenge: Planet Wars 1. Introduction Purpose The purpose of the project is to apply relevant AI techniques learned during the course with a view to develop an intelligent game playing bot for the game of Planet Wars. Planet

More information

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1 Unit-III Chap-II Adversarial Search Created by: Ashish Shah 1 Alpha beta Pruning In case of standard ALPHA BETA PRUNING minimax tree, it returns the same move as minimax would, but prunes away branches

More information

Using Artificial intelligent to solve the game of 2048

Using Artificial intelligent to solve the game of 2048 Using Artificial intelligent to solve the game of 2048 Ho Shing Hin (20343288) WONG, Ngo Yin (20355097) Lam Ka Wing (20280151) Abstract The report presents the solver of the game 2048 base on artificial

More information

Agent. Pengju Ren. Institute of Artificial Intelligence and Robotics

Agent. Pengju Ren. Institute of Artificial Intelligence and Robotics Agent Pengju Ren Institute of Artificial Intelligence and Robotics pengjuren@xjtu.edu.cn 1 Review: What is AI? Artificial intelligence (AI) is intelligence exhibited by machines. In computer science, the

More information

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I Adversarial Search and Game- Playing C H A P T E R 6 C M P T 3 1 0 : S P R I N G 2 0 1 1 H A S S A N K H O S R A V I Adversarial Search Examine the problems that arise when we try to plan ahead in a world

More information

Game-Playing & Adversarial Search

Game-Playing & Adversarial Search Game-Playing & Adversarial Search This lecture topic: Game-Playing & Adversarial Search (two lectures) Chapter 5.1-5.5 Next lecture topic: Constraint Satisfaction Problems (two lectures) Chapter 6.1-6.4,

More information

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions Slides by Svetlana Lazebnik, 9/2016 Modified by Mark Hasegawa Johnson, 9/2017 Types of game environments Perfect

More information

CS440/ECE448 Lecture 9: Minimax Search. Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017

CS440/ECE448 Lecture 9: Minimax Search. Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017 CS440/ECE448 Lecture 9: Minimax Search Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017 Why study games? Games are a traditional hallmark of intelligence Games are easy to formalize

More information

Optic Flow Based Skill Learning for A Humanoid to Trap, Approach to, and Pass a Ball

Optic Flow Based Skill Learning for A Humanoid to Trap, Approach to, and Pass a Ball Optic Flow Based Skill Learning for A Humanoid to Trap, Approach to, and Pass a Ball Masaki Ogino 1, Masaaki Kikuchi 1, Jun ichiro Ooga 1, Masahiro Aono 1 and Minoru Asada 1,2 1 Dept. of Adaptive Machine

More information

Learning to Play Love Letter with Deep Reinforcement Learning

Learning to Play Love Letter with Deep Reinforcement Learning Learning to Play Love Letter with Deep Reinforcement Learning Madeleine D. Dawson* MIT mdd@mit.edu Robert X. Liang* MIT xbliang@mit.edu Alexander M. Turner* MIT turneram@mit.edu Abstract Recent advancements

More information

Generalized Game Trees

Generalized Game Trees Generalized Game Trees Richard E. Korf Computer Science Department University of California, Los Angeles Los Angeles, Ca. 90024 Abstract We consider two generalizations of the standard two-player game

More information

Learning Behaviors for Environment Modeling by Genetic Algorithm

Learning Behaviors for Environment Modeling by Genetic Algorithm Learning Behaviors for Environment Modeling by Genetic Algorithm Seiji Yamada Department of Computational Intelligence and Systems Science Interdisciplinary Graduate School of Science and Engineering Tokyo

More information

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009 By David Anderson SZTAKI (Budapest, Hungary) WPI D2009 1997, Deep Blue won against Kasparov Average workstation can defeat best Chess players Computer Chess no longer interesting Go is much harder for

More information

Learning and Using Models of Kicking Motions for Legged Robots

Learning and Using Models of Kicking Motions for Legged Robots Learning and Using Models of Kicking Motions for Legged Robots Sonia Chernova and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 {soniac, mmv}@cs.cmu.edu Abstract

More information

Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.)

Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.) Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.) Eric B. Laber February 12, 2008 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or,

More information

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play NOTE Communicated by Richard Sutton TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play Gerald Tesauro IBM Thomas 1. Watson Research Center, I? 0. Box 704, Yorktozon Heights, NY 10598

More information

Playing Othello Using Monte Carlo

Playing Othello Using Monte Carlo June 22, 2007 Abstract This paper deals with the construction of an AI player to play the game Othello. A lot of techniques are already known to let AI players play the game Othello. Some of these techniques

More information

CMPT 310 Assignment 1

CMPT 310 Assignment 1 CMPT 310 Assignment 1 October 16, 2017 100 points total, worth 10% of the course grade. Turn in on CourSys. Submit a compressed directory (.zip or.tar.gz) with your solutions. Code should be submitted

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

Administrivia. CS 188: Artificial Intelligence Spring Agents and Environments. Today. Vacuum-Cleaner World. A Reflex Vacuum-Cleaner

Administrivia. CS 188: Artificial Intelligence Spring Agents and Environments. Today. Vacuum-Cleaner World. A Reflex Vacuum-Cleaner CS 188: Artificial Intelligence Spring 2006 Lecture 2: Agents 1/19/2006 Administrivia Reminder: Drop-in Python/Unix lab Friday 1-4pm, 275 Soda Hall Optional, but recommended Accommodation issues Project

More information

A Reinforcement Learning Scheme for Adaptive Link Allocation in ATM Networks

A Reinforcement Learning Scheme for Adaptive Link Allocation in ATM Networks A Reinforcement Learning Scheme for Adaptive Link Allocation in ATM Networks Ernst Nordström, Jakob Carlström Department of Computer Systems, Uppsala University, Box 325, S 751 05 Uppsala, Sweden Fax:

More information

ARTIFICIAL INTELLIGENCE (CS 370D)

ARTIFICIAL INTELLIGENCE (CS 370D) Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-5) ADVERSARIAL SEARCH ADVERSARIAL SEARCH Optimal decisions Min algorithm α-β pruning Imperfect,

More information

Cooperative Transportation by Humanoid Robots Learning to Correct Positioning

Cooperative Transportation by Humanoid Robots Learning to Correct Positioning Cooperative Transportation by Humanoid Robots Learning to Correct Positioning Yutaka Inoue, Takahiro Tohge, Hitoshi Iba Department of Frontier Informatics, Graduate School of Frontier Sciences, The University

More information

Tutorial of Reinforcement: A Special Focus on Q-Learning

Tutorial of Reinforcement: A Special Focus on Q-Learning Tutorial of Reinforcement: A Special Focus on Q-Learning TINGWU WANG, MACHINE LEARNING GROUP, UNIVERSITY OF TORONTO Contents 1. Introduction 1. Discrete Domain vs. Continous Domain 2. Model Based vs. Model

More information

Online Interactive Neuro-evolution

Online Interactive Neuro-evolution Appears in Neural Processing Letters, 1999. Online Interactive Neuro-evolution Adrian Agogino (agogino@ece.utexas.edu) Kenneth Stanley (kstanley@cs.utexas.edu) Risto Miikkulainen (risto@cs.utexas.edu)

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Reinforcement Learning Assumptions we made so far: Known state space S Known transition model T(s, a, s ) Known reward function R(s) not realistic for many real agents Reinforcement

More information

Hybrid of Evolution and Reinforcement Learning for Othello Players

Hybrid of Evolution and Reinforcement Learning for Othello Players Hybrid of Evolution and Reinforcement Learning for Othello Players Kyung-Joong Kim, Heejin Choi and Sung-Bae Cho Dept. of Computer Science, Yonsei University 134 Shinchon-dong, Sudaemoon-ku, Seoul 12-749,

More information

Intuition Mini-Max 2

Intuition Mini-Max 2 Games Today Saying Deep Blue doesn t really think about chess is like saying an airplane doesn t really fly because it doesn t flap its wings. Drew McDermott I could feel I could smell a new kind of intelligence

More information

BLUFF WITH AI. CS297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University. In Partial Fulfillment

BLUFF WITH AI. CS297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University. In Partial Fulfillment BLUFF WITH AI CS297 Report Presented to Dr. Chris Pollett Department of Computer Science San Jose State University In Partial Fulfillment Of the Requirements for the Class CS 297 By Tina Philip May 2017

More information

Five-In-Row with Local Evaluation and Beam Search

Five-In-Row with Local Evaluation and Beam Search Five-In-Row with Local Evaluation and Beam Search Jiun-Hung Chen and Adrienne X. Wang jhchen@cs axwang@cs Abstract This report provides a brief overview of the game of five-in-row, also known as Go-Moku,

More information

A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game

A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game Machine Learning, 59, 31 54, 2005 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands. A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game SHIN ISHII ishii@is.naist.jp

More information

Flexible Cooperation between Human and Robot by interpreting Human Intention from Gaze Information

Flexible Cooperation between Human and Robot by interpreting Human Intention from Gaze Information Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems September 28 - October 2, 2004, Sendai, Japan Flexible Cooperation between Human and Robot by interpreting Human

More information

Adversarial Search and Game Playing

Adversarial Search and Game Playing Games Adversarial Search and Game Playing Russell and Norvig, 3 rd edition, Ch. 5 Games: multi-agent environment q What do other agents do and how do they affect our success? q Cooperative vs. competitive

More information

Move Evaluation Tree System

Move Evaluation Tree System Move Evaluation Tree System Hiroto Yoshii hiroto-yoshii@mrj.biglobe.ne.jp Abstract This paper discloses a system that evaluates moves in Go. The system Move Evaluation Tree System (METS) introduces a tree

More information

a b c d e f g h 1 a b c d e f g h C A B B A C C X X C C X X C C A B B A C Diagram 1-2 Square names

a b c d e f g h 1 a b c d e f g h C A B B A C C X X C C X X C C A B B A C Diagram 1-2 Square names Chapter Rules and notation Diagram - shows the standard notation for Othello. The columns are labeled a through h from left to right, and the rows are labeled through from top to bottom. In this book,

More information

UMBC 671 Midterm Exam 19 October 2009

UMBC 671 Midterm Exam 19 October 2009 Name: 0 1 2 3 4 5 6 total 0 20 25 30 30 25 20 150 UMBC 671 Midterm Exam 19 October 2009 Write all of your answers on this exam, which is closed book and consists of six problems, summing to 160 points.

More information

Playing CHIP-8 Games with Reinforcement Learning

Playing CHIP-8 Games with Reinforcement Learning Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of

More information

CS221 Project Final Report Gomoku Game Agent

CS221 Project Final Report Gomoku Game Agent CS221 Project Final Report Gomoku Game Agent Qiao Tan qtan@stanford.edu Xiaoti Hu xiaotihu@stanford.edu 1 Introduction Gomoku, also know as five-in-a-row, is a strategy board game which is traditionally

More information

CMU-Q Lecture 20:

CMU-Q Lecture 20: CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro ICE-CREAM WARS http://youtu.be/jilgxenbk_8 2 GAME THEORY Game theory is the formal study of conflict and cooperation in (rational) multi-agent

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 4: Adversarial Search 10/12/2009 Luke Zettlemoyer Based on slides from Dan Klein Many slides over the course adapted from either Stuart Russell or Andrew

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificial Intelligence Spring 2007 Lecture 7: CSP-II and Adversarial Search 2/6/2007 Srini Narayanan ICSI and UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell or

More information

Bootstrapping from Game Tree Search

Bootstrapping from Game Tree Search Joel Veness David Silver Will Uther Alan Blair University of New South Wales NICTA University of Alberta December 9, 2009 Presentation Overview Introduction Overview Game Tree Search Evaluation Functions

More information

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game?

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game? CSC384: Introduction to Artificial Intelligence Generalizing Search Problem Game Tree Search Chapter 5.1, 5.2, 5.3, 5.6 cover some of the material we cover here. Section 5.6 has an interesting overview

More information

DIT411/TIN175, Artificial Intelligence. Peter Ljunglöf. 2 February, 2018

DIT411/TIN175, Artificial Intelligence. Peter Ljunglöf. 2 February, 2018 DIT411/TIN175, Artificial Intelligence Chapters 4 5: Non-classical and adversarial search CHAPTERS 4 5: NON-CLASSICAL AND ADVERSARIAL SEARCH DIT411/TIN175, Artificial Intelligence Peter Ljunglöf 2 February,

More information

Reinforcement Learning and its Application to Othello

Reinforcement Learning and its Application to Othello Reinforcement Learning and its Application to Othello Nees Jan van Eck, Michiel van Wezel Econometric Institute, Faculty of Economics, Erasmus University Rotterdam, P.O. Box 1738, 3000 DR, Rotterdam, The

More information

HIT3002: Introduction to Artificial Intelligence

HIT3002: Introduction to Artificial Intelligence HIT3002: Introduction to Artificial Intelligence Intelligent Agents Outline Agents and environments. The vacuum-cleaner world The concept of rational behavior. Environments. Agent structure. Swinburne

More information

Robustness against Longer Memory Strategies in Evolutionary Games.

Robustness against Longer Memory Strategies in Evolutionary Games. Robustness against Longer Memory Strategies in Evolutionary Games. Eizo Akiyama 1 Players as finite state automata In our daily life, we have to make our decisions with our restricted abilities (bounded

More information

Adversarial Search: Game Playing. Reading: Chapter

Adversarial Search: Game Playing. Reading: Chapter Adversarial Search: Game Playing Reading: Chapter 6.5-6.8 1 Games and AI Easy to represent, abstract, precise rules One of the first tasks undertaken by AI (since 1950) Better than humans in Othello and

More information

Game-playing AIs: Games and Adversarial Search I AIMA

Game-playing AIs: Games and Adversarial Search I AIMA Game-playing AIs: Games and Adversarial Search I AIMA 5.1-5.2 Games: Outline of Unit Part I: Games as Search Motivation Game-playing AI successes Game Trees Evaluation Functions Part II: Adversarial Search

More information

Outline. Agents and environments Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Environment types Agent types

Outline. Agents and environments Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Environment types Agent types Intelligent Agents Outline Agents and environments Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Environment types Agent types Agents An agent is anything that can be viewed as

More information

Adversarial Search. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 9 Feb 2012

Adversarial Search. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 9 Feb 2012 1 Hal Daumé III (me@hal3.name) Adversarial Search Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 9 Feb 2012 Many slides courtesy of Dan

More information

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters CS 188: Artificial Intelligence Spring 2011 Announcements W1 out and due Monday 4:59pm P2 out and due next week Friday 4:59pm Lecture 7: Mini and Alpha-Beta Search 2/9/2011 Pieter Abbeel UC Berkeley Many

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 7: Minimax and Alpha-Beta Search 2/9/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein 1 Announcements W1 out and due Monday 4:59pm P2

More information

CS-E4800 Artificial Intelligence

CS-E4800 Artificial Intelligence CS-E4800 Artificial Intelligence Jussi Rintanen Department of Computer Science Aalto University March 9, 2017 Difficulties in Rational Collective Behavior Individual utility in conflict with collective

More information

ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT

ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT PATRICK HALUPTZOK, XU MIAO Abstract. In this paper the development of a robot controller for Robocode is discussed.

More information

Gameplay. Topics in Game Development UNM Spring 2008 ECE 495/595; CS 491/591

Gameplay. Topics in Game Development UNM Spring 2008 ECE 495/595; CS 491/591 Gameplay Topics in Game Development UNM Spring 2008 ECE 495/595; CS 491/591 What is Gameplay? Very general definition: It is what makes a game FUN And it is how players play a game. Taking one step back:

More information

Contents. List of Figures

Contents. List of Figures 1 Contents 1 Introduction....................................... 3 1.1 Rules of the game............................... 3 1.2 Complexity of the game............................ 4 1.3 History of self-learning

More information

CITS3001. Algorithms, Agents and Artificial Intelligence. Semester 2, 2016 Tim French

CITS3001. Algorithms, Agents and Artificial Intelligence. Semester 2, 2016 Tim French CITS3001 Algorithms, Agents and Artificial Intelligence Semester 2, 2016 Tim French School of Computer Science & Software Eng. The University of Western Australia 8. Game-playing AIMA, Ch. 5 Objectives

More information

Hierarchical Controller for Robotic Soccer

Hierarchical Controller for Robotic Soccer Hierarchical Controller for Robotic Soccer Byron Knoll Cognitive Systems 402 April 13, 2008 ABSTRACT RoboCup is an initiative aimed at advancing Artificial Intelligence (AI) and robotics research. This

More information

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5 Adversarial Search CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2017 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5 Outline Game

More information

4. Games and search. Lecture Artificial Intelligence (4ov / 8op)

4. Games and search. Lecture Artificial Intelligence (4ov / 8op) 4. Games and search 4.1 Search problems State space search find a (shortest) path from the initial state to the goal state. Constraint satisfaction find a value assignment to a set of variables so that

More information

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Monte Carlo Tree Search and AlphaGo Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Zero-Sum Games and AI A player s utility gain or loss is exactly balanced by the combined gain or loss of opponents:

More information

What is a Z-Code Almanac?

What is a Z-Code Almanac? ZcodeSystem.com Presents Guide v.2.1. The Almanac Beta is updated in real time. All future updates are included in your membership What is a Z-Code Almanac? Today we are really excited to share our progress

More information

2. The Extensive Form of a Game

2. The Extensive Form of a Game 2. The Extensive Form of a Game In the extensive form, games are sequential, interactive processes which moves from one position to another in response to the wills of the players or the whims of chance.

More information

UMBC CMSC 671 Midterm Exam 22 October 2012

UMBC CMSC 671 Midterm Exam 22 October 2012 Your name: 1 2 3 4 5 6 7 8 total 20 40 35 40 30 10 15 10 200 UMBC CMSC 671 Midterm Exam 22 October 2012 Write all of your answers on this exam, which is closed book and consists of six problems, summing

More information

SEARCHING is both a method of solving problems and

SEARCHING is both a method of solving problems and 100 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 3, NO. 2, JUNE 2011 Two-Stage Monte Carlo Tree Search for Connect6 Shi-Jim Yen, Member, IEEE, and Jung-Kuei Yang Abstract Recently,

More information

CS325 Artificial Intelligence Ch. 5, Games!

CS325 Artificial Intelligence Ch. 5, Games! CS325 Artificial Intelligence Ch. 5, Games! Cengiz Günay, Emory Univ. vs. Spring 2013 Günay Ch. 5, Games! Spring 2013 1 / 19 AI in Games A lot of work is done on it. Why? Günay Ch. 5, Games! Spring 2013

More information

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie!

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie! Games CSE 473 Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie! Games in AI In AI, games usually refers to deteristic, turntaking, two-player, zero-sum games of perfect information Deteristic:

More information

Informatics 2D: Tutorial 1 (Solutions)

Informatics 2D: Tutorial 1 (Solutions) Informatics 2D: Tutorial 1 (Solutions) Agents, Environment, Search Week 2 1 Agents and Environments Consider the following agents: A robot vacuum cleaner which follows a pre-set route around a house and

More information

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur Module 3 Problem Solving using Search- (Two agent) 3.1 Instructional Objective The students should understand the formulation of multi-agent search and in detail two-agent search. Students should b familiar

More information

CSE-571 AI-based Mobile Robotics

CSE-571 AI-based Mobile Robotics CSE-571 AI-based Mobile Robotics Approximation of POMDPs: Active Localization Localization so far: passive integration of sensor information Active Sensing and Reinforcement Learning 19 m 26.5 m Active

More information

Dynamic Programming in Real Life: A Two-Person Dice Game

Dynamic Programming in Real Life: A Two-Person Dice Game Mathematical Methods in Operations Research 2005 Special issue in honor of Arie Hordijk Dynamic Programming in Real Life: A Two-Person Dice Game Henk Tijms 1, Jan van der Wal 2 1 Department of Econometrics,

More information

Detecticon: A Prototype Inquiry Dialog System

Detecticon: A Prototype Inquiry Dialog System Detecticon: A Prototype Inquiry Dialog System Takuya Hiraoka and Shota Motoura and Kunihiko Sadamasa Abstract A prototype inquiry dialog system, dubbed Detecticon, demonstrates its ability to handle inquiry

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Davis Ancona and Jake Weiner Abstract In this report, we examine the plausibility of implementing a NEAT-based solution

More information

BLUFF WITH AI. A Project. Presented to. The Faculty of the Department of Computer Science. San Jose State University. In Partial Fulfillment

BLUFF WITH AI. A Project. Presented to. The Faculty of the Department of Computer Science. San Jose State University. In Partial Fulfillment BLUFF WITH AI A Project Presented to The Faculty of the Department of Computer Science San Jose State University In Partial Fulfillment Of the Requirements for the Degree Master of Science By Tina Philip

More information

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel Foundations of AI 6. Adversarial Search Search Strategies for Games, Games with Chance, State of the Art Wolfram Burgard & Bernhard Nebel Contents Game Theory Board Games Minimax Search Alpha-Beta Search

More information

How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997)

How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997) How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997) Alan Fern School of Electrical Engineering and Computer Science Oregon State University Deep Mind s vs. Lee Sedol (2016) Watson vs. Ken

More information

Learning to play Dominoes

Learning to play Dominoes Learning to play Dominoes Ivan de Jesus P. Pinto 1, Mateus R. Pereira 1, Luciano Reis Coutinho 1 1 Departamento de Informática Universidade Federal do Maranhão São Luís,MA Brazil navi1921@gmail.com, mateus.rp.slz@gmail.com,

More information

2048: An Autonomous Solver

2048: An Autonomous Solver 2048: An Autonomous Solver Final Project in Introduction to Artificial Intelligence ABSTRACT. Our goal in this project was to create an automatic solver for the wellknown game 2048 and to analyze how different

More information