Why Planning ue.g. if we have a robot we want the robot to decide what to do; how to act to achieve our goals. Planning & Reinforcement Learning

Size: px
Start display at page:

Download "Why Planning ue.g. if we have a robot we want the robot to decide what to do; how to act to achieve our goals. Planning & Reinforcement Learning"

Transcription

1 Planning & Reinforcement Learning Slides borrowed from Sheila McIlraith, Kate Larson, and David Silver Why Planning ue.g. if we have a robot we want the robot to decide what to do; how to act to achieve our goals S384 University of Toronto 1 S384 University of Toronto 2 Planning vs. Search How to change the world to suit our needs. ritical issue: we need to reason about what the world will be like after doing a few actions. This aspect of planning is just like Search. utonomous gents for Space Exploration uutonomous planning, scheduling, control u NS: JPL and mes uremote gent Experiment (RX) u Deep Space 1 umars Exploration Rover (MER) GOL: Steven has coffee URRENTLY: robot in mailroom, has no coffee, coffee not made, Steven in office, etc. TO DO: goto lounge, make coffee, 3 4 1

2 Scheduling with ction hoices & Resource Requirements u Problems in supply chain management u HSTS (Hubble Space Telescope scheduler) u Workflow management ir Traffic ontrol Other pplications (cont.) u Route aircraft between runways and terminals. rafts must be kept safely separated. Safe distance depends on craft and mode of transport. Minimize taxi and wait time. pplications These applications require more than search. Not sufficient to simply find a sequence of action for transforming the world so as to achieve a goal state. uthese applications involve dealing with uncertainty. usensing the world and planning to sense the world so as to reduce uncertainty. ugenerating a plan that has high payoff or high expected payoff rather than simply achieving a fixed goal. urunning into problems when executing a plan and having to recover. uetc. haracter nimation u Generate step-by-step character behaviour from high-level spec Plan-based Interfaces u E.g. NLP to database interfaces u Plan recognition, ctivity Recognition 5 6 2

3 Planning u gent: single agent, or multi-agent u State: complete or incomplete (logical/probabilistic), state of the world and/or agent s state of knowledge u ctions: world-altering and/or knowledge-altering (e.g. sensing); deterministic or non-deterministic (logical/stochastic) u Goal ondition: satisfying or optimizing; final-state or temporally extended; optimizing for preference/cost/utility u Reasoning: offline or online (fully observable, partially observable) u Plans: partial order, sequential, conditional S384 University of Toronto 11 Simplifying the Planning Problem u We simplify the planning problem as follows: u ssume complete information about the initial state through closed world assumption (W) u ssume finite domain of objects u ssume action effects are restricted to making conjunctions of atomic formulae to be true or false. No conditional effects, etc. u ssume action preconditions are restricted to conjunctions of ground atoms u Perform lassical Planning. No incomplete or uncertain knowledge S384 University of Toronto 12 3

4 lassical Planning ssumptions STRIPS Representation u Finite System: finitely many states, actions, events u Fully Observable: controller always knows current state u Deterministic: each action has only one outcome u Static: changes only occur as result of controller actions u ttainment goals: a set of goal states S g u Sequential plans: plan is linearly ordered sequence of actions (a 1,, a n ) u Implicit time: actions are instantaneous (have no duration) u Off-line planning: planner doesn t know execution status S384 University of Toronto 13 ustrips (Stanford Research Institute Problem Solver) u way of representing actions with respect to W-K closed world knowledge base representing the state of the world S384 University of Toronto 14 Sequence of Worlds STRIPS ctions S384 University of Toronto 15 u Strips represent actions using 3 lists u list of preconditions. u list of action add effects. u list of action delete effects uthese lists contain variables so that we can represent a whole class of actions with one specification ueach ground instantation of the variables yields a specific action S384 University of Toronto 16 4

5 STRIPS ctions: Example STRIPS ctions: Example robot hand pickup(x): robot hand pickup(x): is called a STRIPS operator Pre: {handempty, clear(x), ontable(x)} dds: {holding(x)} Dels: {handempty, clear(x), ontable(x)} pickup(a): (a particular instance), is called an action S384 University of Toronto 17 S384 University of Toronto 18 STRIPS ctions: Example STRIPS ctions: Example robot hand putdown(x) robot hand stack(x,y) Pre: {holding(x)} dds: {clear(x), ontable(x), handempty} Dels: {holding(x)} Pre: {holding(x), clear(y)} dds: {on(x, Y), handempty, clear(x)} Dels: {holding(x), clear(y)} S384 University of Toronto 19 S384 University of Toronto 20 5

6 STRIPS has no onditional Effects stack(x,y) Pre: {holding(x), clear(y)} dds: {on(x, Y), handempty, clear(x)} Dels: {holding(x), clear(y)} u locks World assumption: Table has infinite space, so it is always clear u If we stack something on table (Y = table), we cannot delete clear(table) u ut if Y is an ordinary block, we must delete clear(y) STRIPS has no onditional Effects usince STRIPS has no conditional effects, we must sometimes utilize extra actions: one for each type of condition. uwe Embed the condition in the precondition and then alter the effects accordingly putdown(x) stack(x,y) Pre: {holding(x)} dds: {ontable(x), handempty, clear(x)} Dels: {holding(x)} Pre: {holding(x), clear(y)} dds: {on(x, Y), handempty, clear(x)} Dels: {holding(x), clear(y)} S384 University of Toronto 21 S384 University of Toronto 22 STRIPS ctions: Example STRIPS ctions: Example robot hand uunstack(x, Y) robot hand uunstack(x, Y) Pre: { } dds: { } Dels: { } Pre: {clear(x), on(x, Y), handempty} dds: {holding(x), clear(y)} Dels: {clear(x), on(x,y), handempty} S384 University of Toronto 23 S384 University of Toronto 24 6

7 Planning as a Search Problem u Given u W-K representing the initial state u set of STRIPS operators that map a state to a new state u goal conditions (conjunction of facts, or as a formula) The planning problem is to determine sequence of action that, when applied to the initial W-K yield an updated W-K which satisfies the goal. This is the classical planning task S384 University of Toronto 25 Planning s Search uthis is a search problem, in which our state space representation is a W-K uinitial W-K is the initial state uactions are operators mapping a state to a new state ugoal is satisfied by any state that satisfies the goal. Typically the goal is a conjunction of primitive facts, so we just need to check if all the facts in the goal are contained in the W-K S384 University of Toronto 26 Example Example move(b,c) move(c,table) move(c,b) S384 University of Toronto 27 move(a,b) S384 University of Toronto 28 7

8 Problems Planning Summary u Search tree is generally quite large u Randomly reconfiguring 9 blocks takes thousands of PU seconds u ut: representation suggests some structure u Each action only affects a small set of facts, u ctions depend on each other via their preconditions u Planning algorithms are designed to take advantage of fact that the representation makes the locality of action changes explicit S384 University of Toronto 29 umodel of the environment is known ugent performs computations with its model (without external interaction) ugent improves its policy udeliberation, reasoning, introspection, pondering, though, search S384 University of Toronto 30 ut what happens if the environment is unknown? How can we inform our agent of what actions to take? ussume: environment is initially unknown uonsider using a reward function, to guide agent uif agent doesn t know what actions to take utry an action out usee what the reward is, of taking that action uthis is Reinforcement Learning S384 University of Toronto 31 S384 University of Toronto 32 8

9 Reinforcement Learning Example: Tic Tac Toe ulearning what to do, so as to maximize some reward signal S384 University of Toronto 33 ustate: oard configuration uctions: Next move ureward: 1 for win, -1 for loss, 0 for draw uproblem: Find π: S that maximizes reward S384 University of Toronto 34 Example: Mobile Robot Example: tari u State: location of robot, people u ctions: Motion u Reward: number of happy faces uproblem: Find π: S that maximizes reward S384 University of Toronto 35 ustate: pixel location of game agents uctions: agent movement ureward: score uproblem: Find π: S that maximizes reward S384 University of Toronto 36 9

10 utonomous Helicopter Flight Quadruped Robot S384 University of Toronto 37 S384 University of Toronto 38 Reinforcement Learning ugoal: learn to choose actions that maximize REWRD = r 0 + γ r 1 + γ 2 r 2, where 0 < γ < 1 Reward u reward R t is a scalar feedback signal uindicates how well agent is doing at step t u The agent s job is to maximize cumulative reward Reward hypothesis: ll goals can be described by the maximization of expected cumulative reward S384 University of Toronto 39 S384 University of Toronto 40 10

11 Sequential Decision Making Exploration and Exploitation ugoal: select actions to maximize total future reward uctions may have long-term consequences ureward may be delayed umay be better to sacrifice immediate reward to gain more long-term reward (exploitation vs. exploration) ureinforcement learning is like trial-and-error learning ugents should discover a good policy ufrom its experiences of the environment (explore) uwithout losing too much of the reward along the way (exploit) S384 University of Toronto 41 S384 University of Toronto 42 gent s Learning Task Fully Observable Environment uexecute actions in the world uobserve the results ulearn policy π: S that maximizes reward from some initial state S384 University of Toronto 43 ufull observability: agent directly observes environment state ugent state = environment state = information state uformally, this is a Markov Decision Process (MDP) S384 University of Toronto 44 11

12 Partially Observable Environment RL gent u Partial observability: agent indirectly observes environment u E.g. robot with camera vision isn t told its absolute location u Trading agent only observes current prices u Poke playing agent only observes public cards u gent state environment state u Formally this is a partially observable Markov Decision Process (POMDP) u gent must construct its own state representation S a t u omplete history: St a = Ht u eliefs of environment state St a = (P[St e = s 1 ],, P[St e = s n ] u Recurrent Neural Network St a = σ(s 1./0 W 3 + O. W 6 ) S384 University of Toronto 45 un RL agent may include one or more of these components upolicy: agent s behaviour function uvalue function: how good is each state and/or action umodel: agent s representation of the environment S384 University of Toronto 46 Maze Example ustates: gent s location uctions: N, E, S, W urewards: -1 per time-step S384 University of Toronto 47 Maze Example Policy: gent s behaviour umap from state to action udeterministic policy a = π(s) ustochastic policy π(a s) = P( t = a S t = s) Each arrow represents policy π(s) for each state s S384 University of Toronto 48 12

13 Maze Example Value function: uprediction of future reward uused to evaluate goodness/badness of states v π (s) = E π [ R t+1 + γ R t+2 + S t = s] Numbers represent value function v π (s) of each state s S384 University of Toronto 49 Maze Example Model: u Predicts what the environment will do next u gent may have internal model of the environment, which determines u how actions change the state, and u how much reward should be given for each state. u Model may be imperfect S384 University of Toronto 50 RL gent RL gent S384 University of Toronto 51 umodel-ased: upolicy and/or Value Function umodel umodel-free: upolicy and/or Value Function uno Model S384 University of Toronto 52 13

Planning & Reinforcement Learning

Planning & Reinforcement Learning Planning & Reinforcement Learning Slides borrowed from Sheila McIlraith, Kate Larson, and David Silver CSC384 University of Toronto 1 Why Planning ue.g. if we have a robot we want the robot to decide what

More information

Intelligent Agents. Introduction to Planning. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University. last change: 23.

Intelligent Agents. Introduction to Planning. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University. last change: 23. Intelligent Agents Introduction to Planning Ute Schmid Cognitive Systems, Applied Computer Science, Bamberg University last change: 23. April 2012 U. Schmid (CogSys) Intelligent Agents last change: 23.

More information

Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer

Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer Learning via Delayed Knowledge A Case of Jamming SaiDhiraj Amuru and R. Michael Buehrer 1 Why do we need an Intelligent Jammer? Dynamic environment conditions in electronic warfare scenarios failure of

More information

Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017

Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017 Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER 2017 April 6, 2017 Upcoming Misc. Check out course webpage and schedule Check out Canvas, especially for deadlines Do the survey by tomorrow,

More information

Application of Artificial Neural Networks in Autonomous Mission Planning for Planetary Rovers

Application of Artificial Neural Networks in Autonomous Mission Planning for Planetary Rovers Application of Artificial Neural Networks in Autonomous Mission Planning for Planetary Rovers 1 Institute of Deep Space Exploration Technology, School of Aerospace Engineering, Beijing Institute of Technology,

More information

TUD Poker Challenge Reinforcement Learning with Imperfect Information

TUD Poker Challenge Reinforcement Learning with Imperfect Information TUD Poker Challenge 2008 Reinforcement Learning with Imperfect Information Outline Reinforcement Learning Perfect Information Imperfect Information Lagging Anchor Algorithm Matrix Form Extensive Form Poker

More information

CMU-Q Lecture 20:

CMU-Q Lecture 20: CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro ICE-CREAM WARS http://youtu.be/jilgxenbk_8 2 GAME THEORY Game theory is the formal study of conflict and cooperation in (rational) multi-agent

More information

LECTURE 26: GAME THEORY 1

LECTURE 26: GAME THEORY 1 15-382 COLLECTIVE INTELLIGENCE S18 LECTURE 26: GAME THEORY 1 INSTRUCTOR: GIANNI A. DI CARO ICE-CREAM WARS http://youtu.be/jilgxenbk_8 2 GAME THEORY Game theory is the formal study of conflict and cooperation

More information

Intelligent Agents & Search Problem Formulation. AIMA, Chapters 2,

Intelligent Agents & Search Problem Formulation. AIMA, Chapters 2, Intelligent Agents & Search Problem Formulation AIMA, Chapters 2, 3.1-3.2 Outline for today s lecture Intelligent Agents (AIMA 2.1-2) Task Environments Formulating Search Problems CIS 421/521 - Intro to

More information

Artificial Intelligence. Minimax and alpha-beta pruning

Artificial Intelligence. Minimax and alpha-beta pruning Artificial Intelligence Minimax and alpha-beta pruning In which we examine the problems that arise when we try to plan ahead to get the best result in a world that includes a hostile agent (other agent

More information

Administrivia. CS 188: Artificial Intelligence Spring Agents and Environments. Today. Vacuum-Cleaner World. A Reflex Vacuum-Cleaner

Administrivia. CS 188: Artificial Intelligence Spring Agents and Environments. Today. Vacuum-Cleaner World. A Reflex Vacuum-Cleaner CS 188: Artificial Intelligence Spring 2006 Lecture 2: Agents 1/19/2006 Administrivia Reminder: Drop-in Python/Unix lab Friday 1-4pm, 275 Soda Hall Optional, but recommended Accommodation issues Project

More information

Unit 1: Introduction to Autonomous Robotics

Unit 1: Introduction to Autonomous Robotics Unit 1: Introduction to Autonomous Robotics Computer Science 4766/6778 Department of Computer Science Memorial University of Newfoundland January 16, 2009 COMP 4766/6778 (MUN) Course Introduction January

More information

CS510 \ Lecture Ariel Stolerman

CS510 \ Lecture Ariel Stolerman CS510 \ Lecture04 2012-10-15 1 Ariel Stolerman Administration Assignment 2: just a programming assignment. Midterm: posted by next week (5), will cover: o Lectures o Readings A midterm review sheet will

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

ARTIFICIAL INTELLIGENCE (CS 370D)

ARTIFICIAL INTELLIGENCE (CS 370D) Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-5) ADVERSARIAL SEARCH ADVERSARIAL SEARCH Optimal decisions Min algorithm α-β pruning Imperfect,

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 7: Minimax and Alpha-Beta Search 2/9/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein 1 Announcements W1 out and due Monday 4:59pm P2

More information

Towards Strategic Kriegspiel Play with Opponent Modeling

Towards Strategic Kriegspiel Play with Opponent Modeling Towards Strategic Kriegspiel Play with Opponent Modeling Antonio Del Giudice and Piotr Gmytrasiewicz Department of Computer Science, University of Illinois at Chicago Chicago, IL, 60607-7053, USA E-mail:

More information

Informatics 2D: Tutorial 1 (Solutions)

Informatics 2D: Tutorial 1 (Solutions) Informatics 2D: Tutorial 1 (Solutions) Agents, Environment, Search Week 2 1 Agents and Environments Consider the following agents: A robot vacuum cleaner which follows a pre-set route around a house and

More information

Game-Playing & Adversarial Search

Game-Playing & Adversarial Search Game-Playing & Adversarial Search This lecture topic: Game-Playing & Adversarial Search (two lectures) Chapter 5.1-5.5 Next lecture topic: Constraint Satisfaction Problems (two lectures) Chapter 6.1-6.4,

More information

CSC384: Introduction to Artificial Intelligence. Game Tree Search

CSC384: Introduction to Artificial Intelligence. Game Tree Search CSC384: Introduction to Artificial Intelligence Game Tree Search Chapter 5.1, 5.2, 5.3, 5.6 cover some of the material we cover here. Section 5.6 has an interesting overview of State-of-the-Art game playing

More information

Game Theory: Normal Form Games

Game Theory: Normal Form Games Game Theory: Normal Form Games CPSC 322 Lecture 34 April 3, 2006 Reading: excerpt from Multiagent Systems, chapter 3. Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 1 Lecture Overview Recap

More information

DeepMind Self-Learning Atari Agent

DeepMind Self-Learning Atari Agent DeepMind Self-Learning Atari Agent Human-level control through deep reinforcement learning Nature Vol 518, Feb 26, 2015 The Deep Mind of Demis Hassabis Backchannel / Medium.com interview with David Levy

More information

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters CS 188: Artificial Intelligence Spring 2011 Announcements W1 out and due Monday 4:59pm P2 out and due next week Friday 4:59pm Lecture 7: Mini and Alpha-Beta Search 2/9/2011 Pieter Abbeel UC Berkeley Many

More information

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Monte Carlo Tree Search and AlphaGo Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Zero-Sum Games and AI A player s utility gain or loss is exactly balanced by the combined gain or loss of opponents:

More information

Solving Coup as an MDP/POMDP

Solving Coup as an MDP/POMDP Solving Coup as an MDP/POMDP Semir Shafi Dept. of Computer Science Stanford University Stanford, USA semir@stanford.edu Adrien Truong Dept. of Computer Science Stanford University Stanford, USA aqtruong@stanford.edu

More information

Indoor vs Outdoor: Water, air, ground Autonomy depends on the need:

Indoor vs Outdoor: Water, air, ground Autonomy depends on the need: I and Robotics Whatis a robot? Different types of robots. utonomy What do you find on an autonomous robot? Planning and Robotics STRIPS onjunctive goals Horisontal architecture Subsumption architecture

More information

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s CS88: Artificial Intelligence, Fall 20 Written 2: Games and MDP s Due: 0/5 submitted electronically by :59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators) but must be written

More information

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1 Announcements Homework 1 Due tonight at 11:59pm Project 1 Electronic HW1 Written HW1 Due Friday 2/8 at 4:00pm CS 188: Artificial Intelligence Adversarial Search and Game Trees Instructors: Sergey Levine

More information

Unit 1: Introduction to Autonomous Robotics

Unit 1: Introduction to Autonomous Robotics Unit 1: Introduction to Autonomous Robotics Computer Science 6912 Andrew Vardy Department of Computer Science Memorial University of Newfoundland May 13, 2016 COMP 6912 (MUN) Course Introduction May 13,

More information

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 1: Intro

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 1: Intro COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 1: Intro Sanjeev Arora Elad Hazan Today s Agenda Defining intelligence and AI state-of-the-art, goals Course outline AI by introspection

More information

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Last name: First name: SID: Class account login: Collaborators: CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Due: Monday 2/28 at 5:29pm either in lecture or in 283 Soda Drop Box (no slip days).

More information

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game?

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game? CSC384: Introduction to Artificial Intelligence Generalizing Search Problem Game Tree Search Chapter 5.1, 5.2, 5.3, 5.6 cover some of the material we cover here. Section 5.6 has an interesting overview

More information

5.1 State-Space Search Problems

5.1 State-Space Search Problems Foundations of Artificial Intelligence March 7, 2018 5. State-Space Search: State Spaces Foundations of Artificial Intelligence 5. State-Space Search: State Spaces Malte Helmert University of Basel March

More information

ADVERSARIAL SEARCH. Chapter 5

ADVERSARIAL SEARCH. Chapter 5 ADVERSARIAL SEARCH Chapter 5... every game of skill is susceptible of being played by an automaton. from Charles Babbage, The Life of a Philosopher, 1832. Outline Games Perfect play minimax decisions α

More information

Game Tree Search. Generalizing Search Problems. Two-person Zero-Sum Games. Generalizing Search Problems. CSC384: Intro to Artificial Intelligence

Game Tree Search. Generalizing Search Problems. Two-person Zero-Sum Games. Generalizing Search Problems. CSC384: Intro to Artificial Intelligence CSC384: Intro to Artificial Intelligence Game Tree Search Chapter 6.1, 6.2, 6.3, 6.6 cover some of the material we cover here. Section 6.6 has an interesting overview of State-of-the-Art game playing programs.

More information

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING RIKA ANTONOVA ANTONOVA@KTH.SE ALI GHADIRZADEH ALGH@KTH.SE RL: What We Know So Far Formulate the problem as an MDP (or POMDP) State space captures

More information

CSE 573: Artificial Intelligence

CSE 573: Artificial Intelligence CSE 573: Artificial Intelligence Adversarial Search Dan Weld Based on slides from Dan Klein, Stuart Russell, Pieter Abbeel, Andrew Moore and Luke Zettlemoyer (best illustrations from ai.berkeley.edu) 1

More information

Mohammad Hossein Manshaei 1394

Mohammad Hossein Manshaei 1394 Mohammad Hossein Manshaei manshaei@gmail.com 394 Some Formal Definitions . First Mover or Second Mover?. Zermelo Theorem 3. Perfect Information/Pure Strategy 4. Imperfect Information/Information Set 5.

More information

CS 188: Artificial Intelligence. Overview

CS 188: Artificial Intelligence. Overview CS 188: Artificial Intelligence Lecture 6 and 7: Search for Games Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein 1 Overview Deterministic zero-sum games Minimax Limited depth and evaluation

More information

CS-E4800 Artificial Intelligence

CS-E4800 Artificial Intelligence CS-E4800 Artificial Intelligence Jussi Rintanen Department of Computer Science Aalto University March 9, 2017 Difficulties in Rational Collective Behavior Individual utility in conflict with collective

More information

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements CS 171 Introduction to AI Lecture 1 Adversarial search Milos Hauskrecht milos@cs.pitt.edu 39 Sennott Square Announcements Homework assignment is out Programming and experiments Simulated annealing + Genetic

More information

Adversarial Search. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 9 Feb 2012

Adversarial Search. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 9 Feb 2012 1 Hal Daumé III (me@hal3.name) Adversarial Search Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 9 Feb 2012 Many slides courtesy of Dan

More information

Planning and Optimization

Planning and Optimization Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and Gabriele Röger Universität Basel October 11, 2017 Content of this Course Tasks Progression/ Regression Planning Complexity

More information

PDDL Representation domain file problem file Problem file Domain file Domain file problem files

PDDL Representation domain file problem file Problem file Domain file Domain file problem files HW3: Planning PDDL Planning Domain Description Language Based on STRIPS with various extensions Originally defined by Drew McDermott (Yale) and others Used in the biennial International Planning Competition

More information

Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen

Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen Policy Teaching Through Reward Function Learning Haoqi Zhang, David Parkes, and Yiling Chen School of Engineering and Applied Sciences Harvard University ACM EC 2009 Haoqi Zhang (Harvard University) Policy

More information

Playing Games. Henry Z. Lo. June 23, We consider writing AI to play games with the following properties:

Playing Games. Henry Z. Lo. June 23, We consider writing AI to play games with the following properties: Playing Games Henry Z. Lo June 23, 2014 1 Games We consider writing AI to play games with the following properties: Two players. Determinism: no chance is involved; game state based purely on decisions

More information

CS 2710 Foundations of AI. Lecture 9. Adversarial search. CS 2710 Foundations of AI. Game search

CS 2710 Foundations of AI. Lecture 9. Adversarial search. CS 2710 Foundations of AI. Game search CS 2710 Foundations of AI Lecture 9 Adversarial search Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square CS 2710 Foundations of AI Game search Game-playing programs developed by AI researchers since

More information

An Empirical Evaluation of Policy Rollout for Clue

An Empirical Evaluation of Policy Rollout for Clue An Empirical Evaluation of Policy Rollout for Clue Eric Marshall Oregon State University M.S. Final Project marshaer@oregonstate.edu Adviser: Professor Alan Fern Abstract We model the popular board game

More information

Games (adversarial search problems)

Games (adversarial search problems) Mustafa Jarrar: Lecture Notes on Games, Birzeit University, Palestine Fall Semester, 204 Artificial Intelligence Chapter 6 Games (adversarial search problems) Dr. Mustafa Jarrar Sina Institute, University

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Adversarial Search Vibhav Gogate The University of Texas at Dallas Some material courtesy of Rina Dechter, Alex Ihler and Stuart Russell, Luke Zettlemoyer, Dan Weld Adversarial

More information

Outline. Introduction to AI. Artificial Intelligence. What is an AI? What is an AI? Agents Environments

Outline. Introduction to AI. Artificial Intelligence. What is an AI? What is an AI? Agents Environments Outline Introduction to AI ECE457 Applied Artificial Intelligence Fall 2007 Lecture #1 What is an AI? Russell & Norvig, chapter 1 Agents s Russell & Norvig, chapter 2 ECE457 Applied Artificial Intelligence

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Reinforcement Learning Assumptions we made so far: Known state space S Known transition model T(s, a, s ) Known reward function R(s) not realistic for many real agents Reinforcement

More information

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Davis Ancona and Jake Weiner Abstract In this report, we examine the plausibility of implementing a NEAT-based solution

More information

Overview of Challenges in the Development of Autonomous Mobile Robots. August 23, 2011

Overview of Challenges in the Development of Autonomous Mobile Robots. August 23, 2011 Overview of Challenges in the Development of Autonomous Mobile Robots August 23, 2011 What is in a Robot? Sensors Effectors and actuators (i.e., mechanical) Used for locomotion and manipulation Controllers

More information

Philosophy. AI Slides (5e) c Lin

Philosophy. AI Slides (5e) c Lin Philosophy 15 AI Slides (5e) c Lin Zuoquan@PKU 2003-2018 15 1 15 Philosophy 15.1 AI philosophy 15.2 Weak AI 15.3 Strong AI 15.4 Ethics 15.5 The future of AI AI Slides (5e) c Lin Zuoquan@PKU 2003-2018 15

More information

Reading Robert Gibbons, A Primer in Game Theory, Harvester Wheatsheaf 1992.

Reading Robert Gibbons, A Primer in Game Theory, Harvester Wheatsheaf 1992. Reading Robert Gibbons, A Primer in Game Theory, Harvester Wheatsheaf 1992. Additional readings could be assigned from time to time. They are an integral part of the class and you are expected to read

More information

Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.)

Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.) Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.) Eric B. Laber February 12, 2008 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or,

More information

Game Playing State of the Art

Game Playing State of the Art Game Playing State of the Art Checkers: Chinook ended 40 year reign of human world champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer

More information

2 person perfect information

2 person perfect information Why Study Games? Games offer: Intellectual Engagement Abstraction Representability Performance Measure Not all games are suitable for AI research. We will restrict ourselves to 2 person perfect information

More information

Solving Problems by Searching

Solving Problems by Searching Solving Problems by Searching Berlin Chen 2005 Reference: 1. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Chapter 3 AI - Berlin Chen 1 Introduction Problem-Solving Agents vs. Reflex

More information

Tutorial of Reinforcement: A Special Focus on Q-Learning

Tutorial of Reinforcement: A Special Focus on Q-Learning Tutorial of Reinforcement: A Special Focus on Q-Learning TINGWU WANG, MACHINE LEARNING GROUP, UNIVERSITY OF TORONTO Contents 1. Introduction 1. Discrete Domain vs. Continous Domain 2. Model Based vs. Model

More information

Game theory attempts to mathematically. capture behavior in strategic situations, or. games, in which an individual s success in

Game theory attempts to mathematically. capture behavior in strategic situations, or. games, in which an individual s success in Game Theory Game theory attempts to mathematically capture behavior in strategic situations, or games, in which an individual s success in making choices depends on the choices of others. A game Γ consists

More information

final examination on May 31 Topics from the latter part of the course (covered in homework assignments 4-7) include:

final examination on May 31 Topics from the latter part of the course (covered in homework assignments 4-7) include: The final examination on May 31 may test topics from any part of the course, but the emphasis will be on topic after the first three homework assignments, which were covered in the midterm. Topics from

More information

CS440/ECE448 Lecture 9: Minimax Search. Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017

CS440/ECE448 Lecture 9: Minimax Search. Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017 CS440/ECE448 Lecture 9: Minimax Search Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017 Why study games? Games are a traditional hallmark of intelligence Games are easy to formalize

More information

CS 771 Artificial Intelligence. Adversarial Search

CS 771 Artificial Intelligence. Adversarial Search CS 771 Artificial Intelligence Adversarial Search Typical assumptions Two agents whose actions alternate Utility values for each agent are the opposite of the other This creates the adversarial situation

More information

Reinforcement Learning in Robotic Task Domains with Deictic Descriptor Representation

Reinforcement Learning in Robotic Task Domains with Deictic Descriptor Representation Louisiana State University LSU Digital Commons LSU Doctoral Dissertations Graduate School 10-22-2018 Reinforcement Learning in Robotic Task Domains with Deictic Descriptor Representation Harry Paul Moore

More information

Scheduling. Radek Mařík. April 28, 2015 FEE CTU, K Radek Mařík Scheduling April 28, / 48

Scheduling. Radek Mařík. April 28, 2015 FEE CTU, K Radek Mařík Scheduling April 28, / 48 Scheduling Radek Mařík FEE CTU, K13132 April 28, 2015 Radek Mařík (marikr@fel.cvut.cz) Scheduling April 28, 2015 1 / 48 Outline 1 Introduction to Scheduling Methodology Overview 2 Classification of Scheduling

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence CS482, CS682, MW 1 2:15, SEM 201, MS 227 Prerequisites: 302, 365 Instructor: Sushil Louis, sushil@cse.unr.edu, http://www.cse.unr.edu/~sushil Non-classical search - Path does not

More information

Deep Learning for Autonomous Driving

Deep Learning for Autonomous Driving Deep Learning for Autonomous Driving Shai Shalev-Shwartz Mobileye IMVC dimension, March, 2016 S. Shalev-Shwartz is also affiliated with The Hebrew University Shai Shalev-Shwartz (MobilEye) DL for Autonomous

More information

Adversarial Search Lecture 7

Adversarial Search Lecture 7 Lecture 7 How can we use search to plan ahead when other agents are planning against us? 1 Agenda Games: context, history Searching via Minimax Scaling α β pruning Depth-limiting Evaluation functions Handling

More information

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Valentijn Muijrers 3275183 Valentijn.Muijrers@phil.uu.nl Supervisor: Gerard Vreeswijk 7,5 ECTS

More information

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions Slides by Svetlana Lazebnik, 9/2016 Modified by Mark Hasegawa Johnson, 9/2017 Types of game environments Perfect

More information

Game Theory and Randomized Algorithms

Game Theory and Randomized Algorithms Game Theory and Randomized Algorithms Guy Aridor Game theory is a set of tools that allow us to understand how decisionmakers interact with each other. It has practical applications in economics, international

More information

CSC321 Lecture 23: Go

CSC321 Lecture 23: Go CSC321 Lecture 23: Go Roger Grosse Roger Grosse CSC321 Lecture 23: Go 1 / 21 Final Exam Friday, April 20, 9am-noon Last names A Y: Clara Benson Building (BN) 2N Last names Z: Clara Benson Building (BN)

More information

Intelligent Agents p.1/25. Intelligent Agents. Chapter 2

Intelligent Agents p.1/25. Intelligent Agents. Chapter 2 Intelligent Agents p.1/25 Intelligent Agents Chapter 2 Intelligent Agents p.2/25 Outline Agents and environments Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Environment types

More information

Math 152: Applicable Mathematics and Computing

Math 152: Applicable Mathematics and Computing Math 152: Applicable Mathematics and Computing April 16, 2017 April 16, 2017 1 / 17 Announcements Please bring a blue book for the midterm on Friday. Some students will be taking the exam in Center 201,

More information

CSE 591: Human-aware Robotics

CSE 591: Human-aware Robotics CSE 591: Human-aware Robotics Instructor: Dr. Yu ( Tony ) Zhang Location & Times: CAVC 359, Tue/Thu, 9:00--10:15 AM Office Hours: BYENG 558, Tue/Thu, 10:30--11:30AM Nov 8, 2016 Slides adapted from Subbarao

More information

HIT3002: Introduction to Artificial Intelligence

HIT3002: Introduction to Artificial Intelligence HIT3002: Introduction to Artificial Intelligence Intelligent Agents Outline Agents and environments. The vacuum-cleaner world The concept of rational behavior. Environments. Agent structure. Swinburne

More information

Dominant and Dominated Strategies

Dominant and Dominated Strategies Dominant and Dominated Strategies Carlos Hurtado Department of Economics University of Illinois at Urbana-Champaign hrtdmrt2@illinois.edu Junel 8th, 2016 C. Hurtado (UIUC - Economics) Game Theory On the

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

Analyzing Games.

Analyzing Games. Analyzing Games staffan.bjork@chalmers.se Structure of today s lecture Motives for analyzing games With a structural focus General components of games Example from course book Example from Rules of Play

More information

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search CSE 473: Artificial Intelligence Fall 2017 Adversarial Search Mini, pruning, Expecti Dieter Fox Based on slides adapted Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Dan Weld, Stuart Russell or Andrew Moore

More information

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.

More information

UMBC 671 Midterm Exam 19 October 2009

UMBC 671 Midterm Exam 19 October 2009 Name: 0 1 2 3 4 5 6 total 0 20 25 30 30 25 20 150 UMBC 671 Midterm Exam 19 October 2009 Write all of your answers on this exam, which is closed book and consists of six problems, summing to 160 points.

More information

Learning from Hints: AI for Playing Threes

Learning from Hints: AI for Playing Threes Learning from Hints: AI for Playing Threes Hao Sheng (haosheng), Chen Guo (cguo2) December 17, 2016 1 Introduction The highly addictive stochastic puzzle game Threes by Sirvo LLC. is Apple Game of the

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence CS482, CS682, MW 1 2:15, SEM 201, MS 227 Prerequisites: 302, 365 Instructor: Sushil Louis, sushil@cse.unr.edu, http://www.cse.unr.edu/~sushil Games and game trees Multi-agent systems

More information

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess Stefan Lüttgen Motivation Learn to play chess Computer approach different than human one Humans search more selective: Kasparov (3-5

More information

Game Tree Search 1/6/17

Game Tree Search 1/6/17 Game Tree Search /6/7 Frameworks for Decision-Making. Goal-directed planning Agents want to accomplish some goal. The agent will use search to devise a plan.. Utility maximization Agents ascribe a utility

More information

Learning and Using Models of Kicking Motions for Legged Robots

Learning and Using Models of Kicking Motions for Legged Robots Learning and Using Models of Kicking Motions for Legged Robots Sonia Chernova and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 {soniac, mmv}@cs.cmu.edu Abstract

More information

Solving Problems by Searching: Adversarial Search

Solving Problems by Searching: Adversarial Search Course 440 : Introduction To rtificial Intelligence Lecture 5 Solving Problems by Searching: dversarial Search bdeslam Boularias Friday, October 7, 2016 1 / 24 Outline We examine the problems that arise

More information

Adversarial Search and Game Theory. CS 510 Lecture 5 October 26, 2017

Adversarial Search and Game Theory. CS 510 Lecture 5 October 26, 2017 Adversarial Search and Game Theory CS 510 Lecture 5 October 26, 2017 Reminders Proposals due today Midterm next week past midterms online Midterm online BBLearn Available Thurs-Sun, ~2 hours Overview Game

More information

CS343 Artificial Intelligence

CS343 Artificial Intelligence CS343 Artificial Intelligence Prof: Department of Computer Science The University of Texas at Austin Good Morning, Colleagues Good Morning, Colleagues Are there any questions? Logistics Questions about

More information

CS 730/830: Intro AI. Prof. Wheeler Ruml. TA Bence Cserna. Thinking inside the box. 5 handouts: course info, project info, schedule, slides, asst 1

CS 730/830: Intro AI. Prof. Wheeler Ruml. TA Bence Cserna. Thinking inside the box. 5 handouts: course info, project info, schedule, slides, asst 1 CS 730/830: Intro AI Prof. Wheeler Ruml TA Bence Cserna Thinking inside the box. 5 handouts: course info, project info, schedule, slides, asst 1 Wheeler Ruml (UNH) Lecture 1, CS 730 1 / 23 My Definition

More information

CS325 Artificial Intelligence Ch. 5, Games!

CS325 Artificial Intelligence Ch. 5, Games! CS325 Artificial Intelligence Ch. 5, Games! Cengiz Günay, Emory Univ. vs. Spring 2013 Günay Ch. 5, Games! Spring 2013 1 / 19 AI in Games A lot of work is done on it. Why? Günay Ch. 5, Games! Spring 2013

More information

COMP9414/ 9814/ 3411: Artificial Intelligence. Week 2. Classifying AI Tasks

COMP9414/ 9814/ 3411: Artificial Intelligence. Week 2. Classifying AI Tasks COMP9414/ 9814/ 3411: Artificial Intelligence Week 2. Classifying AI Tasks Russell & Norvig, Chapter 2. COMP9414/9814/3411 18s1 Tasks & Agent Types 1 Examples of AI Tasks Week 2: Wumpus World, Robocup

More information

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Reinforcement Learning for CPS Safety Engineering Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Motivations Safety-critical duties desired by CPS? Autonomous vehicle control:

More information

COMP9414/ 9814/ 3411: Artificial Intelligence. 2. Environment Types. UNSW c Alan Blair,

COMP9414/ 9814/ 3411: Artificial Intelligence. 2. Environment Types. UNSW c Alan Blair, COMP9414/ 9814/ 3411: rtificial Intelligence 2. Environment Types COMP9414/9814/3411 16s1 Environments 1 gent Model sensors environment percepts actions? agent actuators COMP9414/9814/3411 16s1 Environments

More information

Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks

Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks 2015 IEEE Symposium Series on Computational Intelligence Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks Michiel van de Steeg Institute of Artificial Intelligence

More information

TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill

TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS Thomas Keller and Malte Helmert Presented by: Ryan Berryhill Outline Motivation Background THTS framework THTS algorithms Results Motivation Advances

More information

An Artificially Intelligent Ludo Player

An Artificially Intelligent Ludo Player An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported

More information