Why Planning ue.g. if we have a robot we want the robot to decide what to do; how to act to achieve our goals. Planning & Reinforcement Learning
|
|
- Miranda Hunter
- 6 years ago
- Views:
Transcription
1 Planning & Reinforcement Learning Slides borrowed from Sheila McIlraith, Kate Larson, and David Silver Why Planning ue.g. if we have a robot we want the robot to decide what to do; how to act to achieve our goals S384 University of Toronto 1 S384 University of Toronto 2 Planning vs. Search How to change the world to suit our needs. ritical issue: we need to reason about what the world will be like after doing a few actions. This aspect of planning is just like Search. utonomous gents for Space Exploration uutonomous planning, scheduling, control u NS: JPL and mes uremote gent Experiment (RX) u Deep Space 1 umars Exploration Rover (MER) GOL: Steven has coffee URRENTLY: robot in mailroom, has no coffee, coffee not made, Steven in office, etc. TO DO: goto lounge, make coffee, 3 4 1
2 Scheduling with ction hoices & Resource Requirements u Problems in supply chain management u HSTS (Hubble Space Telescope scheduler) u Workflow management ir Traffic ontrol Other pplications (cont.) u Route aircraft between runways and terminals. rafts must be kept safely separated. Safe distance depends on craft and mode of transport. Minimize taxi and wait time. pplications These applications require more than search. Not sufficient to simply find a sequence of action for transforming the world so as to achieve a goal state. uthese applications involve dealing with uncertainty. usensing the world and planning to sense the world so as to reduce uncertainty. ugenerating a plan that has high payoff or high expected payoff rather than simply achieving a fixed goal. urunning into problems when executing a plan and having to recover. uetc. haracter nimation u Generate step-by-step character behaviour from high-level spec Plan-based Interfaces u E.g. NLP to database interfaces u Plan recognition, ctivity Recognition 5 6 2
3 Planning u gent: single agent, or multi-agent u State: complete or incomplete (logical/probabilistic), state of the world and/or agent s state of knowledge u ctions: world-altering and/or knowledge-altering (e.g. sensing); deterministic or non-deterministic (logical/stochastic) u Goal ondition: satisfying or optimizing; final-state or temporally extended; optimizing for preference/cost/utility u Reasoning: offline or online (fully observable, partially observable) u Plans: partial order, sequential, conditional S384 University of Toronto 11 Simplifying the Planning Problem u We simplify the planning problem as follows: u ssume complete information about the initial state through closed world assumption (W) u ssume finite domain of objects u ssume action effects are restricted to making conjunctions of atomic formulae to be true or false. No conditional effects, etc. u ssume action preconditions are restricted to conjunctions of ground atoms u Perform lassical Planning. No incomplete or uncertain knowledge S384 University of Toronto 12 3
4 lassical Planning ssumptions STRIPS Representation u Finite System: finitely many states, actions, events u Fully Observable: controller always knows current state u Deterministic: each action has only one outcome u Static: changes only occur as result of controller actions u ttainment goals: a set of goal states S g u Sequential plans: plan is linearly ordered sequence of actions (a 1,, a n ) u Implicit time: actions are instantaneous (have no duration) u Off-line planning: planner doesn t know execution status S384 University of Toronto 13 ustrips (Stanford Research Institute Problem Solver) u way of representing actions with respect to W-K closed world knowledge base representing the state of the world S384 University of Toronto 14 Sequence of Worlds STRIPS ctions S384 University of Toronto 15 u Strips represent actions using 3 lists u list of preconditions. u list of action add effects. u list of action delete effects uthese lists contain variables so that we can represent a whole class of actions with one specification ueach ground instantation of the variables yields a specific action S384 University of Toronto 16 4
5 STRIPS ctions: Example STRIPS ctions: Example robot hand pickup(x): robot hand pickup(x): is called a STRIPS operator Pre: {handempty, clear(x), ontable(x)} dds: {holding(x)} Dels: {handempty, clear(x), ontable(x)} pickup(a): (a particular instance), is called an action S384 University of Toronto 17 S384 University of Toronto 18 STRIPS ctions: Example STRIPS ctions: Example robot hand putdown(x) robot hand stack(x,y) Pre: {holding(x)} dds: {clear(x), ontable(x), handempty} Dels: {holding(x)} Pre: {holding(x), clear(y)} dds: {on(x, Y), handempty, clear(x)} Dels: {holding(x), clear(y)} S384 University of Toronto 19 S384 University of Toronto 20 5
6 STRIPS has no onditional Effects stack(x,y) Pre: {holding(x), clear(y)} dds: {on(x, Y), handempty, clear(x)} Dels: {holding(x), clear(y)} u locks World assumption: Table has infinite space, so it is always clear u If we stack something on table (Y = table), we cannot delete clear(table) u ut if Y is an ordinary block, we must delete clear(y) STRIPS has no onditional Effects usince STRIPS has no conditional effects, we must sometimes utilize extra actions: one for each type of condition. uwe Embed the condition in the precondition and then alter the effects accordingly putdown(x) stack(x,y) Pre: {holding(x)} dds: {ontable(x), handempty, clear(x)} Dels: {holding(x)} Pre: {holding(x), clear(y)} dds: {on(x, Y), handempty, clear(x)} Dels: {holding(x), clear(y)} S384 University of Toronto 21 S384 University of Toronto 22 STRIPS ctions: Example STRIPS ctions: Example robot hand uunstack(x, Y) robot hand uunstack(x, Y) Pre: { } dds: { } Dels: { } Pre: {clear(x), on(x, Y), handempty} dds: {holding(x), clear(y)} Dels: {clear(x), on(x,y), handempty} S384 University of Toronto 23 S384 University of Toronto 24 6
7 Planning as a Search Problem u Given u W-K representing the initial state u set of STRIPS operators that map a state to a new state u goal conditions (conjunction of facts, or as a formula) The planning problem is to determine sequence of action that, when applied to the initial W-K yield an updated W-K which satisfies the goal. This is the classical planning task S384 University of Toronto 25 Planning s Search uthis is a search problem, in which our state space representation is a W-K uinitial W-K is the initial state uactions are operators mapping a state to a new state ugoal is satisfied by any state that satisfies the goal. Typically the goal is a conjunction of primitive facts, so we just need to check if all the facts in the goal are contained in the W-K S384 University of Toronto 26 Example Example move(b,c) move(c,table) move(c,b) S384 University of Toronto 27 move(a,b) S384 University of Toronto 28 7
8 Problems Planning Summary u Search tree is generally quite large u Randomly reconfiguring 9 blocks takes thousands of PU seconds u ut: representation suggests some structure u Each action only affects a small set of facts, u ctions depend on each other via their preconditions u Planning algorithms are designed to take advantage of fact that the representation makes the locality of action changes explicit S384 University of Toronto 29 umodel of the environment is known ugent performs computations with its model (without external interaction) ugent improves its policy udeliberation, reasoning, introspection, pondering, though, search S384 University of Toronto 30 ut what happens if the environment is unknown? How can we inform our agent of what actions to take? ussume: environment is initially unknown uonsider using a reward function, to guide agent uif agent doesn t know what actions to take utry an action out usee what the reward is, of taking that action uthis is Reinforcement Learning S384 University of Toronto 31 S384 University of Toronto 32 8
9 Reinforcement Learning Example: Tic Tac Toe ulearning what to do, so as to maximize some reward signal S384 University of Toronto 33 ustate: oard configuration uctions: Next move ureward: 1 for win, -1 for loss, 0 for draw uproblem: Find π: S that maximizes reward S384 University of Toronto 34 Example: Mobile Robot Example: tari u State: location of robot, people u ctions: Motion u Reward: number of happy faces uproblem: Find π: S that maximizes reward S384 University of Toronto 35 ustate: pixel location of game agents uctions: agent movement ureward: score uproblem: Find π: S that maximizes reward S384 University of Toronto 36 9
10 utonomous Helicopter Flight Quadruped Robot S384 University of Toronto 37 S384 University of Toronto 38 Reinforcement Learning ugoal: learn to choose actions that maximize REWRD = r 0 + γ r 1 + γ 2 r 2, where 0 < γ < 1 Reward u reward R t is a scalar feedback signal uindicates how well agent is doing at step t u The agent s job is to maximize cumulative reward Reward hypothesis: ll goals can be described by the maximization of expected cumulative reward S384 University of Toronto 39 S384 University of Toronto 40 10
11 Sequential Decision Making Exploration and Exploitation ugoal: select actions to maximize total future reward uctions may have long-term consequences ureward may be delayed umay be better to sacrifice immediate reward to gain more long-term reward (exploitation vs. exploration) ureinforcement learning is like trial-and-error learning ugents should discover a good policy ufrom its experiences of the environment (explore) uwithout losing too much of the reward along the way (exploit) S384 University of Toronto 41 S384 University of Toronto 42 gent s Learning Task Fully Observable Environment uexecute actions in the world uobserve the results ulearn policy π: S that maximizes reward from some initial state S384 University of Toronto 43 ufull observability: agent directly observes environment state ugent state = environment state = information state uformally, this is a Markov Decision Process (MDP) S384 University of Toronto 44 11
12 Partially Observable Environment RL gent u Partial observability: agent indirectly observes environment u E.g. robot with camera vision isn t told its absolute location u Trading agent only observes current prices u Poke playing agent only observes public cards u gent state environment state u Formally this is a partially observable Markov Decision Process (POMDP) u gent must construct its own state representation S a t u omplete history: St a = Ht u eliefs of environment state St a = (P[St e = s 1 ],, P[St e = s n ] u Recurrent Neural Network St a = σ(s 1./0 W 3 + O. W 6 ) S384 University of Toronto 45 un RL agent may include one or more of these components upolicy: agent s behaviour function uvalue function: how good is each state and/or action umodel: agent s representation of the environment S384 University of Toronto 46 Maze Example ustates: gent s location uctions: N, E, S, W urewards: -1 per time-step S384 University of Toronto 47 Maze Example Policy: gent s behaviour umap from state to action udeterministic policy a = π(s) ustochastic policy π(a s) = P( t = a S t = s) Each arrow represents policy π(s) for each state s S384 University of Toronto 48 12
13 Maze Example Value function: uprediction of future reward uused to evaluate goodness/badness of states v π (s) = E π [ R t+1 + γ R t+2 + S t = s] Numbers represent value function v π (s) of each state s S384 University of Toronto 49 Maze Example Model: u Predicts what the environment will do next u gent may have internal model of the environment, which determines u how actions change the state, and u how much reward should be given for each state. u Model may be imperfect S384 University of Toronto 50 RL gent RL gent S384 University of Toronto 51 umodel-ased: upolicy and/or Value Function umodel umodel-free: upolicy and/or Value Function uno Model S384 University of Toronto 52 13
Planning & Reinforcement Learning
Planning & Reinforcement Learning Slides borrowed from Sheila McIlraith, Kate Larson, and David Silver CSC384 University of Toronto 1 Why Planning ue.g. if we have a robot we want the robot to decide what
More informationIntelligent Agents. Introduction to Planning. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University. last change: 23.
Intelligent Agents Introduction to Planning Ute Schmid Cognitive Systems, Applied Computer Science, Bamberg University last change: 23. April 2012 U. Schmid (CogSys) Intelligent Agents last change: 23.
More informationLearning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer
Learning via Delayed Knowledge A Case of Jamming SaiDhiraj Amuru and R. Michael Buehrer 1 Why do we need an Intelligent Jammer? Dynamic environment conditions in electronic warfare scenarios failure of
More informationProf. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017
Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER 2017 April 6, 2017 Upcoming Misc. Check out course webpage and schedule Check out Canvas, especially for deadlines Do the survey by tomorrow,
More informationApplication of Artificial Neural Networks in Autonomous Mission Planning for Planetary Rovers
Application of Artificial Neural Networks in Autonomous Mission Planning for Planetary Rovers 1 Institute of Deep Space Exploration Technology, School of Aerospace Engineering, Beijing Institute of Technology,
More informationTUD Poker Challenge Reinforcement Learning with Imperfect Information
TUD Poker Challenge 2008 Reinforcement Learning with Imperfect Information Outline Reinforcement Learning Perfect Information Imperfect Information Lagging Anchor Algorithm Matrix Form Extensive Form Poker
More informationCMU-Q Lecture 20:
CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro ICE-CREAM WARS http://youtu.be/jilgxenbk_8 2 GAME THEORY Game theory is the formal study of conflict and cooperation in (rational) multi-agent
More informationLECTURE 26: GAME THEORY 1
15-382 COLLECTIVE INTELLIGENCE S18 LECTURE 26: GAME THEORY 1 INSTRUCTOR: GIANNI A. DI CARO ICE-CREAM WARS http://youtu.be/jilgxenbk_8 2 GAME THEORY Game theory is the formal study of conflict and cooperation
More informationIntelligent Agents & Search Problem Formulation. AIMA, Chapters 2,
Intelligent Agents & Search Problem Formulation AIMA, Chapters 2, 3.1-3.2 Outline for today s lecture Intelligent Agents (AIMA 2.1-2) Task Environments Formulating Search Problems CIS 421/521 - Intro to
More informationArtificial Intelligence. Minimax and alpha-beta pruning
Artificial Intelligence Minimax and alpha-beta pruning In which we examine the problems that arise when we try to plan ahead to get the best result in a world that includes a hostile agent (other agent
More informationAdministrivia. CS 188: Artificial Intelligence Spring Agents and Environments. Today. Vacuum-Cleaner World. A Reflex Vacuum-Cleaner
CS 188: Artificial Intelligence Spring 2006 Lecture 2: Agents 1/19/2006 Administrivia Reminder: Drop-in Python/Unix lab Friday 1-4pm, 275 Soda Hall Optional, but recommended Accommodation issues Project
More informationUnit 1: Introduction to Autonomous Robotics
Unit 1: Introduction to Autonomous Robotics Computer Science 4766/6778 Department of Computer Science Memorial University of Newfoundland January 16, 2009 COMP 4766/6778 (MUN) Course Introduction January
More informationCS510 \ Lecture Ariel Stolerman
CS510 \ Lecture04 2012-10-15 1 Ariel Stolerman Administration Assignment 2: just a programming assignment. Midterm: posted by next week (5), will cover: o Lectures o Readings A midterm review sheet will
More informationReinforcement Learning in Games Autonomous Learning Systems Seminar
Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract
More informationARTIFICIAL INTELLIGENCE (CS 370D)
Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-5) ADVERSARIAL SEARCH ADVERSARIAL SEARCH Optimal decisions Min algorithm α-β pruning Imperfect,
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 7: Minimax and Alpha-Beta Search 2/9/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein 1 Announcements W1 out and due Monday 4:59pm P2
More informationTowards Strategic Kriegspiel Play with Opponent Modeling
Towards Strategic Kriegspiel Play with Opponent Modeling Antonio Del Giudice and Piotr Gmytrasiewicz Department of Computer Science, University of Illinois at Chicago Chicago, IL, 60607-7053, USA E-mail:
More informationInformatics 2D: Tutorial 1 (Solutions)
Informatics 2D: Tutorial 1 (Solutions) Agents, Environment, Search Week 2 1 Agents and Environments Consider the following agents: A robot vacuum cleaner which follows a pre-set route around a house and
More informationGame-Playing & Adversarial Search
Game-Playing & Adversarial Search This lecture topic: Game-Playing & Adversarial Search (two lectures) Chapter 5.1-5.5 Next lecture topic: Constraint Satisfaction Problems (two lectures) Chapter 6.1-6.4,
More informationCSC384: Introduction to Artificial Intelligence. Game Tree Search
CSC384: Introduction to Artificial Intelligence Game Tree Search Chapter 5.1, 5.2, 5.3, 5.6 cover some of the material we cover here. Section 5.6 has an interesting overview of State-of-the-Art game playing
More informationGame Theory: Normal Form Games
Game Theory: Normal Form Games CPSC 322 Lecture 34 April 3, 2006 Reading: excerpt from Multiagent Systems, chapter 3. Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 1 Lecture Overview Recap
More informationDeepMind Self-Learning Atari Agent
DeepMind Self-Learning Atari Agent Human-level control through deep reinforcement learning Nature Vol 518, Feb 26, 2015 The Deep Mind of Demis Hassabis Backchannel / Medium.com interview with David Levy
More informationAnnouncements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters
CS 188: Artificial Intelligence Spring 2011 Announcements W1 out and due Monday 4:59pm P2 out and due next week Friday 4:59pm Lecture 7: Mini and Alpha-Beta Search 2/9/2011 Pieter Abbeel UC Berkeley Many
More informationMonte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar
Monte Carlo Tree Search and AlphaGo Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Zero-Sum Games and AI A player s utility gain or loss is exactly balanced by the combined gain or loss of opponents:
More informationSolving Coup as an MDP/POMDP
Solving Coup as an MDP/POMDP Semir Shafi Dept. of Computer Science Stanford University Stanford, USA semir@stanford.edu Adrien Truong Dept. of Computer Science Stanford University Stanford, USA aqtruong@stanford.edu
More informationIndoor vs Outdoor: Water, air, ground Autonomy depends on the need:
I and Robotics Whatis a robot? Different types of robots. utonomy What do you find on an autonomous robot? Planning and Robotics STRIPS onjunctive goals Horisontal architecture Subsumption architecture
More informationCS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s
CS88: Artificial Intelligence, Fall 20 Written 2: Games and MDP s Due: 0/5 submitted electronically by :59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators) but must be written
More informationAnnouncements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1
Announcements Homework 1 Due tonight at 11:59pm Project 1 Electronic HW1 Written HW1 Due Friday 2/8 at 4:00pm CS 188: Artificial Intelligence Adversarial Search and Game Trees Instructors: Sergey Levine
More informationUnit 1: Introduction to Autonomous Robotics
Unit 1: Introduction to Autonomous Robotics Computer Science 6912 Andrew Vardy Department of Computer Science Memorial University of Newfoundland May 13, 2016 COMP 6912 (MUN) Course Introduction May 13,
More informationCOS 402 Machine Learning and Artificial Intelligence Fall Lecture 1: Intro
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 1: Intro Sanjeev Arora Elad Hazan Today s Agenda Defining intelligence and AI state-of-the-art, goals Course outline AI by introspection
More informationCS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs
Last name: First name: SID: Class account login: Collaborators: CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Due: Monday 2/28 at 5:29pm either in lecture or in 283 Soda Drop Box (no slip days).
More informationGame Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game?
CSC384: Introduction to Artificial Intelligence Generalizing Search Problem Game Tree Search Chapter 5.1, 5.2, 5.3, 5.6 cover some of the material we cover here. Section 5.6 has an interesting overview
More information5.1 State-Space Search Problems
Foundations of Artificial Intelligence March 7, 2018 5. State-Space Search: State Spaces Foundations of Artificial Intelligence 5. State-Space Search: State Spaces Malte Helmert University of Basel March
More informationADVERSARIAL SEARCH. Chapter 5
ADVERSARIAL SEARCH Chapter 5... every game of skill is susceptible of being played by an automaton. from Charles Babbage, The Life of a Philosopher, 1832. Outline Games Perfect play minimax decisions α
More informationGame Tree Search. Generalizing Search Problems. Two-person Zero-Sum Games. Generalizing Search Problems. CSC384: Intro to Artificial Intelligence
CSC384: Intro to Artificial Intelligence Game Tree Search Chapter 6.1, 6.2, 6.3, 6.6 cover some of the material we cover here. Section 6.6 has an interesting overview of State-of-the-Art game playing programs.
More informationREINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING
REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING RIKA ANTONOVA ANTONOVA@KTH.SE ALI GHADIRZADEH ALGH@KTH.SE RL: What We Know So Far Formulate the problem as an MDP (or POMDP) State space captures
More informationCSE 573: Artificial Intelligence
CSE 573: Artificial Intelligence Adversarial Search Dan Weld Based on slides from Dan Klein, Stuart Russell, Pieter Abbeel, Andrew Moore and Luke Zettlemoyer (best illustrations from ai.berkeley.edu) 1
More informationMohammad Hossein Manshaei 1394
Mohammad Hossein Manshaei manshaei@gmail.com 394 Some Formal Definitions . First Mover or Second Mover?. Zermelo Theorem 3. Perfect Information/Pure Strategy 4. Imperfect Information/Information Set 5.
More informationCS 188: Artificial Intelligence. Overview
CS 188: Artificial Intelligence Lecture 6 and 7: Search for Games Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein 1 Overview Deterministic zero-sum games Minimax Limited depth and evaluation
More informationCS-E4800 Artificial Intelligence
CS-E4800 Artificial Intelligence Jussi Rintanen Department of Computer Science Aalto University March 9, 2017 Difficulties in Rational Collective Behavior Individual utility in conflict with collective
More informationCS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements
CS 171 Introduction to AI Lecture 1 Adversarial search Milos Hauskrecht milos@cs.pitt.edu 39 Sennott Square Announcements Homework assignment is out Programming and experiments Simulated annealing + Genetic
More informationAdversarial Search. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 9 Feb 2012
1 Hal Daumé III (me@hal3.name) Adversarial Search Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 9 Feb 2012 Many slides courtesy of Dan
More informationPlanning and Optimization
Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and Gabriele Röger Universität Basel October 11, 2017 Content of this Course Tasks Progression/ Regression Planning Complexity
More informationPDDL Representation domain file problem file Problem file Domain file Domain file problem files
HW3: Planning PDDL Planning Domain Description Language Based on STRIPS with various extensions Originally defined by Drew McDermott (Yale) and others Used in the biennial International Planning Competition
More informationPolicy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen
Policy Teaching Through Reward Function Learning Haoqi Zhang, David Parkes, and Yiling Chen School of Engineering and Applied Sciences Harvard University ACM EC 2009 Haoqi Zhang (Harvard University) Policy
More informationPlaying Games. Henry Z. Lo. June 23, We consider writing AI to play games with the following properties:
Playing Games Henry Z. Lo June 23, 2014 1 Games We consider writing AI to play games with the following properties: Two players. Determinism: no chance is involved; game state based purely on decisions
More informationCS 2710 Foundations of AI. Lecture 9. Adversarial search. CS 2710 Foundations of AI. Game search
CS 2710 Foundations of AI Lecture 9 Adversarial search Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square CS 2710 Foundations of AI Game search Game-playing programs developed by AI researchers since
More informationAn Empirical Evaluation of Policy Rollout for Clue
An Empirical Evaluation of Policy Rollout for Clue Eric Marshall Oregon State University M.S. Final Project marshaer@oregonstate.edu Adviser: Professor Alan Fern Abstract We model the popular board game
More informationGames (adversarial search problems)
Mustafa Jarrar: Lecture Notes on Games, Birzeit University, Palestine Fall Semester, 204 Artificial Intelligence Chapter 6 Games (adversarial search problems) Dr. Mustafa Jarrar Sina Institute, University
More informationArtificial Intelligence
Artificial Intelligence Adversarial Search Vibhav Gogate The University of Texas at Dallas Some material courtesy of Rina Dechter, Alex Ihler and Stuart Russell, Luke Zettlemoyer, Dan Weld Adversarial
More informationOutline. Introduction to AI. Artificial Intelligence. What is an AI? What is an AI? Agents Environments
Outline Introduction to AI ECE457 Applied Artificial Intelligence Fall 2007 Lecture #1 What is an AI? Russell & Norvig, chapter 1 Agents s Russell & Norvig, chapter 2 ECE457 Applied Artificial Intelligence
More informationReinforcement Learning
Reinforcement Learning Reinforcement Learning Assumptions we made so far: Known state space S Known transition model T(s, a, s ) Known reward function R(s) not realistic for many real agents Reinforcement
More informationDeveloping Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function
Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Davis Ancona and Jake Weiner Abstract In this report, we examine the plausibility of implementing a NEAT-based solution
More informationOverview of Challenges in the Development of Autonomous Mobile Robots. August 23, 2011
Overview of Challenges in the Development of Autonomous Mobile Robots August 23, 2011 What is in a Robot? Sensors Effectors and actuators (i.e., mechanical) Used for locomotion and manipulation Controllers
More informationPhilosophy. AI Slides (5e) c Lin
Philosophy 15 AI Slides (5e) c Lin Zuoquan@PKU 2003-2018 15 1 15 Philosophy 15.1 AI philosophy 15.2 Weak AI 15.3 Strong AI 15.4 Ethics 15.5 The future of AI AI Slides (5e) c Lin Zuoquan@PKU 2003-2018 15
More informationReading Robert Gibbons, A Primer in Game Theory, Harvester Wheatsheaf 1992.
Reading Robert Gibbons, A Primer in Game Theory, Harvester Wheatsheaf 1992. Additional readings could be assigned from time to time. They are an integral part of the class and you are expected to read
More informationIntroduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.)
Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.) Eric B. Laber February 12, 2008 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or,
More informationGame Playing State of the Art
Game Playing State of the Art Checkers: Chinook ended 40 year reign of human world champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer
More information2 person perfect information
Why Study Games? Games offer: Intellectual Engagement Abstraction Representability Performance Measure Not all games are suitable for AI research. We will restrict ourselves to 2 person perfect information
More informationSolving Problems by Searching
Solving Problems by Searching Berlin Chen 2005 Reference: 1. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Chapter 3 AI - Berlin Chen 1 Introduction Problem-Solving Agents vs. Reflex
More informationTutorial of Reinforcement: A Special Focus on Q-Learning
Tutorial of Reinforcement: A Special Focus on Q-Learning TINGWU WANG, MACHINE LEARNING GROUP, UNIVERSITY OF TORONTO Contents 1. Introduction 1. Discrete Domain vs. Continous Domain 2. Model Based vs. Model
More informationGame theory attempts to mathematically. capture behavior in strategic situations, or. games, in which an individual s success in
Game Theory Game theory attempts to mathematically capture behavior in strategic situations, or games, in which an individual s success in making choices depends on the choices of others. A game Γ consists
More informationfinal examination on May 31 Topics from the latter part of the course (covered in homework assignments 4-7) include:
The final examination on May 31 may test topics from any part of the course, but the emphasis will be on topic after the first three homework assignments, which were covered in the midterm. Topics from
More informationCS440/ECE448 Lecture 9: Minimax Search. Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017
CS440/ECE448 Lecture 9: Minimax Search Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017 Why study games? Games are a traditional hallmark of intelligence Games are easy to formalize
More informationCS 771 Artificial Intelligence. Adversarial Search
CS 771 Artificial Intelligence Adversarial Search Typical assumptions Two agents whose actions alternate Utility values for each agent are the opposite of the other This creates the adversarial situation
More informationReinforcement Learning in Robotic Task Domains with Deictic Descriptor Representation
Louisiana State University LSU Digital Commons LSU Doctoral Dissertations Graduate School 10-22-2018 Reinforcement Learning in Robotic Task Domains with Deictic Descriptor Representation Harry Paul Moore
More informationScheduling. Radek Mařík. April 28, 2015 FEE CTU, K Radek Mařík Scheduling April 28, / 48
Scheduling Radek Mařík FEE CTU, K13132 April 28, 2015 Radek Mařík (marikr@fel.cvut.cz) Scheduling April 28, 2015 1 / 48 Outline 1 Introduction to Scheduling Methodology Overview 2 Classification of Scheduling
More informationArtificial Intelligence
Artificial Intelligence CS482, CS682, MW 1 2:15, SEM 201, MS 227 Prerequisites: 302, 365 Instructor: Sushil Louis, sushil@cse.unr.edu, http://www.cse.unr.edu/~sushil Non-classical search - Path does not
More informationDeep Learning for Autonomous Driving
Deep Learning for Autonomous Driving Shai Shalev-Shwartz Mobileye IMVC dimension, March, 2016 S. Shalev-Shwartz is also affiliated with The Hebrew University Shai Shalev-Shwartz (MobilEye) DL for Autonomous
More informationAdversarial Search Lecture 7
Lecture 7 How can we use search to plan ahead when other agents are planning against us? 1 Agenda Games: context, history Searching via Minimax Scaling α β pruning Depth-limiting Evaluation functions Handling
More informationTraining a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente
Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Valentijn Muijrers 3275183 Valentijn.Muijrers@phil.uu.nl Supervisor: Gerard Vreeswijk 7,5 ECTS
More informationCS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions
CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions Slides by Svetlana Lazebnik, 9/2016 Modified by Mark Hasegawa Johnson, 9/2017 Types of game environments Perfect
More informationGame Theory and Randomized Algorithms
Game Theory and Randomized Algorithms Guy Aridor Game theory is a set of tools that allow us to understand how decisionmakers interact with each other. It has practical applications in economics, international
More informationCSC321 Lecture 23: Go
CSC321 Lecture 23: Go Roger Grosse Roger Grosse CSC321 Lecture 23: Go 1 / 21 Final Exam Friday, April 20, 9am-noon Last names A Y: Clara Benson Building (BN) 2N Last names Z: Clara Benson Building (BN)
More informationIntelligent Agents p.1/25. Intelligent Agents. Chapter 2
Intelligent Agents p.1/25 Intelligent Agents Chapter 2 Intelligent Agents p.2/25 Outline Agents and environments Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Environment types
More informationMath 152: Applicable Mathematics and Computing
Math 152: Applicable Mathematics and Computing April 16, 2017 April 16, 2017 1 / 17 Announcements Please bring a blue book for the midterm on Friday. Some students will be taking the exam in Center 201,
More informationCSE 591: Human-aware Robotics
CSE 591: Human-aware Robotics Instructor: Dr. Yu ( Tony ) Zhang Location & Times: CAVC 359, Tue/Thu, 9:00--10:15 AM Office Hours: BYENG 558, Tue/Thu, 10:30--11:30AM Nov 8, 2016 Slides adapted from Subbarao
More informationHIT3002: Introduction to Artificial Intelligence
HIT3002: Introduction to Artificial Intelligence Intelligent Agents Outline Agents and environments. The vacuum-cleaner world The concept of rational behavior. Environments. Agent structure. Swinburne
More informationDominant and Dominated Strategies
Dominant and Dominated Strategies Carlos Hurtado Department of Economics University of Illinois at Urbana-Champaign hrtdmrt2@illinois.edu Junel 8th, 2016 C. Hurtado (UIUC - Economics) Game Theory On the
More informationCandyCrush.ai: An AI Agent for Candy Crush
CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.
More informationAnalyzing Games.
Analyzing Games staffan.bjork@chalmers.se Structure of today s lecture Motives for analyzing games With a structural focus General components of games Example from course book Example from Rules of Play
More informationGame Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search
CSE 473: Artificial Intelligence Fall 2017 Adversarial Search Mini, pruning, Expecti Dieter Fox Based on slides adapted Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Dan Weld, Stuart Russell or Andrew Moore
More informationGame Playing for a Variant of Mancala Board Game (Pallanguzhi)
Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.
More informationUMBC 671 Midterm Exam 19 October 2009
Name: 0 1 2 3 4 5 6 total 0 20 25 30 30 25 20 150 UMBC 671 Midterm Exam 19 October 2009 Write all of your answers on this exam, which is closed book and consists of six problems, summing to 160 points.
More informationLearning from Hints: AI for Playing Threes
Learning from Hints: AI for Playing Threes Hao Sheng (haosheng), Chen Guo (cguo2) December 17, 2016 1 Introduction The highly addictive stochastic puzzle game Threes by Sirvo LLC. is Apple Game of the
More informationArtificial Intelligence
Artificial Intelligence CS482, CS682, MW 1 2:15, SEM 201, MS 227 Prerequisites: 302, 365 Instructor: Sushil Louis, sushil@cse.unr.edu, http://www.cse.unr.edu/~sushil Games and game trees Multi-agent systems
More informationTD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen
TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess Stefan Lüttgen Motivation Learn to play chess Computer approach different than human one Humans search more selective: Kasparov (3-5
More informationGame Tree Search 1/6/17
Game Tree Search /6/7 Frameworks for Decision-Making. Goal-directed planning Agents want to accomplish some goal. The agent will use search to devise a plan.. Utility maximization Agents ascribe a utility
More informationLearning and Using Models of Kicking Motions for Legged Robots
Learning and Using Models of Kicking Motions for Legged Robots Sonia Chernova and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 {soniac, mmv}@cs.cmu.edu Abstract
More informationSolving Problems by Searching: Adversarial Search
Course 440 : Introduction To rtificial Intelligence Lecture 5 Solving Problems by Searching: dversarial Search bdeslam Boularias Friday, October 7, 2016 1 / 24 Outline We examine the problems that arise
More informationAdversarial Search and Game Theory. CS 510 Lecture 5 October 26, 2017
Adversarial Search and Game Theory CS 510 Lecture 5 October 26, 2017 Reminders Proposals due today Midterm next week past midterms online Midterm online BBLearn Available Thurs-Sun, ~2 hours Overview Game
More informationCS343 Artificial Intelligence
CS343 Artificial Intelligence Prof: Department of Computer Science The University of Texas at Austin Good Morning, Colleagues Good Morning, Colleagues Are there any questions? Logistics Questions about
More informationCS 730/830: Intro AI. Prof. Wheeler Ruml. TA Bence Cserna. Thinking inside the box. 5 handouts: course info, project info, schedule, slides, asst 1
CS 730/830: Intro AI Prof. Wheeler Ruml TA Bence Cserna Thinking inside the box. 5 handouts: course info, project info, schedule, slides, asst 1 Wheeler Ruml (UNH) Lecture 1, CS 730 1 / 23 My Definition
More informationCS325 Artificial Intelligence Ch. 5, Games!
CS325 Artificial Intelligence Ch. 5, Games! Cengiz Günay, Emory Univ. vs. Spring 2013 Günay Ch. 5, Games! Spring 2013 1 / 19 AI in Games A lot of work is done on it. Why? Günay Ch. 5, Games! Spring 2013
More informationCOMP9414/ 9814/ 3411: Artificial Intelligence. Week 2. Classifying AI Tasks
COMP9414/ 9814/ 3411: Artificial Intelligence Week 2. Classifying AI Tasks Russell & Norvig, Chapter 2. COMP9414/9814/3411 18s1 Tasks & Agent Types 1 Examples of AI Tasks Week 2: Wumpus World, Robocup
More informationReinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara
Reinforcement Learning for CPS Safety Engineering Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Motivations Safety-critical duties desired by CPS? Autonomous vehicle control:
More informationCOMP9414/ 9814/ 3411: Artificial Intelligence. 2. Environment Types. UNSW c Alan Blair,
COMP9414/ 9814/ 3411: rtificial Intelligence 2. Environment Types COMP9414/9814/3411 16s1 Environments 1 gent Model sensors environment percepts actions? agent actuators COMP9414/9814/3411 16s1 Environments
More informationTemporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks
2015 IEEE Symposium Series on Computational Intelligence Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks Michiel van de Steeg Institute of Artificial Intelligence
More informationTRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill
TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS Thomas Keller and Malte Helmert Presented by: Ryan Berryhill Outline Motivation Background THTS framework THTS algorithms Results Motivation Advances
More informationAn Artificially Intelligent Ludo Player
An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported
More information