ECE 517: Reinforcement Learning in Artificial Intelligence
|
|
- Juniper Jackson
- 6 years ago
- Views:
Transcription
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: Case Studies and Gradient Policy October 29, 2015 Dr. Itamar Arel College of Engineering Department of Electrical Engineering and Computer Science The University of Tennessee Fall
2 Introduction We ll discuss several case studies of reinforcement learning Illustrate some of the trade-offs and issues that arise in real-world applications For example, we emphasize how domain knowledge is incorporated into the formulation and solution of the problem We also highlight the representation issues that are so often critical to successful applications Applications of RL are still far from routine and typically require as much art as science Making applications easier and more straightforward is one of the goals in current RL research ECE 517: Reinforcement Learning in AI 2
3 TD-Gammon (Tesauro s 1992, 1994, 1995, ) One of the most impressive early applications of RL to date is Gerry Tesauro s (IBM) game of backgammon TD-Gammon, required little backgammon knowledge, yet learned to play extremely well, near the level of the world's strongest grandmasters The learning algorithm was a straightforward combination of the TD(l) algorithm and nonlinear function approximation FA using a FFNN trained by backpropagating TD errors There are probably more professional backgammon players than there are professional chess players BG is in part a game of chance, which can be viewed as a large MDP ECE 517: Reinforcement Learning in AI 3
4 TD-Gammon (cont.) The game is played with 15 white and 15 black pieces on a board of 24 locations, called points Here s a typical position early in the game, seen from the perspective of the white player ECE 517: Reinforcement Learning in AI 4
5 TD-Gammon (cont.) White has just rolled a 5 and a 2, so it can move one of his pieces 5 and one (possibly the same) 2 steps The objective is to advance all pieces to points 19-24, and then off the board Hitting removal of single piece 30 pieces, 24 locations implies enormous number of configurations (state set is ~10 20 ) Effective branching factor of 400, considering that each dice roll has ~20 possibilities ECE 517: Reinforcement Learning in AI 5
6 TD-Gammon - details Although the game is highly stochastic, a complete description of the game's state is available at all times The estimated value of any state was meant to predict the probability of winning starting from that state Reward: 0 at all times except those in which the game is won, when it is 1 Episodic (game = episode), undiscounted Non-linear form of TD(l) using a FF neural network Backpropagation of TD error 4 input units for each point; unary encoding of number of white pieces, plus other features Use of after-states Learning during self-play fully incrementally ECE 517: Reinforcement Learning in AI 6
7 TD-Gammon Neural Network Employed ECE 517: Reinforcement Learning in AI 7
8 Summary of TD-Gammon Results Two players played against each other Each had no prior knowledge of the game Only the rules of the game were imposed Human s learn from machines: TD-Gammon learned to play certain opening positions differently than was the convention among the best human players ECE 517: Reinforcement Learning in AI 8
9 Rebuttal on TD-Gammon For an alternative view, see Why did TD-Gammon Work?, Jordan Pollack and Alan Blair, NIPS (1997) Claim: it was the co-evolutionary training strategy, playing games against itself, which led to the success No need for dealing with exploration/exploitation Any such approach would work with backgammon No sensitivity of state to value - success does not extend to other problems e.g. Tetris, maze-type problems exploration issue comes up ECE 517: Reinforcement Learning in AI 9
10 The Acrobot Robotic application of RL Roughly analogous to a gymnast swinging on a high bar The first joint (corresponding to the hands on the bar) cannot exert torque The second joint (corresponding to the gymnast bending at the waist) can This system has been widely studied by control engineers and machine learning researchers ECE 517: Reinforcement Learning in AI 10
11 The Acrobot (cont.) One objective for controlling the Acrobot is to swing the tip (the "feet") above the first joint by an amount equal to one of the links in minimum time In this task, the torque applied at the second joint is limited to three choices: positive torque of a fixed magnitude, negative torque of the same magnitude, or no torque A reward of 1 is given on all time steps until the goal is reached, which ends the episode. No discounting is used Thus, the optimal value of any state is the minimum time to reach the goal (an integer number of steps) Sutton (1996) addressed the Acrobot swing-up task in an on-line, model-free context ECE 517: Reinforcement Learning in AI 11
12 Acrobot Learning Curves for Sarsa(l) ECE 517: Reinforcement Learning in AI 12
13 Typical Acrobot Learned Behavior ECE 517: Reinforcement Learning in AI 13
14 RL in Robotics Robot motor capabilities were investigated using RL Walking, grabbing and delivering MIT Media Lab Robocup competitions soccer games Sony AIBOs are commonly employed Maze-type problems Balancing themselves on unstable platform Multi-dimensional input streams ECE 517: Reinforcement Learning in AI 14
15 Policy Gradient Methods Assume that our policy, p, has a set of n realvalued parameters, q = {q 1, q 2, q 3,..., q n } Running the policy with a particular q results in a reward, r q r Estimate the reward gradient,, for each q i θ i r θi θi θ i This is another learning rate ECE 517: Reinforcement Learning in AI 15
16 Policy Gradient Methods (cont.) This results in hill-climbing in policy space So, it s subject to all the problems of hill-climbing But, we can also use tricks from search theory, like random starting points and momentum terms This is a good approach if you have a parameterized policy Let s assume we have a reasonable starting policy Typically faster than value-based methods Safe exploration, if you have a good policy Learns locally-best parameters for that policy ECE 517: Reinforcement Learning in AI 16
17 An Example: Learning to Walk RoboCup 4-legged league Walking quickly is a big advantage Historically - tuned manually Robots have a parameterized gait controller 12 parameters Controls step length, height, etc. Robot walk across soccer field and is timed Reward is a function of the time taken They know when to stop (distance measure) ECE 517: Reinforcement Learning in AI 17
18 An Example: Learning to Walk (cont.) Basic idea 1. Pick an initial q = {q 1, q 2,..., q 12 } 2. Generate N testing parameter settings by perturbing q q j = {q 1 + d 1, q 2 + d 2,..., q 12 + d 12 }, d i {-e, 0, e} 3. Test each setting, and observe rewards q j r j 4. For each q i q Calculate q i+, q i0, q i- and set 5. Set q q, and go to 2 θ' i θ i d 0 d if if if θ θ θ i 0 i i largest largest largest Average reward when q n i = q i - d i ECE 517: Reinforcement Learning in AI 18
19 An Example: Learning to Walk (cont.) Initial Final ECE 517: Reinforcement Learning in AI 19
20 Value Function or Policy Gradient? When should I use policy gradient? When there s a parameterized policy When there s a high-dimensional state space When we expect the gradient to be smooth Typically on episodic tasks (e.g. AIBO walking) When should I use a value-based method? When there is no parameterized policy When we have no idea how to solve the problem (i.e. no known structure) ECE 517: Reinforcement Learning in AI 20
21 Summary RL is a powerful tool which can support a wide range of applications There is an art to defining the observations, states, rewards and actions Main goal: formulate as simple as possible representation Policy Gradient methods directly search in policy space ECE 517: Reinforcement Learning in AI 21
Reinforcement Learning
Reinforcement Learning Applications Andrea Bonarini Artificial Intelligence and Robotics Lab Department of Electronics and Information Politecnico di Milano E-mail: bonarini@elet.polimi.it URL:http://www.elet.polimi.it/~bonarini
More informationTD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play
NOTE Communicated by Richard Sutton TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play Gerald Tesauro IBM Thomas 1. Watson Research Center, I? 0. Box 704, Yorktozon Heights, NY 10598
More informationCS 331: Artificial Intelligence Adversarial Search II. Outline
CS 331: Artificial Intelligence Adversarial Search II 1 Outline 1. Evaluation Functions 2. State-of-the-art game playing programs 3. 2 player zero-sum finite stochastic games of perfect information 2 1
More informationReinforcement Learning in Games Autonomous Learning Systems Seminar
Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract
More informationGame Design Verification using Reinforcement Learning
Game Design Verification using Reinforcement Learning Eirini Ntoutsi Dimitris Kalles AHEAD Relationship Mediators S.A., 65 Othonos-Amalias St, 262 21 Patras, Greece and Department of Computer Engineering
More informationRISTO MIIKKULAINEN, SENTIENT (HTTP://VENTUREBEAT.COM/AUTHOR/RISTO-MIIKKULAINEN- SATIENT/) APRIL 3, :23 PM
1,2 Guest Machines are becoming more creative than humans RISTO MIIKKULAINEN, SENTIENT (HTTP://VENTUREBEAT.COM/AUTHOR/RISTO-MIIKKULAINEN- SATIENT/) APRIL 3, 2016 12:23 PM TAGS: ARTIFICIAL INTELLIGENCE
More informationArtificial Intelligence. Minimax and alpha-beta pruning
Artificial Intelligence Minimax and alpha-beta pruning In which we examine the problems that arise when we try to plan ahead to get the best result in a world that includes a hostile agent (other agent
More informationTEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS
TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS Thong B. Trinh, Anwer S. Bashi, Nikhil Deshpande Department of Electrical Engineering University of New Orleans New Orleans, LA 70148 Tel: (504) 280-7383 Fax:
More informationTemporal-Difference Learning in Self-Play Training
Temporal-Difference Learning in Self-Play Training Clifford Kotnik Jugal Kalita University of Colorado at Colorado Springs, Colorado Springs, Colorado 80918 CLKOTNIK@ATT.NET KALITA@EAS.UCCS.EDU Abstract
More informationAn Artificially Intelligent Ludo Player
An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported
More informationReinforcement Learning Simulations and Robotics
Reinforcement Learning Simulations and Robotics Models Partially observable noise in sensors Policy search methods rather than value functionbased approaches Isolate key parameters by choosing an appropriate
More informationTraining a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente
Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Valentijn Muijrers 3275183 Valentijn.Muijrers@phil.uu.nl Supervisor: Gerard Vreeswijk 7,5 ECTS
More informationREINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING
REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING RIKA ANTONOVA ANTONOVA@KTH.SE ALI GHADIRZADEH ALGH@KTH.SE RL: What We Know So Far Formulate the problem as an MDP (or POMDP) State space captures
More informationTUD Poker Challenge Reinforcement Learning with Imperfect Information
TUD Poker Challenge 2008 Reinforcement Learning with Imperfect Information Outline Reinforcement Learning Perfect Information Imperfect Information Lagging Anchor Algorithm Matrix Form Extensive Form Poker
More informationHow AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997)
How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997) Alan Fern School of Electrical Engineering and Computer Science Oregon State University Deep Mind s vs. Lee Sedol (2016) Watson vs. Ken
More informationOutline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game
Outline Game Playing ECE457 Applied Artificial Intelligence Fall 2007 Lecture #5 Types of games Playing a perfect game Minimax search Alpha-beta pruning Playing an imperfect game Real-time Imperfect information
More informationCS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions
CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions Slides by Svetlana Lazebnik, 9/2016 Modified by Mark Hasegawa Johnson, 9/2017 Types of game environments Perfect
More informationCOMP9414/ 9814/ 3411: Artificial Intelligence. Week 2. Classifying AI Tasks
COMP9414/ 9814/ 3411: Artificial Intelligence Week 2. Classifying AI Tasks Russell & Norvig, Chapter 2. COMP9414/9814/3411 18s1 Tasks & Agent Types 1 Examples of AI Tasks Week 2: Wumpus World, Robocup
More informationReinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara
Reinforcement Learning for CPS Safety Engineering Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Motivations Safety-critical duties desired by CPS? Autonomous vehicle control:
More informationEvolutionary robotics Jørgen Nordmoen
INF3480 Evolutionary robotics Jørgen Nordmoen Slides: Kyrre Glette Today: Evolutionary robotics Why evolutionary robotics Basics of evolutionary optimization INF3490 will discuss algorithms in detail Illustrating
More informationBootstrapping from Game Tree Search
Joel Veness David Silver Will Uther Alan Blair University of New South Wales NICTA University of Alberta December 9, 2009 Presentation Overview Introduction Overview Game Tree Search Evaluation Functions
More informationCS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón
CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH Santiago Ontañón so367@drexel.edu Recall: Adversarial Search Idea: When there is only one agent in the world, we can solve problems using DFS, BFS, ID,
More informationLearning a Value Analysis Tool For Agent Evaluation
Learning a Value Analysis Tool For Agent Evaluation Martha White Michael Bowling Department of Computer Science University of Alberta International Joint Conference on Artificial Intelligence, 2009 Motivation:
More informationThe UT Austin Villa 3D Simulation Soccer Team 2008
UT Austin Computer Sciences Technical Report AI09-01, February 2009. The UT Austin Villa 3D Simulation Soccer Team 2008 Shivaram Kalyanakrishnan, Yinon Bentor and Peter Stone Department of Computer Sciences
More informationAlphaGo and Artificial Intelligence GUEST LECTURE IN THE GAME OF GO AND SOCIETY
AlphaGo and Artificial Intelligence HUCK BENNET T (NORTHWESTERN UNIVERSITY) GUEST LECTURE IN THE GAME OF GO AND SOCIETY AT OCCIDENTAL COLLEGE, 10/29/2018 The Game of Go A game for aliens, presidents, and
More informationPresentation Overview. Bootstrapping from Game Tree Search. Game Tree Search. Heuristic Evaluation Function
Presentation Bootstrapping from Joel Veness David Silver Will Uther Alan Blair University of New South Wales NICTA University of Alberta A new algorithm will be presented for learning heuristic evaluation
More informationArtificial Intelligence Search III
Artificial Intelligence Search III Lecture 5 Content: Search III Quick Review on Lecture 4 Why Study Games? Game Playing as Search Special Characteristics of Game Playing Search Ingredients of 2-Person
More informationSolving Problems by Searching: Adversarial Search
Course 440 : Introduction To rtificial Intelligence Lecture 5 Solving Problems by Searching: dversarial Search bdeslam Boularias Friday, October 7, 2016 1 / 24 Outline We examine the problems that arise
More informationTutorial of Reinforcement: A Special Focus on Q-Learning
Tutorial of Reinforcement: A Special Focus on Q-Learning TINGWU WANG, MACHINE LEARNING GROUP, UNIVERSITY OF TORONTO Contents 1. Introduction 1. Discrete Domain vs. Continous Domain 2. Model Based vs. Model
More informationAugmenting Self-Learning In Chess Through Expert Imitation
Augmenting Self-Learning In Chess Through Expert Imitation Michael Xie Department of Computer Science Stanford University Stanford, CA 94305 xie@cs.stanford.edu Gene Lewis Department of Computer Science
More informationGoogle DeepMind s AlphaGo vs. world Go champion Lee Sedol
Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides
More informationThe UT Austin Villa 3D Simulation Soccer Team 2007
UT Austin Computer Sciences Technical Report AI07-348, September 2007. The UT Austin Villa 3D Simulation Soccer Team 2007 Shivaram Kalyanakrishnan and Peter Stone Department of Computer Sciences The University
More informationHybrid of Evolution and Reinforcement Learning for Othello Players
Hybrid of Evolution and Reinforcement Learning for Othello Players Kyung-Joong Kim, Heejin Choi and Sung-Bae Cho Dept. of Computer Science, Yonsei University 134 Shinchon-dong, Sudaemoon-ku, Seoul 12-749,
More informationContents. List of Figures
1 Contents 1 Introduction....................................... 3 1.1 Rules of the game............................... 3 1.2 Complexity of the game............................ 4 1.3 History of self-learning
More informationOptic Flow Based Skill Learning for A Humanoid to Trap, Approach to, and Pass a Ball
Optic Flow Based Skill Learning for A Humanoid to Trap, Approach to, and Pass a Ball Masaki Ogino 1, Masaaki Kikuchi 1, Jun ichiro Ooga 1, Masahiro Aono 1 and Minoru Asada 1,2 1 Dept. of Adaptive Machine
More informationLecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1
Lecture 14 Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1 Outline Chapter 5 - Adversarial Search Alpha-Beta Pruning Imperfect Real-Time Decisions Stochastic Games Friday,
More informationHierarchical Controller for Robotic Soccer
Hierarchical Controller for Robotic Soccer Byron Knoll Cognitive Systems 402 April 13, 2008 ABSTRACT RoboCup is an initiative aimed at advancing Artificial Intelligence (AI) and robotics research. This
More informationCreating an Agent of Doom: A Visual Reinforcement Learning Approach
Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering
More informationFive-In-Row with Local Evaluation and Beam Search
Five-In-Row with Local Evaluation and Beam Search Jiun-Hung Chen and Adrienne X. Wang jhchen@cs axwang@cs Abstract This report provides a brief overview of the game of five-in-row, also known as Go-Moku,
More informationThe UPennalizers RoboCup Standard Platform League Team Description Paper 2017
The UPennalizers RoboCup Standard Platform League Team Description Paper 2017 Yongbo Qian, Xiang Deng, Alex Baucom and Daniel D. Lee GRASP Lab, University of Pennsylvania, Philadelphia PA 19104, USA, https://www.grasp.upenn.edu/
More informationAnalysing and Exploiting Transitivity to Coevolve Neural Network Backgammon Players
Analysing and Exploiting Transitivity to Coevolve Neural Network Backgammon Players Mete Çakman Dissertation for Master of Science in Artificial Intelligence and Gaming Universiteit van Amsterdam August
More informationLearning and Using Models of Kicking Motions for Legged Robots
Learning and Using Models of Kicking Motions for Legged Robots Sonia Chernova and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 {soniac, mmv}@cs.cmu.edu Abstract
More informationAbalearn: Efficient Self-Play Learning of the game Abalone
Abalearn: Efficient Self-Play Learning of the game Abalone Pedro Campos and Thibault Langlois INESC-ID, Neural Networks and Signal Processing Group, Lisbon, Portugal {pfpc,tl}@neural.inesc.pt http://neural.inesc.pt/
More informationCS343 Introduction to Artificial Intelligence Spring 2012
CS343 Introduction to Artificial Intelligence Spring 2012 Prof: TA: Daniel Urieli Department of Computer Science The University of Texas at Austin Good Afternoon, Colleagues Welcome to a fun, but challenging
More informationCSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9
CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9 Learning to play blackjack In this assignment, you will implement
More informationAndrei Behel AC-43И 1
Andrei Behel AC-43И 1 History The game of Go originated in China more than 2,500 years ago. The rules of the game are simple: Players take turns to place black or white stones on a board, trying to capture
More information5.4 Imperfect, Real-Time Decisions
5.4 Imperfect, Real-Time Decisions Searching through the whole (pruned) game tree is too inefficient for any realistic game Moves must be made in a reasonable amount of time One has to cut off the generation
More informationNeural Networks for Real-time Pathfinding in Computer Games
Neural Networks for Real-time Pathfinding in Computer Games Ross Graham 1, Hugh McCabe 1 & Stephen Sheridan 1 1 School of Informatics and Engineering, Institute of Technology at Blanchardstown, Dublin
More informationProf. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017
Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER 2017 April 6, 2017 Upcoming Misc. Check out course webpage and schedule Check out Canvas, especially for deadlines Do the survey by tomorrow,
More information6. Games. COMP9414/ 9814/ 3411: Artificial Intelligence. Outline. Mechanical Turk. Origins. origins. motivation. minimax search
COMP9414/9814/3411 16s1 Games 1 COMP9414/ 9814/ 3411: Artificial Intelligence 6. Games Outline origins motivation Russell & Norvig, Chapter 5. minimax search resource limits and heuristic evaluation α-β
More informationAdversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5
Adversarial Search CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2017 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5 Outline Game
More informationFoundations of Artificial Intelligence
Foundations of Artificial Intelligence 6. Board Games Search Strategies for Games, Games with Chance, State of the Art Joschka Boedecker and Wolfram Burgard and Bernhard Nebel Albert-Ludwigs-Universität
More informationOverview. Origins. Idea of programming computers for "intelligent" behavior. First suggested by Alan Turing, 1950.
Lecture S2: Artificial Intelligence Lecture S2: Artificial Intelligence Overview Origins A whirlwind tour of Artificial Intelligence. Idea of programming computers for "intelligent" behavior. First suggested
More informationCS343 Introduction to Artificial Intelligence Spring 2010
CS343 Introduction to Artificial Intelligence Spring 2010 Prof: TA: Daniel Urieli Department of Computer Science The University of Texas at Austin Good Afternoon, Colleagues Welcome to a fun, but challenging
More informationMore Adversarial Search
More Adversarial Search CS151 David Kauchak Fall 2010 http://xkcd.com/761/ Some material borrowed from : Sara Owsley Sood and others Admin Written 2 posted Machine requirements for mancala Most of the
More informationCSE-571 AI-based Mobile Robotics
CSE-571 AI-based Mobile Robotics Approximation of POMDPs: Active Localization Localization so far: passive integration of sensor information Active Sensing and Reinforcement Learning 19 m 26.5 m Active
More informationTeaching a Neural Network to Play Konane
Teaching a Neural Network to Play Konane Darby Thompson Spring 5 Abstract A common approach to game playing in Artificial Intelligence involves the use of the Minimax algorithm and a static evaluation
More informationCS 188: Artificial Intelligence Spring Game Playing in Practice
CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein UC Berkeley Game Playing in Practice Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994.
More informationWhy did TD-Gammon Work?
Why did TD-Gammon Work? Jordan B. Pollack & Alan D. Blair Computer Science Department Brandeis University Waltham, MA 02254 {pollack,blair}@cs.brandeis.edu Abstract Although TD-Gammon is one of the major
More informationFoundations of Artificial Intelligence
Foundations of Artificial Intelligence 6. Board Games Search Strategies for Games, Games with Chance, State of the Art Joschka Boedecker and Wolfram Burgard and Frank Hutter and Bernhard Nebel Albert-Ludwigs-Universität
More informationCOMP219: Artificial Intelligence. Lecture 2: AI Problems and Applications
COMP219: Artificial Intelligence Lecture 2: AI Problems and Applications 1 Introduction Last time General module information Characterisation of AI and what it is about Today Overview of some common AI
More informationLearning of Position Evaluation in the Game of Othello
Learning of Position Evaluation in the Game of Othello Anton Leouski Master's Project: CMPSCI 701 Department of Computer Science University of Massachusetts Amherst, Massachusetts 0100 leouski@cs.umass.edu
More informationOnline Interactive Neuro-evolution
Appears in Neural Processing Letters, 1999. Online Interactive Neuro-evolution Adrian Agogino (agogino@ece.utexas.edu) Kenneth Stanley (kstanley@cs.utexas.edu) Risto Miikkulainen (risto@cs.utexas.edu)
More informationSuccess Stories of Deep RL. David Silver
Success Stories of Deep RL David Silver Reinforcement Learning (RL) RL is a general-purpose framework for decision-making An agent selects actions Its actions influence its future observations Success
More informationRandom Administrivia. In CMC 306 on Monday for LISP lab
Random Administrivia In CMC 306 on Monday for LISP lab Artificial Intelligence: Introduction What IS artificial intelligence? Examples of intelligent behavior: Definitions of AI There are as many definitions
More informationDecision Making in Multiplayer Environments Application in Backgammon Variants
Decision Making in Multiplayer Environments Application in Backgammon Variants PhD Thesis by Nikolaos Papahristou AI researcher Department of Applied Informatics Thessaloniki, Greece Contributions Expert
More informationGames and Adversarial Search
1 Games and Adversarial Search BBM 405 Fundamentals of Artificial Intelligence Pinar Duygulu Hacettepe University Slides are mostly adapted from AIMA, MIT Open Courseware and Svetlana Lazebnik (UIUC) Spring
More informationSPQR RoboCup 2016 Standard Platform League Qualification Report
SPQR RoboCup 2016 Standard Platform League Qualification Report V. Suriani, F. Riccio, L. Iocchi, D. Nardi Dipartimento di Ingegneria Informatica, Automatica e Gestionale Antonio Ruberti Sapienza Università
More informationFU-Fighters. The Soccer Robots of Freie Universität Berlin. Why RoboCup? What is RoboCup?
The Soccer Robots of Freie Universität Berlin We have been building autonomous mobile robots since 1998. Our team, composed of students and researchers from the Mathematics and Computer Science Department,
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Bart Selman Reinforcement Learning R&N Chapter 21 Note: in the next two parts of RL, some of the figure/section numbers refer to an earlier edition of R&N
More informationContents. Foundations of Artificial Intelligence. Problems. Why Board Games?
Contents Foundations of Artificial Intelligence 6. Board Games Search Strategies for Games, Games with Chance, State of the Art Wolfram Burgard, Bernhard Nebel, and Martin Riedmiller Albert-Ludwigs-Universität
More informationWhy we need to know what AI is. Overview. Artificial Intelligence is it finally arriving?
Artificial Intelligence is it finally arriving? Artificial Intelligence is it finally arriving? Are we nearly there yet? Leslie Smith Computing Science and Mathematics University of Stirling May 2 2013.
More informationRoboCup: Not Only a Robotics Soccer Game but also a New Market Created for Future
RoboCup: Not Only a Robotics Soccer Game but also a New Market Created for Future Kuo-Yang Tu Institute of Systems and Control Engineering National Kaohsiung First University of Science and Technology
More informationMultiple Agents. Why can t we all just get along? (Rodney King)
Multiple Agents Why can t we all just get along? (Rodney King) Nash Equilibriums........................................ 25 Multiple Nash Equilibriums................................. 26 Prisoners Dilemma.......................................
More informationPrinciples of Computer Game Design and Implementation. Lecture 20
Principles of Computer Game Design and Implementation Lecture 20 utline for today Sense-Think-Act Cycle: Thinking Acting 2 Agents and Virtual Player Agents, no virtual player Shooters, racing, Virtual
More informationCS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s
CS88: Artificial Intelligence, Fall 20 Written 2: Games and MDP s Due: 0/5 submitted electronically by :59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators) but must be written
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Slides borrowed from Katerina Fragkiadaki Solving known MDPs: Dynamic Programming Markov Decision Process (MDP)! A Markov Decision Process
More informationCS7032: AI & Agents: Ms Pac-Man vs Ghost League - AI controller project
CS7032: AI & Agents: Ms Pac-Man vs Ghost League - AI controller project TIMOTHY COSTIGAN 12263056 Trinity College Dublin This report discusses various approaches to implementing an AI for the Ms Pac-Man
More informationCSC321 Lecture 23: Go
CSC321 Lecture 23: Go Roger Grosse Roger Grosse CSC321 Lecture 23: Go 1 / 21 Final Exam Friday, April 20, 9am-noon Last names A Y: Clara Benson Building (BN) 2N Last names Z: Clara Benson Building (BN)
More informationOn the Design and Training of Bots to Play Backgammon Variants
On the Design and Training of Bots to Play Backgammon Variants Nikolaos Papahristou, Ioannis Refanidis To cite this version: Nikolaos Papahristou, Ioannis Refanidis. On the Design and Training of Bots
More informationReinforcement Learning of Local Shape in the Game of Go
Reinforcement Learning of Local Shape in the Game of Go David Silver, Richard Sutton, and Martin Müller Department of Computing Science University of Alberta Edmonton, Canada T6G 2E8 {silver, sutton, mmueller}@cs.ualberta.ca
More informationIMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN
IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN FACULTY OF COMPUTING AND INFORMATICS UNIVERSITY MALAYSIA SABAH 2014 ABSTRACT The use of Artificial Intelligence
More informationCOMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )
COMP3211 Project Artificial Intelligence for Tron game Group 7 Chiu Ka Wa (20369737) Chun Wai Wong (20265022) Ku Chun Kit (20123470) Abstract Tron is an old and popular game based on a movie of the same
More informationPlaying CHIP-8 Games with Reinforcement Learning
Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of
More informationPlaying Atari Games with Deep Reinforcement Learning
Playing Atari Games with Deep Reinforcement Learning 1 Playing Atari Games with Deep Reinforcement Learning Varsha Lalwani (varshajn@iitk.ac.in) Masare Akshay Sunil (amasare@iitk.ac.in) IIT Kanpur CS365A
More informationGame AI Challenges: Past, Present, and Future
Game AI Challenges: Past, Present, and Future Professor Michael Buro Computing Science, University of Alberta, Edmonton, Canada www.skatgame.net/cpcc2018.pdf 1/ 35 AI / ML Group @ University of Alberta
More informationBoard Representations for Neural Go Players Learning by Temporal Difference
Board Representations for Neural Go Players Learning by Temporal Difference Helmut A. Mayer Department of Computer Sciences Scientic Computing Unit University of Salzburg, AUSTRIA helmut@cosy.sbg.ac.at
More informationTD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen
TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess Stefan Lüttgen Motivation Learn to play chess Computer approach different than human one Humans search more selective: Kasparov (3-5
More informationRobo-Erectus Tr-2010 TeenSize Team Description Paper.
Robo-Erectus Tr-2010 TeenSize Team Description Paper. Buck Sin Ng, Carlos A. Acosta Calderon, Nguyen The Loan, Guohua Yu, Chin Hock Tey, Pik Kong Yue and Changjiu Zhou. Advanced Robotics and Intelligent
More information5.4 Imperfect, Real-Time Decisions
116 5.4 Imperfect, Real-Time Decisions Searching through the whole (pruned) game tree is too inefficient for any realistic game Moves must be made in a reasonable amount of time One has to cut off the
More informationIt s Over 400: Cooperative reinforcement learning through self-play
CIS 520 Spring 2018, Project Report It s Over 400: Cooperative reinforcement learning through self-play Team Members: Hadi Elzayn (PennKey: hads; Email: hads@sas.upenn.edu) Mohammad Fereydounian (PennKey:
More informationCS295-1 Final Project : AIBO
CS295-1 Final Project : AIBO Mert Akdere, Ethan F. Leland December 20, 2005 Abstract This document is the final report for our CS295-1 Sensor Data Management Course Final Project: Project AIBO. The main
More informationThe State of the Art in Robotics: RoboCup, Rescue, Entertainment, and More
22 nd World Gas Conference Tokyo 2003 SPECIAL ADDRESS (SA-3) The State of the Art in Robotics: RoboCup, Rescue, Entertainment, and More Dr. Hiroaki Kitano Project Director, ERATO Kitano Symbiotic Systems
More informationTemporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks
2015 IEEE Symposium Series on Computational Intelligence Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks Michiel van de Steeg Institute of Artificial Intelligence
More informationBirth of An Intelligent Humanoid Robot in Singapore
Birth of An Intelligent Humanoid Robot in Singapore Ming Xie Nanyang Technological University Singapore 639798 Email: mmxie@ntu.edu.sg Abstract. Since 1996, we have embarked into the journey of developing
More informationPlayful AI Education. Todd W. Neller Gettysburg College
Playful AI Education Todd W. Neller Gettysburg College Introduction Teachers teach best when sharing from the core of their enjoyment of the material. E.g. Those with enthusiasm for graphics should use
More informationReinforcement Learning in a Generalized Platform Game
Reinforcement Learning in a Generalized Platform Game Master s Thesis Artificial Intelligence Specialization Gaming Gijs Pannebakker Under supervision of Shimon Whiteson Universiteit van Amsterdam June
More informationUNIT 13A AI: Games & Search Strategies. Announcements
UNIT 13A AI: Games & Search Strategies 1 Announcements Do not forget to nominate your favorite CA bu emailing gkesden@gmail.com, No lecture on Friday, no recitation on Thursday No office hours Wednesday,
More informationNao Devils Dortmund. Team Description for RoboCup Matthias Hofmann, Ingmar Schwarz, and Oliver Urbann
Nao Devils Dortmund Team Description for RoboCup 2014 Matthias Hofmann, Ingmar Schwarz, and Oliver Urbann Robotics Research Institute Section Information Technology TU Dortmund University 44221 Dortmund,
More informationSINGLE SENSOR LINE FOLLOWER
SINGLE SENSOR LINE FOLLOWER One Sensor Line Following Sensor on edge of line If sensor is reading White: Robot is too far right and needs to turn left Black: Robot is too far left and needs to turn right
More information