Reinforcement Learning Simulations and Robotics

Size: px
Start display at page:

Download "Reinforcement Learning Simulations and Robotics"

Transcription

1 Reinforcement Learning Simulations and Robotics

2 Models Partially observable noise in sensors Policy search methods rather than value functionbased approaches Isolate key parameters by choosing an appropriate representation for a value function or policy Incorporating prior knowledge and transfer knowledge from simulations

3 Safety Key issue of the learning process Doesn't apply to the rest of the RL community Perkins and Barto RL agents based on Lyapunov functions Switching between the underlying controllers Always safe and offers basic performance guarantees.

4 Grid World Themed Movements Classical RL approaches Discrete states and actions Projected for navigational tasks Use actions like move to the cell to the left Use a lower level controller to take care of accelerating moving and stopping while ensuring precision

5 Quick Reward Shaping Rewards Quick success Real-world experience costly Specifying good reward functions Requires domain knowledge Difficult in practice Intermediate rewards instead of binary

6 Tracking Solution Used to help convergence The dynamics of a robot can change Temperature Wear on gears or motors Other external factors

7 Building an Accurate Model Challenging Requires very many data samples Under-modeling errors accumulate Simulated robot can quickly diverge from the real-world system Transfer requires significant modifications if model is not accurate

8 Approximate models Verifying and testing algorithms in simulation Establishing proximity to theoretical optimal solution Calculating approximate gradients for local policy improvement Identifying strategies for collecting more data Performing Mental rehearsal

9 Mental Rehearsal Practicing in simulation The simulated learning step Used after learning a forward model from real world Only the resulting policy is transferred to the robot Model-based methods Sample efficient Often requires a great deal of memory

10 Mental Rehearsal Issues Simulation Biases Stochasticity of the real world Efficient optimization when sampling from a simulator

11 Mental Rehearsal Solutions Add a stochastic model of distribution to your simulation Average results over model uncertainty Artificially add noise the the simulation Avoids policy over-fitting Smooths model errors Explicity model uncertainty

12 Grounded Simulation Learning Iterative optimization framework for speeding up robot learning using an imperfect simulator 1. Behavior is optimized in simulation 2. Behavior is tested on robot and compared to expected results from the simulation 3. Simulator is modified using machinelearning approach to come closer to reality

13 GSL: Fitness Sim Imperfect simulation of the robot Evaluates the parametrized behavior of the robot Function must be modifiable Used to make the simulation better match the real robot s behavior.

14 GSL Fitness Robot Small number of evaluations Evaluates the fitness of the parametrized behavior on the robot itself

15 GSL Explore Robot A small number of explorations can be run on the real robot Collect states and actions relevant to the current parameterization of the behavior While exploring

16 GSL Learn Used to learn a model of the effects of actions on state of the real robot. This model will be used to modify Fitness sim to make it better reflect the behavior on the real robot.

17 GSL Optimize In simulation Optimization to find better parameters

18 Ball in Cup Real Robot Example

19 Ball in Cup Real Robot Results episodes to get the ball n the cup episodes to be consistent Always converged tot he maximum after 100 episodes

20 Simulation in Robot RL Simulation matched recorded data very well Simulated policies usually missed First improve a demonstrated policy in simulation and only perform the fine-tuning on the real robot Importance sampler Considers only the n best previous episodes

21 SARSA Popular base RL algorithm for robotics Compatible with Q-Value reuse The mapping Q-Value Reuse function

22 Q-Value Reuse

23 Transfer Methods

24

25 Weak Transfer: Time spent in source task doesn't count against the learner in the target Strong Tranfer: Source time does count

26

27 Two Step Transfer Learned sequentially from multiple source tasks The Q-Value Reuse function for two step

Overview Agents, environments, typical components

Overview Agents, environments, typical components Overview Agents, environments, typical components CSC752 Autonomous Robotic Systems Ubbo Visser Department of Computer Science University of Miami January 23, 2017 Outline 1 Autonomous robots 2 Agents

More information

Tutorial of Reinforcement: A Special Focus on Q-Learning

Tutorial of Reinforcement: A Special Focus on Q-Learning Tutorial of Reinforcement: A Special Focus on Q-Learning TINGWU WANG, MACHINE LEARNING GROUP, UNIVERSITY OF TORONTO Contents 1. Introduction 1. Discrete Domain vs. Continous Domain 2. Model Based vs. Model

More information

Vishnu Nath. Usage of computer vision and humanoid robotics to create autonomous robots. (Ximea Currera RL04C Camera Kit)

Vishnu Nath. Usage of computer vision and humanoid robotics to create autonomous robots. (Ximea Currera RL04C Camera Kit) Vishnu Nath Usage of computer vision and humanoid robotics to create autonomous robots (Ximea Currera RL04C Camera Kit) Acknowledgements Firstly, I would like to thank Ivan Klimkovic of Ximea Corporation,

More information

CSE-571 AI-based Mobile Robotics

CSE-571 AI-based Mobile Robotics CSE-571 AI-based Mobile Robotics Approximation of POMDPs: Active Localization Localization so far: passive integration of sensor information Active Sensing and Reinforcement Learning 19 m 26.5 m Active

More information

ECE 517: Reinforcement Learning in Artificial Intelligence

ECE 517: Reinforcement Learning in Artificial Intelligence ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: Case Studies and Gradient Policy October 29, 2015 Dr. Itamar Arel College of Engineering Department of Electrical Engineering and

More information

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING RIKA ANTONOVA ANTONOVA@KTH.SE ALI GHADIRZADEH ALGH@KTH.SE RL: What We Know So Far Formulate the problem as an MDP (or POMDP) State space captures

More information

TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill

TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS Thomas Keller and Malte Helmert Presented by: Ryan Berryhill Outline Motivation Background THTS framework THTS algorithms Results Motivation Advances

More information

BRIDGING THE GAP: LEARNING IN THE ROBOCUP SIMULATION AND MIDSIZE LEAGUE

BRIDGING THE GAP: LEARNING IN THE ROBOCUP SIMULATION AND MIDSIZE LEAGUE BRIDGING THE GAP: LEARNING IN THE ROBOCUP SIMULATION AND MIDSIZE LEAGUE Thomas Gabel, Roland Hafner, Sascha Lange, Martin Lauer, Martin Riedmiller University of Osnabrück, Institute of Cognitive Science

More information

The next level of intelligence: Artificial Intelligence. Innovation Day USA 2017 Princeton, March 27, 2017 Michael May, Siemens Corporate Technology

The next level of intelligence: Artificial Intelligence. Innovation Day USA 2017 Princeton, March 27, 2017 Michael May, Siemens Corporate Technology The next level of intelligence: Artificial Intelligence Innovation Day USA 2017 Princeton, March 27, 2017, Siemens Corporate Technology siemens.com/innovationusa Notes and forward-looking statements This

More information

CPS331 Lecture: Intelligent Agents last revised July 25, 2018

CPS331 Lecture: Intelligent Agents last revised July 25, 2018 CPS331 Lecture: Intelligent Agents last revised July 25, 2018 Objectives: 1. To introduce the basic notion of an agent 2. To discuss various types of agents Materials: 1. Projectable of Russell and Norvig

More information

TUD Poker Challenge Reinforcement Learning with Imperfect Information

TUD Poker Challenge Reinforcement Learning with Imperfect Information TUD Poker Challenge 2008 Reinforcement Learning with Imperfect Information Outline Reinforcement Learning Perfect Information Imperfect Information Lagging Anchor Algorithm Matrix Form Extensive Form Poker

More information

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play NOTE Communicated by Richard Sutton TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play Gerald Tesauro IBM Thomas 1. Watson Research Center, I? 0. Box 704, Yorktozon Heights, NY 10598

More information

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Reinforcement Learning for CPS Safety Engineering Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Motivations Safety-critical duties desired by CPS? Autonomous vehicle control:

More information

Using Policy Gradient Reinforcement Learning on Autonomous Robot Controllers

Using Policy Gradient Reinforcement Learning on Autonomous Robot Controllers Using Policy Gradient Reinforcement on Autonomous Robot Controllers Gregory Z. Grudic Department of Computer Science University of Colorado Boulder, CO 80309-0430 USA Lyle Ungar Computer and Information

More information

Biologically Inspired Embodied Evolution of Survival

Biologically Inspired Embodied Evolution of Survival Biologically Inspired Embodied Evolution of Survival Stefan Elfwing 1,2 Eiji Uchibe 2 Kenji Doya 2 Henrik I. Christensen 1 1 Centre for Autonomous Systems, Numerical Analysis and Computer Science, Royal

More information

Deep Learning for Autonomous Driving

Deep Learning for Autonomous Driving Deep Learning for Autonomous Driving Shai Shalev-Shwartz Mobileye IMVC dimension, March, 2016 S. Shalev-Shwartz is also affiliated with The Hebrew University Shai Shalev-Shwartz (MobilEye) DL for Autonomous

More information

Mental rehearsal to enhance navigation learning.

Mental rehearsal to enhance navigation learning. Mental rehearsal to enhance navigation learning. K.Verschuren July 12, 2010 Student name Koen Verschuren Telephone 0612214854 Studentnumber 0504289 E-mail adress Supervisors K.Verschuren@student.ru.nl

More information

Informatics 2D: Tutorial 1 (Solutions)

Informatics 2D: Tutorial 1 (Solutions) Informatics 2D: Tutorial 1 (Solutions) Agents, Environment, Search Week 2 1 Agents and Environments Consider the following agents: A robot vacuum cleaner which follows a pre-set route around a house and

More information

CS343 Artificial Intelligence

CS343 Artificial Intelligence CS343 Artificial Intelligence Prof: Department of Computer Science The University of Texas at Austin Good Morning, Colleagues Good Morning, Colleagues Are there any questions? Logistics Questions about

More information

Hierarchical Controller for Robotic Soccer

Hierarchical Controller for Robotic Soccer Hierarchical Controller for Robotic Soccer Byron Knoll Cognitive Systems 402 April 13, 2008 ABSTRACT RoboCup is an initiative aimed at advancing Artificial Intelligence (AI) and robotics research. This

More information

Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.)

Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.) Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.) Eric B. Laber February 12, 2008 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or,

More information

Intelligent Agents p.1/25. Intelligent Agents. Chapter 2

Intelligent Agents p.1/25. Intelligent Agents. Chapter 2 Intelligent Agents p.1/25 Intelligent Agents Chapter 2 Intelligent Agents p.2/25 Outline Agents and environments Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Environment types

More information

Swarm Intelligence W7: Application of Machine- Learning Techniques to Automatic Control Design and Optimization

Swarm Intelligence W7: Application of Machine- Learning Techniques to Automatic Control Design and Optimization Swarm Intelligence W7: Application of Machine- Learning Techniques to Automatic Control Design and Optimization Learning to avoid obstacles Outline Problem encoding using GA and ANN Floreano and Mondada

More information

Real-World Reinforcement Learning for Autonomous Humanoid Robot Charging in a Home Environment

Real-World Reinforcement Learning for Autonomous Humanoid Robot Charging in a Home Environment Real-World Reinforcement Learning for Autonomous Humanoid Robot Charging in a Home Environment Nicolás Navarro, Cornelius Weber, and Stefan Wermter University of Hamburg, Department of Computer Science,

More information

Playing CHIP-8 Games with Reinforcement Learning

Playing CHIP-8 Games with Reinforcement Learning Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of

More information

Agent. Pengju Ren. Institute of Artificial Intelligence and Robotics

Agent. Pengju Ren. Institute of Artificial Intelligence and Robotics Agent Pengju Ren Institute of Artificial Intelligence and Robotics pengjuren@xjtu.edu.cn 1 Review: What is AI? Artificial intelligence (AI) is intelligence exhibited by machines. In computer science, the

More information

Q Learning Behavior on Autonomous Navigation of Physical Robot

Q Learning Behavior on Autonomous Navigation of Physical Robot The 8th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI 211) Nov. 23-26, 211 in Songdo ConventiA, Incheon, Korea Q Learning Behavior on Autonomous Navigation of Physical Robot

More information

Intelligent Agents & Search Problem Formulation. AIMA, Chapters 2,

Intelligent Agents & Search Problem Formulation. AIMA, Chapters 2, Intelligent Agents & Search Problem Formulation AIMA, Chapters 2, 3.1-3.2 Outline for today s lecture Intelligent Agents (AIMA 2.1-2) Task Environments Formulating Search Problems CIS 421/521 - Intro to

More information

City Research Online. Permanent City Research Online URL:

City Research Online. Permanent City Research Online URL: Child, C. H. T. & Trusler, B. P. (2014). Implementing Racing AI using Q-Learning and Steering Behaviours. Paper presented at the GAMEON 2014 (15th annual European Conference on Simulation and AI in Computer

More information

USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES

USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES Thomas Hartley, Quasim Mehdi, Norman Gough The Research Institute in Advanced Technologies (RIATec) School of Computing and Information

More information

COOPERATIVE STRATEGY BASED ON ADAPTIVE Q- LEARNING FOR ROBOT SOCCER SYSTEMS

COOPERATIVE STRATEGY BASED ON ADAPTIVE Q- LEARNING FOR ROBOT SOCCER SYSTEMS COOPERATIVE STRATEGY BASED ON ADAPTIVE Q- LEARNING FOR ROBOT SOCCER SYSTEMS Soft Computing Alfonso Martínez del Hoyo Canterla 1 Table of contents 1. Introduction... 3 2. Cooperative strategy design...

More information

Plan Execution Monitoring through Detection of Unmet Expectations about Action Outcomes

Plan Execution Monitoring through Detection of Unmet Expectations about Action Outcomes Plan Execution Monitoring through Detection of Unmet Expectations about Action Outcomes Juan Pablo Mendoza 1, Manuela Veloso 2 and Reid Simmons 3 Abstract Modeling the effects of actions based on the state

More information

SPQR RoboCup 2016 Standard Platform League Qualification Report

SPQR RoboCup 2016 Standard Platform League Qualification Report SPQR RoboCup 2016 Standard Platform League Qualification Report V. Suriani, F. Riccio, L. Iocchi, D. Nardi Dipartimento di Ingegneria Informatica, Automatica e Gestionale Antonio Ruberti Sapienza Università

More information

A Comparison of PSO and Reinforcement Learning for Multi-Robot Obstacle Avoidance

A Comparison of PSO and Reinforcement Learning for Multi-Robot Obstacle Avoidance A Comparison of PSO and Reinforcement Learning for Multi-Robot Obstacle Avoidance Ezequiel Di Mario, Zeynab Talebpour, and Alcherio Martinoli Distributed Intelligent Systems and Algorithms Laboratory École

More information

the question of whether computers can think is like the question of whether submarines can swim -- Dijkstra

the question of whether computers can think is like the question of whether submarines can swim -- Dijkstra the question of whether computers can think is like the question of whether submarines can swim -- Dijkstra Game AI: The set of algorithms, representations, tools, and tricks that support the creation

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

Administrivia. CS 188: Artificial Intelligence Spring Agents and Environments. Today. Vacuum-Cleaner World. A Reflex Vacuum-Cleaner

Administrivia. CS 188: Artificial Intelligence Spring Agents and Environments. Today. Vacuum-Cleaner World. A Reflex Vacuum-Cleaner CS 188: Artificial Intelligence Spring 2006 Lecture 2: Agents 1/19/2006 Administrivia Reminder: Drop-in Python/Unix lab Friday 1-4pm, 275 Soda Hall Optional, but recommended Accommodation issues Project

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Reinforcement Learning Assumptions we made so far: Known state space S Known transition model T(s, a, s ) Known reward function R(s) not realistic for many real agents Reinforcement

More information

Multi robot Team Formation for Distributed Area Coverage. Raj Dasgupta Computer Science Department University of Nebraska, Omaha

Multi robot Team Formation for Distributed Area Coverage. Raj Dasgupta Computer Science Department University of Nebraska, Omaha Multi robot Team Formation for Distributed Area Coverage Raj Dasgupta Computer Science Department University of Nebraska, Omaha C MANTIC Lab Collaborative Multi AgeNt/Multi robot Technologies for Intelligent

More information

A Novel Cognitive Anti-jamming Stochastic Game

A Novel Cognitive Anti-jamming Stochastic Game A Novel Cognitive Anti-jamming Stochastic Game Mohamed Aref and Sudharman K. Jayaweera Communication and Information Sciences Laboratory (CISL) ECE, University of New Mexico, Albuquerque, NM and Bluecom

More information

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Davis Ancona and Jake Weiner Abstract In this report, we examine the plausibility of implementing a NEAT-based solution

More information

Learning to play Dominoes

Learning to play Dominoes Learning to play Dominoes Ivan de Jesus P. Pinto 1, Mateus R. Pereira 1, Luciano Reis Coutinho 1 1 Departamento de Informática Universidade Federal do Maranhão São Luís,MA Brazil navi1921@gmail.com, mateus.rp.slz@gmail.com,

More information

HIT3002: Introduction to Artificial Intelligence

HIT3002: Introduction to Artificial Intelligence HIT3002: Introduction to Artificial Intelligence Intelligent Agents Outline Agents and environments. The vacuum-cleaner world The concept of rational behavior. Environments. Agent structure. Swinburne

More information

COMPACT FUZZY Q LEARNING FOR AUTONOMOUS MOBILE ROBOT NAVIGATION

COMPACT FUZZY Q LEARNING FOR AUTONOMOUS MOBILE ROBOT NAVIGATION COMPACT FUZZY Q LEARNING FOR AUTONOMOUS MOBILE ROBOT NAVIGATION Handy Wicaksono, Khairul Anam 2, Prihastono 3, Indra Adjie Sulistijono 4, Son Kuswadi 5 Department of Electrical Engineering, Petra Christian

More information

Embodiment from Engineer s Point of View

Embodiment from Engineer s Point of View New Trends in CS Embodiment from Engineer s Point of View Andrej Lúčny Department of Applied Informatics FMFI UK Bratislava lucny@fmph.uniba.sk www.microstep-mis.com/~andy 1 Cognitivism Cognitivism is

More information

CPS331 Lecture: Agents and Robots last revised November 18, 2016

CPS331 Lecture: Agents and Robots last revised November 18, 2016 CPS331 Lecture: Agents and Robots last revised November 18, 2016 Objectives: 1. To introduce the basic notion of an agent 2. To discuss various types of agents 3. To introduce the subsumption architecture

More information

Evolutionary robotics Jørgen Nordmoen

Evolutionary robotics Jørgen Nordmoen INF3480 Evolutionary robotics Jørgen Nordmoen Slides: Kyrre Glette Today: Evolutionary robotics Why evolutionary robotics Basics of evolutionary optimization INF3490 will discuss algorithms in detail Illustrating

More information

Russell and Norvig: an active, artificial agent. continuum of physical configurations and motions

Russell and Norvig: an active, artificial agent. continuum of physical configurations and motions Chapter 8 Robotics Christian Jacob jacob@cpsc.ucalgary.ca Department of Computer Science University of Calgary 8.5 Robot Institute of America defines a robot as a reprogrammable, multifunction manipulator

More information

Section Marks Agents / 8. Search / 10. Games / 13. Logic / 15. Total / 46

Section Marks Agents / 8. Search / 10. Games / 13. Logic / 15. Total / 46 Name: CS 331 Midterm Spring 2017 You have 50 minutes to complete this midterm. You are only allowed to use your textbook, your notes, your assignments and solutions to those assignments during this midterm.

More information

Disturbance Rejection Using Self-Tuning ARMARKOV Adaptive Control with Simultaneous Identification

Disturbance Rejection Using Self-Tuning ARMARKOV Adaptive Control with Simultaneous Identification IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 9, NO. 1, JANUARY 2001 101 Disturbance Rejection Using Self-Tuning ARMARKOV Adaptive Control with Simultaneous Identification Harshad S. Sane, Ravinder

More information

ENHANCED HUMAN-AGENT INTERACTION: AUGMENTING INTERACTION MODELS WITH EMBODIED AGENTS BY SERAFIN BENTO. MASTER OF SCIENCE in INFORMATION SYSTEMS

ENHANCED HUMAN-AGENT INTERACTION: AUGMENTING INTERACTION MODELS WITH EMBODIED AGENTS BY SERAFIN BENTO. MASTER OF SCIENCE in INFORMATION SYSTEMS BY SERAFIN BENTO MASTER OF SCIENCE in INFORMATION SYSTEMS Edmonton, Alberta September, 2015 ABSTRACT The popularity of software agents demands for more comprehensive HAI design processes. The outcome of

More information

Mutliplayer Snake AI

Mutliplayer Snake AI Mutliplayer Snake AI CS221 Project Final Report Felix CREVIER, Sebastien DUBOIS, Sebastien LEVY 12/16/2016 Abstract This project is focused on the implementation of AI strategies for a tailor-made game

More information

[31] S. Koenig, C. Tovey, and W. Halliburton. Greedy mapping of terrain.

[31] S. Koenig, C. Tovey, and W. Halliburton. Greedy mapping of terrain. References [1] R. Arkin. Motor schema based navigation for a mobile robot: An approach to programming by behavior. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA),

More information

Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer

Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer Learning via Delayed Knowledge A Case of Jamming SaiDhiraj Amuru and R. Michael Buehrer 1 Why do we need an Intelligent Jammer? Dynamic environment conditions in electronic warfare scenarios failure of

More information

CS494/594: Software for Intelligent Robotics

CS494/594: Software for Intelligent Robotics CS494/594: Software for Intelligent Robotics Spring 2007 Tuesday/Thursday 11:10 12:25 Instructor: Dr. Lynne E. Parker TA: Rasko Pjesivac Outline Overview syllabus and class policies Introduction to class:

More information

Push Path Improvement with Policy based Reinforcement Learning

Push Path Improvement with Policy based Reinforcement Learning 1 Push Path Improvement with Policy based Reinforcement Learning Junhu He TAMS Department of Informatics University of Hamburg Cross-modal Interaction In Natural and Artificial Cognitive Systems (CINACS)

More information

Ball Dribbling for Humanoid Biped Robots: A Reinforcement Learning and Fuzzy Control Approach

Ball Dribbling for Humanoid Biped Robots: A Reinforcement Learning and Fuzzy Control Approach Ball Dribbling for Humanoid Biped Robots: A Reinforcement Learning and Fuzzy Control Approach Leonardo Leottau, Carlos Celemin, Javier Ruiz-del-Solar Advanced Mining Technology Center & Dept. of Elect.

More information

Structure and Synthesis of Robot Motion

Structure and Synthesis of Robot Motion Structure and Synthesis of Robot Motion Motion Synthesis in Groups and Formations I Subramanian Ramamoorthy School of Informatics 5 March 2012 Consider Motion Problems with Many Agents How should we model

More information

E190Q Lecture 15 Autonomous Robot Navigation

E190Q Lecture 15 Autonomous Robot Navigation E190Q Lecture 15 Autonomous Robot Navigation Instructor: Chris Clark Semester: Spring 2014 1 Figures courtesy of Probabilistic Robotics (Thrun et. Al.) Control Structures Planning Based Control Prior Knowledge

More information

Multi-Robot Teamwork Cooperative Multi-Robot Systems

Multi-Robot Teamwork Cooperative Multi-Robot Systems Multi-Robot Teamwork Cooperative Lecture 1: Basic Concepts Gal A. Kaminka galk@cs.biu.ac.il 2 Why Robotics? Basic Science Study mechanics, energy, physiology, embodiment Cybernetics: the mind (rather than

More information

Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen

Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen Policy Teaching Through Reward Function Learning Haoqi Zhang, David Parkes, and Yiling Chen School of Engineering and Applied Sciences Harvard University ACM EC 2009 Haoqi Zhang (Harvard University) Policy

More information

CS 730/830: Intro AI. Prof. Wheeler Ruml. TA Bence Cserna. Thinking inside the box. 5 handouts: course info, project info, schedule, slides, asst 1

CS 730/830: Intro AI. Prof. Wheeler Ruml. TA Bence Cserna. Thinking inside the box. 5 handouts: course info, project info, schedule, slides, asst 1 CS 730/830: Intro AI Prof. Wheeler Ruml TA Bence Cserna Thinking inside the box. 5 handouts: course info, project info, schedule, slides, asst 1 Wheeler Ruml (UNH) Lecture 1, CS 730 1 / 23 My Definition

More information

the question of whether computers can think is like the question of whether submarines can swim -- Dijkstra

the question of whether computers can think is like the question of whether submarines can swim -- Dijkstra the question of whether computers can think is like the question of whether submarines can swim -- Dijkstra Game AI: The set of algorithms, representations, tools, and tricks that support the creation

More information

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Creating an Agent of Doom: A Visual Reinforcement Learning Approach Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering

More information

Reinforcement Learning in Robotic Task Domains with Deictic Descriptor Representation

Reinforcement Learning in Robotic Task Domains with Deictic Descriptor Representation Louisiana State University LSU Digital Commons LSU Doctoral Dissertations Graduate School 10-22-2018 Reinforcement Learning in Robotic Task Domains with Deictic Descriptor Representation Harry Paul Moore

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Slides borrowed from Katerina Fragkiadaki Solving known MDPs: Dynamic Programming Markov Decision Process (MDP)! A Markov Decision Process

More information

UNIT 2 TOPICS IN COMPUTER SCIENCE. Emerging Technologies and Society

UNIT 2 TOPICS IN COMPUTER SCIENCE. Emerging Technologies and Society UNIT 2 TOPICS IN COMPUTER SCIENCE Emerging Technologies and Society EMERGING TECHNOLOGIES Technology has become perhaps the greatest agent of change in the modern world. While never without risk, positive

More information

Gilbert Peterson and Diane J. Cook University of Texas at Arlington Box 19015, Arlington, TX

Gilbert Peterson and Diane J. Cook University of Texas at Arlington Box 19015, Arlington, TX DFA Learning of Opponent Strategies Gilbert Peterson and Diane J. Cook University of Texas at Arlington Box 19015, Arlington, TX 76019-0015 Email: {gpeterso,cook}@cse.uta.edu Abstract This work studies

More information

Surveillance strategies for autonomous mobile robots. Nicola Basilico Department of Computer Science University of Milan

Surveillance strategies for autonomous mobile robots. Nicola Basilico Department of Computer Science University of Milan Surveillance strategies for autonomous mobile robots Nicola Basilico Department of Computer Science University of Milan Intelligence, surveillance, and reconnaissance (ISR) with autonomous UAVs ISR defines

More information

First Name: Last Name: Lab Cover Page. Teaching Assistant to whom you are submitting

First Name: Last Name: Lab Cover Page. Teaching Assistant to whom you are submitting Student Information First Name School of Computer Science Faculty of Engineering and Computer Science Last Name Student ID Number Lab Cover Page Please complete all (empty) fields: Course Name: DIGITAL

More information

A Mobile Robot Behavior Based Navigation Architecture using a Linear Graph of Passages as Landmarks for Path Definition

A Mobile Robot Behavior Based Navigation Architecture using a Linear Graph of Passages as Landmarks for Path Definition A Mobile Robot Behavior Based Navigation Architecture using a Linear Graph of Passages as Landmarks for Path Definition LUBNEN NAME MOUSSI and MARCONI KOLM MADRID DSCE FEEC UNICAMP Av Albert Einstein,

More information

How to Use the Method of Multivariate Statistical Analysis Into the Equipment State Monitoring. Chunhua Yang

How to Use the Method of Multivariate Statistical Analysis Into the Equipment State Monitoring. Chunhua Yang 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 205) How to Use the Method of Multivariate Statistical Analysis Into the Equipment State Monitoring

More information

Frugal Sensing Spectral Analysis from Power Inequalities

Frugal Sensing Spectral Analysis from Power Inequalities Frugal Sensing Spectral Analysis from Power Inequalities Nikos Sidiropoulos Joint work with Omar Mehanna IEEE SPAWC 2013 Plenary, June 17, 2013, Darmstadt, Germany Wideband Spectrum Sensing (for CR/DSM)

More information

Elements of Artificial Intelligence and Expert Systems

Elements of Artificial Intelligence and Expert Systems Elements of Artificial Intelligence and Expert Systems Master in Data Science for Economics, Business & Finance Nicola Basilico Dipartimento di Informatica Via Comelico 39/41-20135 Milano (MI) Ufficio

More information

Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks

Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks Recently, consensus based distributed estimation has attracted considerable attention from various fields to estimate deterministic

More information

POKER AGENTS LD Miller & Adam Eck April 14 & 19, 2011

POKER AGENTS LD Miller & Adam Eck April 14 & 19, 2011 POKER AGENTS LD Miller & Adam Eck April 14 & 19, 2011 Motivation Classic environment properties of MAS Stochastic behavior (agents and environment) Incomplete information Uncertainty Application Examples

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Chapter 1 Chapter 1 1 Outline What is AI? A brief history The state of the art Chapter 1 2 What is AI? Systems that think like humans Systems that think rationally Systems that

More information

Trajectory Generation for a Mobile Robot by Reinforcement Learning

Trajectory Generation for a Mobile Robot by Reinforcement Learning 1 Trajectory Generation for a Mobile Robot by Reinforcement Learning Masaki Shimizu 1, Makoto Fujita 2, and Hiroyuki Miyamoto 3 1 Kyushu Institute of Technology, Kitakyushu, Japan shimizu-masaki@edu.brain.kyutech.ac.jp

More information

Team-Triggered Coordination of Robotic Networks for Optimal Deployment

Team-Triggered Coordination of Robotic Networks for Optimal Deployment Team-Triggered Coordination of Robotic Networks for Optimal Deployment Cameron Nowzari 1, Jorge Cortés 2, and George J. Pappas 1 Electrical and Systems Engineering 1 University of Pennsylvania Mechanical

More information

An Agent-based Heterogeneous UAV Simulator Design

An Agent-based Heterogeneous UAV Simulator Design An Agent-based Heterogeneous UAV Simulator Design MARTIN LUNDELL 1, JINGPENG TANG 1, THADDEUS HOGAN 1, KENDALL NYGARD 2 1 Math, Science and Technology University of Minnesota Crookston Crookston, MN56716

More information

Confidence-Based Multi-Robot Learning from Demonstration

Confidence-Based Multi-Robot Learning from Demonstration Int J Soc Robot (2010) 2: 195 215 DOI 10.1007/s12369-010-0060-0 Confidence-Based Multi-Robot Learning from Demonstration Sonia Chernova Manuela Veloso Accepted: 5 May 2010 / Published online: 19 May 2010

More information

CS123. Programming Your Personal Robot. Part 3: Reasoning Under Uncertainty

CS123. Programming Your Personal Robot. Part 3: Reasoning Under Uncertainty CS123 Programming Your Personal Robot Part 3: Reasoning Under Uncertainty Topics For Part 3 3.1 The Robot Programming Problem What is robot programming Challenges Real World vs. Virtual World Mapping and

More information

Survivor Identification and Retrieval Robot Project Proposal

Survivor Identification and Retrieval Robot Project Proposal Survivor Identification and Retrieval Robot Project Proposal Karun Koppula Zachary Wasserman Zhijie Jin February 8, 2018 1 Introduction 1.1 Objective After the Fukushima Daiichi didaster in after a 2011

More information

Closed-Loop Transportation Simulation. Outlines

Closed-Loop Transportation Simulation. Outlines Closed-Loop Transportation Simulation Deyang Zhao Mentor: Unnati Ojha PI: Dr. Mo-Yuen Chow Aug. 4, 2010 Outlines 1 Project Backgrounds 2 Objectives 3 Hardware & Software 4 5 Conclusions 1 Project Background

More information

Booklet of teaching units

Booklet of teaching units International Master Program in Mechatronic Systems for Rehabilitation Booklet of teaching units Third semester (M2 S1) Master Sciences de l Ingénieur Université Pierre et Marie Curie Paris 6 Boite 164,

More information

CMU-Q Lecture 20:

CMU-Q Lecture 20: CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro ICE-CREAM WARS http://youtu.be/jilgxenbk_8 2 GAME THEORY Game theory is the formal study of conflict and cooperation in (rational) multi-agent

More information

Application Areas of AI Artificial intelligence is divided into different branches which are mentioned below:

Application Areas of AI   Artificial intelligence is divided into different branches which are mentioned below: Week 2 - o Expert Systems o Natural Language Processing (NLP) o Computer Vision o Speech Recognition And Generation o Robotics o Neural Network o Virtual Reality APPLICATION AREAS OF ARTIFICIAL INTELLIGENCE

More information

Reinforcement Learning for Penalty Avoiding Policy Making and its Extensions and an Application to the Othello Game

Reinforcement Learning for Penalty Avoiding Policy Making and its Extensions and an Application to the Othello Game Reinforcement Learning for Penalty Avoiding Policy Making and its Extensions and an Application to the Othello Game Kazuteru Miyazaki teru@niad.ac.jp National Institution for Academic Degrees, 3-29-1 Ootsuka

More information

ACHIEVING SEMI-AUTONOMOUS ROBOTIC BEHAVIORS USING THE SOAR COGNITIVE ARCHITECTURE

ACHIEVING SEMI-AUTONOMOUS ROBOTIC BEHAVIORS USING THE SOAR COGNITIVE ARCHITECTURE 2010 NDIA GROUND VEHICLE SYSTEMS ENGINEERING AND TECHNOLOGY SYMPOSIUM MODELING & SIMULATION, TESTING AND VALIDATION (MSTV) MINI-SYMPOSIUM AUGUST 17-19 DEARBORN, MICHIGAN ACHIEVING SEMI-AUTONOMOUS ROBOTIC

More information

Verification and Validation for Safety in Robots Kerstin Eder

Verification and Validation for Safety in Robots Kerstin Eder Verification and Validation for Safety in Robots Kerstin Eder Design Automation and Verification Trustworthy Systems Laboratory Verification and Validation for Safety in Robots, Bristol Robotics Laboratory

More information

Structure of Intelligent Agents. Examples of Agents 1. Examples of Agents 2. Intelligent Agents and their Environments. An agent:

Structure of Intelligent Agents. Examples of Agents 1. Examples of Agents 2. Intelligent Agents and their Environments. An agent: Intelligent Agents and their Environments Michael Rovatsos University of Edinburgh Structure of Intelligent Agents An agent: Perceives its environment, Through its sensors, Then achieves its goals By acting

More information

The UPennalizers RoboCup Standard Platform League Team Description Paper 2017

The UPennalizers RoboCup Standard Platform League Team Description Paper 2017 The UPennalizers RoboCup Standard Platform League Team Description Paper 2017 Yongbo Qian, Xiang Deng, Alex Baucom and Daniel D. Lee GRASP Lab, University of Pennsylvania, Philadelphia PA 19104, USA, https://www.grasp.upenn.edu/

More information

Planning in autonomous mobile robotics

Planning in autonomous mobile robotics Sistemi Intelligenti Corso di Laurea in Informatica, A.A. 2017-2018 Università degli Studi di Milano Planning in autonomous mobile robotics Nicola Basilico Dipartimento di Informatica Via Comelico 39/41-20135

More information

Real-World Reinforcement Learning for Autonomous Humanoid Robot Charging in a Home Environment

Real-World Reinforcement Learning for Autonomous Humanoid Robot Charging in a Home Environment Real-World Reinforcement Learning for Autonomous Humanoid Robot Charging in a Home Environment Nicolás Navarro, Cornelius Weber, and Stefan Wermter University of Hamburg, Department of Computer Science,

More information

Component Based Mechatronics Modelling Methodology

Component Based Mechatronics Modelling Methodology Component Based Mechatronics Modelling Methodology R.Sell, M.Tamre Department of Mechatronics, Tallinn Technical University, Tallinn, Estonia ABSTRACT There is long history of developing modelling systems

More information

Learning Attentive-Depth Switching while Interacting with an Agent

Learning Attentive-Depth Switching while Interacting with an Agent Learning Attentive-Depth Switching while Interacting with an Agent Chyon Hae Kim, Hiroshi Tsujino, and Hiroyuki Nakahara Abstract This paper addresses a learning system design for a robot based on an extended

More information

Software-Intensive Systems Producibility

Software-Intensive Systems Producibility Pittsburgh, PA 15213-3890 Software-Intensive Systems Producibility Grady Campbell Sponsored by the U.S. Department of Defense 2006 by Carnegie Mellon University SSTC 2006. - page 1 Producibility

More information

AI Agent for Ants vs. SomeBees: Final Report

AI Agent for Ants vs. SomeBees: Final Report CS 221: ARTIFICIAL INTELLIGENCE: PRINCIPLES AND TECHNIQUES 1 AI Agent for Ants vs. SomeBees: Final Report Wanyi Qian, Yundong Zhang, Xiaotong Duan Abstract This project aims to build a real-time game playing

More information

Using Reactive Deliberation for Real-Time Control of Soccer-Playing Robots

Using Reactive Deliberation for Real-Time Control of Soccer-Playing Robots Using Reactive Deliberation for Real-Time Control of Soccer-Playing Robots Yu Zhang and Alan K. Mackworth Department of Computer Science, University of British Columbia, Vancouver B.C. V6T 1Z4, Canada,

More information

Subsumption Architecture in Swarm Robotics. Cuong Nguyen Viet 16/11/2015

Subsumption Architecture in Swarm Robotics. Cuong Nguyen Viet 16/11/2015 Subsumption Architecture in Swarm Robotics Cuong Nguyen Viet 16/11/2015 1 Table of content Motivation Subsumption Architecture Background Architecture decomposition Implementation Swarm robotics Swarm

More information