COOPERATIVE STRATEGY BASED ON ADAPTIVE Q- LEARNING FOR ROBOT SOCCER SYSTEMS

Similar documents
Strategy for Collaboration in Robot Soccer

Hierarchical Controller for Robotic Soccer

Swarm AI: A Solution to Soccer

USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER

RoboCup. Presented by Shane Murphy April 24, 2003

TUD Poker Challenge Reinforcement Learning with Imperfect Information

A GAME THEORETIC MODEL OF COOPERATION AND NON-COOPERATION FOR SOCCER PLAYING ROBOTS. M. BaderElDen, E. Badreddin, Y. Kotb, and J.

Fuzzy Logic for Behaviour Co-ordination and Multi-Agent Formation in RoboCup

COMPACT FUZZY Q LEARNING FOR AUTONOMOUS MOBILE ROBOT NAVIGATION

Cooperative Behavior Acquisition in A Multiple Mobile Robot Environment by Co-evolution

Dealing with parameterized actions in behavior testing of commercial computer games

Team Edinferno Description Paper for RoboCup 2011 SPL

Adjustable Group Behavior of Agents in Action-based Games

CAMBADA 2015: Team Description Paper

CHAPTER 6 NEURO-FUZZY CONTROL OF TWO-STAGE KY BOOST CONVERTER

National University of Singapore

Implementing a Fuzzy Logic Control of a Shower

Modular Q-learning based multi-agent cooperation for robot soccer

Neuro-Fuzzy and Soft Computing: Fuzzy Sets. Chapter 1 of Neuro-Fuzzy and Soft Computing by Jang, Sun and Mizutani

Multi-Agent Control Structure for a Vision Based Robot Soccer System

NuBot Team Description Paper 2008

SPQR RoboCup 2016 Standard Platform League Qualification Report

Behavior generation for a mobile robot based on the adaptive fitness function

Sonar Behavior-Based Fuzzy Control for a Mobile Robot

A Robotic Simulator Tool for Mobile Robots

Learning and Using Models of Kicking Motions for Legged Robots

Gilbert Peterson and Diane J. Cook University of Texas at Arlington Box 19015, Arlington, TX

Behaviour Patterns Evolution on Individual and Group Level. Stanislav Slušný, Roman Neruda, Petra Vidnerová. CIMMACS 07, December 14, Tenerife

Chapter 1: Introduction to Neuro-Fuzzy (NF) and Soft Computing (SC)

2 Our Hardware Architecture

SPQR RoboCup 2014 Standard Platform League Team Description Paper

CPS331 Lecture: Agents and Robots last revised April 27, 2012

Attention! Choking hazard! Small pieces, not for children under three years old. Figure 01 - Set Up for Kick Off. corner arc. corner square.

CS 354R: Computer Game Technology

Multi-Robot Team Response to a Multi-Robot Opponent Team

Trajectory Generation for a Mobile Robot by Reinforcement Learning

Test Plan. Robot Soccer. ECEn Senior Project. Real Madrid. Daniel Gardner Warren Kemmerer Brandon Williams TJ Schramm Steven Deshazer

UChile Team Research Report 2009

Biologically Inspired Embodied Evolution of Survival

Reinforcement Learning Simulations and Robotics

Multi-Platform Soccer Robot Development System

Design of an Action Select Mechanism for Soccer Robot Systems Using Artificial Immune Network

Game Design Verification using Reinforcement Learning

Overview Agents, environments, typical components

Autonomous Robot Soccer Teams

Conflict Management in Multiagent Robotic System: FSM and Fuzzy Logic Approach

Design a Modular Architecture for Autonomous Soccer Robot Based on Omnidirectional Mobility with Distributed Behavior Control

Hierarchical Case-Based Reasoning Behavior Control for Humanoid Robot

CAMBADA 2014: Team Description Paper

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Optic Flow Based Skill Learning for A Humanoid to Trap, Approach to, and Pass a Ball

Online Evolution for Cooperative Behavior in Group Robot Systems

Action-Based Sensor Space Categorization for Robot Learning

Obstacle avoidance based on fuzzy logic method for mobile robots in Cluttered Environment

NAVIGATION OF MOBILE ROBOT USING THE PSO PARTICLE SWARM OPTIMIZATION

CMDragons: Dynamic Passing and Strategy on a Champion Robot Soccer Team

FU-Fighters. The Soccer Robots of Freie Universität Berlin. Why RoboCup? What is RoboCup?

Robo-Erectus Jr-2013 KidSize Team Description Paper.

Rapid Control Prototyping for Robot Soccer

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function

Adaptive Neuro-Fuzzy Controler With Genetic Training For Mobile Robot Control

CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9

Principles of Computer Game Design and Implementation. Lecture 20

Multi-Robot Coordination. Chapter 11

Artificial Intelligence for Games

Design of Fuzzy Adaptive Resonance Theory Structures with VLSI: A Design Approach

Multiple Agents. Why can t we all just get along? (Rodney King)

A. Rules of blackjack, representations, and playing blackjack

Atif I. Chaudhry Prof. Raffaello D Andrea Prof. Mark Campbell

Cognitive Robotics. Behavior Control. Hans-Dieter Burkhard June 2014

HfutEngine3D Soccer Simulation Team Description Paper 2012

Humanoid Robot NAO: Developing Behaviors for Football Humanoid Robots

ROBOT SOCCER STRATEGY ADAPTATION

Team KMUTT: Team Description Paper

Acquiring Mobile Robot Behaviors by Learning Trajectory Velocities

Task Allocation: Role Assignment. Dr. Daisy Tang

Application of Soft Computing Techniques in Water Resources Engineering

JavaSoccer. Tucker Balch. Mobile Robot Laboratory College of Computing Georgia Institute of Technology Atlanta, Georgia USA

A Comparative Study on different AI Techniques towards Performance Evaluation in RRM(Radar Resource Management)

The Autonomous Performance Improvement of Mobile Robot using Type-2 Fuzzy Self-Tuning PID Controller

ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS

CPS331 Lecture: Agents and Robots last revised November 18, 2016

Implicit Fitness Functions for Evolving a Drawing Robot

CMDragons 2009 Team Description

GA-based Learning in Behaviour Based Robotics

The Dutch AIBO Team 2004

CS295-1 Final Project : AIBO

Learning and Using Models of Kicking Motions for Legged Robots

Robocup Electrical Team 2006 Description Paper

An Artificially Intelligent Ludo Player

CS510 \ Lecture Ariel Stolerman

The UT Austin Villa 3D Simulation Soccer Team 2008

APPLICATION OF FUZZY BEHAVIOR COORDINATION AND Q LEARNING IN ROBOT NAVIGATION

Improving the Kicking Accuracy in a Soccer Robot

Learning Reactive Neurocontrollers using Simulated Annealing for Mobile Robots

POWER TRANSFORMER PROTECTION USING ANN, FUZZY SYSTEM AND CLARKE S TRANSFORM

Intelligent Humanoid Robot

The Behavior Evolving Model and Application of Virtual Robots

Transcription:

COOPERATIVE STRATEGY BASED ON ADAPTIVE Q- LEARNING FOR ROBOT SOCCER SYSTEMS Soft Computing Alfonso Martínez del Hoyo Canterla 1

Table of contents 1. Introduction... 3 2. Cooperative strategy design... 4 2.1 Strategy selection design... 5 2.2 Role assignment... 7 2.3 Behaviors design... 9 3. Simulation... 12 4. Conclusion... 14 2

1. Introduction The objective of the paper is to develop a self-learning cooperative strategy for robot soccer systems. The robots can learn from successes or failures to improve the performance gradually. A robot soccer game is a proper platform to implement a multiagent system (MAS). Many methods are being proposed about cooperation in MAS: genetic algorithms, neural networks, reinforcement learning In this case they use reinforcement learning: the agent selects an action that gives the highest reward (maximizes a numerical signal) It is implemented by Q- learning, a temporal-difference (TD) method. Simplest TD update method: Q-learning algorithm: The learning in the system is implemented in the strategy selection and in the sidekick behavior. 3

2. Cooperative strategy design There are 3 aspects to design the strategy: Strategy selection Role assignment Behaviors of all roles individually First the strategy selection observes the state from the environment, after some time it observes the total reward given to see if the decision is good. Then the role arbiter chooses an attacker, defenders and sidekicks. Finally each robot executes their individual tasks, and a reward will be individually given to them according to their performances. The complete cooperative strategy architecture: 4

2.1 Strategy selection design The strategy design selects the number of robots in each position. It uses the following information: the current state of the environment and the reward from past actions. The environment information: we make it fuzzy with the following membership functions: The dimension of the inputs is 3x4, so we need 12 inference rules: We obtain a fuzzy value for the state (S) using Mamdani s minimum implication rule 5

The reward: it is calculated as follows:! Given the state and the reward we select one of the 3 possible actions: 6

2.2 Role assignment The main purpose is to find a proper robot to be an attacker, and others to be defenders or sidekicks. Find the attacker: Select the robot closest to the ball facing the opponent goal: 7

If there are no robot behind the ball, select the suitable robot according to the distance to the ball, the direction of the velocity of the ball and the possible obstacles (select the robot that maximizes Attacker_value): Find the defenders: Select the robots closer to our goal. Find the sidekicks: Select the remaining robots. 8

2.3 Behaviors design Common behavior (Obstacle-avoidance): If a robot wants to go to the ball, all the robots are positrons and the ball is a negatron. If the robot just wants to go to a position, the ball is also a positron, and the position is negatron. There is an avoidance range D. 9

Attacker behavior: If the goal is close to the ball, the robot tries to shoot, other wise it passes the ball. The attacker must be behind the ball, so if it cannot kick the ball directly, it will use the obstacle avoidance behavior. Defender behavior: The number of defenders varies from 1 to 3. They are assigned to different positions (there are 2 zones). The defender robot tries to block the ball when the opponent shoots, so it finds a suitable location according to the velocity of the ball. Sidekick behavior: It has learning ability. There can be 1 to 3 sidekicks. The sidekicks objective is to find good positions. States classification: the robot takes in consideration the angle of the attacker, the angle of the closest opponent to the ball and whether the attacker is closer to the ball than the closest opponent. Actions: the robot chooses a position, selecting the best distance and angle from a predefined group of possible positions with respect to the ball. Rewards: the sidekicks objective is to find good positions so that they can become an attacker or to prevent the ball from being closer to the goal. The reward is positive if they change often to attacker and negative if the ball is close to the goal. It is obtained with a fuzzy system: The procedure to find the final consequence is the same than before, but now the consequence in the rules is crisp so we directly obtain a crisp value when we compute the maximum. 10

There is an extra reward of +25 if we kick the ball to the opponent goal, and -25 if the opponent does so.! The final reward is the sum of the fuzzy and the extra rewards. 11

3. Simulation The robot team played 100 matches with a team designed with another nonlearning strategy. Everything was simulated in a computer. Goals from our team. Goals from the opponent. 12

Difference of goals. Summary of the results: 13

4. Conclusion The main objective of the fuzzy is to evaluate the rewards and the states. It gives flexibility: We can change the membership functions to modify the way the robots cooperate and coordinate with each other without big modifications in the strategy architecture. The learning part is carried by the Q-learning algorithm. The cooperative strategy is effective: there is a tendency on rise to get better results. 14