Q Learning Behavior on Autonomous Navigation of Physical Robot

Similar documents
COMPACT FUZZY Q LEARNING FOR AUTONOMOUS MOBILE ROBOT NAVIGATION

APPLICATION OF FUZZY BEHAVIOR COORDINATION AND Q LEARNING IN ROBOT NAVIGATION

APPLICATION OF FUZZY BEHAVIOR COORDINATION AND Q LEARNING IN ROBOT NAVIGATION

Behaviour-Based Control. IAR Lecture 5 Barbara Webb

Designing Toys That Come Alive: Curious Robots for Creative Play

Subsumption Architecture in Swarm Robotics. Cuong Nguyen Viet 16/11/2015

Robotic teaching for Malaysian gifted enrichment program

Online Evolution for Cooperative Behavior in Group Robot Systems

Robot Architectures. Prof. Yanco , Fall 2011

Robot Architectures. Prof. Holly Yanco Spring 2014

AUTONOMOUS FIVE LEGS RESCUE ROBOT NAVIGATION IN CLUTTERED ENVIRONMENT

The Necessity of Average Rewards in Cooperative Multirobot Learning

Path Following and Obstacle Avoidance Fuzzy Controller for Mobile Indoor Robots

Evolving High-Dimensional, Adaptive Camera-Based Speed Sensors

Adaptive Neuro-Fuzzy Controler With Genetic Training For Mobile Robot Control

Closed-Loop Transportation Simulation. Outlines

Learning and Using Models of Kicking Motions for Legged Robots

COSC343: Artificial Intelligence

Automata Depository Model with Autonomous Robots

Artificial Intelligence Planning and Decision Making

Reinforcement Learning Simulations and Robotics

Welcome to. NXT Basics. Presenter: Wael Hajj Ali With assistance of: Ammar Shehadeh - Souhaib Alzanki - Samer Abuthaher

Team Description Paper

Distributed Vision System: A Perceptual Information Infrastructure for Robot Navigation

Learning Reactive Neurocontrollers using Simulated Annealing for Mobile Robots

Mobile Robot Navigation Contest for Undergraduate Design and K-12 Outreach

Swarm Intelligence W7: Application of Machine- Learning Techniques to Automatic Control Design and Optimization

Tutorial of Reinforcement: A Special Focus on Q-Learning

Keywords Multi-Agent, Distributed, Cooperation, Fuzzy, Multi-Robot, Communication Protocol. Fig. 1. Architecture of the Robots.

Traffic Control for a Swarm of Robots: Avoiding Group Conflicts

AN HYBRID LOCOMOTION SERVICE ROBOT FOR INDOOR SCENARIOS 1

Fuzzy-Heuristic Robot Navigation in a Simulated Environment

NAVIGATION OF MOBILE ROBOT USING THE PSO PARTICLE SWARM OPTIMIZATION

Key-Words: - Fuzzy Behaviour Controls, Multiple Target Tracking, Obstacle Avoidance, Ultrasonic Range Finders

Pre-Activity Quiz. 2 feet forward in a straight line? 1. What is a design challenge? 2. How do you program a robot to move

Fuzzy Logic Based Robot Navigation In Uncertain Environments By Multisensor Integration

Behavior Emergence in Autonomous Robot Control by Means of Feedforward and Recurrent Neural Networks

Cooperative Behavior Acquisition in A Multiple Mobile Robot Environment by Co-evolution

Traffic Control for a Swarm of Robots: Avoiding Target Congestion

* Intelli Robotic Wheel Chair for Specialty Operations & Physically Challenged

IMPLEMENTATION OF ROBOTIC OPERATING SYSTEM IN MOBILE ROBOTIC PLATFORM

USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES

Transactions on Information and Communications Technologies vol 6, 1994 WIT Press, ISSN

Emergent Behavior Robot

Obstacle Avoidance in Collective Robotic Search Using Particle Swarm Optimization

Creating a 3D environment map from 2D camera images in robotics

Intelligent Robotics Assignments

Humanoid Robot NAO: Developing Behaviors for Football Humanoid Robots

NCCT IEEE PROJECTS ADVANCED ROBOTICS SOLUTIONS. Latest Projects, in various Domains. Promise for the Best Projects

DEVELOPMENT OF A ROBOID COMPONENT FOR PLAYER/STAGE ROBOT SIMULATOR

A Lego-Based Soccer-Playing Robot Competition For Teaching Design

The use of programmable robots in the education of programming

Embedded Robust Control of Self-balancing Two-wheeled Robot

Multi-Robot Cooperative System For Object Detection

A LEGO Mindstorms multi-robot setup in the Automatic Control Telelab

Obstacle avoidance based on fuzzy logic method for mobile robots in Cluttered Environment

TU Graz Robotics Challenge 2017

Jamming mitigation in cognitive radio networks using a modified Q-learning algorithm

USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER

Multisensory Based Manipulation Architecture

Multi-Agent Robotics with GPS Navigation

Biologically Inspired Embodied Evolution of Survival

Dipartimento di Elettronica Informazione e Bioingegneria Robotics

Motion Control of a Three Active Wheeled Mobile Robot and Collision-Free Human Following Navigation in Outdoor Environment

CYCLIC GENETIC ALGORITHMS FOR EVOLVING MULTI-LOOP CONTROL PROGRAMS

5a. Reactive Agents. COMP3411: Artificial Intelligence. Outline. History of Reactive Agents. Reactive Agents. History of Reactive Agents

Introduction.

AGENT PLATFORM FOR ROBOT CONTROL IN REAL-TIME DYNAMIC ENVIRONMENTS. Nuno Sousa Eugénio Oliveira

An Improved Path Planning Method Based on Artificial Potential Field for a Mobile Robot

Estimation of Absolute Positioning of mobile robot using U-SAT

Development of an Experimental Testbed for Multiple Vehicles Formation Flight Control

Decision Science Letters

A Posture Control for Two Wheeled Mobile Robots

Designing of a Shooting System Using Ultrasonic Radar Sensor

Unit 1: Introduction to Autonomous Robotics

Issues in Information Systems Volume 13, Issue 2, pp , 2012

Path Planning and Obstacle Avoidance for Boe Bot Mobile Robot

Hi everyone. educational environment based on team work that nurtures creativity and innovation preparing them for a world of increasing

COS Lecture 1 Autonomous Robot Navigation

REDUCING THE STEADY-STATE ERROR BY TWO-STEP CURRENT INPUT FOR A FULL-DIGITAL PNEUMATIC MOTOR SPEED CONTROL

Fuzzy Logic Controlled Miniature LEGO Robot for Undergraduate Training System

EMERGENCE OF COMMUNICATION IN TEAMS OF EMBODIED AND SITUATED AGENTS

Modular Q-learning based multi-agent cooperation for robot soccer

Escape From ENGINEERING ISLAND KU High School Design

City Research Online. Permanent City Research Online URL:

Autonomous Stair Climbing Algorithm for a Small Four-Tracked Robot

Randomized Motion Planning for Groups of Nonholonomic Robots

A Reactive Robot Architecture with Planning on Demand

A Comparison of PSO and Reinforcement Learning for Multi-Robot Obstacle Avoidance

Hierarchical Controller for Robotic Soccer

Energy-aware Task Scheduling in Wireless Sensor Networks based on Cooperative Reinforcement Learning

Tracking and Formation Control of Leader-Follower Cooperative Mobile Robots Based on Trilateration Data

Target Tracking in Mobile Robot under Uncertain Environment using Fuzzy Logic Controller

Jane Li. Assistant Professor Mechanical Engineering Department, Robotic Engineering Program Worcester Polytechnic Institute

Proseminar Roboter und Aktivmedien. Outline of today s lecture. Acknowledgments. Educational robots achievements and challenging

MULTI-LAYERED HYBRID ARCHITECTURE TO SOLVE COMPLEX TASKS OF AN AUTONOMOUS MOBILE ROBOT

Dr. Wenjie Dong. The University of Texas Rio Grande Valley Department of Electrical Engineering (956)

AN AUTONOMOUS SIMULATION BASED SYSTEM FOR ROBOTIC SERVICES IN PARTIALLY KNOWN ENVIRONMENTS

Enhancing Embodied Evolution with Punctuated Anytime Learning

Gregory Bock, Brittany Dhall, Ryan Hendrickson, & Jared Lamkin Project Advisors: Dr. Jing Wang & Dr. In Soo Ahn Department of Electrical and Computer

Transcription:

The 8th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI 211) Nov. 23-26, 211 in Songdo ConventiA, Incheon, Korea Q Learning Behavior on Autonomous Navigation of Physical Robot Handy Wicaksono Department of Electrical Engineering, Petra Christian University, Surabaya, Indonesia (E-mail: handy@petra.ac.id) Abstract - Behavior based architecture gives robot fast and reliable action. If there are many behaviors in robot, behavior coordination is needed. Subsumption architecture is behavior coordination method that give quick and robust response. Learning mechanism improve robot s performance in handling uncertainty. Q learning is popular reinforcement learning method that has been used in robot learning because it is simple, convergent and off policy. In this paper, Q learning will be used as learning mechanism for obstacle avoidance behavior in autonomous robot navigation. Learning rate of Q learning affect robot s performance in learning phase. As the result, Q learning algorithm is successfully implemented in a physical robot with its imperfect environment. Keywords Q learning, behavior coordination, autonomous navigation, physical robot 1. Introduction Behavior based architecture is a key concept in creating fast and reliable robot. It replaces deliberative architecture that used by Nilson in Shakey robot [1]. Behavior based robot doesn t need world model to finish its task. The real environment is the only model which needed by robot. Another advantage of this architecture is all behaviors run in parallel, simultaneous, and asynchronous way [2]. In behavior based architecture, robot must have behavior coordinator. First approach suggested by Brooks [2] is Subsumption Architecture that can be classified as competitive method. In this method, there is only one behavior (that can be applied in robot) at one time. It is very simple method and it gives the fast performance result, but it has disavantage of non-smooth response and inaccuracy in robot movement. In order to anticipate many uncertain things, robot should have learning mechanism. In supervised learning, robot will need a master to teach it. On the other hand, unsupervised learning mechanism will make robot learn by itself. Reinforcement learning is an example of this method, so robot can learn online by accepting reward from its environment [3]. There are many methods to solve reinforcement learning problem. One of most popular methods is Temporal Difference Algorithm, especially Q Learning algorithm [4]. Q Learning advantages are its off-policy characteristic and simple algorithm. It is also convergent in optimal policy. But it can only be used in discrete state/action. If Q table is large enough, algorithm will spend too much time in learning process [5]. Learning algorithm usually takes more memory space on robot s controller and it also adds program complexity than non-learning one. That s why some researchers prefer use this algorithm (including Q learning) on computer simulation only [6-8]. However, its implementation on real robot is very important because there are many differences between computer simulation and real world experiment. LEGO NXT robot as low cost and popular robotics kit will used here as a replacement of somewhat expensive research robotics platform. This paper will describe about Q learning algorithm implementation on physical robot which navigates itself autonomously. Q learning will be applied on single behavior and all behaviors are coordinated by Subsumption Architecture method. This is different approach with Khirji et. al. [9] that used Q learning to coordinate some behaviors 2. Behaviors Coordination Method In behavior based robotics approach, proper method of behavior coordination is significant. The designer needs to know how robot coordinates its behaviors and take the action in the real world. There are two approaches : competitive and cooperative. In competitive method, at one time, there is only one behavior that applied in robot. The first suggestion in this method is Subsumption Architecture by Brooks [2]. This method divides behaviors to many levels, where the higher level behavior has higher priorities. So it can subsume the lower level ones. The layered control system figure is given below. Fig. 1. Layered control system [2] Robot should have these behaviors to accomplish autonomous navigation task : 1. Wandering

2. Obstacle avoidance 3. Search target 4. Stop Those behaviors must be coordinated so they can work synchronously in robot. Coordination method which is used in this research is Subsumption Architecture [2]. Figure 2. shows robot s behaviors coordination structure. Fig. 2. Subsumption Architecture for autonomous navigation robot From the figure, it can be seen that Wandering is the lowest level behavior, so if there are another active behaviors, then Wandering won t be active. Behavior with highest priority level is obstacle avoidance (OA). 3. Q Learning Reinforcement learning is a kind of unsupervised learning method which learns from agent s environment. Agent (such as: robot) will receive reward from its environment. This method is simple and effective for online and fast process in an agent (such as robot). Figure 3. shows reinforcement learning basic scheme. Simple Q value equation that used in this algoroithm is shown in Eq (1). [ r + γ max Q( s', a' ) Q( s, )] Q( s, a) Q( s, a) + α a' a (1) where : Q(s,a) : component of Q table (state, action) s : state s : next state a : action a : next action r : reward α : learning rate γ : discount factor Design of state and reward are important in Q learning algorithm. Here are states value design of robot s obstacle avoidance behavior : : if obstacle s distance is less then equal with 3 cm from robot s left and right side 1 : if obstacle s distance is less then equal with 3 cm from robot s left side and more than 3 cm from robot s right side 2 : if obstacle s distance is less then equal with 3 cm from robot s right side and more than 3 cm from robot s left side 3 : if obstacle is more than 3 cm from robot s left and right side Meanwhile rewards value design of the same behavior are : -2 : if obstacle s distance is less then equal with 2 cm from robot s left and right side -1 : if obstacle s distance is less then equal with 2 cm from robot s left side or right side 2 : if obstacle obstacle is more than 2 cm from robot s left and right side In this paper, Q learning will be applied on obstacle avoidance behavior only. Figure 5. Shows Q learning behavior implementation on robot s subsumption architecture. Fig. 3. Reinforcement learning basic scheme [3] Q learning is most popular reinforcement learning method because it is simple, convergent, and off policy. So it is suitable for real time application such as robot. Q learning algorithm is described in Fig. 4. Fig. 5. Q learning behavior on robot s subsumption architecture 4. Physical Robot Implementation Fig. 4. General flow chart of Q learning algorithm LEGO NXT Robot is a famous robotic kit for people of all ages. It is suitable for developed country like Indonesia because of its affordable price (compare than expensive robotic platform like Kephera, Pioneer, etc). Although its main target user is children and teenager, nowadays LEGO NXT robot has been used in university for advance robotic application such as envioronment mapping [1], multi robot system [11], and robot learning [12].

This paper will describe about implementation of behavior coordination and Q learning on LEGO NXT Robot. NXC (Not exatcly C), an open source C-like language will be used to program the robot as substitute of NXT-G (original graphical programming tool from LEGO NXT). Its text based programming style is suitable for advance algorithm like Q learning. There are some NXC programming techniques on implementation of robot s Q learning behavior. Q learning algorithm needs 2 dimensional array to build Q table consist of state action. So enhanced NBC/NXC firmware that support multi dimensional array will be used here. It is also important to use float data type on α (learning rate) and γ (discount rate), so their value can be varied between and 1. Experiment data will be saved on NXT brick as text file and it will be transferred to PC after all experiments are finished. Robot used in this research has two ultrasonic sensors (to detect the obstacles), two light sensors (to detect the target) and two servo motors. NXT Brick behaves as brain or controller for this robot. Figure 6. shows the robot. different home positions, robot should avoid the obstacles and find the target. The result is shown on Fig. 8. Fig. 6. LEGO NXT Robot for autonomous navigation task Arena that will be used in experiments have 3 different home positions and 1 target location (by using candle as light source). The general arena is shown in Fig. 7. Fig. 7. The arena (c) Fig. 8. Robot s trajectory from home position 1, 2 and 3 From Fig. 8. it is obvious that robot with subsumption architecture can avoid the obstacle well. Robot also succeed to find the light source as target from three different home positions. 5.2 Experiment on Q learning behavior with fixed learning rate As seen on Fig. 5., Q learning only applied in obstacle avoidance behavior. In order to watch robot s performance, a simple obstacle structure is prepared. Q learning algorithm applied on robot use α =.7 and γ =.7. It utilize greedy method for exploration exploitation policy. Robot s performance on the beginning and the end of trial is shown on Fig. 9 and Fig. 1. Beside this arena, some simple structure of some obstacles and target will also be used in order to know characteristics of learning mechanism clearly. 5. Result and Discussion 5.1 Experiment on robot s behaviors coordination First experiment that will be done is to test robot s ability in solving autonomous navigation task. Given three Fig. 9. Robot s performance at the beginning and the end of trial 1

Fig. 1. Robot s performance at the beginning and the end of trial 2 It can be seen from Fig. 9. and Fig. 1. that robot s learning result can be different between one and another experiment. The first robot tend to go to right direction and the second one choose left direction. Both of them are succeed to avoid the obstacle. This can be happened because Q learning give intelligence on each robot to decide which is the best decision (action) for robot itself. Robot s goal in Q learning point of view is collecting positive rewards as many as possible. Graphic of rewards average every ten iterations and total rewards during the experiment is shown on Fig. 11 and Fig. 12. (c) (e) (d) (f) Avaerage Reward Fig. 11. Average reward every tenth iteration Total Reward 3 2 1-1 -2 15 1 5-5 1 3 5 7 9 11 13 Tenth Iteration 1 18 35 52 69 86 1312 Iteration Fig. 12. Total rewards of Q learning obstacle avoidance behavior. From Fig. 11., it can be seen that average reward that received by robot is getting bigger over the time. In the learning phase robot still receive some negative rewards, but after 5 steps it start to collect positive rewards. Figure 12. Shows total (accumulated) rewards collected by robot is getting larger over the time. So it can be concluded that robot can maximize its reward after learning for some time. 5.3 Experiment on Q learning behavior with varying learning rate In this experiment, different learning rate (α) will be given to the robot s Q learning algorithm. Its values are :.25,.5,.75 and 1. The result shown in Fig. 13. (g) (h) Fig. 13. Robot s movement with different learning rate values From Fig. 13. and, it can be seen that robot with.25 learning rate can not learn to avoid obstacles because its value is too small. While robot with.5 learning rate sometimes succeed to learn, but it s not happened in every experiment (see Fig. 13. (c) (d)). But robot with.75 and 1 learning rate can learn obstacle avoidance task well everytime (see Fig. 13. (e) (h)). Before robot learns, it will bump to the obstacles sometime because it still doesn t understand that it is forbidden. But after it has learned, it can avoid obstacle (without bumping) successfully. The difference of robot with.5,.75 and 1 learning rate is time needed to learn and finish obstacle avoidance task. Here is the comparison table of them. Table 1 Comparison of robot with different learning rate. α Before learning (seconds) After learning (seconds).5 15 7.75 9 5 1 7 7 From Table 1, it can be seen that the increasing of learning rate is proportional with decreasing time needed by robot to solve the task. In this case, robot with α = 1 is the fastest. But in after-learning phase, those robot is not always be the fastest one too.

Beside the time needed to learn and finish the task, rewards that receive by robot with different learning rate is also different. A graph of rewards collected by these robots are shown on Fig. 14. Total Reward 8 6 4 2-2 -4 1 4 7 1 13 16 19 22 25 28 31 34 37 4 43 Iterations Alfa =.25 Alfa =.5 Alfa =.75 Alfa = 1 Fig. 14. Total rewards collected by robot s obstacle avoidance behavior. From figure above, it can be stated that robot with bigger learning rate will collect the bigger amount of rewards too. It means that robot will learn the task faster than the others. So it can be concluded that for simple obstacle avoidance task, the best learning rate (α) that can be given by robot is 1. But it does not always true for every tasks. In some tasks, when a robot learn too fast, it tend to make the robot fall in local minima. This Q learning behavior has been used in physical robot that solve autonomous navigation task, and it succeed to avoid the obstacle (after some learning time) and reach the target (by its combination with search target behavior). Some problems dealing with imperfect environment should be solved to get the best result. 6. Conclusion It can be concluded from the experiment results that : 1. Physical robot using subsumption architecture as behavior coordination method can finish autonomous navigation task well. 2. Physical robot using Q learning mechanisme can learn and understand obstacle avoidance task well, this is remarked by its success in collecting positive rewards continually. 3. Learning rate of Q learning mechanism affect the robot s learning performance. When learning rate getting bigger, the learning phase getting faster too. But in some tasks, it can drive the robot to fall in local minima phase. 4. Q learning experiments in physical robot give clearer understanding of Q learning algorithm itself, although there is disturbance from the imperfect environment. Handry Khoswanto for valuable suggestion on LEGO NXT Robot implementation. References [1] N. J. Nillson, Shakey the Robot, Technical Note 323, AI Center, SRI International, 1984 [2] R. Brooks, A Robust Layered Control System For a Mobile Robot, IEEE Journal of Robotics and Automation, Vol. 2, No. 1, pp. 14 23, 1986 [3] R.S. Sutton, and A.G. Barto, Reinforcement Learning, an Introduction, MIT Press, Massachusets, 1998 [4] C. Watkins and P. Dayan, Q-learning, Technical Note, Machine Learning, Vol 8, 1992, pp.279-292 [5] M.C. Perez, A Proposal of Behavior Based Control Architecture with Reinforcement Learning for an Autonomous Underwater Robot, Ph.D. Dissertation, University of Girona, Girona, 23 [6] R. Hafner, and M. Riedmiller, Reinforcement Learning on a Omnidirectional Mobile Robot, Proceedings of 23 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol. 1, Las Vegas, 23, pp. 418 423. [7] H. Wicaksono, Prihastono, K. Anam, S. Kuswadi, R. Effendie, A. Jazidie, I. A. Sulistijono, M. Sampei, Modified Fuzzy Behavior Coordination for Autonomous Mobile Robot Navigation System, Proc. of ICCAS-SICE, 29 [8] K. Anam, S. Kuswadi, Behavior Based Control and Fuzzy Q-Learning For Autonomous Mobile Robot Navigation, Proceeding of The 4th International Conference on Information & Communication Technology and Systems (ICTS), 28 [9] L. Khriji, F. Touati, K. Benhmed, A.A. Yahmedi, Q-Learning Based Mobile robot behaviors Coordination, Proc. of International Renewable Energy Congress (IREC), 21 [1] G. Oliveira, R. Silva, T. Lira, L. P. Reis, Environment Mapping using the Lego Mindstorms NXT and lejos NXJ, EPIA, 29 [11] D. Benedettelli, N. Ceccarelli, A. Garulli, A. Giannitrapani, Experimental validation of collective circular motion for nonholonomic multi-vehicle systems, Robotics and Autonomous Systems, Vol. 58, No. 8, pp. 128-136, 21 [12] B. R. Leffler, C. R. Mansley, M. L. Littman, Efficient Learning of Dynamics Models using Terrain Classification, Proceedings of the International Workshop on Evolutionary and Reinforcement Learning for Autonomous Robot Systems, 28 Acknowledgement This work is being supported by DP2M Directorate General of Higher Education (Indonesia) through Young Lecturer Research Grant with contract number 26/SP2H-PDM/OO7/KL.1/II/21. Author also thanks