Fuzzy Inference Model for Learning from Experiences and Is Applicaion o Robo Navigaion Manabu Gouko, Yoshihiro Sugaya and Hiroomo Aso Deparmen of Elecrical and Communicaion Engineering, Graduae School of Engineering, Tohoku Universiy Aoba 6-6-05, Aramaki, Aoba-ku, Sendai, 980-8579, Japan. gouko@aso.ecei.ohoku.ac.jp Absrac A fuzzy inference model for learning from experiences (FILE) is proposed. The model can learn from experience daa obained by rial-and-error of a ask and i can sably learn from boh experiences of success and failure of a rial. The learning of he model is execued afer each of rial of he ask. Hence, i is expeced ha he achievemen rae increases wih repeiion of he rials, and ha he model adaps o change of environmen. In his paper, we confirm performance of he model by applying he model o a robo navigaion ask simulaion and invesigae he knowledge acquired by he learning. 1. Inroducion Over he pas few decades, a considerable number of sudies have been conduced on he inelligence sysem as ypified by a robo. In recen years, he concern wih he sysem ha acquires knowledge by learning has been growing. In such a research, he sysem acquires knowledge based on ineracion in he environmen. If a designer easily inerpres he knowledge which he sysem acquired, he or she can make use of he informaion o design he sysem, which faciliaes he consrucion of a more flexible sysem. Addiionally if prior knowledge ha he designer has is fed o he sysem, he sysem can learn more effecively using i as a bias. Tha is o say, i brings many advanages ha humans inerpre he knowledge of he sysem. Fuzzy inference model can use knowledge which is easily inerpreed by humans. The model can conduc advanced inference like humans and has been applied o many inelligen sysems. In he model, he knowledge is described in if-hen rule form. Recen sudies on learning fuzzy inference model can be classified ino hree main groups according o learning mehod. In he firs group, models learn by using supervised learning mehod[1, 6, 7]. This is an efficien mehod when inpu-oupu raining daa (eacher daa) are available, bu i is difficul o deermine eacher daa in a changing environmen. Moreover, i canno learn from an evaluaion value which means he success or failure of a rial of a ask; noe ha he evaluaion is no eacher daa. In he second group, models learn by using geneic algorihm[2, 4]. I is possible o learn from he evaluaion value, bu he mehod requires much calculaion for learning. Therefore, i is difficul for he model o adap o environmen changes. In he hird group, models learn by using reinforcemen learning (RL)[5, 8]. RL is on-line learning hrough ineracions wih a dynamic environmen and i is possible o learn from an evaluaion value (reward). Many convenional models using RL learn he opimum behavior by a searching he environmen, bu i requires large number of rialsand-errors[5]. On he oher hand, here are some models o learn based on experiences[8]. The knowledge learned by hese models is no always opimum, bu he learning requires relaively small number of rials-and-errors. Several sudies have been made on he fuzzy inference model ha learns from experiences, however, many convenional models which learn from experiences learn only from experience of eiher success or failure. Addiionally lile aenion has been given o learning in a changing environmen and invesigaion ino he knowledge acquired by learning. We focus on a fuzzy inference model for learning from experiences (FILE) which is proposed by he auhors[3]. FILE can sably learn from boh experiences of success and failure. In he model, he learning execues afer each end of a rial. Hence, i is expeced ha an achievemen rae of a ask increases wih repeiion of he rials. In his paper, we confirm performance of he model by applying he model o a robo navigaion ask simulaion, and we invesigae change of he knowledge caused by an environmenal change.
Figure 2. Membership funcion. Figure 1. FILE. (a) Trial mode. (b) Learning mode. 2. Proposed model : FILE Fig.1 shows he srucure of FILE. FILE consiss of a fuzzy inference uni and a buffer. FILE ries o achieve a ask by using own knowledge. One rial is defined as a period from he sar of a ask unil he end. FILE reasons and decides acions several imes during a rial (Fig.1(a)). For example a sensor informaion as an inpu is given o he fuzzy inference uni, and he uni decides an oupu. Such inpu/oupu (I/O) daa by he fuzzy inference uni are sored in he buffer during a rial. The sored daa, which mean experiences, are exploied in learning mode wih he evaluaion value (Fig.1(b)). Evaluaion value E, which is an evaluaion for he rial, is fed o FILE a he end of rial. FILE updaes own knowledge by learning. As a resul, he model acquires he knowledge which is suiable for he environmen. 2.1. Trial mode In he rial mode, FILE ries o achieve a ask by using own knowledge. The inference is execued a he fuzzy inference uni and ha resul is oupued. The uni has n fuzzy rules described in if-hen form. Rule i represens ih fuzzy rule is wrien as follows: Rule i : if x 1 is A i1 and x j is A ij and and x m is A im hen y = b i (i =1, 2,,n),(1) where x 1,x 2,,x m are inpu variables and y is an oupu variable. A i1,a i2,,a im are linguisic labels which represen fuzzy ses and b i is a consan oupu value. y is a resul of inference and calculaed by: µ i = m j=1 A ij(x j ) (i =1, 2,,n), (2) y = n i=1 µ ib i / n i=1 µ i, (3) where µ i is a firing srengh of he ih rule and A ij (x j ) is a membership funcion of x j (fig.2) and calculaed by: ( ) A ij (x j ) = exp (x j c ij ) 2 /σijk 2, (i =1, 2,,n,j =1, 2,,m), (4) where c ij and σ ijk are parameers o give cener and widh, respecively. The membership funcion used in he model has asymmeric widhs σ ijl and σ ijr in he lef and righ. k is an index o represen which side of he funcion. If c ij x j, we se k = L, oherwise we se k = R. I/O daa (x 1,x 2,,x m and y ) of he fuzzy inference uni are sored in he buffer during a rial, which are called learning daa. 2.2. Learning mode In he learning mode, FILE learns from he learning daa and he evaluaion value (Fig.1(b)). Each parameer of he membership funcion is updaed by learning. The evaluaion value E( 1 E 1) has a posiive value when a rial succeeded. When a rial failed, E has a negaive value. When E is posiive, he model learns o reinforce I/O relaionships of he learning daa. When E is negaive, he model learns he repulsive relaionships. The learning process is given below. (1) A se of learning daa (x l,y l )=(x l 1,xl 2,,xl m,yl ) is picked from he buffer. (2) µ l i which is he firing srengh of ih rule for xl is calculaed from eq.(2) and i s is calculaed by i s = augmax i µ l i. (3) c isj and b is are updaed as follows: c new i sj = c old i sj + αea isj(x l j)(x l j c old i sj ) if E 0 (j =1, 2,,m), (5) b new i s = b old i s where α and β are learning raes. + βµ l i s E(y l b old i s ) if E 0, (6)
(4) The widhs of he membership funcions are updaed as follows: σijp old γ 1EA isj(x l j )µl i if E 0, σ new ijp = σijp old γ 2E(1 A isj(x l j ))(1 A ij(x l j )) if E<0, (i =1, 2,...,n, j =1, 2,...,m), (7) where γ 1,γ 2 are learning raes. The index p is deermined as he following: L if c isj <x l j c ij & i i s p = or x l j c ij c isj & i i s, R if c ij x l j c i sj & i i s φ or c isj c ij x l j & i i s, oherwise. (8) If p = φ, neiher widh is updaed. The widh of he rule excep for he i s h rule is decreased corresponding o µ l i, when E has a posiive value. On he whole, he firing srengh µ is is increased. On he oher hand, he firing srengh µ is is decreased when E has a negaive value. (5) Afer he above operaion is compleed, he daa used for he learning is deleed from he buffer, and hen he procedure (1) - (4) is repeaed. The learning mode is finished when here is no daa in he buffer. From he viewpoin of he inerpreing of he rule by humans, i is more desirable ha he model has few rules. However, learning ofen becomes unsable by an influence of dispersion of he learning daa when a model has a few rules. FILE can execue sably learning even if here are few rules[3]. In he eq.(5), A isj(x l j ) prevens he learning from becoming unsable due o he dispersion of he learning daa. A learning daa which is apar from he cener of membership funcion A isj(x l j ) exers small influence on he learning, hus he learning becomes sable. In he eq.(7), he role of A isj(x l j ), 1 A i sj(x l j ) and 1 A ij(x l j ) is sabilizaion of he learning, as we said earlier[3]. Naurally, he model has he limis of he adapabiliy by using only a few rules wihou addiion of he new rules. Therefore, his model should be improved in order ha i can add a rule, bu in his paper, we are no concerned wih ha. 3. Experimenal resuls and discussion We apply FILE o a robo navigaion ask simulaion[4] and confirm he performance of he model. Addiionally, we invesigae change of he knowledge caused by an environmenal change. The robo navigaion ask aims a moving Figure 3. (a) Top view of a mobile robo. (b) Inpu variables. x 1 is he angle beween he raveling direcion and he wall. x 2, x 3 and x 4 are he disances beween he robo and he wall, respecively. of a mobile robo from sar o goal wihou any collisions wih a wall. Fig.3(a) shows he op view of he robo. The diameer of he robo is 20cm. The robo has a sensor which can deec obsacles. The sensing area is ±30 degrees o he raveling direcion and he maximum deph of he sensing area is 200cm. The robo moves based on he informaion which is observed from he environmen. If here is an obsacle in he area, he model execues he inference and decides he raveling direcion. When here is no obsacle in he area, he robo goes sraigh wihou he inference. One sep is defined as ha he robo moves by an observaion. The robo moves forward L[cm] every sep. Fig.3(b) shows inpu variables, x 1 is he angle beween he raveling direcion and he neares deeced wall. x 2 is he shores disance beween he robo and he deeced wall. x 3 and x 4 are he disances beween he robo and he walls deeced a he righ and lef edge of he sensing area. The fuzzy inference uni infers a seering angle from he inpu variables and pus ou i. The maximum seering angle is ±10 degrees. In he fuzzy inference uni, each inpu variable and oupu value are normalized. The experimen is carried ou by using he course as shown in fig.4(a). The robo moves o he goal from he sar by repeaing he inference. One rial is defined as ha he robo reaches he goal or collides wih he wall afer sar. In fig.4(a), a broken line represens he range of sar posiions of he robo. The sar posiion is chosen a random wihin he range a every rial. E is fed o he model a he end of every rial. When he robo reaches he goal, we se E =1. When he robo collides wih he wall, we se E = 1. We confirm performance of learning by using he same navigaion ask wihou learning (es ask). In he experimen, whenever a learning mode finished, he rial of he es ask is execued 100 imes. When he rial of he es ask finished, he robo sars a new rial for learning mode.
100 ) ( s e 90 r a n e 80 m e v ie h 70 c A 0 2 inpus (change) 4 inpus (change) 2 inpus (no change) 20 40 60 80 100 Number of rials Figure 4. (a) Course of robo navigaion. (b) Membership funcions of x 1 of each rule which are given as he prior knowledge. Figure 5. Relaion beween he number of rials and average of achievemen raes. The achievemen rae of he es ask is expeced o increase as learning ime increases. In he experimen, parameers are α = 0.004,β = 0.0001,γ 1 = 110,γ 2 =0.004. We give wo simple rules as a prior knowledge o he model. When here is he neares wall in he lef of he robo (x 1 < 0.5), he robo urns he seering wheel o he righ by Rule 1. When here is he neares wall in he righ of he robo (x 1 > 0.5), he robo urns he seering wheel o he lef by Rule 2. Fig.4(b) shows he membership funcions of x 1 of each rule which are given as he prior knowledge. FILE canno execue he learning, if a designer canno give any prior knowledge for i. We assume ha FILE is used under he environmen where a designer can give i some knowledge. In boh rules, he same membership funcion was given for x 2, x 3 and x 4. The cener and widh of he membership funcion of x 2 are 0.7 and 0.5. The cener and widh of he membership funcions of x 3 and x 4 are boh 0.5. 3.1. Change of he speed In his experimen, he speed of he robo is changed from L =10o 25. We se L =10unil he 30h rial, and we se L =25aferwards. The experimen is carried ou by using a 2-inpu model and a 4-inpu model. The 2- inpu model infers only from inpu variables x 1,x 2, and he 4-inpu model infers from inpu variables x 1,x 2,x 3,x 4. Fig.5 is he resul of he experimen, and indicaes he relaion beween he number of rials and average of achievemen raes. A solid line indicaes he 2-inpu model, and broken line indicaes he 4-inpu model. A chain dash line indicaes he case using he 2-inpu model alhough he speed is fixed o 25 from he beginning of he ask. The achievemen raes of 2-inpu and 4-inpu models increase o 95.4% and 95.8% unil speed is changed. Each rae in he 31s rial drops o 90.6% and 89.7%, however, unil he 100h rial, each rae is increased o 93.1% and 94.1% by he learning afer he change of he speed. The acquired knowledge is invesigaed by observing he membership funcion (fig.6). Firsly, we invesigae he change of membership funcion of x 1 (fig.6(a)). Before he learning ( =0), he cross poin of he funcions is 0.5. In he 30h and 100h rial ( =30, 100), he cross poin moves o 0.39 and 0.37, respecively. Even if x 1 is a lile smaller han 0.5, he firing srengh of Rule 2 is higher han Rule 1, herefore he robo seers o he lef. This means ha he knowledge ha ends o urn o he lef was acquired by repeiion of he rial and he learning. Such endency becomes sronger as he speed increases. I was found from he resul ha he robo adaps o he increase of he speed by changing own knowledge. We also invesigae he change of he membership funcions of x 3 and x 4 of he 4-inpu model. Fig.6(b) shows he membership funcions of x 3 and x 4 in Rule 2 a he 100h rial. Rule 2 means urning he seering wheel o he lef. The model learned he knowledge ha he robo urns o he lef, if x 4 is larger han x 3. This knowledge is suiable, because his relaion beween x 3 and x 4 is observed when he wall is on he righ of he robo. As a resul, FILE learned he suiable membership funcion of x 3 and x 4 in Rule 2. Furhermore, we confirm ha he knowledge which uses x 3 and x 4 changes o he effecive knowledge in each environmen by learning. Fig.7 shows he rajecories of he robo of 4-inpu model in he es ask. These rajecories were obained by inference ha used only x 3 and x 4. The robo sars from he same posiion and direcion in each rajecory. In fig.7, he small circle represens he robo. Before he learning ( =0) he robo goes sraigh on he wall and collides. Afer 30 rials ( =30), he robo can reach he goal, however, he robo becomes unable o reach he goal again owing o increase in speed ( =31). Finally he robo reaches he goal by he learning in he changed environmen ( = 100). The resul shows ha he knowledge
100 ) ( s e 80 r a n e 60 m e v ie h c 40 A 2 inpus 4 inpus Figure 6. (a) Membership funcions of x 1 of each rule. (b) Membership funcions of x 3 and x 4 in Rule 2. 0 20 40 60 80 100 Number of rials Figure 8. Relaion beween he number of rials and average of achievemen raes. Figure 7. Trajecories of he robo. which uses x 3 and x 4 changes o he effecive knowledge for each environmen by learning. 3.2. Change of he course By anoher experimen, he course is changed during he experimen. Unil he 30h rial, lef urn course as shown in fig.4(a) is used, and righ urn course is used afer ha. Only a direcion of a corner varies and widh and lengh are he same. The speed of he robo is consan (L = 10). Fig.8 is he resul of he experimen. A solid line indicaes he case using 2-inpu model, and broken line indicaes he case using 4-inpu model. The achievemen raes of 2-inpu and 4-inpu models increase o 93.6% and 94.3% unil he course is changed. Each rae in he 31s rial drops o 34.6% and 31.3%. Unil he 100h rial, each rae is increased o 91.6% and 89.3% by he learning afer he change of he course. Fig.9(a) shows he membership funcions of x 1. In he 30h and 100h rial ( =30, 100), he cross poin is 0.41 and 0.54. I can be said ha he robo acquired he knowledge ha ends o urn o he direcion of each course. Fig.9(b) shows he membership funcions of x 3 and x 4 in Rule 1 a he 100h rial. In his figure, relaion beween x 3 and x 4 is opposie o fig.6(b). Rule 1 means urning he seering wheel o he righ. I can be said ha he model learned he suiable membership funcions of x 3 and x 4 in Rule 1. Figure 9. (a) Membership funcions of x 1 of each rule. (b) Membership funcions of x 3 and x 4 in Rule 1. The difference in he cener of each membership funcion in fig.9(b) is small in comparison wih he previous experimen(fig.6(b)). This difference resuls from he difference in he speed of he robo. Fig.10 shows behavior of he robo which avoids colliding. In Fig.10(a) and (b), he sar posiions of he robos are he same. The case ha he robo moves forward a low speed is shown in fig.10(a). The robo can avoid colliding wihou approaching he wall very much. The case ha he robo moves forward a high speed is shown in fig.10(b). The robo goes o he wall closer in comparison wih fig.10(a), because he speed of he robo is fas. Fig.10 shows ha hough he robos urn in he same way o he lef, observed x 3 is smaller when he speed is faser. I can be said ha he difference of he cener of he membership funcions in each speed resuls from such difference of he observed variables. 4. Conclusion In his sudy, we focus on a fuzzy inference model for learning from experiences (FILE) which is proposed by he
informaion elecronics sysem (SOIM-COE05), pages 285 288, 2005. [4] H. Iani and T. Furuhashi. A sudy on eaching informaion undersanding by auonomous mobile robo. Trans. of SICE, 38(11):966 973, 2002. in Japanese. [5] L. Jouffe. Fuzzy inference sysem learning by reinforcemen mehod. IEEE Trans. Sys., Man, Cybern., par C, 28(3):338 355, 1998. [6] G. Leng, T. M. McGinniy, and G. Prasad. An approach for on-line exracion of fuzzy rules using a self-organising fuzzy neural nework. Fuzzy Ses and Sysems, 150:211 243, 2005. [7] T. Nishina and M. Hagiwara. Fuzzy inference neural nework. Neurocompuing, 14:223 239, 1997. [8] T. Takahama, S. Sakai, H. Ogura, and M. Nakamura. Learning fuzzy rules for bang-bang conrol by reinforcemen learning mehod. Journal of Japan Sociey for Fuzzy Theory and Sysems, 8(1):115 122, 1996. in Japanese. Figure 10. Behavior of he robo which avoids colliding. (a) Low speed. (b) High speed. auhors[3]. FILE can sably learn from boh experiences of success and failure. In he model, he learning execues afer each end of a rial. In his paper, we applied he model o an environmen which changes and confirmed performance of he model by using a robo navigaion ask simulaion. Addiionally we invesigaed change of he knowledge caused by an environmenal change. In he experimen, we showed ha he robo adaps o a changing environmen and acquires knowledge which is suiable for each environmen by he learning. A furher direcion of his sudy will be ha we add mechanism which generae new rule o he model and adap he model o more complicaed asks. Acknowledgmen This work is suppored in par by he 21s Cenury COE Program (Gran No.14170017). References [1] M. Er, T. P. Tan, and S. Y. Loh. Conrol of a mobile robo using generalized dynamic fuzzy neural neworks. Microprocessors and Microsysems, 28:491 498, 2004. [2] T. Furuhashi, K. Nakaoka, K. Morikawa, H. Maeda, and Y. Uchikawa. A sudy on knowledge finding using fuzzy classifier sysem. Journal of Japan Sociey for Fuzzy Theory and Sysems, 7(4):839 848, 1995. in Japanese. [3] M. Gouko, Y. Sugaya, and H. Aso. Learning fuzzy inference model and acquisiion of knowledge. Proceedings of he 3rd suden-organizing inernaional mini-conference on