Simultaneous versus incremental learning of multiple skills by modular robots

Evol. Intel. (2014) 7:119 131 DOI 10.1007/s12065-014-0109-3 SPECIAL ISSUE Smultaneous versus ncremental learnng of multple sklls by modular robots C. Ross A. E. Eben Receved: 15 November 2013 / Revsed: 6 March 2014 / Accepted: 6 June 2014 / Publshed onlne: 28 June 2014 Ó Sprnger-Verlag Berln Hedelberg 2014 Abstract Ths paper s concerned wth the problem of learnng multple sklls by modular robots. The man queston we address s whether t s better to learn multple sklls smultaneously (all-at-once) or ncrementally (oneby-one). We conduct an expermental study wth modular robots of varous morphologes that need to acqure three dfferent but correlated sklls, effcent locomoton, navgaton towards a target pont, and obstacle avodance, usng a real-tme, on-board evoluton as the learnng method. The results ndcate that the one-by-one strategy s more effcent and more stable than the all-at-once strategy. Keywords Evolutonary robotcs Learnng Artfcal lfe Emboded artfcal evoluton 1 Introducton Ths paper s concerned wth the problem of learnng multple sklls by modular robots autonomously, wthout human nterventon. The man queston we address s whether t s better to learn multple sklls smultaneously (all-at-once) or ncrementally (one-by-one). In prncple, ths queston can arse n varous scenaros, whenever Electronc supplementary materal The onlne verson of ths artcle (do:10.1007/s12065-014-0109-3) contans supplementary materal, whch s avalable to authorzed users. C. Ross (&) Centre for Automaton and Robotcs UPM-CSIC, Madrd, Span e-mal: claudo.ross@upm.es A. E. Eben VU Unversty Amsterdam, Amsterdam, The Netherlands e-mal: a.e.eben@vu.nl robots are requred to adapt ther controllers, and can concern any collecton of sklls they need to acqure. The current study, however, s grounded n a partcular scenaro that also motvates our choces about the sklls to learn. The motvatonal scenaro s based on the vson of an ecosystem of robots that evolve both ther physcal structure (body) and control software (mnd). In ths ecosystem robots only have to survve and reproduce, that s, we do not assume any user defned task, the sole crteron of ftness s vablty and fertlty. To ths end robots have to be able to move around, fnd energy sources, and meet and mate wth other robots to reproduce. A general vson of physcally emboded artfcal evolutonary systems has been descrbed n [14], whle a conceptual framework for ther mplementaton has been presented n [13]. Ths framework, dubbed the Trangle of Lfe, descrbes a lfe cycle that does not run from brth to death, but from concepton (beng conceved) to concepton (concevng one or more offsprng) and t s repeated over and over agan, thus creatng consecutve generatons of robot chldren. In general, the Trangle of Lfe conssts of three stages, Brth, Infancy, and Mature Lfe. In ths paper we focus on the Infancy stage, makng some assumptons about the other parts of the system. In partcular, we assume modular robot bodes composed of dentcal modules and some mplementaton of Brth. That s, we assume a mechansm for producng new robot organsms. The delvery of a newborn robot organsm s the begnnng of the Infancy stage that rases the Control Your Own Body (CYOB) problem. The essence of the problem s that the body (morphologcal structure) and the mnd (controller) of a new organsm wll unlkely ft each other well. Even f the parents had well matchng bodes and mnds, recombnaton and mutaton can easly result n a chld where ths s not the case. Hence, the new organsm needs acqure a

120 Evol. Intel. (2014) 7:119 131 controller that fts ts morphology and t needs to do t onthe-fly. Ths problem can be put n a broader perspectve, observng that matter matters. That s, the body, ts shape, and ts physcal nteracton wth the envronment have an mpact on the way the mnd s shaped [22]. To solve the CYOB problem we need a generc adaptaton mechansm that works over a wde range of dfferent body shapes and szes and s capable of learnng a good controller for any gven body. The exact defnton of a good controller depends on the gven scenaro, but n general we can assume that approprate behavor requres several sklls. Ths rases our man research queston: Is t better to learn multple sklls smultaneously (allat-once) or ncrementally (one-by-one, followng a syllabus)? The experments reported n ths paper are based on three basc sklls related to movng effcently and usng a real-tme on-board evolutonary algorthm to mplement learnng. To compare smultaneous and ncremental learnng we use three crtera: effcency, relablty, and reslence. In partcular, we address the followng subquestons: 1. How do the two strateges compare n terms of effcency (.e., learnng tme)? 2. How do the two strateges compare n terms of relablty (.e., varance of the outcomes)? 3. How do the two strateges compare n terms of reslence of the controllers generated (.e., how can they recover f a robot component s dsabled)? The remnder of the paper s organzed as follows. In the followng we brefly revew related work. Then we descrbe the experments dvded nto two sectons. Secton 3 presents the expermental setup, and Sect. 4 descrbes the concept of the Robot School, our metaphor for the one-by-one learnng strategy. The outcomes of the experments are gven n Sect. 5, arranged around the man research questons. Ths s followed by a bonus secton that provdes addtonal nsghts nto the learnng process. Fnally, n Sect. 7 we gve a bref summary and conclude the paper. 2 Related work Exstng work closely related to our research subject can be dvded nto papers about the way of learnng and papers about the task of gat learnng. On the abstract level of learnng mechansms, layered learnng and ncremental learnng are related [21, 26]. However, the layered learnng approach conssts of breakng a problem down nto several behavoral layers and ncremental learnng works through learnng (moton pattern) prmtves and groupng them n herarchcal tree structures. In contrast, our approach s not based on explct herarches, we smply expose the learner robot to a sequence of tasks. The most closely related works about ncremental learnng n evolutonary robotcs can be grouped n two clusters. The frst one concerns work done n the 1990 s. A relatvely early work s that of Dorgo and Colombett [12] based on shapng. They dstngush holstc shapng (all sklls at once) and modular shapng (sklls learned separately) and compare them n dfferent (herarchcal or monolthc) control archtectures. Ther results ndcate the superorty of modular shapng. Another approach s reported by Kodjabachan and Meyer who evolved NNbased robot controllers for two sklls n a two-staged procedure [15]. In the frst stage a NN controller s evolved for locomoton only. In the second stage ths NN s frozen and another one s evolved for obstacle avodance. The second NN has lnks to nodes of the frst one, thus a combned NN has evolved. Such a two staged ncremental approach was also used by Chavas et al. [6], where the two stages try to match dfferent envronmental condtons (strong-lght and medum lght), rather then to represent two dfferent tasks. In these papers the two stages rare arranged sequentally and the results show that behavour after stage 1 and 2 s better than after stage 1 only. The second cluster s formed by recent work of Bongard about behavour channg and scaffoldng. Behavor channg falls n the sequental learnng category based on scaffoldng after Wood et al. [29]. In [5] behavor channg s ntroduced for gradually expandng the behavoral repertore of a dynamcally behavng robot. Bongard experments wth two tasks and has robots evolve a controller capable of graspng and lftng frst and to ncorporate locomoton afterwards. As for the mportance of ths order, he concludes that there s an nherent undrectonalty n at least some behavoral trajectores: for a gven set of behavors t s easer to learn task and then task j, compared to learnng task j and then task. Ths observaton s confrmed later n [1] for three tasks: behavour channg works, but the success depends on the order of tasks. An mportant extenson of ths work s presented n [3] where morphologcal scaffoldng s added to the system. The dea s to use dfferent morphologes that change durng one evaluaton perod of the robot. The outcomes show that sgnfcant performance mprovement can acheved f morphologcal scaffoldng precedes envronmental scaffoldng. These effects can be explaned by lookng at the robustness of the controllers: workng n dfferent body plans ncreases robustness, even f the changes (aganst whch robustness s measured) orgnate n the envronment (and not n the body). Usng dfferent

Evol. Intel. (2014) 7:119 131 121 Fg. 1 Left body modules and the sx possble connecton ponts. Rght head module equpped wth two dstance sensors morphologes s also a feature n our present work, but n our real-tme evolutonary system the body of a robot never changes durng the learnng process. The combnaton of morphologcal and envronmental varaton s further nvestgated n [2], where morphologes and controllers co-evolve n dfferent envronments ndependently. As for the task, there s only one thng to learn (one ftness functon), t s just the arena that changes and the man research queston concerns the lnk between envronmental complexty and the complexty of the evolved morphologes. The results are somewhat nconclusve as these two forms of complexty do not seem to be correlated as hypotheszed. Also ths study s usng off-lne evoluton wth many ftness evaluatons (500 generatons wth a populaton sze of 150 ndvduals). A promnent dfference between ths cluster of work and the present study s that we use evoluton n real-tme, whle here evoluton takes place n an off-lne fashon. The other perspectve to select exstng work related to our study s the ssue of gat learnng for modular robotcs. As explaned by Spröwtz: locomoton requres the creaton of rhythmc patterns whch satsfy multple constrants: generatng forward moton, wthout fallng over, wth low energy, possbly copng wth dfferent envronments, hardware falures, changes n the envronment and/or of the organsm [25]. In the lterature there are several approaches, based on varous types of controllers and algorthms for creatng these rhythmc patterns. An early approach s based on gat control tables, for nstance, [4, 30]. A sgnfcant cluster of work concerns usng neural networks and HyperNEAT [9, 16, 31]. Another successful approach s based on Central Pattern Generators [18] that can produce well performng and stable gats on both non-modular robots [7, 25] and modular mult-robot organsms [19, 20]. Usng artfcal hormones n modular robots has been shown to work well too [17, 24]. Fnally, renforcement learnng has also been used n modular as well as non modular robots for onlne gat learnng [8, 11, 23]. Flterng the lterature though the problem we address here, t s [8, 11, 19, 25] where multple dfferent shapes are nvestgated. 3 Expermental setup The robot ecosystem we work wth exsts n a vrtual world, smulated by a realstc 3D smulator, 1 that fathfully reproduces both the real world physcs and the robots knematcs and dynamcs as well as the physcal nteractons wth the world s objects. 3.1 The robots The physcal realsm of the smulatons mples hgh computatonal costs, makng these smulatons very slow. For ths reason, the modular robots we adopt at ths stage are very smple. They are composed of a number of capsule modules connected to each other by two degrees of freedom (DOFs) actuated jonts that allows both horzontal and vertcal movement. Each module has sx connecton ponts, and each robot has a specal head module provded wth two dstance sensors. Fgure 1 depcts the basc modules, and Fg. 4 llustrates the twelve confguratons adopted for our experments. The two DOFs are actuated accordng to a perodc functon, n the form / ðtþ ¼a snðxt þ b Þ: ð1þ / ðtþ s a snusodal functon that determnes the poston of jont between two body segments ð ¼ 1...2ðn 1ÞÞ, where and n s the number of body segments of the robot. Parameters x, a and b determne, respectvely, the angular speed, ampltude and phase of the oscllaton of the jonts, a beng n the range p 2 ; þ p 2. Such moton law s motvated by the observaton that, n general, locomoton nvolves perodc and synchronsed 1 Webots system by Cyberbotcs, www.cyberbotcs.com.

122 Evol. Intel. (2014) 7:119 131 Fg. 2 Left example of the effect of r n the trajectory of a serpentne locomoton (X s the longtudnal drecton, Y represents the lateral dsplacement). Rght llustraton of the angular component of the moton. The green lne s the deal trajectory (mages adapted from [10] and [27]) (color fgure onlne) Fg. 3 Left the dff ðtþ functon takes nto account the dfference of the sensors readngs,.e. the poston of the obstacle. Rght the term c ðcentreþ fxes the relatve orentaton of the modules at rest movement of the lmbs. Note that all jonts share the same parameter x, snce synchronsed movement mples the same oscllaton frequency. Thus, the basc parameters requred by each motor are the three classcal parameters of perodc undulatory moton: rotaton speed, ampltude and phase. Moton functon (1) can be added a fourth parameter that bases the oscllaton: / ðtþ ¼a snðxt þ b Þþc : ð2þ The term c shfts the centre of the oscllaton makng n asymmetrc, and s used to set the relatve orentaton of the modules. But t can do much more: t can be used for changng the overall robot s moton. For nstance, n serpentne locomoton the term c makes the wave moton of the snake to be asymmetrc, resultng n a curved path, as a functon of the desred curvature radus r : c ¼ r n (see also Fg. 2). In a smlar way, basng the oscllaton of the robot s jonts, wll result n dfferent motons. Moreover, c can be a functon of the sensory nput, and can thus be used to modfy the robot s behavour n a controlled way. Gven an arbtrary shape, however, makes t very hard to fnd an analytcal formula for a curved path, and that s where learnng plays a key role. Snce the bas for a desred curvature radus cannot be explctly calculated, t must be learnt, takng nto account the angle between the current robot s orentaton, the target destnaton pont and the sensory nput provded by the dstance sensor. Therefore, the term c of Eq. (2) takes the form: c ðtþ ¼c ðtargetþ angleðtþþc ðobstacleþ and the fnal jont moton equaton s / ðtþ ¼ a snðxt þ b Þþc ðtargetþ dff ðtþþc ðcentreþ dff ðtþþc ðcentreþ ð3þ angleðtþþc ðobstacleþ ð4þ where dff ðtþ s the dfference between the range sensors readngs, angleðtþ s the angle between the robot s orentaton uðtþ and the drecton of the target pont vðtþ (see Fgs. 2 rght, 3 left). A robot s controller can be thus descrbed by an array of floatng-pont values of length 5 2ðn 1Þþ1, wth 5 parameters governng the jonts moton, plus one for the common x. 3.2 The evolutonary algorthm The learnng strategy we adopted s a smple (1?1) Evoluton Strategy runnng on every robot. Ths choce s motvated by the natural encodng of the robots controllers as arrays of floatng pont values. Snce we am at real-tme embedded algorthms, havng a populaton of controllers would mply testng multple canddate solutons blndly before makng a selecton, whch would be very tme consumng. Evoluton Strateges have the addtonal advantage that the mutaton step sze s self-adaptve, and

Evol. Intel. (2014) 7:119 131 Table 1 Parameters of the evolutonary algorthm Parameter that the algorthm s almost parameter-free. By real-tme we mean that the controller s evolved durng the lfe tme of the robot, not just tested on t. Thus, a canddate controller s encoded n a floatng pont valued chromosome of sze 5 2ðn 1Þþ2, where the added gene encodes a common mutaton rate. 2 Table 1 summarses the man features of the algorthm. 4 Robot school Value Algorthm type Evoluton strategy Selecton ðl þ kþ (best of parent and offsprng) strategy Populaton sze l ¼ 1 No. of offrsprng k ¼ 1 Chromosome 5 2ðn 1Þþ2, n ¼ number of robot modules length Gene type Floatng-pont, range ½ 1; 1Š (radans), except x ¼½0; 1Š and r Mutaton Gaussan perturbaton, range l, encoded n the chromosome Mutaton lmts l ¼½0:1; 0:5Š As mentoned earler, survvng n the ecosystem mples (at least) three basc sklls: locomoton, obstacle avodance and target reachng. Indvduals capable of wanderng effcently, and capable of reachng a desred pont lke a food source or a possble mate wll have hgher chances to survve and reproduce. 3 Therefore, we am at generatng controllers that are capable of dealng wth the generc moton task (.e., movng from one place to another, avodng obstacles n between) accordng to ther sensory nput wthout the need of swtchng between behavors, that s, wthout a hgher level behavor control. Note that goals can be conflctng, lke n the case the robot s drectng towards a pont and fnds an obstacle n between. Snce the robot controllers that we are usng encode the three basc sklls, the learnng process has to fnd the correct combnaton of parameters that deals wth such stuatons. 2 Early work on such learnng strategy for the problem at hand has shown that an ndvdual mutaton rate for each gene does not provde an advantage, w.r.t havng a common mutaton rate for all genes, see [28]. 3 Recall that the Infancy stage s part of a cycle the Trangle of Lfe where ndvduals mate sto generate new offsprng. The problem s then whch s the best way to fnd such a combnaton, that s the best learnng strategy. One soluton would be tryng to learn all the sklls at the same tme. We call ths strategy smultaneous learnng. Here, we am at comparng ths strategy wth an alternatve strategy, called Robot School. In the Robot School, nfant robots follow a syllabus that conssts n three sequental classes. In Class 1, robots have to learn how to control ther bodes n order to generate gats for effcent unconstraned locomoton. In Class 2, robots have to learn how to move towards a target pont, and n Class 3, robots have to learn how to move around obstacles. Classes 2 and 3 mply ncludng sensory nformaton n the generaton of the approprate gat, as descrbed by Eq. (4). The target s poston can be provded by long range sensors, by nterrobot communcatons or, as we have done n ths work, by an nstructor. Obstacle detecton s performed usng the sensory nput provded by the on-board range sensors located n the robot s head. Referrng to Eq. (4), smultaneous learnng mples fndng sutable values for all the parameters governng body moton (a ; b ; c ðtargetþ ; c ðobstacleþ ; c ðcentreþ ; ¼ 1...::n; and x) at the same tme. Conversely, Robot School dvdes the problem n three steps: Class 1 bascally means fndng good a ; b ; c ðcentreþ and x for straght forward locomoton; Class two teaches to ndvduals that are good on straght locomoton how to reach for a target n a gven locaton, that s sutable values for the c ðtargetþ parameters, and Class 3 teaches how to avod obstacles that are between the robots ant the target, meanng sutable values for the c ðobstacleþ parameters. Note however that each Class can also modfy the parameters learnt n the prevous classes. 5 Expermental results As mentoned n Sect. 1, the questons we address n ths paper are concerned wth the effcency and the stablty of the learnng process and the reslence of the controllers produced. To answer these questons we conducted three seres of experments, one per research queston. These experments used a hand crafted test sute of 12 dfferent robot morphologes that dffer n ther n shapes and szes. Fgure 4 depcts the whole set, whch conssts of four dfferent robot topologes (labelled wth the letters H,T,X,I, whch resemble ther shape), each of them of three dfferent szes n terms of modules composng them (6,7,8). 5.1 Effcency of the learnng process In the frst seres of tests, we amed at addressng the frst queston:

124 Evol. Intel. (2014) 7:119 131 Fg. 4 The twelve confguratons used for expermentaton. They are grouped by topology (row-wse) and sze (column-wse) How do the two strateges compare n terms of effcency? To ths end, we compared the number of generatons needed by the Robot School traned ndvduals to reach the same performance as the smultaneous learnng ndvduals. Two seres of tests were performed, one comparng the two learnng strateges on two tasks (the locomoton plus target reachng tasks), and one on all the three tasks (locomoton plus target reachng plus obstacle avodance). The frst tests compare locomoton and target reachng, and was carred out n the followng way. The smultaneous learnng process was allotted a fxed learnng tme,.e. a fxed number of generatons, set to 100. At each tral, the ndvduals were allowed to run for 40 s. The ftness of the ndvdual was computed as a functon of the fnal dstance to the target pont. In order to avod bases n the robots trajectores, the target pont locaton was changed after 20 s from front-left to front-rght, and the robot relocated n ts ntal poston. The fnal ftness was calculated as the dstance to the target obtaned nt the frst half run plus the dstance to the target obtaned n the second half run, dvded by the dfference between the two partal results. In ths way, ndvduals that had a strong drecton preference (.e. obtaned good results on one of the partal runs but bad on the other one) were penalsed. For each confguraton, ten ndependent runs were performed. The Robot School ndvduals where gven 50 % of such tme to learn straght locomoton (Class 1). The best ndvduals of Class 1 where then gven an unlmted number of generatons to learn how to correctly turn towards the movng target (Class 2), reachng at least the same

Evol. Intel. (2014) 7:119 131 125 Fg. 5 Effcency tests wth two tasks: locomoton? target reachng. The number of generatons needed by the smultaneous learnng strategy (blue bars) versus number of generatons needed by the Robot School traned robots (red bars). Robot School traned robots take fewer generatons to reach the same ftness as the all-at-once traned robots. Values are averages over 10 runs (color fgure onlne) Fg. 6 Effcency tests wth three tasks: locomoton? target reachng? obstacle avodance. The total number of generatons needed by the smultaneous learnng strategy (blue bars) versus number of generatons needed by the Robot School traned robots (red performance as the average of the smultaneous learnng ndvduals,.e. the same ftness. Fgure 5 compares the fnal number of generatons needed by the Robot School traned ndvduals versus the smultaneous learnng baselne number of 100 generatons. The results show that, n general, learnng n two steps takes less than 100 generatons (of whch, 50 bars). Robot School traned robot take fewer generatons to reach the same ftness as the all-at-once traned robots. Values are averages over 10 runs (color fgure onlne) were used to learn locomoton). Ths means that learnng the two tasks ncrementally takes less tme than learnng the two of them smultaneously. The average number of generatons over all shapes and szes s 73.3 for the R- obot School (ncludng 50 generatons for Class 1), versus the reference value of 100 for the smultaneous learnng.

126 Evol. Intel. (2014) 7:119 131 In the second test of ths seres, we compared the learnng of the three tasks (locomoton, target reachng and obstacle avodance). Here, a wall was put n between the robots ntal poston and the target (see Fg. 3 left), and the robots had to learn how to avod t to reach ther target. The expermental setup was smlar to the one of the prevous experments, but the ndvduals were gven 60 s lfe tme for ftness evaluaton. The smultaneous learnng strategy was allotted a maxmum number of 150 generatons to learn the three tasks. Smlarly to the earler experment, for the Robot School strategy, the best ndvduals of class 2 were gven an unlmted number of generatons to reach at least the same ftness as ther smultaneous learnng companons. Fgure 6 compares the fnal number of generatons needed by the Robot School traned ndvduals versus the smultaneous learnng baselne number of 150 generatons for the three tasks. The results show that learnng n three steps takes fewer generatons. Agan, ncrementally Table 2 Results of the sgnfcance tests Left: locomoton? target reachng; rght: locomoton? target reachng? obstacle avodance learnng the tasks takes less tme than learnng them smultaneously. The average number of generatons over all shapes and szes s 69.2 for the Robot School, to whch n must be added the 50 generatons of Class 1, makng a total of 119.2, versus the reference value of 150 for the smultaneous learnng. In order to assess the statstcal sgnfcance of the dfferences of the plots of Fgs. 5 and 6, we performed an unpared 2-taled t test, wth dfferent varances, to test the null hypothess the average of the Robot School experments s not dfferent than the average of the smultaneous experments. The results of the t test are reported n Table 2. 5.2 Relablty of the learnng process In ths subsecton we provde the data needed to address the second queston: How do the two strateges compare n terms of relablty? To assess the stablty of the learnng process, we compared the varance of the ftness values obtaned. Also n ths case, two seres of tests were performed, one comparng the two learnng strateges on the locomoton and target reachng tasks, and one on all the three tasks. It can be notced from Fgs. 7 and 8 that the varance of the fnal ftness s, n general, smaller n Robot School learnng than n the smultaneous learnng. Some exceptons can be notced e.g. n the T7 and X7 confguratons for the tests wth three tasks (Fg. 8). However, Fg. 7 Relablty tests wth two tasks: locomoton? target reachng. Ftness obtaned by the smultaneous learnng strategy (blue bars) versus ftness of the Robot School traned robots (red bars). The varance s smaller for Robot School, thus the learnng process s more stable. Values are averages over 10 runs (color fgure onlne)

Evol. Intel. (2014) 7:119 131 127 Fg. 8 Relablty tests wth three tasks: locomoton? target reachng? obstacle avodance. Ftness obtaned by the smultaneous learnng strategy (blue bars) versus ftness of the Robot School traned robots (red bars). The varance s smaller for Robot School, thus the learnng process s more stable. Values are averages over 10 runs (color fgure onlne) Fg. 9 Reslence tests. Number of generatons needed to re-learn the locomoton? target reachng tasks wth a faulty motor. Smultaneous learnng (blue bars) versus Robot School(red bars). Values are averages over 50 runs (color fgure onlne) as a global result t can be stated that the Robot School learnng strategy s more relable, presentng more stable results. Note that there are no sgnfcant dfferences on ftness between the two strateges,.e. the faster learnng of Robot School does not affect qualty.

128 Evol. Intel. (2014) 7:119 131 4 3.5 H7 4 3.5 X7 3 3 2.5 2.5 Ftness 2 1.5 1 0.5 Ftness 2 1.5 1 0.5 0-0.5 3.5 3 Robot School Smultaneous 10 20 30 40 50 60 70 80 90 100 Generatons T7 0-0.5 1.6 1.4 Robot School Smultaneous 10 20 30 40 50 60 70 80 90 100 Generatons I7 Ftness 2.5 2 1.5 1 0.5 Ftness 1.2 1 0.8 0.6 0.4 0.2 0-0.5 Robot School Smultaneous 10 20 30 40 50 60 70 80 90 100 Generatons 0-0.2 Robot School Smultaneous 10 20 30 40 50 60 70 80 90 100 Generatons Fg. 10 Target reachng tests: evoluton of the ftness over tme. From top-left to bottom-rght: confguraton H7, X7, T7, I7. Each pont of the smultaneous strategy plots s the average ftness over 10 trals, whle each pont of the Robot School plots s the average over 5.3 Reslence of the controllers In the thrd seres of experments, we looked nto the thrd queston. How do the two strateges compare n terms of reslence of the controllers generated? To ths end, we tested the robustness of the robot controllers, meant as the capacty to deal wth falures. In these tests, the best ndvduals obtaned wth the two learnng strateges n the locomoton? target reachng tasks were consdered. A falure, consstng n blockng n a random poston a randomly chosen motor, were caused to the robots, and the tme needed to re-learn the target reachng task was measured. By re-learnng, we mean reachng the same ftness as before the falure. For each robot confguraton, 5 dfferent falures were smulated, and for each of them, 10 ndependent runs were performed. Thus, each controller were tested 50 tmes. The robots were allowed a maxmum number of 100 all the trals that have reached the correspondng generaton number (note that each of the 10 run of the Robot School test has stopped at dfferent tmes) generatons to learn the task, that s the same number of generaton as for learnng the task ex novo. Fgure 9 summarses the results of the reslence smulatons. Both learnng strateges have produced controllers that are capable of re-learnng the task n less than 100 generatons, whch can be nterpreted as the controllers adapted to the new stuaton, and not just re-learn the task from scratch. However, n ths case no sgnfcant dfferences can be notced between the results obtaned by the ndvduals generated by the two dfferent learnng strateges. Thus, the faster learnng of the Robot School does not affect the reslence of the controllers generated. 6 Further nsghts In ths secton we brefly present further fndngs that provde extra nsghts nto the evoluton process and llustrate the dfferences of the evolved solutons.

Evol. Intel. (2014) 7:119 131 129 Fg. 11 Fnal value of the x parameter (frequency of the oscllaton) obtaned by the smultaneous learnng strategy (blue bars) versus ftness of the Robot School traned robots (red bars). Left locomoton? target reachng; Rght: locomoton? target reachng? obstacle avodance (color fgure onlne) Fg. 12 Fnal mutaton rate obtaned by the smultaneous learnng strategy (blue bars) versus ftness of the Robot School traned robots (red bars). Left locomoton? target reachng; Rght locomoton? target reachng? obstacle avodance (color fgure onlne) Fgure 10 shows the evoluton of the ftness over tme of four representatve confguratons, one for each of the topologes used, ntermedate sze. Note that snce the R- obot School learnng strategy usually stops before the maxmum number of 50 allowed generatons, each pont of the correspondng plot s the average over the number of trals that actually run untl the correspondng pont n the x axs (generaton). As t can be notced, the varablty of the ftness s lower and more constant over tme n the Robot School experments, whch confrms that the Robot School learnng strategy s more relable. Fgure 11 shows the fnal value of the x parameter, whch determnes the frequency of the oscllatons of the body segments. Hgher values mean that the robot s movng ts parts faster. Although a hgher frequency gat does not translate mmedately to a faster locomoton, the better performance of the Robot School traned robots may depend on ther hgher frequency gats. Note also the smaller varance of ths parameter n the Robot School w.r.t the smultaneous learnng strategy. Fnally, Fg. 12 shows the fnal value of the mutaton rate. Such parameter also undergoes evoluton, as customary n Evoluton Strateges, startng from the ntal value of l ¼ 0:3. In general, no sgnfcant dfferences can be notced n the two learnng strateges, wth some exceptons n the locomoton? target reachng? obstacle avodance experments that shows smaller values e.g. n the X6; I6 and I7 confguratons for the Robot School. Agan, a lower varablty can be observed n the Robot School learnng strategy, suggestng a more relable learnng process. 7 Summary and conclusons The objectve of ths paper was to compare two dfferent strateges for learnng multple tasks n (smulated) modular robot organsms. The questons addressed here were wether learnng one task at a tme, buldng over prevously traned ndvduals s a better strategy than attemptng to learn all tasks at once. To ths end we carred out an emprcal study, based on extensve smulatons usng four dfferent robot topologes, each of them of three dfferent szes. The results of the experments ndcate that the ncremental learnng strategy, the Robot School, reduces the learnng tmes reachng the same qualty level wth fewer evaluatons. Moreover, t s more relable, showng less varance n the qualty of the resultng gats (controllers). Especally the frst result s mportant, because evaluatng canddate solutons s very

130 Evol. Intel. (2014) 7:119 131 expensve: evaluatng a controller mples lettng the correspondng robot operate for a relatvely long tme n order to get a good ndcaton of ts qualty. In our smulatons ths took 40 and 60 s, but n real physcal mplementatons t can be much longer, e.g. hours. Concernng reslence, both learnng strateges have produced smlar results. Ths means that the faster learnng does not come at a prce n ths respect, Robot School - traned ndvduals do not behave more poorly than the ndvduals traned wth the smultaneous strategy. Last, but not least, we observed that the results are consstent throughout the test sute. That s, there were no sgnfcant dfferences between confguratons (sze/topology) for both learnng strateges. Ongong and future work concerns further verfcaton of these fndngs n more complex settngs. To ths ends we are extendng the test sute of robot morphologes (more shapes, larger range of szes) as well as the set of tasks. References 1. Auerbach JE, Bongard JC (2009) How robot morphology and tranng order affect the learnng of multple behavors. In: Proceedngs of the IEEE congress on evolutonary computaton (CEC 2009). IEEE Press, pp 39 46 2. Auerbach JE, Bongard JC (2012) On the relatonshp between envronmental and mechancal complexty n evolved robots. In: Adam C, Bryson D, Ofra C, Pennock R (eds) Proceedngs of artfcal lfe, vol 13. MIT Press, pp 309 316 3. Bongard J (2011) Morphologcal and envronmental scaffoldng synergze when evolvng robot controllers. In: Krasnogor N (ed) Proceedngs of the 13th genetc and evolutonary computaton conference (GECCO-2011). ACM, pp 179 186 4. Bongard J, Zykov V, Lpson H (2006) Reslent machnes through contnuous self-modelng. Scence 314(5802):1118 1121 5. Bongard JC (2008) Behavor channg: ncremental behavor ntegraton for evolutonary robotcs. In: Bullock S, Noble J, Watson R, Bedau M (eds) Proceedngs of artfcal lfe XI. MIT Press, pp 64 71 6. Chavas J, Corne C, Horva P, Kodjabachan J, Meyer J.-A (1998) Incremental evoluton of neural controllers for robust obstacleavodance n Khepera. In: Husbands P, Meyer J.-A (eds) Proceedngs of the 1st European workshop on Evolutonary Robotcs-EvoRobot98. Sprnger, Berln, pp 227 247 7. Chrstensen DJ, Larsen JC, Støy K (2013) Fault-tolerant gat learnng and morphology optmzaton of a polymorphc walkng robot. Evol Syst (to appear). do:10.1007/s12530-013-9088-3 8. Chrstensen DJ, Schultz UP, Støy K (2013) A dstrbuted and morphology-ndependent strategy for adaptve locomoton n self-reconfgurable modular robots. Robotcs Auton Syst 61(9):1021 1035 9. Clune J, Beckmann BE, Ofra C, Pennock RT (2009) Evolvng coordnated quadruped gats wth the HyperNEAT generatve encodng. In: IEEE congress on evolutonary computaton (CEC) 2009. IEEE Press, pp 2764 2771 10. Colorado J, Barrentos A, Ross C, Garzón M, Galán M, del Cerro J (2010) Effcent locomoton on non-wheeled snake-lke robots. In: Flpe J, Andrade-Cetto J, Ferrer JL (eds) ICINCO (2). IN- STICC Press, pp 246 251 11. D Angelo M, Weel B, Eben A (2013) Onlne gat learnng for modular robots wth arbtrary shapes and szes. In: Dedu AH, Martín-Vde C, Truthe B, Vega-Rodríguez MA (eds) Second nternatonal conference on the theory and practce of natural computng (TPNC 2013), no. 8273 n LNCS. Sprnger, Berln, pp 45 56 12. Dorgo M, Colombett M (1994) Robot shapng: developng autonomous agents through learnng. Artf Intell 2:321 370 13. Eben AE, Bredeche N, Hoogendoorn M, Stradner J, Tmms J, Tyrrell A, Wnfeld A (2013) The trangle of lfe: evolvng robots n real-tme and real-space. In: Ló P, Mglno O, Ncosa G, Nolf S, Pavone M (eds) Advances n artfcal lfe, (ECAL) 2013. MIT Press, pp 1056 1063 14. Eben AE, Kernbach S, Haasdjk E (2012) Emboded artfcal evoluton. Evol Intell 5(4):261 272 15. Fllat D, Kodjabachan J, Meyer JA (1999) Evoluton of neural controllers for locomoton and obstacle-avodance n a 6-legged robot. Connect Sc 11(3 4):223 240 16. Haasdjk E, Rusu AA, Eben AE (2010) HyperNEAT for locomoton control n modular robots. In: Hornby GS, Sekanna L, Haddow PC (eds) Evolvable systems: from bology to hardware, lecture notes n computer scence, vol 5216. Sprnger, Berln, pp 169 180 17. Hamann H, Stradner J, Schmckl T, Cralshem K (2010) A hormone-based controller for evolutonary mult-modular robotcs: from sngle modules to gat learnng. In: IEEE congress on evolutonary computaton (CEC) 2010. IEEE Press, pp 1 8 18. Ijspeert AJ (2008) Central pattern generators for locomoton control n anmals and robots: a revew. Neural Netw 21(4):642 653 19. Kammura A, Kurokawa H, Yoshda E, Murata S, Tomta K, Kokaj S (2005) Automatc locomoton desgn and experments for a modular robotc system. IEEE/ASME Trans Mechatron 10(3):314 325 20. Kammura A, Kurokawa H, Yoshda E, Tomta K, Kokaj S, Murata S (2004) Dstrbuted adaptve locomoton by a modular robotc system, M-TRAN II. In: Proceedngs of 2004 IEEE/RSJ nternatonal conference on ntellgent robots and systems (IROS) 2004, vol 3. IEEE Press, pp 2370 2377 21. Lessn D, Fussel D, Mkkulanen R (2013) Open-ended behavoral complexty for evolved vrtual creatures. In: Proceedngs of the ffteenth annual conference on genetc and evolutonary computaton conference (GECCO-2013). ACM, pp 335 342 22. Pfefer R, Bongard JC (2007) How the body shapes the way we thnk. A new vew of ntellgence. MIT Press, Cambrdge 23. Shen H, Yosnsk J, Kormushev P, Caldwell DG, Lpson H (2012) Learnng fast quadruped robot gats wth the RL PoWER splne parameterzaton. Cybern Inf Technol 12(3):66 75 24. Shen WM, Salem B, Wll P (2000) Hormones for self-reconfgurable robots. In: Pagello E, Groen F, Ara T, Dllman R, Stentz A (eds) Proceedngs of the 6th nternatonal conference on ntellgent autonomous systems (IAS-6). IOS Press, pp 918 925 25. Spröwtz A, Moeckel R, Maye J, Ijspeert AJ (2008) Learnng to move n modular robots usng central pattern generators and onlne optmzaton. Int J Robot Res 27(3 4):423 443 26. Stone P (2000) Layered learnng n multagent systems: a wnnng approach to robotc soccer. MIT Press, Cambrdge 27. Suarez DM, Ross C (2014) Evolutonary learnng of basc functonaltes for snake-lke robots. In: Armada MA, Sanfelu A, Ferre M (eds) ROBOT2013: frst beran robotcs conference, advances n ntellgent systems and computng, vol 252. Sprnger, Berln, pp 391 406 28. Suarez DPM, Ross C (2013) A comparson between dfferent encodng strateges for snake-lke robot controllers. In: Proceedngs of the EvoStar 2013, lecture notes n computer scence 7835, pp 560 568

Evol. Intel. (2014) 7:119 131 131 29. Wood D, Bruner J, Ross G (1976) The role of tutorng n problem solvng. J Chld Psychol Psychatry 17(2):89 100 30. Ym M (1993) A reconfgurable modular robot wth many modes of locomoton. In: Proceedngs of nternatonal conference on advanced mechatroncs. Japan Socety of Mechancal Engneers, Toko, Japan, pp 283 288 31. Yosnsk J, Clune J, Hdalgo D, Nguyen S, Zagal J, Lpson H (2011) Evolvng robot gats n hardware: the HyperNEAT generatve encodng vs. parameter optmzaton. In: Lenaerts T, Gacobn M, Bersn H, Bourgne P, Dorgo M, Doursat R (eds) Advances n artfcal lfe (ECAL) 2011. MIT Press, pp 890 897