Learning to Acquire Whole-Body Humanoid Center of Mass Movements to Achieve Dynamic Tasks
|
|
- Rose Briggs
- 5 years ago
- Views:
Transcription
1 Advanced Robotics 22 (2008) Full paper Learning to Acquire Whole-Body Humanoid Center of Mass Movements to Achieve Dynamic Tasks Takamitsu Matsubara a,b,, Jun Morimoto b,c, Jun Nakanishi b,c, Sang-Ho Hyon b,c, Joshua G. Hale b,c and Gordon Cheng b,c a Nara Institute of Science and Technology, Takayama-cho, Ikoma, Nara , Japan b ATR Computational Neuroscience Laboratories, Hikaridai Seika-cho, Soraku-gun, Kyoto , Japan c ICORP Computational Brain Project, Japan Science and Technology Agency, Honcho, Kawaguchi, Saitama , Japan Received 4 March 2008; accepted 19 March 2008 Abstract This paper presents a novel approach for acquiring dynamic whole-body movements on humanoid robots focused on learning a control policy for the center of mass (CoM). In our approach, we combine both a model-based CoM controller and a model-free reinforcement learning (RL) method to acquire dynamic whole-body movements in humanoid robots. (i) To cope with high dimensionality, we use a model-based CoM controller as a basic controller that derives joint angular velocities from the desired CoM velocity. The balancing issue can also be considered in the controller. (ii) The RL method is used to acquire a controller that generates the desired CoM velocity based on the current state. To demonstrate the effectiveness of our approach, we apply it to a ball-punching task on a simulated humanoid robot model. The acquired wholebody punching movement was also demonstrated on Fujitsu s Hoap-2 humanoid robot. Koninklijke Brill NV, Leiden and The Robotics Society of Japan, 2008 Keywords Reinforcement learning, humanoid robot, whole-body movement, policy-gradient method 1. Introduction Since their physical structure resembles humans, humanoid robots can be expected to help us with many tasks in our normal living environment, without specifically needing additional environmental customization. Therefore, interest continues to grow in the development of humanoid robots and their control methods to achieve whole-body dynamic movements in these systems [1 4]. In particular, over the last * To whom correspondence should be addressed. takam-m@is.naist.jp Koninklijke Brill NV, Leiden and The Robotics Society of Japan, 2008 DOI: / X324785
2 1126 T. Matsubara et al. / Advanced Robotics 22 (2008) decade, a number of methods for achieving various tasks on a humanoid robot have been explored, mainly to achieve biped walking and balancing [5 8]. Even though a number of real humanoid robots have demonstrated whole-body dynamic movements with these existing methods, it remains impossible to introduce humanoid robots into our living spaces to help us in our daily lives. This is in large part caused by their inability to adapt to new environments as easily as humans and animals, i.e., due to a lack of motor learning ability. One candidate solution for granting motor learning skills to humanoid robots is reinforcement learning (RL) a promising method because it requires no expert teachers or idealized desired behavior to improve skills. RL is a framework for improving the control rules of an agent, i.e., a robot, through iterative interaction with the environment based on a trial-and-error paradigm without using an explicit model of the environment [9, 10]. However, with increasing dimensionality in state and action spaces, RL often requires not only a large number of iterations, but also large computational cost, especially for learning a complex control policy motor learning. Although many researchers have attempted to apply RL methods to several robots in simulations and real hardware systems for acquiring desired movements, so far most of the robots to which learning has been successfully applied only have a small number of d.o.f., not as many as the d.o.f. typically offered by humanoid robots [11 15]. To the best of our knowledge, only one attempt has successfully learned the desired movements on a small humanoid robot [16]. That work focused on learning biped walking [16]. In this paper, we present a novel approach for acquiring dynamic whole-body movements on humanoid robots by focusing on learning a control policy for the center of mass (CoM). CoM is one of the most important features of humanoid robots because it approximately represents the whole-body motion of the humanoid robot. Moreover, as suggested by such experimental studies as Ref. [17], it can also be considered a control variable during functional tasks to overcome the curse of dimensionality in humans. Due to its dimensionality, learning a CoM movement for a given task might be simpler than directly learning all joint movements. Therefore, we propose combining a model-based CoM controller with a modelfree RL approach. A drawback of the model-based CoM controller is that the method only considers highly approximated dynamics, comprised of CoM and zero moment point (ZMP), to design a CoM controller. This approximation can cause poor tracking performance for a given desired trajectory. Model-based approaches are also always affected by modeling errors. However, we can derive joint angular velocities from the desired CoM velocity and can also explicitly consider the balancing with a model-based CoM controller. On the other hand, RL methods are applicable to improve the performance of controllers without using physical models and parameters. However, as described above, its drawback is that RL generally is not applicable to high-dimensional systems due to the curse of dimensionality [9]. Therefore, we cannot expect improvement of controllers for humanoid robots with
3 T. Matsubara et al. / Advanced Robotics 22 (2008) many d.o.f. within a realistic amount of time by naive application of RL to humanoid robots. In our approach, we combine a model-based CoM controller and a RL method to acquire dynamic whole-body movements on humanoid robots. (i) To cope with high-dimensionality, we use a model-based CoM controller as a basic CoM controller that derives joint angular velocities from the desired CoM velocity. The balancing issue can also be considered in the controller [5 8]. (ii) The RL method is used to acquire a controller that generates the desired CoM velocity based on the current state. While RL methods generally do not require the physical model and parameters of the robot, the learning system needs to be a Markov decision process for most standard approaches based on such value functions as Q-learning. Since we only consider CoM position and time as state variables, the learning system becomes a partially observable Markov decision process (POMDP). Therefore, we use a policy-gradient method that can be applied to the POMDP. We demonstrate that our proposed approach efficiently acquires the appropriate policies for a ball-punching task on a numerically simulated humanoid robot model of Fujitsu s Hoap-2. The acquired whole-body punching movement is demonstrated in a real hardware system as well as in simulations. The paper is organized as follows. In Section 2, we briefly describe our approach for learning desired whole-body movements on a humanoid robot by focusing on its CoM. In Section 3, we briefly introduce ZMP and its equation. Next we describe how we can control CoM by manipulating ZMP based on the ZMP equations. In Section 4, we present the policy-gradient method for learning an appropriate control policy for the desired full-body movements of a humanoid robot. In Section 5, we present a concrete example of the learning system in a ball-punching task with a humanoid robot. In Section 6, we describe the results achieved by applying the proposed method in numerical simulations. In Section 7, we demonstrate the acquired whole-body punching movement on a real robot. 2. Learning a Desired Whole-Body Movement on a Humanoid Robot: FocusedonLearningCoMMovements In this section, we briefly describe our approach for learning the desired wholebody movements on a humanoid robot. The approach focuses on learning CoM movement suitable to achieve the task on a humanoid robot. Figure 1 shows a rough sketch of our proposed approach. x is a state variable which is (partial) information of the robot and a is control output for learning. In this paper, we focus on learning CoM movement, i.e., the control output is the desired velocity of the CoM ṙ CoM ref as a = ṙ CoM ref. c(x) is reward function that evaluates each control decision. π(x, a; w) is a control policy in which the parameter w is learned to maximize the accumulated reward function. q is the desired angular velocities. As long as both CoM and ZMP are inside the support polygon during CoM control, the robot can be prevented from falling over [18]. This characteristic makes the
4 1128 T. Matsubara et al. / Advanced Robotics 22 (2008) Figure 1. Learning system to acquire desired whole-body movements on a humanoid robot. x is state variable and a is control output for learning. We focus on learning CoM movement, i.e., control output is the desired velocity of CoM ṙ CoM ref as a = ṙ CoM ref. c(x) is a reward function that evaluates each control decision. π(x, a; w) is control policy in which parameter w is learned to maximize accumulated reward function. q is the desired angular velocities. under-actuated robot system apparently become like a fully actuated system, which simplifies its motor learning task and increases its tractability. Thus, our approach also contains such a ZMP manipulation method, i.e., in our approach, policygradient-type reinforcement learning is applied to learn the CoM controller on such a ZMP manipulating method. The acquired controller is expected to implicitly consider the dynamics of the robot, e.g., friction and inertia, and information about the task, which are not explicitly considered in the model-based CoM controller. A CoM Jacobian-based redundancy resolution technique is utilized to compute angular velocities for all joints to achieve a whole-body movement consistent with a desired CoM movement [7]. We use a manually tuned weighting matrix to compute the weighted pseudo-inverse computation to achieve desirable joints configuration and avoid joint limits. Thus, our learning system is composed of two components that are introduced in the following two sections: (i) CoM control based on ZMP and distribution of CoM movement into joint space, and (ii) RL for CoM movement. 3. CoM Controller Based on a ZMP Equation This section describes a method for controlling the CoM based on ZMP manipulation [5, 7]. ZMP compensation control is a method to compensate the current ZMP as an objective point [5]. The PID controller is used to calculate the objective ZMP based on an analogy between the inverted pendulum and the CoM ZMP dynamics in a mass-concentrated humanoid robot model [7]. By integrating the two components, which are presented in Sections 3.1 and 3.2, CoM can be controlled by manipulating ZMP. The CoM Jacobian-based redundancy resolution technique, described in Section 3.3, is utilized to calculate the movements in the all joint-space consistent with the desired CoM movements.
5 T. Matsubara et al. / Advanced Robotics 22 (2008) ZMP Compensation Control According to Nagasaka [5], assuming a mass-concentrated model, the relationship between the moment acting on the ZMP and the objective ZMP is given as: n ZMP = n OZMP + (r OZMP r ZMP ) f CoM (1) n OZMP = (r CoM r OZMP ) f CoM, (2) where n ZMP R 3 and n OZMP R 3 are ZMP and objective ZMP moments, respectively. r ZMP R 3 and r OZMP R 3 are the position vector of ZMP and objective ZMP from the origin, respectively, and f CoM R 3 is the force acting on the CoM. From the definition of the ZMP, which is a point such that the horizontal components of the moment acting at the point are zero, we can derive a control law to compensate the ZMP to the objective ZMP by kinematically manipulating CoM as follows: rx,i+1 CoM = K( rx ZMP rx OZMP ) + r CoM x,i ry OZMP ) + r CoM + ( r CoM x,i rx,i 1 CoM ) + K r CoM x,i (3) ry,i 1 CoM ) + K r CoM y,i, (4) ry,i+1 CoM = K( ry ZMP y,i + ( ry,i CoM where K = fz,i CoM t 2 /(rz,i CoM rz,i OZMP ) and t is a discrete time step. r is the deviation of position during t. The desired velocity of CoM can be straightforwardly approximated as ṙ CoM x ṙ CoM y rx,i+1 CoM / t (5) ry,i+1 CoM / t. (6) Under such control, the robot can be regarded as an inverted pendulum with its supporting point at the objective ZMP Calculating the Reference ZMP Based on Inverted Pendulum Model As mentioned above, since the horizontal components of the moment on the ZMP are zero, the mass-concentrated model of the humanoid robot can be regarded as an inverted pendulum. Based on this analogy, we apply a simple PID controller to control the CoM by manipulating the ZMP as described in Ref. [7]. The dynamics of the mass-concentrated model approximately linearized around an equilibrium point are given as: r CoM x r CoM y = ω 2( r CoM x = ω 2( r CoM y rx ZMP ) (7) ry ZMP ), (8) r where ω = CoM z +g. The above dynamics equations represent the horizontal rz CoM rz ZMP movement of CoM. Due to the symmetry of the x and y components, we can focus
6 1130 T. Matsubara et al. / Advanced Robotics 22 (2008) on the x component in the following derivation without loss of generality. By differentiating (7) and ignoring the change in ω by assuming that r CoM z = 0andrz CoM are constant, the following equation can be derived:... r CoM x = ω 2( ṙ CoM x ṙ ZMP ) x. (9) To control CoM rx CoM controller as ṙ ZMP (ṙcom x (t) = K P x (ṙcom + K I with reference r CoM x ref as a target, we can apply the following x ref ṙcom ) x ref ṙcom x ) dt + KD ( r CoM x ref rcom x ṙ CoM x ref = K ( C r CoM x ref rx CoM ). (11) K P, K I, K D and K C are gains. By the final-value theorem, it may be proven that rx CoM converges to rx CoM ref with appropriate settings for the gains. By integrating the two components presented in this section, the CoM can be controlled by manipulating ZMP. In the next section, we describe a CoM Jacobianbased redundancy resolution technique to achieve whole-body movement consistent with the desired CoM movement Distributing the CoM Movement Into Joint Space In this section, we present a CoM Jacobian-based redundancy resolution technique to achieve whole-body movement consistent with desired CoM movement [7]. We also present the CoM controller used in our framework that is based on the CoM Jacobian Distributing CoM Movements Through the CoM Jacobian Sugihara et al. [7] utilized a calculation method for the CoM Jacobian with legged systems, that was originally proposed by Boulic et al. [19]. The CoM Jacobian relates CoM velocity with the angular velocities of all joints as: ṙ CoM = J C (q) q, (12) where J C (q) R 3 n is the CoM Jacobian and n is the the number of d.o.f. in the robot. By using the CoM Jacobian and the weighted pseudo-inverse calculation, we can distribute the CoM velocity to the angular velocities of all the joints based on sum-squared minimization applied to all the joint angular velocities as follows: q = J + C ṙcom + (I J + C J C)k, (13) where: J + C = W 1 J T C( JC W 1 J T 1. C) (14) W = diag{w i } (i = 1,...,n),andk R n is an arbitrary vector. I R n n is an identity matrix. The above redundancy resolution technique with a weighting matrix determines whole-body motion consistent with the desired CoM movements. ) (10)
7 T. Matsubara et al. / Advanced Robotics 22 (2008) Figure 2. Definition of the variables CoM Jacobian-Based Redundancy Resolution in the Double-Support Case We used the following mapping to control the CoM by all joints as: q = J + ṙ + (I J + J)k, (15) where ṙ R 6 =[ṙ C ṙ rl, ṙ C ṙ ll ] T and J(q) R 6 n =[J C (q) J rl (q), J C (q) J ll (q)] T. k R 6 is an arbitrary vector. r C is a position vector of CoM from the base-link defined on the waist, and r ll and r rl are the position vectors of the left and right feet from the base-link, respectively. ṙ and J(q) are the corresponding velocity vector and the Jacobian of each r defined above, respectively. The variables are defined in Fig. 2. The desired ṙ to control the CoM based on the desired trajectory is given by (5), (6) and (10). 4. RL for CoM Movement In this section, we present a RL method for the proposed learning framework. For learning CoM movement, we use a policy-gradient method, which is a kind of RL method that maximizes the average reward with respect to parameters controlling action rules known as policy [11, 20, 21]. Compared with most standard value function-based RL methods, such a method has particular features suited to robotic applications. First, the policy-gradient method is applicable to POMDPs [22]. Considering all possible states of the robot is almost impossible, because even if it has a complete set of sensors there will be a certain degree of noise. It is also possible to consider a partial set of states as input for a RL system. Second, the policy-gradient method is a stochastic gradient-descent scheme. The policy can, therefore, be improved with every update. In this section, we briefly describe a framework for RL with the policy-gradient method RL With a Policy-Gradient Method Assuming a Markov decision process, average reward, discounted cumulative re-
8 1132 T. Matsubara et al. / Advanced Robotics 22 (2008) ward and value functions are defined as: [ η(θ) = lim E 1 T T [ η β (θ) = lim E 1 T T ] T c(x t ) t=0 ] T βc(x t ) t=0 [ Vβ π (x) = E ] βc t+k+1 x t = x k=0 (16) (17) (18) [ Q π β (x, a) = E ] βc t+k+1 x t = x,a t = a, (19) k=0 where x S is state and c(x): S R is immediate reward. η(θ) is the average reward and η β (θ) is the discounted cumulative reward. Vβ π(x) and Qπ β (x, a) are the state-value and action-value function, respectively [9]. x is the state, a is the action and θ is the parameter of the stochastic policy. β is a discounting factor. The goal of RL is to maximize the average reward. If we calculate the gradient of η(θ) with respect to policy parameters θ, we can search for a locally optimal policy in the policy parameter space by updating the parameters as θ θ + α η(θ). η(θ) is the gradient of η(θ) with respect to θ. Various derivations and algorithms have been proposed to estimate the gradient based on sampling through interaction with the environment. According to Kimura and Kobayashi [23], the gradient is given by: η = (1 β) η β (20) [ = (1 β) d(x) π(a,x) log d(x) + 1 ] log π(a,x) Q π β (x, a) da dx (21) 1 β = d(x) 1 = lim T,β 1 T 1 = lim T,β 1 T π(a,x)[(1 β) log d(x) t=0 + log π(a,x)] { Q π β (x, a) V π β (x)} da dx (22) T T log π(a t,x t ) β s t δ(x s,a s ) T δ(x t,a t ) t=0 s=t t β t s log π(a s,x s ). (23) s=0 Here, π(x,a; θ) = P(a x; θ) is a stochastic policy that maps state x to action a stochastically. π(x,a; θ) means the gradient of π(x,a; θ) with respect to θ.
9 T. Matsubara et al. / Advanced Robotics 22 (2008) d(x) is the stationary distribution of x. δ(x,a) is TD error defined as δ(x t,a t ) = c(x t ) + β p(x t+1 x t,a t )Vβ π(x t+1) dx t Vβ π(x t). Equation (20) is presented in Ref. [24] as Theorem 1 and (21) is derived in Ref. [25]. The derivation of (22) is based on π(x,a)vβ π(x) da = 0. If we neglect V β π (x), the algorithm is identical to the GPOMDP algorithm developed in Ref. [21]. As pointed out in Ref. [21], discounting factor β controls a bias variance trade-off in the policy-gradient estimated by sampling. In fact, we update the policy parameters based on the following rule: θ t+1 = θ t + αd t δ(x t,a t ),whered is updated by D t = βd t 1 + log π(x t,a t ).However, to derive TD error δ(x t,a t ), we need the state-value function Vβ π (x). In this paper, we simultaneously approximate it using the function approximator ˆV β π (x; w) with parameter w and a simple TD learning method presented as w = w + αδ t ˆV π β (x;w) w. TD error δ(x t,a t ) is then approximately calculated by δ(x t,a t ) = c(x t ) + β ˆV π β (x t+1) ˆV π β (x t). Note that β should satisfy 0 β<1topreventthe state-value function from diverging. 5. Application to Learning of a Dynamic Task: Ball-Punching In previous sections, we presented our learning approach that focussed on CoM movement to achieve whole-body movement. We applied the proposed approach for learning whole-body movement on a humanoid robot and selected a ball-punching task. The goal was to strengthen ball-punching through a learning process that focused on CoM movement. In this section, the details of the learning settings are described. We then present the numerical simulation and experimental results in a real environment in the next two sections Learning CoM Movement for Whole-Body Dynamic Punching In this paper, we focus on controlling the x-axis component of the CoM, i.e., the policy output is the target velocity of the x-axis component of CoM ṙ CoM x ref. Thus, action in the control policy for learning is defined as a =ṙ CoM x ref. To simplify the task, we constrained the desired CoM to one-dimensional movement. The policy output is distributed to the x-andy-axis components of CoM as ṙ CoM x = sin(ψ) ṙ CoM x ref and ṙ CoM y = cos(ψ) ṙ CoM x ref,whereψ is the angle from the y-axis to the x-axis clockwise and ψ = π/3 as depicted in Fig. 3. This setting can be considered to sufficiently use the area of the support polygon because a diagonal line is larger than x- andy-axis lines. State-space was simply defined as x = (rx CoM,t). Note that the state of the dynamics of the humanoid robot to which the learning is applied is not such a small dimensional variable, even though the inverted pendulum-based controller simplified it, as explained in Section 3. However, the position of CoM remains one of most dominant variables and time t is also important to coordinate the timing of the pre-designed punching motion. Thus, with the above notion and the applicability
10 1134 T. Matsubara et al. / Advanced Robotics 22 (2008) Figure 3. One-dimensional CoM movement controlled by a policy is shown by grey line; the solid line is each foot and the dashed line means the support polygon. of the policy-gradient method to such partially observable cases [21], we simply designed the state-space for the above learning Gaussian Policy and Function Approximator for the State-Value Function We implemented the following Gaussian policy as a stochastic policy for controlling the CoM: π(x,a; θ) = 1 2πσ exp ( (a μ(x; θ)) 2 2σ 2 ), (24) where μ(x; θ) = θ T φ(x). x is the state and a is the action. In this study, a radial basis function network is used as a model of the feedback controller. Since it is almost impossible to manually design all the network parameters, using the policygradient method is useful to optimize them. We located Gaussian basis functions φ(x) on a grid with even intervals in each dimension of the observation space as in Refs [10, 15]. The function approximator for the state-value function is also modeled as ˆV β π(x) = wt φ(x). We allocated 100 (=10 10) basis functions φ(x) in state-space ( 1.0 <rx CoM < 0.0, 0.5 <t<4.0) to represent the mean of the policy μ(x) Reward Function The purpose of the ball-punching task is to strengthen the punching as much as possible. We designed the reward function based on this objective as: c = (t t b ) v T b v b, (25) because the velocity of the ball v b punched is proportional to its momentum. The term associated with time t is incorporated in the reward function to avoid local minima motions, which involve the robot falling forward and ignore the timing of the punch. t b is bias to distribute the reward to positive and negative, which is set as
11 T. Matsubara et al. / Advanced Robotics 22 (2008) in this study. Negative reward 5 is given when both feet leave the ground to avoid acquiring a punching motion with jumping Punching Motion Projected on Null Space of the CoM Controller A punching motion was straightforwardly implemented by tracking the target trajectory in the task space. In this study, we achieved tracking control in the null space of the CoM controller by introducing the following vector as an arbitrary vector in (15): k = J + ra (ṙ ra J ra J + ṙ), (26) where J ra R 3 n is the Jacobian relating to the right hand velocity in task space ṙ ra with q as ṙ ra = J ra q, and J + ra = J ra(i J + J). Introducing this vector yields target tracking with the right hand in the null space of the CoM controller [26]. 6. Numerical Simulations 6.1. Settings and Results We applied the proposed approach to the acquisition of a strong punching movement on Fujitsu s Hoap-2 humanoid robot (see Fig. 4) in numerical simulation. The ball was modeled as a simple point mass (0.1 kg), and the contact between the robot and the ball was simulated by a spring damper model. A spring damper model was also used to model the floor. The integration time-step for the robot was 0.2 ms, and the time interval for learning was 50 ms. For the CoM and right-arm controllers, a weighting matrix suitable for this task must be set to appropriately achieve whole-body motion in (14). To avoid using the d.o.f. in the right arm (which are used for the punching motion) for the CoM controller, the weights in the right arm were set smaller (0.01) than the other joints (1.0) in the CoM controller described in (13). For the right-arm controller described in (26), to achieve a punching motion mainly using the right arm, we set the weights in the body joint larger (3.0) than other joints (1.0). The target trajectory for the right-arm controller to achieve a punching motion was designed as Figure 4. Fujitsu humanoid robot Hoap-2 (21 d.o.f.): 6 d.o.f. for the legs, 4 d.o.f. for arms and 1 d.o.f. for the waist. Total weight is about 7 kg, and height is about 0.4 m.
12 1136 T. Matsubara et al. / Advanced Robotics 22 (2008) Figure 5. Acquired reward at each episode. The learning curve was averaged over five experiments and smoothed by taking a 50 moving average. Figure 6. Acquired control policy for the axis component of the CoM. r rax ref = p sin(2πf(t t a )) + q, where(t t a ), and we set the parameters so that amplitude p = 0.03 m, bias q = 0.21 m, frequency f = 1.5 Hz and bias t a = 3.5 by considering Hoap-2 s physical model. While 0 <t<3.5, r rax ref is constant p. Figure 5 shows the reward at each episode based on the policy-gradient method. The curve means that the locally optimal punching motion with maximal reward was acquired around 2000 episodes. Figure 6 is an acquired policy for controlling the x-axis component of CoM and Fig. 7 presents a whole-body punching motion acquired by the control-policy. While keeping the CoM at the initial point, the punching motion produced a ball momentum of about kgm/s. The acquired punching motion without any probabilistic factors produced an average ball momentum of about kgm/s
13 T. Matsubara et al. / Advanced Robotics 22 (2008) Figure 7. Acquired whole-body punching movement. Snapshots correspond to 0.0, 0.85, 1.40, 2.16 and 2.33 s, respectively. The grey bar on the foot denotes ground reaction force. Figure 8. CoM trajectories generated with the learned control policy from various initial CoM positions. (standard deviation was 0.005), which means the ball momentum generated by the learned policy was about 2.3-times larger than the initial performance. Note that the acquired control policy is not a simple trajectory. Figure 8 presents the x-axis CoM trajectories with an acquired control policy from various initial conditions. To achieve a strong punching motion, the x-axis CoM position must be about 0.02 m to guarantee that the right arm can kinematically reach the ball. When the robot hits the ball, the CoM also requires high velocity for strong punching. The acquired policy for various initial conditions tends to move the CoM backward from the ball at the beginning. Then, it accelerates and propels the CoM forward, achieving high velocity when its position is about 0.02 m and is coordinated with the pre-designed right-arm movement for strong punching. Thus, the acquired control policy is a complex feedback controller to achieve strong punching Robustness of Learning Against Modeling Error As presented in previous sections, our suggested approach requires such robot information as mass, length and the position of the CoM in each link to calculate
14 1138 T. Matsubara et al. / Advanced Robotics 22 (2008) the position of the CoM and its Jacobian. Even through having perfectly accurate parameters would be desirable, our approach can be robust to estimate errors of such parameters, because the control policy of the CoM is acquired through iterative interaction with the environment. To investigate its robustness, we applied the learning in simulations with the following settings: (i) mass of the right arm s tip is over-estimated as double the true parameter and (ii) position of body mass is 0.01 m biased in the x-axis direction. In both cases, the appropriate control policy for the CoM was acquired as in the normal settings. The resulting rewards with acquired policies for (i) and (ii) through 2000 trials were 1.57 and 1.89, which were averaged for five experiments and smoothed by taking a 50 moving average. These results suggest its robustness to modeling errors. 7. Experiments on a Real Hardware System In this section, we implemented the proposed controller on Hoap-2 a real humanoid robot. We implemented the CoM trajectories generated in simulations with the acquired control policy for CoM. To show the effectiveness of the learned punching motions, we set a toy car in front of the robot as a punching target. The distance the toy car is punched can measure the effectiveness of the initial and learned punches. Figure 9 provides sequential snapshots of the car being hit. The upper and lower sequences are the initial and learned movements, respectively. The results suggest that the punching motion, i.e., the acquired cooperative whole-body movement, is effective even in a real environment. Figure 9. Sequential snapshots for punching motion with (a) initial (car speed was 0.42 m/s) and (b) learned (car speed was 0.71 m/s) control policies. Each picture corresponds to 0.0, 0.67, 1.67 and 2.0 s from timing of impact. From the car s movement after being punched, the learned punching significantly affected the impact on the car.
15 8. Conclusions T. Matsubara et al. / Advanced Robotics 22 (2008) This paper presented an approach for acquiring dynamic whole-body movements on humanoid robots that focused on learning a control policy for the CoM to produce dynamic movements in achieving tasks. We applied the framework to the learning of a dynamic ball-punching motion on a Hoap-2 model in numerical simulations. As a result, we demonstrated that acquiring dynamic punching motions is possible through learning using our approach. We achieved the task with significantly fewer trials while accounting for the original complexity of the task and robot. The acquired cooperative whole-body punching movement was also demonstrated on a real hardware platform. As future work, we wish to explore on-line learning within a real environment because the proposed framework is also suitable for such situations. References 1. K. Hirai, M. Hirose, Y. Haikawa and T. Takenaka, The development of handa humanoid robot, in: Proc. IEEE Int. Conf. on Robotics and Automation, Leuven, pp (1998). 2. Y. Kuroki, T. Ishida, J. Yamaguchi, M. Ujita and T. Doi, A small biped entertainment robot, in: Proc. IEEE RAS Int. Conf. on Humanoid Robots, Tokyo, pp (2001). 3. J. Morimoto, G. Endo, J. Nakanishi, S. Hyon, G. Cheng, D. Bentivegna and C. Atkeson, Modulation of simple sinusoidal patterns by a coupled oscillator model for biped walking, in: Proc. IEEE Int. Conf. on Robotics and Automation, Orlando, FL, pp (2006). 4. S. Hyon and G. Cheng, Passivity-based whole-body motion control for humanoids: gravity compensation, balancing and walking, in: Proc. IEEE Int. Conf. on Intelligent Robots and Systems, Beijing, pp (2006). 5. K. Nagasaka, The whole-body motion generation of humanoid robot using dynamics filter (in japanese), PhD Thesis, University of Tokyo (2000). 6. S. Kagami, F. Kanehiro, Y. Tamiya, M. Inaba and H. Inoue, Autobalancer: an online dynamic balance compensation scheme for humanoid robots, in: Algorithmic and Computational Robotics: New Directions, B. R. Donald, K. Lynch and D. Rus (Eds), pp A. K. Peters, Wellesley, MA (2001). 7. T. Sugihara and Y. Nakamura, Whole-body cooperative balancing of humanoid robot using COG Jacobian, in: Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Lausanne, pp (2002). 8. S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi and H. Hirukawa, Resolved momentum control: humanoid motion planning based on the linear and anguler momentum, in: Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Las Vegas, NV, pp (2003). 9. R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998). 10. K. Doya, Reinforcement learning in continuous time and space, Neural Comput. 12, (2000). 11. H. Kimura, K. Miyazaki and S. Kobayashi, Reinforcement learning in POMDPs with function approximation, in: Proc. 14th Int. Conf. on Machine Learning, Nashville, TN, pp (1997).
16 1140 T. Matsubara et al. / Advanced Robotics 22 (2008) H. Kimura, T. Yamashita and S. Kobayashi, Reinforcement learning of walking behavior for a four-legged robot, in: Proc. IEEE Conf. on Decision and Control, Orlando, FL, pp (2001). 13. J. Morimoto and K. Doya, Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning, Robotics Autonomous Systems 36, (2001). 14. R. Tedrake, T. W. Zhang and H. S. Seung, Stochastic policy gradient reinforcement learning on a simple 3D biped, in: Proc. IEEE Int. Conf. on Intelligent Robots and Systems, Sendai, pp (2004). 15. T. Matsubara, J. Morimoto, J. Nakanishi, M. Sato and K. Doya, Learning sensory feedback to CPG for biped locomotion with policy gradient, in: Proc. IEEE Int. Conf. on Robotics and Automation, Barcelona, pp (2005). 16. G. Endo, J. Morimoto, T. Matsubara, J. Nakanishi and G. Cheng, Learning CPG sensory feedback with policy gradient for biped locomotion for a full body humanoid, in: Proc. 12th Natl. Conf. on Artificial Intelligence, Pittsburgh, PA, pp (2005). 17. J. Scholz and G. Schoner, The uncontrolled manifold concept: identifying control variables for a functional task, Exp. Brain Res. 126, (1999). 18. M. Vukobratović and B. Borovac, Zero-moment point thirty five years of its life, Int. J. Humanoid Robotics 1, (2004). 19. R. Boulic, R. Mas and D. Thalmann, Inverse kinetics for center of mass position control and posture optimization, in: Proc. Eur. Workshop on Combined Real and Synthetic Image Processing for Broadcast and Video Production, Hamburg (1994). 20. R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learn. 8, (1992). 21. J. Baxter and P. L. Bartlett, Infinite-horizon policy-gradient estimation, J. Artif. Intell. Res. 15, (2001). 22. D. A. Aberdeen, Policy-gradient algorithms for partially observable Markov decision processes, PhD Thesis, Australian National University (2003). 23. H. Kimura and S. Kobayashi, An analysis of actor/critic algorithms using eligibility traces: reinforcement learning with imperfect value function, in: Proc. Int. Conf. on Machine Learning, Madison, WI, pp (1998). 24. J. Baxter and P. L. Bartlett, Direct gradient-based reinforcement learning: I. Gradient estimation algorithms, Technical Report, Australian National University (1999). 25. R. S. Sutton, D. McAllester, S. Singh and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, Adv. Neural Information Proc. Syst. 12, pp (2000). 26. T. Yoshikawa, Foundations of Robotics: Analysis and Control. MIT Press, Cambridge, MA (1990). About the Authors Takamitsu Matsubara received the BE in Electrical and Electronic Systems Engineering from Osaka Prefecture University, Japan, in 2003, the ME in Information Science from Nara Institute of Science and Technology, Nara, in 2005, and the PhD in Information Science from Nara Institute of Science and Technology, Nara, in From 2005 to 2007, he was a Research Fellow (DC1) of the Japan Society for the Promotion of Science. He is currently an Assistant Professor of Nara Institute of Science and Technology and Visiting Researcher at ATR Computational Neuroscience Laboratories, Kyoto. His research interests include
17 T. Matsubara et al. / Advanced Robotics 22 (2008) reinforcement learning, machine learning and robotics. Jun Morimoto is a Senior Researcher at ATR Computational Neuroscience Laboratories and with the Computational Brain Project, ICORP, JST. He received the PhD in Information Science from Nara Institute of Science and Technology, Nara, in He was a Research Assistant with the Kawato Dynamic Brain Project, ERATO, JST, from 1999 to From 2001 to 2002, he was a Postdoctoral Fellow at the Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. He Jointed ATR in 2002 and then joined JST, ICORP in Jun Nakanishi received the BE and ME degrees both in Mechanical Engineering from Nagoya University, Nagoya, in 1995 and 1997, respectively. He received the PhD degree in Engineering from Nagoya University in He also studied in the Department of Electrical Engineering and Computer Science at the University of Michigan, Ann Arbor, MI, from 1995 to He was a Research Associate at the Department of Micro System Engineering, Nagoya University, from 2000 to 2001, and was a Presidential Postdoctoral fellow at the Computer Science Department, University of Southern California, Los Angeles, CA, from 2001 to He joined ATR Human Information Science Laboratories, Kyoto, in He is currently a Researcher at ATR Computational Neuroscience Laboratories and with the Computational Brain Project, ICORP, Japan Science and Technology Agency. His research interests include motor learning and control in robotic systems. He received the IEEE ICRA 2002 Best Paper Award. Sang-Ho Hyon received the MS degree in Mechanical Engineering from Waseda University, in 1998, and the PhD degree in Control Engineering from the Tokyo Institute of Technology, in He has been a Research Associate and Assistant Professor of Tohoku University during He has been developing various legged robots and the controllers, performing various dynamic locomotion experiments such as jumping, running, walking and somersaulting. He is currently a Researcher at ATR Computational Neuroscience Laboratories, Japan. From 2005 to 2007, he was a researcher at JST International Cooperative Research Project, Computational Brain Project. He was a 1999 ICRA Best Paper Award Finalist. His primary research interests are legged locomotion, nonlinear oscillation and nonlinear control. He is a member of the RSJ and the IEEE Robotics and Automation Society. Joshua G. Hale received the BA (Hons 1st) degree in computation from the University of Oxford, in 1997, the MS (Dist.) degree in Computer Science from the University of Edinburgh, in 1998, the MA degree in Computation from the University of Oxford, in 2002, and the PhD degree on Biomimetic Motion Synthesis from the University of Glasgow, in He has worked as a Research Engineer at the Hardware Compilation Group at the University of Oxford, as a Research Assistant at the Computer Vision and Graphics Laboratory at the University of Glasgow, and is currently employed as an Researcher at the Humanoid Robotics and Computational Neuroscience Laborotory at ATR in Japan. His research interests include dynamic simulation, humanoid robotics and robot skill acquisition, computer graphics and three-dimensional modelling, and human motion production and perception.
18 1142 T. Matsubara et al. / Advanced Robotics 22 (2008) Gordon Cheng received the BS and MS degrees in Computer Science from the University of Wollongong, Wollongong, NSW, and the PhD degree in Systems Engineering from the Department of Systems Engineering, Australian National University, Acton, ACT. His current research interests include humanoid robotics, cognitive systems, biomimetics of human vision, computational neuroscience of vision, action understanding, human robot interaction, active vision, mobile robot navigation and object-oriented software construction. He is on the Editorial Board of the International Journal of Humanoid Robotics. He is a Senior Member of the IEEE Robotics and Automation Society and the IEEE Computer Society.
Learning to acquire whole-body humanoid CoM movements to achieve dynamic tasks
27 IEEE International Conference on Robotics and Automation Roma, Italy, -4 April 27 ThC9.5 Learning to acquire whole-body humanoid CoM movements to achieve dynamic tasks Takamitsu Matsubara,JunMorimoto,
More informationOptic Flow Based Skill Learning for A Humanoid to Trap, Approach to, and Pass a Ball
Optic Flow Based Skill Learning for A Humanoid to Trap, Approach to, and Pass a Ball Masaki Ogino 1, Masaaki Kikuchi 1, Jun ichiro Ooga 1, Masahiro Aono 1 and Minoru Asada 1,2 1 Dept. of Adaptive Machine
More informationShuffle Traveling of Humanoid Robots
Shuffle Traveling of Humanoid Robots Masanao Koeda, Masayuki Ueno, and Takayuki Serizawa Abstract Recently, many researchers have been studying methods for the stepless slip motion of humanoid robots.
More informationIntegration of Manipulation and Locomotion by a Humanoid Robot
Integration of Manipulation and Locomotion by a Humanoid Robot Kensuke Harada, Shuuji Kajita, Hajime Saito, Fumio Kanehiro, and Hirohisa Hirukawa Humanoid Research Group, Intelligent Systems Institute
More informationPushing Manipulation by Humanoid considering Two-Kinds of ZMPs
Proceedings of the 2003 IEEE International Conference on Robotics & Automation Taipei, Taiwan, September 14-19, 2003 Pushing Manipulation by Humanoid considering Two-Kinds of ZMPs Kensuke Harada, Shuuji
More informationUKEMI: Falling Motion Control to Minimize Damage to Biped Humanoid Robot
Proceedings of the 2002 IEEE/RSJ Intl. Conference on Intelligent Robots and Systems EPFL, Lausanne, Switzerland October 2002 UKEMI: Falling Motion Control to Minimize Damage to Biped Humanoid Robot Kiyoshi
More informationThe Tele-operation of the Humanoid Robot -Whole Body Operation for Humanoid Robots in Contact with Environment-
The Tele-operation of the Humanoid Robot -Whole Body Operation for Humanoid Robots in Contact with Environment- Hitoshi Hasunuma, Kensuke Harada, and Hirohisa Hirukawa System Technology Development Center,
More informationSensor system of a small biped entertainment robot
Advanced Robotics, Vol. 18, No. 10, pp. 1039 1052 (2004) VSP and Robotics Society of Japan 2004. Also available online - www.vsppub.com Sensor system of a small biped entertainment robot Short paper TATSUZO
More informationDesign and Experiments of Advanced Leg Module (HRP-2L) for Humanoid Robot (HRP-2) Development
Proceedings of the 2002 IEEE/RSJ Intl. Conference on Intelligent Robots and Systems EPFL, Lausanne, Switzerland October 2002 Design and Experiments of Advanced Leg Module (HRP-2L) for Humanoid Robot (HRP-2)
More informationRunning Pattern Generation for a Humanoid Robot
Running Pattern Generation for a Humanoid Robot Shuuji Kajita (IST, Takashi Nagasaki (U. of Tsukuba, Kazuhito Yokoi, Kenji Kaneko and Kazuo Tanie (IST 1-1-1 Umezono, Tsukuba Central 2, IST, Tsukuba Ibaraki
More informationRapid Development System for Humanoid Vision-based Behaviors with Real-Virtual Common Interface
Rapid Development System for Humanoid Vision-based Behaviors with Real-Virtual Common Interface Kei Okada 1, Yasuyuki Kino 1, Fumio Kanehiro 2, Yasuo Kuniyoshi 1, Masayuki Inaba 1, Hirochika Inoue 1 1
More informationA Semi-Minimalistic Approach to Humanoid Design
International Journal of Scientific and Research Publications, Volume 2, Issue 4, April 2012 1 A Semi-Minimalistic Approach to Humanoid Design Hari Krishnan R., Vallikannu A.L. Department of Electronics
More information4R and 5R Parallel Mechanism Mobile Robots
4R and 5R Parallel Mechanism Mobile Robots Tasuku Yamawaki Department of Mechano-Micro Engineering Tokyo Institute of Technology 4259 Nagatsuta, Midoriku Yokohama, Kanagawa, Japan Email: d03yamawaki@pms.titech.ac.jp
More informationEDUCATION ACADEMIC DEGREE
Akihiko YAMAGUCHI Address: Nara Institute of Science and Technology, 8916-5, Takayama-cho, Ikoma-shi, Nara, JAPAN 630-0192 Phone: +81-(0)743-72-5376 E-mail: akihiko-y@is.naist.jp EDUCATION 2002.4.1-2006.3.24:
More informationAdaptive Motion Control with Visual Feedback for a Humanoid Robot
The 21 IEEE/RSJ International Conference on Intelligent Robots and Systems October 18-22, 21, Taipei, Taiwan Adaptive Motion Control with Visual Feedback for a Humanoid Robot Heinrich Mellmann* and Yuan
More informationGraphical Simulation and High-Level Control of Humanoid Robots
In Proc. 2000 IEEE RSJ Int l Conf. on Intelligent Robots and Systems (IROS 2000) Graphical Simulation and High-Level Control of Humanoid Robots James J. Kuffner, Jr. Satoshi Kagami Masayuki Inaba Hirochika
More informationHumanoid Robot HanSaRam: Recent Development and Compensation for the Landing Impact Force by Time Domain Passivity Approach
Humanoid Robot HanSaRam: Recent Development and Compensation for the Landing Impact Force by Time Domain Passivity Approach Yong-Duk Kim, Bum-Joo Lee, Seung-Hwan Choi, In-Won Park, and Jong-Hwan Kim Robot
More informationDevelopment of a Humanoid Biped Walking Robot Platform KHR-1 - Initial Design and Its Performance Evaluation
Development of a Humanoid Biped Walking Robot Platform KHR-1 - Initial Design and Its Performance Evaluation Jung-Hoon Kim, Seo-Wook Park, Ill-Woo Park, and Jun-Ho Oh Machine Control Laboratory, Department
More informationConverting Motion between Different Types of Humanoid Robots Using Genetic Algorithms
Converting Motion between Different Types of Humanoid Robots Using Genetic Algorithms Mari Nishiyama and Hitoshi Iba Abstract The imitation between different types of robots remains an unsolved task for
More informationTeam Description 2006 for Team RO-PE A
Team Description 2006 for Team RO-PE A Chew Chee-Meng, Samuel Mui, Lim Tongli, Ma Chongyou, and Estella Ngan National University of Singapore, 119260 Singapore {mpeccm, g0500307, u0204894, u0406389, u0406316}@nus.edu.sg
More informationROBOTICS ENG YOUSEF A. SHATNAWI INTRODUCTION
ROBOTICS INTRODUCTION THIS COURSE IS TWO PARTS Mobile Robotics. Locomotion (analogous to manipulation) (Legged and wheeled robots). Navigation and obstacle avoidance algorithms. Robot Vision Sensors and
More informationMotion Generation for Pulling a Fire Hose by a Humanoid Robot
Motion Generation for Pulling a Fire Hose by a Humanoid Robot Ixchel G. Ramirez-Alpizar 1, Maximilien Naveau 2, Christophe Benazeth 2, Olivier Stasse 2, Jean-Paul Laumond 2, Kensuke Harada 1, and Eiichi
More informationTeam TH-MOS. Liu Xingjie, Wang Qian, Qian Peng, Shi Xunlei, Cheng Jiakai Department of Engineering physics, Tsinghua University, Beijing, China
Team TH-MOS Liu Xingjie, Wang Qian, Qian Peng, Shi Xunlei, Cheng Jiakai Department of Engineering physics, Tsinghua University, Beijing, China Abstract. This paper describes the design of the robot MOS
More informationReinforcement Learning Approach to Generate Goal-directed Locomotion of a Snake-Like Robot with Screw-Drive Units
Reinforcement Learning Approach to Generate Goal-directed Locomotion of a Snake-Like Robot with Screw-Drive Units Sromona Chatterjee, Timo Nachstedt, Florentin Wörgötter, Minija Tamosiunaite, Poramate
More informationTeam TH-MOS Abstract. Keywords. 1 Introduction 2 Hardware and Electronics
Team TH-MOS Pei Ben, Cheng Jiakai, Shi Xunlei, Zhang wenzhe, Liu xiaoming, Wu mian Department of Mechanical Engineering, Tsinghua University, Beijing, China Abstract. This paper describes the design of
More informationKid-Size Humanoid Soccer Robot Design by TKU Team
Kid-Size Humanoid Soccer Robot Design by TKU Team Ching-Chang Wong, Kai-Hsiang Huang, Yueh-Yang Hu, and Hsiang-Min Chan Department of Electrical Engineering, Tamkang University Tamsui, Taipei, Taiwan E-mail:
More informationMotion Generation for Pulling a Fire Hose by a Humanoid Robot
2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids) Cancun, Mexico, Nov 15-17, 2016 Motion Generation for Pulling a Fire Hose by a Humanoid Robot Ixchel G. Ramirez-Alpizar 1, Maximilien
More informationPerformance Assessment of a 3 DOF Differential Based. Waist joint for the icub Baby Humanoid Robot
Performance Assessment of a 3 DOF Differential Based Waist joint for the icub Baby Humanoid Robot W. M. Hinojosa, N. G. Tsagarakis, Giorgio Metta, Francesco Becchi, Julio Sandini and Darwin. G. Caldwell
More informationRobust Haptic Teleoperation of a Mobile Manipulation Platform
Robust Haptic Teleoperation of a Mobile Manipulation Platform Jaeheung Park and Oussama Khatib Stanford AI Laboratory Stanford University http://robotics.stanford.edu Abstract. This paper presents a new
More informationInteraction rule learning with a human partner based on an imitation faculty with a simple visuo-motor mapping
Robotics and Autonomous Systems 54 (2006) 414 418 www.elsevier.com/locate/robot Interaction rule learning with a human partner based on an imitation faculty with a simple visuo-motor mapping Masaki Ogino
More informationHUMANOID ROBOT SIMULATOR: A REALISTIC DYNAMICS APPROACH. José L. Lima, José C. Gonçalves, Paulo G. Costa, A. Paulo Moreira
HUMANOID ROBOT SIMULATOR: A REALISTIC DYNAMICS APPROACH José L. Lima, José C. Gonçalves, Paulo G. Costa, A. Paulo Moreira Department of Electrical Engineering Faculty of Engineering of University of Porto
More informationA Nonlinear PID Stabilizer With Spherical Projection for Humanoids: From Concept to Real-time Experiments
A Nonlinear PID Stabilizer With Spherical Projection for Humanoids: From Concept to Real-time Experiments David Galdeano 1, Ahmed Chemori 1, Sébastien Krut 1 and Philippe Fraisse 1 Abstract This paper
More informationHumanoids. Lecture Outline. RSS 2010 Lecture # 19 Una-May O Reilly. Definition and motivation. Locomotion. Why humanoids? What are humanoids?
Humanoids RSS 2010 Lecture # 19 Una-May O Reilly Lecture Outline Definition and motivation Why humanoids? What are humanoids? Examples Locomotion RSS 2010 Humanoids Lecture 1 1 Why humanoids? Capek, Paris
More informationHfutEngine3D Soccer Simulation Team Description Paper 2012
HfutEngine3D Soccer Simulation Team Description Paper 2012 Pengfei Zhang, Qingyuan Zhang School of Computer and Information Hefei University of Technology, China Abstract. This paper simply describes the
More informationReinforcement Learning Simulations and Robotics
Reinforcement Learning Simulations and Robotics Models Partially observable noise in sensors Policy search methods rather than value functionbased approaches Isolate key parameters by choosing an appropriate
More informationAn Improved Path Planning Method Based on Artificial Potential Field for a Mobile Robot
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No Sofia 015 Print ISSN: 1311-970; Online ISSN: 1314-4081 DOI: 10.1515/cait-015-0037 An Improved Path Planning Method Based
More informationBehavior generation for a mobile robot based on the adaptive fitness function
Robotics and Autonomous Systems 40 (2002) 69 77 Behavior generation for a mobile robot based on the adaptive fitness function Eiji Uchibe a,, Masakazu Yanase b, Minoru Asada c a Human Information Science
More informationT=r, ankle joint 6-axis force sensor
Proceedings of the 2001 EEE nternational Conference on Robotics & Automation Seoul, Korea. May 21-26, 2001 Balancing a Humanoid Robot Using Backdrive Concerned Torque Control and Direct Angular Momentum
More informationRobots Learning from Robots: A proof of Concept Study for Co-Manipulation Tasks. Luka Peternel and Arash Ajoudani Presented by Halishia Chugani
Robots Learning from Robots: A proof of Concept Study for Co-Manipulation Tasks Luka Peternel and Arash Ajoudani Presented by Halishia Chugani Robots learning from humans 1. Robots learn from humans 2.
More informationOn Observer-based Passive Robust Impedance Control of a Robot Manipulator
Journal of Mechanics Engineering and Automation 7 (2017) 71-78 doi: 10.17265/2159-5275/2017.02.003 D DAVID PUBLISHING On Observer-based Passive Robust Impedance Control of a Robot Manipulator CAO Sheng,
More informationA Posture Control for Two Wheeled Mobile Robots
Transactions on Control, Automation and Systems Engineering Vol., No. 3, September, A Posture Control for Two Wheeled Mobile Robots Hyun-Sik Shim and Yoon-Gyeoung Sung Abstract In this paper, a posture
More informationBirth of An Intelligent Humanoid Robot in Singapore
Birth of An Intelligent Humanoid Robot in Singapore Ming Xie Nanyang Technological University Singapore 639798 Email: mmxie@ntu.edu.sg Abstract. Since 1996, we have embarked into the journey of developing
More informationPr Yl. Rl Pl. 200mm mm. 400mm. 70mm. 120mm
Humanoid Robot Mechanisms for Responsive Mobility M.OKADA 1, T.SHINOHARA 1, T.GOTOH 1, S.BAN 1 and Y.NAKAMURA 12 1 Dept. of Mechano-Informatics, Univ. of Tokyo., 7-3-1 Hongo Bunkyo-ku Tokyo, 113-8656 Japan
More informationTasks prioritization for whole-body realtime imitation of human motion by humanoid robots
Tasks prioritization for whole-body realtime imitation of human motion by humanoid robots Sophie SAKKA 1, Louise PENNA POUBEL 2, and Denis ĆEHAJIĆ3 1 IRCCyN and University of Poitiers, France 2 ECN and
More informationLearning and Using Models of Kicking Motions for Legged Robots
Learning and Using Models of Kicking Motions for Legged Robots Sonia Chernova and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 {soniac, mmv}@cs.cmu.edu Abstract
More informationsin( x m cos( The position of the mass point D is specified by a set of state variables, (θ roll, θ pitch, r) related to the Cartesian coordinates by:
Research Article International Journal of Current Engineering and Technology ISSN 77-46 3 INPRESSCO. All Rights Reserved. Available at http://inpressco.com/category/ijcet Modeling improvement of a Humanoid
More informationAntennas and Propagation. Chapter 6b: Path Models Rayleigh, Rician Fading, MIMO
Antennas and Propagation b: Path Models Rayleigh, Rician Fading, MIMO Introduction From last lecture How do we model H p? Discrete path model (physical, plane waves) Random matrix models (forget H p and
More informationMechanical Design of Humanoid Robot Platform KHR-3 (KAIST Humanoid Robot - 3: HUBO) *
Proceedings of 2005 5th IEEE-RAS International Conference on Humanoid Robots Mechanical Design of Humanoid Robot Platform KHR-3 (KAIST Humanoid Robot - 3: HUBO) * Ill-Woo Park, Jung-Yup Kim, Jungho Lee
More informationA Comparison of Particle Swarm Optimization and Gradient Descent in Training Wavelet Neural Network to Predict DGPS Corrections
Proceedings of the World Congress on Engineering and Computer Science 00 Vol I WCECS 00, October 0-, 00, San Francisco, USA A Comparison of Particle Swarm Optimization and Gradient Descent in Training
More informationDistributed Vision System: A Perceptual Information Infrastructure for Robot Navigation
Distributed Vision System: A Perceptual Information Infrastructure for Robot Navigation Hiroshi Ishiguro Department of Information Science, Kyoto University Sakyo-ku, Kyoto 606-01, Japan E-mail: ishiguro@kuis.kyoto-u.ac.jp
More informationInterconnection Structure Optimization for Neural Oscillator Based Biped Robot Locomotion
2015 IEEE Symposium Series on Computational Intelligence Interconnection Structure Optimization for Neural Oscillator Based Biped Robot Locomotion Azhar Aulia Saputra 1, Indra Adji Sulistijono 2, Janos
More informationDesign and Implementation of a Simplified Humanoid Robot with 8 DOF
Design and Implementation of a Simplified Humanoid Robot with 8 DOF Hari Krishnan R & Vallikannu A. L Department of Electronics and Communication Engineering, Hindustan Institute of Technology and Science,
More informationCooperative Transportation by Humanoid Robots Learning to Correct Positioning
Cooperative Transportation by Humanoid Robots Learning to Correct Positioning Yutaka Inoue, Takahiro Tohge, Hitoshi Iba Department of Frontier Informatics, Graduate School of Frontier Sciences, The University
More informationA Passive System Approach to Increase the Energy Efficiency in Walk Movements Based in a Realistic Simulation Environment
A Passive System Approach to Increase the Energy Efficiency in Walk Movements Based in a Realistic Simulation Environment José L. Lima, José A. Gonçalves, Paulo G. Costa and A. Paulo Moreira Abstract This
More informationAutonomous Stair Climbing Algorithm for a Small Four-Tracked Robot
Autonomous Stair Climbing Algorithm for a Small Four-Tracked Robot Quy-Hung Vu, Byeong-Sang Kim, Jae-Bok Song Korea University 1 Anam-dong, Seongbuk-gu, Seoul, Korea vuquyhungbk@yahoo.com, lovidia@korea.ac.kr,
More informationTeam Description for Humanoid KidSize League of RoboCup Stephen McGill, Seung Joon Yi, Yida Zhang, Aditya Sreekumar, and Professor Dan Lee
Team DARwIn Team Description for Humanoid KidSize League of RoboCup 2013 Stephen McGill, Seung Joon Yi, Yida Zhang, Aditya Sreekumar, and Professor Dan Lee GRASP Lab School of Engineering and Applied Science,
More informationShuguang Huang, Ph.D Research Assistant Professor Department of Mechanical Engineering Marquette University Milwaukee, WI
Shuguang Huang, Ph.D Research Assistant Professor Department of Mechanical Engineering Marquette University Milwaukee, WI 53201 huangs@marquette.edu RESEARCH INTEREST: Dynamic systems. Analysis and physical
More informationStabilize humanoid robot teleoperated by a RGB-D sensor
Stabilize humanoid robot teleoperated by a RGB-D sensor Andrea Bisson, Andrea Busatto, Stefano Michieletto, and Emanuele Menegatti Intelligent Autonomous Systems Lab (IAS-Lab) Department of Information
More informationSwarm Intelligence W7: Application of Machine- Learning Techniques to Automatic Control Design and Optimization
Swarm Intelligence W7: Application of Machine- Learning Techniques to Automatic Control Design and Optimization Learning to avoid obstacles Outline Problem encoding using GA and ANN Floreano and Mondada
More informationActive Stabilization of a Humanoid Robot for Impact Motions with Unknown Reaction Forces
2012 IEEE/RSJ International Conference on Intelligent Robots and Systems October 7-12, 2012. Vilamoura, Algarve, Portugal Active Stabilization of a Humanoid Robot for Impact Motions with Unknown Reaction
More informationReinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara
Reinforcement Learning for CPS Safety Engineering Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Motivations Safety-critical duties desired by CPS? Autonomous vehicle control:
More informationCB: a humanoid research platform for exploring neuroscience
Advanced Robotics, Vol. 21, No. 10, pp. 1097 1114 (2007) VSP and Robotics Society of Japan 2007. Also available online - www.brill.nl/ar Full paper CB: a humanoid research platform for exploring neuroscience
More informationA Compact Model for the Compliant Humanoid Robot COMAN
The Fourth IEEE RAS/EMBS International Conference on Biomedical Robotics and Biomechatronics Roma, Italy. June 24-27, 212 A Compact for the Compliant Humanoid Robot COMAN Luca Colasanto, Nikos G. Tsagarakis,
More informationLearning and Using Models of Kicking Motions for Legged Robots
Learning and Using Models of Kicking Motions for Legged Robots Sonia Chernova and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 {soniac, mmv}@cs.cmu.edu Abstract
More informationOptimal Control System Design
Chapter 6 Optimal Control System Design 6.1 INTRODUCTION The active AFO consists of sensor unit, control system and an actuator. While designing the control system for an AFO, a trade-off between the transient
More informationCONTROL IMPROVEMENT OF UNDER-DAMPED SYSTEMS AND STRUCTURES BY INPUT SHAPING
CONTROL IMPROVEMENT OF UNDER-DAMPED SYSTEMS AND STRUCTURES BY INPUT SHAPING Igor Arolovich a, Grigory Agranovich b Ariel University of Samaria a igor.arolovich@outlook.com, b agr@ariel.ac.il Abstract -
More informationIntroduction to Robotics
Jianwei Zhang zhang@informatik.uni-hamburg.de Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Technische Aspekte Multimodaler Systeme 14. June 2013 J. Zhang 1 Robot Control
More informationIN MOST human robot coordination systems that have
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 54, NO. 2, APRIL 2007 699 Dance Step Estimation Method Based on HMM for Dance Partner Robot Takahiro Takeda, Student Member, IEEE, Yasuhisa Hirata, Member,
More informationDATA ACQUISITION FOR STOCHASTIC LOCALIZATION OF WIRELESS MOBILE CLIENT IN MULTISTORY BUILDING
DATA ACQUISITION FOR STOCHASTIC LOCALIZATION OF WIRELESS MOBILE CLIENT IN MULTISTORY BUILDING Tomohiro Umetani 1 *, Tomoya Yamashita, and Yuichi Tamura 1 1 Department of Intelligence and Informatics, Konan
More informationBiomimetic Design of Actuators, Sensors and Robots
Biomimetic Design of Actuators, Sensors and Robots Takashi Maeno, COE Member of autonomous-cooperative robotics group Department of Mechanical Engineering Keio University Abstract Biological life has greatly
More informationDevelopment of a Simulator of Environment and Measurement for Autonomous Mobile Robots Considering Camera Characteristics
Development of a Simulator of Environment and Measurement for Autonomous Mobile Robots Considering Camera Characteristics Kazunori Asanuma 1, Kazunori Umeda 1, Ryuichi Ueda 2, and Tamio Arai 2 1 Chuo University,
More informationThe Humanoid Robot ARMAR: Design and Control
The Humanoid Robot ARMAR: Design and Control Tamim Asfour, Karsten Berns, and Rüdiger Dillmann Forschungszentrum Informatik Karlsruhe, Haid-und-Neu-Str. 10-14 D-76131 Karlsruhe, Germany asfour,dillmann
More informationAdaptive Inverse Control with IMC Structure Implementation on Robotic Arm Manipulator
Adaptive Inverse Control with IMC Structure Implementation on Robotic Arm Manipulator Khalid M. Al-Zahrani echnical Support Unit erminal Department, Saudi Aramco P.O. Box 94 (Najmah), Ras anura, Saudi
More informationMekanisme Robot - 3 SKS (Robot Mechanism)
Mekanisme Robot - 3 SKS (Robot Mechanism) Latifah Nurahmi, PhD!! latifah.nurahmi@gmail.com!! C.250 First Term - 2016/2017 Velocity Rate of change of position and orientation with respect to time Linear
More informationNavigation of Transport Mobile Robot in Bionic Assembly System
Navigation of Transport Mobile obot in Bionic ssembly System leksandar Lazinica Intelligent Manufacturing Systems IFT Karlsplatz 13/311, -1040 Vienna Tel : +43-1-58801-311141 Fax :+43-1-58801-31199 e-mail
More informationAdaptive Dynamic Simulation Framework for Humanoid Robots
Adaptive Dynamic Simulation Framework for Humanoid Robots Manokhatiphaisan S. and Maneewarn T. Abstract This research proposes the dynamic simulation system framework with a robot-in-the-loop concept.
More informationHardware Experiments of Humanoid Robot Safe Fall Using Aldebaran NAO
Hardware Experiments of Humanoid Robot Safe Fall Using Aldebaran NAO Seung-Kook Yun and Ambarish Goswami Abstract Although the fall of a humanoid robot is rare in controlled environments, it cannot be
More informationActive Stabilization of a Humanoid Robot for Real-Time Imitation of a Human Operator
2012 12th IEEE-RAS International Conference on Humanoid Robots Nov.29-Dec.1, 2012. Business Innovation Center Osaka, Japan Active Stabilization of a Humanoid Robot for Real-Time Imitation of a Human Operator
More informationBiologically Inspired Embodied Evolution of Survival
Biologically Inspired Embodied Evolution of Survival Stefan Elfwing 1,2 Eiji Uchibe 2 Kenji Doya 2 Henrik I. Christensen 1 1 Centre for Autonomous Systems, Numerical Analysis and Computer Science, Royal
More informationActive sway control of a gantry crane using hybrid input shaping and PID control schemes
Home Search Collections Journals About Contact us My IOPscience Active sway control of a gantry crane using hybrid input shaping and PID control schemes This content has been downloaded from IOPscience.
More informationChapter 2 Introduction to Haptics 2.1 Definition of Haptics
Chapter 2 Introduction to Haptics 2.1 Definition of Haptics The word haptic originates from the Greek verb hapto to touch and therefore refers to the ability to touch and manipulate objects. The haptic
More informationREINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING
REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING RIKA ANTONOVA ANTONOVA@KTH.SE ALI GHADIRZADEH ALGH@KTH.SE RL: What We Know So Far Formulate the problem as an MDP (or POMDP) State space captures
More informationTeaching Mechanical Students to Build and Analyze Motor Controllers
Teaching Mechanical Students to Build and Analyze Motor Controllers Hugh Jack, Associate Professor Padnos School of Engineering Grand Valley State University Grand Rapids, MI email: jackh@gvsu.edu Session
More informationJane Li. Assistant Professor Mechanical Engineering Department, Robotic Engineering Program Worcester Polytechnic Institute
Jane Li Assistant Professor Mechanical Engineering Department, Robotic Engineering Program Worcester Polytechnic Institute (6 pts )A 2-DOF manipulator arm is attached to a mobile base with non-holonomic
More informationHMM-based Error Recovery of Dance Step Selection for Dance Partner Robot
27 IEEE International Conference on Robotics and Automation Roma, Italy, 1-14 April 27 ThA4.3 HMM-based Error Recovery of Dance Step Selection for Dance Partner Robot Takahiro Takeda, Yasuhisa Hirata,
More informationUsing Policy Gradient Reinforcement Learning on Autonomous Robot Controllers
Using Policy Gradient Reinforcement on Autonomous Robot Controllers Gregory Z. Grudic Department of Computer Science University of Colorado Boulder, CO 80309-0430 USA Lyle Ungar Computer and Information
More informationIdentification of a Piecewise Controller of Lateral Human Standing Based on Returning Recursive-Least-Square Method
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) November -,. Tokyo, Japan Identification of a Piecewise Controller of Lateral Human Standing Based on Returning Recursive-Least-Square
More informationTraffic Control for a Swarm of Robots: Avoiding Group Conflicts
Traffic Control for a Swarm of Robots: Avoiding Group Conflicts Leandro Soriano Marcolino and Luiz Chaimowicz Abstract A very common problem in the navigation of robotic swarms is when groups of robots
More informationMulti-robot Formation Control Based on Leader-follower Method
Journal of Computers Vol. 29 No. 2, 2018, pp. 233-240 doi:10.3966/199115992018042902022 Multi-robot Formation Control Based on Leader-follower Method Xibao Wu 1*, Wenbai Chen 1, Fangfang Ji 1, Jixing Ye
More informationAGENT PLATFORM FOR ROBOT CONTROL IN REAL-TIME DYNAMIC ENVIRONMENTS. Nuno Sousa Eugénio Oliveira
AGENT PLATFORM FOR ROBOT CONTROL IN REAL-TIME DYNAMIC ENVIRONMENTS Nuno Sousa Eugénio Oliveira Faculdade de Egenharia da Universidade do Porto, Portugal Abstract: This paper describes a platform that enables
More informationSpeed Control of a Pneumatic Monopod using a Neural Network
Tech. Rep. IRIS-2-43 Institute for Robotics and Intelligent Systems, USC, 22 Speed Control of a Pneumatic Monopod using a Neural Network Kale Harbick and Gaurav S. Sukhatme! Robotic Embedded Systems Laboratory
More informationRobot Joint Angle Control Based on Self Resonance Cancellation Using Double Encoders
Robot Joint Angle Control Based on Self Resonance Cancellation Using Double Encoders Akiyuki Hasegawa, Hiroshi Fujimoto and Taro Takahashi 2 Abstract Research on the control using a load-side encoder for
More informationNao Devils Dortmund. Team Description for RoboCup Matthias Hofmann, Ingmar Schwarz, and Oliver Urbann
Nao Devils Dortmund Team Description for RoboCup 2014 Matthias Hofmann, Ingmar Schwarz, and Oliver Urbann Robotics Research Institute Section Information Technology TU Dortmund University 44221 Dortmund,
More informationECE 517: Reinforcement Learning in Artificial Intelligence
ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: Case Studies and Gradient Policy October 29, 2015 Dr. Itamar Arel College of Engineering Department of Electrical Engineering and
More informationSimple Path Planning Algorithm for Two-Wheeled Differentially Driven (2WDD) Soccer Robots
Simple Path Planning Algorithm for Two-Wheeled Differentially Driven (2WDD) Soccer Robots Gregor Novak 1 and Martin Seyr 2 1 Vienna University of Technology, Vienna, Austria novak@bluetechnix.at 2 Institute
More informationKalman Filtering, Factor Graphs and Electrical Networks
Kalman Filtering, Factor Graphs and Electrical Networks Pascal O. Vontobel, Daniel Lippuner, and Hans-Andrea Loeliger ISI-ITET, ETH urich, CH-8092 urich, Switzerland. Abstract Factor graphs are graphical
More informationSteering a humanoid robot by its head
University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part B Faculty of Engineering and Information Sciences 2009 Steering a humanoid robot by its head Manish
More informationPerception. Read: AIMA Chapter 24 & Chapter HW#8 due today. Vision
11-25-2013 Perception Vision Read: AIMA Chapter 24 & Chapter 25.3 HW#8 due today visual aural haptic & tactile vestibular (balance: equilibrium, acceleration, and orientation wrt gravity) olfactory taste
More informationDynamic analysis and control of a Hybrid serial/cable driven robot for lower-limb rehabilitation
Dynamic analysis and control of a Hybrid serial/cable driven robot for lower-limb rehabilitation M. Ismail 1, S. Lahouar 2 and L. Romdhane 1,3 1 Mechanical Laboratory of Sousse (LMS), National Engineering
More informationModel-based Fall Detection and Fall Prevention for Humanoid Robots
Model-based Fall Detection and Fall Prevention for Humanoid Robots Thomas Muender 1, Thomas Röfer 1,2 1 Universität Bremen, Fachbereich 3 Mathematik und Informatik, Postfach 330 440, 28334 Bremen, Germany
More information