Robots Learning from Robots: A proof of Concept Study for Co-Manipulation Tasks Luka Peternel and Arash Ajoudani Presented by Halishia Chugani
Robots learning from humans 1. Robots learn from humans 2. Co-manipulation tasks 3. The experiment 4. Novel approach to co-manipulation task
ROBOTS LEARNING FROM HUMANS
Goal: to achieve a seamless robotic physical interaction behavior Robots learning from humans Accepted approach: learning from demonstration (LfD): 1. Kinaesthetic Guidance 2. Teleoperation 3. Observation Drawbacks: KG: difficult to modulate all required parameters, inability to teach at distance T: expensive hardware, train human O: manually determine controller for axes
A Method for Derivation of Robot Task-Frame Control Authority from Repeated Sensory Observations Idea: What if we let the robot derive the task-frame control strategy for each axis autonomously from observing human demonstration? Start by observing and recording human motion data through optical markers and muscle stiffness activity through EMG electrodes.
A Method for Derivation of Robot Task-Frame Control Authority from Repeated Sensory Observations Use this data in three steps: 1. Data segmentation The robot should detect the periodical behavior and treat it as a repetitive demonstration/ observation of the same action. 2. Task-frame derivation 3. Impedance behaviour derivation Examine muscle activity data as a trend for robot stiffness by comparing the amplitude of controlled variables and exhibited muscle activity. If this comparison 1, the stiffness matrix is dependent on the control at variable v: else, use min stiffness K const
These methods are good for autonomous execution of interactive tasks, but many tasks require involvement of more than one agent for their successful execution...
HUMAN-ROBOT CO-MANIPULATION
Human-robot co-manipulation Where the task is shared between human and robot partner(s) Goal: robot can adapt to human behaviour and solve manipulation task together. Accepted approach: use feedback from sensory systems to detect human behavior and intention. Gradual adaptation or reinforcement learning can also be employed to correct the learnt skill. Drawbacks: human involvement, time allocation
ROBOT-ROBOT CO-MANIPULATION
Goal: robots work together to solve a task with limited human help Robot-robot co-manipulation Where the task* is shared between robots *Currently limited to physically coupled tasks. First thoughts: basic control framework can be transferred to the novice bot through observation. Novice bot can use RL to obtain refined skill on its own via trial/error Drawbacks: limited degree of skill can be obtained through observation. Reward Function can lead to bias in co-manipulated task
A novel 3 stage process was tested in which a novice robot successfully learned a sawing task through collaborating with an expert robot.
EXPECTATION The idea behind a Cartesian force/impedance controller means the observations and tuning of parameters are kept in Cartesian space, thus allowing skill transfer between robots with different morphology and sensory capabilities.
However this experiment was conducted on 2 KUKA lightweight robots with Pisa/IIT Softhands. REALITY
Overview of Experimental Approach 1. Human demonstrates to novice robot 2. Human and novice robot collaborate novice robot becomes expert 3. Expert robot now collaborates with and teaches new novice robot 4. The 2 expert robots can then propagate skills to all other robots 1 2 3 4
Part 1: Human teaches novice robot 1. ROBOT OBSERVES: Robot creates task frame control strategy: - Force controller: axis that maintained environment contact - Impedance controller: axis of position of tool 2. ROBOT COLLABORATES: Robot optimizes under given conditions: - Robot uses sensory system to learn reference trajectories - Robot records human stiffness for leader/follower role allocation.
A Method for Derivation of Robot Task-Frame Control Authority from Repeated Sensory Observations Once the task-frame control strategy, trajectories, and stiffness were derived, they are used by the hybrid Cartesian force/impedance controller to perform the task in a collaborative manner with another agent. Our goal is to figure out the interaction force in terms of the robot joint torque within a cartesian space. Interaction force acting on the environment (aka force moving the tool) Force required to keep tool in contact with environment (controlled with a PI controller, should = 0) Impedance force to counteract how much the object doesn t want to move. Uses stiffness from muscle behaviour to calculate damping. The interaction force is then plugged into an equation with a robot arm Jacobian transformation matrix, mass matrix, and vector joints to calculate the joint torque level.
With these corrections, the robot is now an expert as it is able to participate in collaborative sawing with the human. Its understanding of stiffness, force, and reference trajectories allows it to abide by the leader/follower role allocation where when one agent is stiff to pull, the other agent is compliant to follow and vice versa.
Part 2: Expert robot teaches novice robot Three stage Learning Process: 1. Novice learns reference motions 2. Novice learns impedance behaviour at all task phases 3. Novice becomes expert All steps require physical interaction!
Part 2: Expert robot teaches novice robot Three stage Learning Process: 1. Novice learns reference motions 2. Novice learns impedance behaviour at all task phases 3. Novice becomes expert
Step 1: Novice learns reference motions - The expert robot guides the novice robot by producing the reference motion of the tool in all phases of the task (moves the saw along the sawing axis). In other words, the expert robot is fully stiff - At the same time, the novice robot is exerting force to establish contact between the tool and the environment and maintain force in the cutting direction, but otherwise is fully compliant. Given the rigid coupling of the tool and task, the desired motion produced by the expert is learned by the novice and encoded into its own reference trajectories.
Step 1: Novice learns reference motions How is it encoded? These trajectories are stored as Dynamical Movement Primitives (DMPs), which allow for point to point and rhythmic movements. The desired pattern of motion was approximated as: and was used to build a nonlinear shape function (f ) at each task phase( ): where weights w determine shape of trajectory and Gaussian kernels ψi (φ) Locally weighted Regression was used to learn the encoded trajectories because it enables fast updates of the existing models. The weight of each kernel was recursively updated with a forgetting rate (ƛ) of 0.995.
Step 1: Novice learns reference motions
Part 2: Expert robot teaches novice robot Three stage Learning Process: 1. Novice learns reference motions 2. Novice learns impedance behaviour at all task phases 3. Novice becomes expert
Step 2: Novice robot learns leader/follower behaviour - The expert robots now produces the desired stiffness behaviour throughout all phases of the task. (half compliant, half stiff) - Novice robot observes actual motion of the tool with respect to the previously recorded reference motion trajectories. - If the desired motion trajectory is followed at a given phase, the novice robot assumes compliant as the expert robot is stiff. When the expert robot s motion strays from the desired motion trajectory, the novice robot assumes that it is its moment to take over the execution and therefore increases impedance/stiffness.
Step 2: Novice robot learns leader/follower behaviour
Part 2: Expert robot teaches novice robot Three stage Learning Process: 1. Novice learns reference motions 2. Novice learns impedance behaviour at all task phases 3. Novice becomes expert
Step 3: Novice robot becomes expert robot!
Step 3: Novice robot becomes expert robot! The novice robot is now an expert and can use the learnt reference motion and stiffness trajectories to collaboratively perform the task with the expert robot on equal terms.
In this experiment X-axis = sawing motion, stiffness 100N/m to 0N/m Y-axis = aligned along beam Z-axis = cutting, contact force 10 seconds between learning stages 0.5 Hz execution frequency 0.02m error threshold 0.66 = W E /W N
Experiment 2: The same task with a 0.9Hz execution frequency
0.9Hz experiment shows decreased fluctuation in cutting motion = improvement in cutting and skill gain. This graph demonstrates the differences in task performance at different stages of the learning process.
Limitations: - Task must have both robots physically coupled - Incapable of teaching subtasks in joint space such as obstacle avoidance - Have not actually tested with different paired robots - Current metric analysis is based on task specific parameters
Future: - Perform tests on other coupling tasks with various robots as partners - Combine impedance learning and motion learning into one step - Find a general quality metric for task performance
References L. Peternel and A. Ajoudani, "Robots learning from robots: A proof of concept study for co-manipulation tasks," 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), Birmingham, 2017, pp. 484-490. doi: 10.1109/HUMANOIDS.2017.8246916 L. Peternel, L. Rozo, D. Caldwell and A. Ajoudani, "A Method for Derivation of Robot Task-Frame Control Authority from Repeated Sensory Observations," in IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 719-726, April 2017. doi: 10.1109/LRA.2017.2651368