High-Level Programming for Industrial Robotics: using Gestures, Speech and Force Control

High-Level Programming for Industrial Robotics: using Gestures, Speech and Force Control Pedro Neto, J. Norberto Pires, Member, IEEE Abstract Today, most industrial robots are programmed using the typical teaching process. This paper presents a robotic system where the user can instruct and program a robot just showing what it should do, and with a high-level of abstraction from the robot language. This is done using the two most natural human interfaces (gestures and speech), a force control system and several code generation techniques. The performance of this system is compared with a similar system that instead of gestures uses a manual guidance system based on a force control strategy. Two different demonstrations with two different robots (MOTOMAN and ABB) are presented, showing that the developed systems can be customised for different users and robots. P I. INTRODUCTION ROGRAMMING an industrial robot by the typical teaching method is a tedious and time-consuming task that requires some technical expertise. In opposition to the highly intelligent robots described in science fiction, current industrial robots are non-intelligent machines that work in a controlled and well known environment. Generally, robots are designed, equipped and programmed to perform specific tasks, and thus, an unskilled worker will not be able to reprogram the robot to perform a different task. The goal is to create a methodology that helps users to control and program a robot with a high-level of abstraction from the robot language i.e., making a robotic demonstration in terms of high-level behaviors (using gestures, speech, etc.) the user should be able to demonstrate to the robot what it should do. This type of learning is often known as programming by demonstration (PbD). Several approaches for PbD have been investigated, using different input devices, manipulators and learning strategies [1] [5]. In fact, the demand for new and more natural humanmachine interfaces (HMIs) has been increasing in recent years, and the field of robotics has followed this trend [6]. The speech recognition is seen as one of the most promising interfaces between humans and machines, because it is probably the most natural and intuitive way of communication between humans. For this reason, and given Manuscript received September 15, 2008. This work was supported in part by the European Commission s Sixth Framework Program under grant no. 011838 as part of the Integrated Project SMErobot TM, and the Portuguese Foundation for Science and Technology (FCT) (SFRH/BD/39218/2007). Pedro Neto is a PhD student in the Industrial Robotics Laboratory - Mechanical Engineering Department of the University of Coimbra, POLO- II, 3030-788, Coimbra, Portugal; (e-mail: pedro.neto@robotics.dem.uc.pt). J. Norberto Pires is with the Industrial Robotics Laboratory - Mechanical Engineering Department of the University of Coimbra, POLO-II, 3030-788, Coimbra, Portugal; (e-mail: jnp@robotics.dem.uc.pt). the high demand for more natural and intuitive HMIs, the automatic speech recognition (ASR) systems had a great development in the last years. Today, these systems present a good performance and robustness, allowing, for example, the control of industrial robots in an industrial environment (in the presence of surrounding noise) [7]. Gestures are another natural form of communication between humans. In the robotics field, several works has been done in order to identify and recognize motions and gestures performed by humans. The above can be done using vision-based interfaces that detect human gestures [8], motion capture sensors [3], using the combination of both (a vision system and a data glove) [1], or using finger gesture recognition systems based on active tracking mechanisms [9]. The use of artificial neural networks (ANNs) for the recognition of gestures [10] has also been used extensively in the past with very good results. In this work a Wii Remote controller was used to capture human hand behaviors, manual postures (static hand positions) and gestures (dynamic hand positions). Basically, the Wii Remote is an inexpensive device that uses accelerometers to detect motion and user commands for game control. The motion data extracted from the three-axis accelerometer embedded in the Wii Remote are used as input to a statistical model, and then feed to an ANN algorithm for posture and gesture recognition. This information will serve as input to the robot control system used in this paper, which also incorporates a speech recognition software allowing the user to manage the cell, acting on the robot and on the code generation system. To avoid excessive contact forces between the robot toll and workpiece and at the same time detect and avoid collisions during the robot operation, a force control system is used. In summary, the robotic system used in this paper allows the user to control and generate robot code, using perhaps the two most natural human interfaces: gestures and speech. Those features are also used for user feedback, i.e., the user receives warning sounds and spoken alerts (generated using a TTS interface) and tactile vibrations (the Wii Remote vibrates when the force control system detects excessive forces). In order to analyze the viability of presented system, two test cases are presented and discussed. The first test case is a common pick-and-place operation and in the second test case the robot is used to write some letters on paper. Finally, the performance of this system is compared with the performance of a similar system that instead of gestures uses a manual guidance system based on a force control

strategy. II. EXPERIMENTAL SETUP A. System description The experimental setup consists of an industrial robot MOTOMAN HP6 equipped with the NX100 controller, a Wii Remote controller to capture human hand behaviors, a headset microphone to capture the user voice, a force/torque (F/T) sensor and a computer running the application that manages the cell (Fig. 1). Fig. 1. The robotic cell is basically composed of an industrial robot, a F/T sensor, two input devices (Wii Remote and headset), and a computer running the application that manages the cell. The above mentioned application receives data from the Wii Remote, interprets the received data and acts in the robot, using for this purpose the MotomanLib, a Data Link Library created in our laboratory to control and manage the robot remotely via Ethernet (Fig. 2). The application has incorporated a speech recognition software that recognizes the voice commands received from the headset, and depending on the commands received acts in the robot or in the code generation system that is embedded in the application. To communicate with the F/T sensor is used an Active X named JR3PCI [11], which allows the application to be continuously receiving feedback from the F/T sensor and if any component of force or torque exceeds a set value, is sent a command that makes the Wii Remote vibrate (tactile feedback). This is a way of providing feedback about the cell state to the user, beyond the sound feedback (alert sounds and a TTS system that reports the cell state and occurrences). Finally, the application also includes a section to train the statistical model and the ANN. Fig. 2. A schematic representation of the cell, in terms of communication technology. The input devices work without wires (via Bluetooth), giving a greater freedom to the user. B. The Wii Remote The demand for new interaction devices to improve the game experience has led to the development of new devices that allow the user to feel more immersed in the game. In contrast to the traditional game pads or joysticks, the Wii Remote from Nintendo allows users to control the game using gestures as well as button presses. It uses a combination of motion sensing and infrared (IR) detection to sense its poses (rotations and translations) in 3D space. The Wii Remote has a 3-axis accelerometer, an IR camera with an object tracking system and 11 buttons used as input features. In order to provide feedback to the user, the Wii Remote contains 4 LEDs, a rumble that can be activated to cause the controller vibration and a speaker. The Wii Remote communicates with the Wii console or with a computer via Bluetooth wireless link, reporting back data at 100 packages per second. The reported data can contains information about the controller state (acceleration, buttons, IR camera, etc.). Several studies have been done using the Wii Remote as interaction device, particularly in the construction of interactive whiteboards, finger tracking systems, and control of robots. Due to its characteristics and low price, the Wii Remote was selected to be integrated in the presented robotic system as input device to capture human hand behaviors (postures and gestures). In order to extract relevant information from the Wii Remote, the motion sensor and the IR capabilities of the controller was explored, but after some experiments, it was concluded that the IR capabilities of the Wii Remote were not usable. The Wii Remote s IR sensor offers the possibility to locate IR light sources in the controller s field of view, but the viewing angle of the Wii Remote is too limited. Other problems arises with the placement of the IR source in the cell, calibration of the IR sensor, the limited distance from the Wii Remote to the IR source that the user should maintain during the demonstration process, and detection problems when other infrared sources are around. Thus, the information provided by the motion sensor will be used to achieve the goals. This motion sensor is a 3-axis accelerometer ADXL330 from Analog Devices, physically rated to measure accelerations over a range of at least +/- 3g with 10% of sensitivity. C. Speech recognition The ASR systems have been used with relative success in the control of machines [7]. In this system, during the robotic demonstration the user can use voice commands to act remotely in the robot or in the code generation system, for example, if the user wants to stop the robot motors he should say ROBOT MOTORS OFF. Otherwise, if he wants generate robot code, for example, a command to move linearly the robot to the current pose he should say COMPUTER MOVE LINE. It is important to note that each voice command must be identified with a confidence

higher than 70%, otherwise it is rejected. D. Force control The robotic manipulators are often in direct contact with its surrounding environment. For purely positioning tasks such as robotic painting where the forces of interaction with the environment are negligible, no information about force feedback is required. However, in applications such as polishing, grinding or even in the manipulation of objects, the knowledge of the contact forces has a great influence on the quality and robustness of the process. In this paper, the robotic force control is done using a F/T sensor that measures both force and torque along 3 perpendicular axes, allowing the user to have a better perception of the surrounding environment. The application is continuously receiving feedback from the F/T sensor and if any component of force or torque exceeds a set value, is sent a command that makes the Wii Remote vibrate (tactile feedback). If the set value of that component is increased by 10% or more, the robot stops (see section III C. Security systems). E. Code generation In the construction of an algorithm to generate code, the keyword is generalize and never particularize, in other words, the algorithm must be prepared to cover a wide range of variations in the process. In the developed system, the code generation algorithm receives the instructions from the identified spoken commands, allowing that during the demonstration the user uses the speech to build the robot code step-by-step (write any type of variables, robot commands, etc.). Finally, after finalizing the robotic demonstration task, the user can generate the entire robot program, upload it to the robot controller and run it. III. CONTROL STRATEGY A. Robot control The robot is controlled remotely via Ethernet using the MOTOMAN IMOV function that moves linearly the robot according to a specified pose increment =. The first three components represent the robot translation along the X, Y, and Z axes, respectively; while the last three components represent the robot rotation about the X, Y, and Z axes, respectively. These components have the enough information to control the robot, and which it is necessary to identify by analyzing the behavior of the user hand that holds the Wii Remote. In this case it is completely unnecessary to extract precise displacements or rotations, because it is only necessary to know what the pose increment components which must be activated. In a first approach, the robot control strategy was to identify translation movements and rotations of the user hand, and, depending on these inputs, were sent continuously to the robot small pose increments. However, was quickly concluded that this approach was not viable, because the robot was constantly stop and go, presenting a high-level of vibration. The achieved solution was to send to the robot only one pose increment that will move the robot to the limit of their field of operation. The robot movement is activated by pressing the Wii Remote B button and making a hand gesture or posture, according to the desired robot movement. After this, immediately the robot starts the movement and when the user releases the B button the robot stops. If the B button is never released, the robot continues the movement up to the limit of its field of operation. B. Field of operation of the robot - increment calculation As above mentioned, according to the user hand behavior the robot is moved to the limit of their field operation, or more specifically, for a pose close to the limit of their field of operation. The field of operation of a 6-DOF robot manipulator is approximately a volume region bounded by two spherical surfaces. By this way, it can be considered that the field of operation of the used robot is bounded by two spherical surfaces (1), both with the center coincident with the zero reference point of the robot, and where and are respectively the radius of the external and internal spherical surface. + + + + (1) Before starting any robot movement, the current robot position,, is acquired. In order to calculate the pose increment, firstly it is necessary to achieve the increment components which must be activated. This is done by recurring to the Wii Remote acceleration values,, that will define the robot movement direction =,, 1 (see section IV). This acceleration vector, in conjugation with the current robot position point,,, will be used to achieve a straight line (2) that will intersect the external spherical surface in two points (Fig. 3). In a first approach, it is considered that only the external spherical surface limits the robot field of operation.,, =,, +,, 1, R (2) From (1) and (2): + + + + + = (3) Extracting from (3), and considering only the positive value of (vector direction), the distance from the current robot position to the external spherical surface point is:,, =,, =,, 1, R (4)

Thus, in terms of robot translation movements, the pose increment is = 0 0 0. Note that, for example, if it is found that the robot should be moved along the X axis in the negative direction, the vector becomes 1,0,0, and then = 0 0 0 0 0. An analog approach was employed to obtain when the robot field of operation is limited by the internal spherical surface. In this case, if has no value (impossible to calculate), it means that the straight line does not intercept the internal spherical surface and is the external spherical surface that limits the robot field of operation. In terms of rotation increments, since we know the robot rotation limit values and the current robot pose, it is easy to obtain the increments. time. In both modes, the user has the chance to make null the translation increment given to any of the three axes. In terms of rotations, in both cases, the rotation around each of the three axes is done separately, an axis at a time. B. Robot translations The accelerations extracted from the 3-axis accelerometer,, will be used to detect the user hand gestures (Fig. 4). Fig. 3. The two spherical surfaces that define the robot field of operation. The current robot point and the acceleration vector components that will define the robot movement direction are represented in figure. C. Security systems When a human interacts directly with a robot in a coworker scenario, the security systems present in the robotic cell should have a high-level of robustness, in order to avoid accidents. The developed robotic cell contains a system that is continually receiving data from the Wii Remote (via Bluetooth) and if the communication fails, the robot immediately stops. The same thing happens if the communication with the robot fails. The force control system also works as a security system, because if any component of force or torque exceeds a set value the user is alerted (the Wii Remote vibrate), and if the value of that component is increased by 10% or more, the robot immediately stops. The above is done with an independent system that actuates directly in a low-level of the control hierarchy (stopping the robot), independently from the software that is running in the robot. Fig. 4. The developed system can recognize six different gestures (X+, X-, Y+, Y-, Z+, and Z-). Depending on the mode of operation, the user can move the robot along the X, Y, and Z axes separately or at the same time. Note that in both movements the Wii Remote is held horizontally. Moving the Wii Remote over each one of the three axes (in both directions), we can extract,, for each of the six different gestures (Fig. 5). When the Wii Remote is moved in the positive X direction (X+), initially the value of acceleration increases because the hand begins to move and then, when the hand begins to slow the positive value of is converted to a negative value. This inversion point is highlighted in the figure with a white dot and marks the beginning of the shaded area. The acceleration remains near to zero and remains near to one, because the Wii Remote is held horizontally (see C. Robot rotations). A similar reasoning can be done to the other gestures (X-, Y+, Y-, Z+, and Z-). IV. POSTURE AND GESTURE RECOGNITION A. Modes of operation In fact, the developed system has two distinct modes of operation that the user can select during the robot demonstration phase. In the first mode, the robot moves along the X, Y, and Z axes separately, while in the other mode the robot can moves along the three axes at the same Fig. 5. The measured accelerations along the three axes, for each of the six different gestures. To interpret the acceleration values and recognize the hand movements, a statistical approach was used. For each of the six gestures is calculated the arithmetic mean of the

accelerations measured in the learning phase,,, (only in the non-shaded area). After this, the standard deviation is used to measure how widely spreads the acceleration values are from each mean,,. By this way, is established a range of acceleration values which define each of gestures. The above is done in the learning phase. During the robotic demonstration phase, a gesture is recognized when, +,, + and, +, where,, are a mean of the acceleration values measured during the robotic demonstration phase. The experimental tests showed that this method have an average of 93% of correctly recognized movements (see D. Learning phase and ANNs). To define the robot increment, these gestures are then transformed in the vector, for example, if is detected the movement (Y+), = 0,1,0. In the second mode of operation, the robot is moved linearly along the direction that the user hand demonstrates, in other words, the vector =,, 1 will directly define the robot movement direction. The third component of is 1 because the Wii Remote is held horizontally, reporting an acceleration in the Z axis (see C. Robot rotations). C. Robot rotations The robot control system needs to have as input six different robot rotations (Rx+, Rx-, Ry+, Ry-, Rz+, and Rz- ). If the Wii Remote is in free fall, it will report zero acceleration. But if the Wii Remote is held horizontally, it will report acceleration in the Z axis, the acceleration due to gravity that near to the surface of the earth is approximately 9.8 /. Thus, even when the user is not accelerating the Wii Remote, a static measurement can determine the rotation of the Wii Remote (posture recognition). Analyzing figure 6-A, when the Wii Remote is held horizontally, it will report an acceleration along the Z axis in the positive direction;, 0, and 0. But when the Wii Remote is rotated around the Y axis (Fig. 6- B),, 0, and 0. On contrary, when the Wii Remote is rotated around the Y axis in the reverse direction (Fig. 6-C),, 0, and 0. In order to detect the rotations around the X axis, a similar approach was done (Fig. 6-D, 6-E). However, in terms of rotation around the Z axis (Fig. 6-F, 6-G), nothing can be concluded because in both cases the gravity is going along the Z axis. To solve this problem, an ANN was used to detect rotation movements around the Z axis (see D. Learning phase and ANNs). D. Learning phase and ANNs Before starting the robotic demonstration, the user should train the system (learning phase). In terms of robot translations, this is done demonstrating several times each movement (X+, X-, Y+, Y-, Z+, and Z-). The experimental Fig. 6. A No rotation. B Rotation around the Y axis in the negative direction (Ry-). C (Ry+). D (Rx-). E (Rx+). F (Rz+). G (Rz-). tests showed that 20 demonstrations are enough to have an average of 93% of correctly recognized movements. In terms of robot rotations (Rx+, Rx-, Ry+, and Ry-), the same procedure was applied. The ANNs has been applied in a wide range of applications such as in the recognition of gestures [10]. In order to detect rotation movements around the Z axis (Rz+ or Rz-), an ANN trained with a back-propagation algorithm was implemented. The input signals (acceleration data) are represented by a vector =,,, and the output from a neuron j is given by (5), where is the output of neuron i, is the weight of the link from neuron i to neuron j, is

the bias of neuron j, and F is the activation function. =. + (5) In the learning phase, the user should provide inputs and the outputs to the algorithm. The accuracy of the output depends a lot on the samples we provided during the training phase and the number of times we trained the network, but the tests showed that in 2 minutes we can perform 80 demonstrations, obtaining 87% of correctly recognized gestures. V. TEST CASES To assess the performance of our system, two different experimental tests were performed. The first is a common robotic pick-and-place operation and in the second test case the robot is used to write some letters on paper (Fig. 7). In both test cases the results obtained were very promising, showing that an unskilled user can generate a robot program for a specific task in an intuitive way (only using gestures and speech) [12]. Fig. 7. A Robotic pick-and-place operation. B Robot writing letters. VI. COMPARISON WITH A MANUAL GUIDANCE SYSTEM The performance of the developed system was compared with a similar system that instead of gestures uses a manual guidance system based on a force control strategy to move the robot (Fig. 8) [13]. Both systems are intuitive and easy to use, however, to do the same robotic demonstration the manual guidance system takes less time ( 30%), and presents better robustness than the other system that sometimes does not recognizes the hand postures and gestures. VII. CONCLUSION Due to the growing demand for natural HMIs, a robotic system that allows users to program an industrial robot using gestures and speech was proposed. The ASR has shown great robustness (even in the presence of surrounding noise), and the recognition of gestures presents promising results, which should be improved in future. ACKNOWLEDGMENT The authors want also to acknowledge the help of the Portuguese Office of the Microsoft Language Development Centre, especially Professor Miguel Salles Dias, for their support with the Microsoft ASR and TTS engines and related APIs. REFERENCES [1] R. Dillmann, Teaching and learning of robot tasks via observation of human performance, in Robotics and Autonomous Systems, vol. 47, no.2-3, pp. 109-116, 2004. [2] M. Ehrenmann, R. D. Zöllner, O. Rogalla and R. Dillmann, Programming service tasks in household environments by human demonstration, 11 th IEEE International Workshop on Robot and Human Interactive Communication (ROMAN 2002), pp. 25 27, Berlin, Germany, 2002. [3] J. Aleotti, A. Skoglund and T. Duckett, Position teaching of a robot arm by demonstration with a wearable input device International Conference on Intelligent Manipulation and Grasping (IMG04), Genoa, Italy, July 1-2, 2004. [4] J. N. Pires, G. Veiga, and R. Araújo, Programming-by-demonstration in the coworker scenario for SMEs, in Industrial Robot, Emerald, 2008, submitted for publication. [5] J. N. Pires, Industrial Robots Programming, Building Applications for the Factories of the Future, Springer, New York, USA, 2006. [6] R. Cravotta, Recognizing gestures. Blurring the line between humans and machines, EDN Europe, 2007. Available: http://www.edneurope.com/recognizinggestures+article+1716+europe.html [7] J. N. Pires, Robot-by-voice: Experiments on commanding an industrial robot using the human voice, in Industrial Robot, An International Journal, vol. 32, no.6, pp. 505-511, Emerald, 2005. [8] I. Mihara, Y. Yamauchi, and M. Doi, A real-time vision-based interface using motion processor and applications to robotics, in Systems and Computers in Japan, vol. 34, pp. 10-19, 2003. [9] S. Perrin, A. Cassinelli, and M. Ishikawa, Gesture recognition using laser-based tracking system, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 541-546, 2004. [10] K. Murakami, and H. Taguchi, Gesture Recognition using Recurrent Neural Networks In Proceedings of ACM CHI 91Conference on Human Factors in Computing Systems, pp. 237-242, New Orleans, USA, 1991. [11] J. N. Pires, J. Ramming, S. Rauch, and R. Araújo, Force/torque sensing applied to industrial robotic deburring, Sensor Review Journal, An International Journal, vol. 22 no.3, pp. 232-241, 2002. [12] Available: http://robotics.dem.uc.pt/pedro.neto/gs1.html [13] Available: http://robotics.dem.uc.pt/pedro.neto/pbd.html Fig. 8. Due to their force control system, the ABB robot can be guided manually by the user.