Controlling Humanoid Robot Using Head Movements

Volume-5, Issue-2, April-2015 International Journal of Engineering and Management Research Page Number: 648-652 Controlling Humanoid Robot Using Head Movements S. Mounica 1, A. Naga bhavani 2, Namani.Niharika 3, K. Prabhakar Rao 4 1,2,3 IV, B.Tech, Department of ECE, Padmasri Dr.B.V.Raju Institute of Technology, Narsapur, Medak, Telangana, INDIA 4 Professor, Department of ECE, Padmasri Dr.B.V.Raju Institute of Technology, Narsapur, Medak, Telangana, INDIA ABSTRACT This paper presents the framework on vision based interface that has been designed to instruct a humanoid robot through head movements using image processing. Face Detection and Face tracking techniques were used to obtain movements. Then we analyze the movements given by the user in front of a web camera and take an appropriate action (like stepping forward, backward etc). The application is developed using OpenCV (Open Computer Vision) libraries and Microsoft Visual C++. The movements obtained by processing the live images are used to command a humanoid robot with simple capabilities. A commercial humanoid toy robot Robosapien was used as the output module of the system. The robot was interfaced to computer by USB-UIRT (Universal Infrared Receiver and Transmitter) module. Keywords Interface, image processing, Robosapien, USB-UIRT, Microsoft Visual C++, OpenCV. I. INTRODUCTION In the present world, the interaction with the computing devices has advanced to such an extent that as humans it has become necessity and we cannot live without it. The technology has become so embedded into our daily lives that we use it to work, shop, communicate and even entertain our self. It has been widely believed that the computing, communication and display technologies progress further, but the existing techniques may become a bottleneck in the effective utilization of the available information flow. To efficiently use them, most computer applications require more and more interaction. For that reason, human-computer interaction (HCI) has been a lively field of research in the last few years. Robots are used successfully in many areas today, particularly in industrial production, military operations, deep sea drilling, and space exploration. This success drives the interest in the feasibility of using robots in human social environments, particularly in the care of the aged and the handicapped.in social environments, humans communicate easily and naturally by both speech (audio) and gesture (vision) without the use of any external devices (like keyboards) requiring special training. Robots have to adapt to human modes of communication to promote a more natural interaction with humans. Given a choice between speech and gesture (movements), some researchers have opined that gesture recognition would be more reliable than speech recognition because the latter would need a greater number of training datasets to deal with the greater variability in human voice and speech. Gesture recognition and controlling through gestures is widely popular in to-days world in mobile phones making them smart phones. Gestures are thus a mean of non-verbal communication, ranging from simple actions (pointing at objects for example) to more complex ones (such as expressing feelings or communicating with others). In this sense, gestures are not only an ornament of spoken language, but are essential components of the language generation process itself. A gesture can be defined as a physical movement of the head, face, arms, hands and body with the intent to convey information or meaning. Our natural head movements can be used as an interface to operate machines, communicate with intelligent environments to control home appliances, smart home, etc. In this paper we review those type of head movements. The keyboard and mouse are currently the main interfaces between man and computer. In other areas where 3D information is required, such as computer games, robotics and design, other mechanical devices such as roller-balls, joysticks and data-gloves are used. Humans communicate mainly by vision and sound, therefore, a man-machine interface would be more intuitive if it made greater use of vision and audio recognition. Another advantage is that the user not only can communicate from a distance, but need not have any physical contact with the computer. However, unlike audio commands, a visual system would be preferable in noisy environments or in situations where sound would cause a disturbance. The visual system chosen was the recognition of head movements. The amount of 648 Copyright 2011-15. Vandana Publications. All Rights Reserved.

computation required to process head movements is much greater than that of the mechanical devices; however standard desktop computers are now quick enough to make this project head movements recognition using computer vision a viable proposition. II. THE PROPOSED CONCEPT We propose a system, using which the user can control the robot in any environment using various head movements, making it semi-autonomous. Here the user operates the robot through a laptop or a PC with a good quality in-built webcam or external webcam. This webcam is used to capture real time video stream of head movements to generate commands for the robot. Gesture commands are given using head movements. Using the gesture technique developed, robot is moved in all possible directions in the environment. The designed system can capture limited number of gestures like move right, left, front and back. Block Diagram of Controlling Humanoid Robot Using Facial Gestures. Principle The principle of this project is Face Detection and Tracking in order to obtain head movements.these are the very basic steps considered in image enhancement. Face tracking system can be achieved by dividing the tracking into three parts: 1. Detect a face 2. Track the face 3. Thresholding conditions. Once face detection and tracking has been achieved the next step is to assign different robotic movements for different head movements. In order to do so we hacked the remote of the robot and the respective IR codes (which are transmitted by the remote to robot) are transmitted to robot using USB UIRT module. 1) Face Detection: Face detection is a computer technology that determines the locations and sizes of human faces in arbitrary (digital) images. It detects facial features and ignores anything else, such as buildings, trees and bodies. In order to locate a human face, the system needs to capture an image using a camera and a frame-grabber to process the image, search the image for important features and then use these features to determine the location of the face. For detecting face there are various algorithms and methods including skin color based, haar like features, adaboost and cascade classifier. Color is an important feature of human faces. Using skin-color as a feature for tracking a face has several advantages. Color processing is much faster than processing other facial features. Feature Extraction A Generic Face Detection System The input of a face detection system is always an image or video stream. The output is an identification or verification of the subject or subjects that appear in the image or video. The first step - feature extraction- involves obtaining relevant facial features from the data. These features could be certain face regions, variations, angles or measures, which can be human relevant (e.g. eyes spacing) or not. This phase has other applications like facial feature tracking or emotion recognition. Finally, the system detect (recognize) the face in an image. Viola Jones Algorithm has been employed to detect the faces in a given image. 2) Face Tracking There are many algorithms to track the detected face. Viola Jones algorithm can also be used to track the face. After detecting the face in the given image it draws a rectangular box around the face. Whenever we move the face the bounding box appears showing that the face is tracked. 3) Thresholding Conditions Thresholding is the simplest method of image segmentation i.e. it is the process of partitioning a digital image into multiple segments (sets of pixels). Thresholding can be used to create binary images from their corresponding gray scale images There are a number of thresholding methods to recognize the movements of face. One such method is central moments and using centroid. To calculate the centroid of a binary image you need to calculate two coordinates - Consider the first moment: Face Detection. The two summations are like a for loop. The x coordinate of all white pixels (where f(x, y) = 1) is added up. Similarly, we can calculate the sum of y coordinates of all white pixels: 649 Copyright 2011-15. Vandana Publications. All Rights Reserved.

Now we have the sum of several pixels x and y coordinates. To get the average, you need to divide each by the number of pixels. The number of pixels is the area of the image the zeroth moment. So we get: We can view the signal as a code word on the screen as shown in figure 2. and Here we calculated the centroid of the facial region and also found the co-ordinates of center of nose in every frame. We calculated the difference between the X co-ordinates of the center of nose and centroid and also difference between the Y co-ordinates of the same. l and t are the co-ordinates of center of nose. And cen.x and cen.y are the co-ordinates of the centroid of the face. There are two more variables which store the difference between these two variables in every frame received. p= l - cen.x and q = t- cen.y From the peak and minimum values of p and q obtained in different environments we have written the conditions for two head movements i.e. TOP and BOTTOM. Thresholding conditions: S.NO CONDITION MOVEMENT 1 p!=0 TOP &&q>=100&&q<=130 2 P!=0&&q>=170 BOTTOM Hacking the remote of Humanoid robot using Girder Software: Basically robosapien is controlled using remote. But in order to control humanoid robot using head movements we need to initially hack the remote and obtain the IR codes which are transmitted from remote to the robot. Girder 6 software is used to obtain the codes from the remote. By placing the remote opposite to the USB UIRT module and pressing a particular button we can capture the IR codes as shown in figure 1. Fig.2 IR code learnt from the remote control Now we test the captured signal strength on the robot by passing the signal to it via UIRT. If it is strength enough, it is okay else further learning is made. Since here we need 2 gestures, we learn three codes as such. 1) FORWARD 0000 0076 0000 0009 0147 006E 00DA 006E 00DA 006E 006D 006E 00DA 006E 006D 006E 006D 006E 006D 006E 00DA 3C7F 2) BACKWARD 0000 0076 0000 0009 0147 006E 006D 006E 006D 006E 00DA 006E 00DA 006E 006D 006E 006D 006E 006D 006E 006D 2292 3) RESET 0000 0076 0000 0009 0147 006E 00DA 006E 00DA 006E 00DA 006E 00DA 006E 006D 006E 006D 006E 006D 006E 006D 2620 Transmitting IR Codes using USB UIRT: For the two head movements i.e. TOP and BOTTOM we have assigned two movements of the robot i.e. FORWARD and BACKWARD. When we move our head upwards i.e. top robot has to move forward and it has to move backward when we move head downwards i.e. bottom. In order to do so we transmit the IR code captured (using Girder 6 software) using USB UIRT module. III. RESULTS 1) SIMULATION RESULTS: When the user comes in front of the web camera, the program runs and the environment sets up to track the face of the user as shown in below Fig. 1 Capturing IR codes 650 Copyright 2011-15. Vandana Publications. All Rights Reserved.

figure. Figure3. Figure showing the environment setup to track the face 2) When the user turns their head in any direction in this case if he/she turns their head upwards then it tracks the face and the command top is printed on the console. This is shown in the below figure. Figure5. Figure showing the command bottom on console. 2) HARDWARE RESULTS: The hardware results are shown below. 1. When the user is in idle case i.e. when he/she is just in front of webcam then the position of robot is shown in figure 6. Figure 4.Figure showing the command top on the console. 3) When they move their head downwards the command bottom is shown on the console. This can be viewed in the figure shown below. Figure 6. Position of robot when user is in idle state. 2. When the user is in front of webcam and raises his/her head then the robot is as show in figure7. 651 Copyright 2011-15. Vandana Publications. All Rights Reserved.

IV. CONCLUSION AND FUTURE WORK The designed interface works with high accuracy and takes a maximum time of 3 seconds to recognize the head movement. The interface is tested at different complex environments and found to be successful. Hence when the user raises his/her head, the robot moves forward and when the user moves his/her head downwards then the robot steps backward. Image Processing is the present hot topic for research in digital world. Deriving the head movements from the image, and in a way that can be used in the field of robotics, is what the entire idea is. Gestures are the most costly features to be derived from humans, hence different methodologies can be adapted to capture them, keeping the strategies of real time applications, efficiency, and accuracy in concern. The method adapted for capturing head movements is very economical, and so can say it is unique in field of robotics, open software era. In future we would extend the project by including two more movements of head such as left and right. The robot will move left and right depending on the movement of head to left and right respectively. Figure7. Position of robot when user raises his/her head. When the user raises his head the robot starts moving forward (we can observe the robot stepping forward with reference to the figure 6) 3. The position of robot when the user bends his/her face downwards is shown in figure 8. Figure 8.Postion of robot when user bends his head downwards From the figure 8 we can observe that the robot is stepping backward with reference to figure 6. REFERENCES [1] R. C. Gonzales, R. E. Woods, Digital Image Processing, Reading, MA: Addison- Wesley, 1995. [2] Lewis, Michael, and Andrew Edmonds. "Searching for faces in scrambled scenes." Visual Cognition 12.7 (2005): 1309-1336.DOI:10.1080/13506280444000535 [3] Paul Viola and Michael Jones. Rapid Object Detection using a Boosted Cascade of Simple Features in 2001. [4]Viola, Jones. Robust Real-Time Object Detection,IJCV 2001. http://citeseerx.ist.psu.edu/viewdoc/summary?doi= 10.1.1.110.4868 [5] V. Jain and E. Learned-Miller. FDDB: A benchmark for face detection in unconstrained settings. Technical report, University of Massachusetts, Amherst, 2010. [6] D. Bhattacharjee, D. K. Basu, M. Nasipuri, and M. Kundu. Human face recognition using fuzzy multilayer perceptron. Soft Computing A Fusion of Foundations, Methodologies and Applications, 14(6):559 570,April 2009. [7] V. Blanz and T. Vetter. Face recognition based on fitting a 3d morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1063 1074, September 2003. [8] Smita Tripathi,Varsha Sharma. Face Detection using Combined Skin Color Detector and Template Matching Method. International Journal of Computer Applications (0975 8887)Volume 26 No.7, July 2011. [9] P.Viola and M. Jones Robust real time object detection Proceedings of International Journal of Computer Vision, pp.137-154.2004. 652 Copyright 2011-15. Vandana Publications. All Rights Reserved.