Vishnu Nath Usage of computer vision and humanoid robotics to create autonomous robots (Ximea Currera RL04C Camera Kit)
Acknowledgements Firstly, I would like to thank Ivan Klimkovic of Ximea Corporation, who has been supporting me throughout the entire duration of my research work. It was Ivan who had sent me the camera that helped me in my research work, and he has been my point of contact all throughout with Ximea. He also gave me pointers from time to time to improve my technical writing. I would also like to thank my advisors, Dr. Stephen Levinson and Dr. Paris Smaragdis for their support and technical advice that made this project a reality. I would like to thank Luke Wendt, a Ph.D. student and a good friend, who has always been supportive of me since the very first days of my research. He recommended great books to aid me in my research and to improve my understanding of the harder concepts in machine learning and statistics. He was also my partner in the computer vision course and a lot of the progress that was made in this research experiment is directly related to the vision course. This project required a lot of tools, especially for modeling and simulation. It was Aaron Silver, a Ph.D. student and another good friend, who taught me how to use these tools. It was Aaron who patiently taught me how to begin programming a humanoid robot and spent hours on end scrutinizing his own code for me to learn.
Project Overview The field of robotics has improved in leaps and bounds over the past half century. Robots have gone from mere characters in science fiction novels to indispensable servants in several fields. Assembly lines, previously staffed by humans, are now giving way to industrial robots that perform the same task hundreds of times faster, with an even better accuracy rate. As a result, it is more imperative than ever before to design robots that can perceive their surroundings and intelligently work within their environment, rather than being programmed for every possible case by a human programmer. The celebrated science fiction author Isaac Asimov has written quite a few works in the field of Artificial Intelligence. In his book I, Robot, Asimov talks about how robots would play a part in our daily lives in the near future. He goes onto predict that one day robots might be capable of emotion and they may love or hate their masters, in effect mimicking the behavior of any random human being. The goal of the project is to teach a specialized humanoid robot, the icub robot, to solve any puzzle, wherein a ball of a given color would be placed at the start position of the maze, and the robot would navigate the ball through obstacles and get the ball to the finish position. The robot would be able to move the ball through the maze by physically tilting the base of the puzzle with its hand. In the process, the robot would utilize the most efficient way possible. If no possible path exists, the robot would not begin to solve the maze. The importance of this experiment is profound. This experiment marks the beginnings of letting a robot truly think that its actions have an impact on its immediate surroundings. When a child is born, initially it is not able to comprehend objects, shapes, etc. and cannot focus on anything that is not within its direct gaze. However, over time, the child learns that it can move and play with objects, thus realizing
that its motor hand movements have changed the environment. The child learns this from neurological feedback loops like the rattling of toys. In this experiment, we are hoping to determine if a robot can also develop this consciousness that its actions mean something, and hopefully to make the leap that by in order to reach state C, it needs to perform action A on state B. We believe that this is the foundation of truly autonomous robotics and this project is the first step in this direction.
Project Components For the successful completion of the project, the most important components were the Currera Starter Kit (which included the Currera RL04C camera) and the icub robot itself, along with its required hardware. The starter kit also came with demo applications of a variety of computer vision libraries. After testing them all out, I decided to proceed with the OpenCV vision library, since I had prior experience with it and it has the largest support community. The project involves the successful design and execution of complex vision, control and artificial intelligence algorithms in order to be able to learn from experience and modify its behavior accordingly in future iterations. As a result, a demarcation was made with regards to the storage and processing of the data amongst different components. All the vision related processing was performed by the Currera camera. This includes the determination of all the internal and external edges of the maze and the initial position of the ball. This information is then transmitted to the icub to store in its RAM, freeing up the Currera camera to deal exclusively with the live camera feed and to process it. The Currera unit is responsible for detecting the edge markers, the centroid of the ball and its current relative position at all times with respect to the markers. The unit is also responsible for filtering out other possible candidates for being the ball (possible with objects having similar HSV values) and for making sure that the entire board is visible at all times. This is done by the process of inverse homography and the entire computation is done within the Currera unit itself. Only the output is sent to the icub for further processing. We believe that the Currera unit was able to handle such loads seamlessly because of the Atom Z530 process and the 1GB RAM that it had, making it a powerful computer in its own right. The icub robot comes with its own PC104 controller and memory, and communicates with the actuators and sensors using the CAN protocol. The initial version of the artificial intelligence algorithms is loaded onto the icub memory. There is a constant interaction between the controller and the memory, and so any changes in the parameters of the algorithms that would help fine-tune the algorithm would result in
an automatic update in the parameters. All changes are logged as well, so that the project can be replicated any number of times. The flow chart for the interaction of all the various components of this project is given below: Figure 1 - Components Overview
Part Replacement The icub comes with two cameras of its own, DragonFly2 cameras from PointGrey. However, they are CCD based while the Currera is CMOS based. The DragonFly2 is always prone to noise and slow speeds. This was what prompted me to start researching alternatives, and Ximea s Currera starter kit was the perfect fit for me to carry on my research. Figure 2 - Currera camera right after mounting it on stand Figure 3 - Lens
Figure 4 Breakout box BOB144 Figure 5 - The camera right before fixing inside icub
Figure 6 - Testing the camera before insertion Because of these performance reasons, the right camera of the icub was removed and was replaced with the Currera camera. The base of the skull of the icub contains the PC104, which covers the entire interior of the skull. As a result, the camera is not visible from the back of the robot, but can be seen as the iris of an eye from the front. All the wires from the BOB144 have been passed through the neck into the body and are connected to the icub s control systems so that the loop is maintained.
Figure 7 - View of icub from the back Figure 8 - Notice BOB144 towards the far end of top side of the skull
Algorithm approaches and views Offline Analysis of the Maze First the maze must be studied to obtain a control policy. This is done by analyzing a top down view of the maze. Since it is very difficult to provide a perfect orthographic view of the maze, an inverse homography must be applied to this step as well. Therefore the first thing that must be done is identification of features with known geometric relation to each other. At least four must be present to determine the inverse homography. The easiest way to go about doing this is to place high contrast color markers on each corner of the maze. The color red was selected because it was not present in the maze or the surrounding background. The ball was also chosen to be red. In the initial top down analysis no ball will be present. Color thresholding provides a binary image indicating where high concentrations of red are present. Thresholding the original RGB values appear to be overly sensitive. This sensitivity can be removed by first converting to HSV coordinates. A segmentation algorithm, like RasterScan, can be used to label contiguous regions and sort them by size. The four largest regions are expected to be the four corner markers. The geometry of the corners is known and therefore with a bit of logic labels can be applied to the compass markers. The markers are assumed to be related in a square manner from the top down perspective. Using this information, an inverse homography can be computed to obtain the correct top down perspective of the maze. All surrounding border content can be cropped off. The maze wall and open path are the only things that remain after this step. Choosing colors that are easily distinguished, e.g., yellow and green, allows thresholding of the open path. Again, a RasterScan will provide the open contiguous path of the maze. Once a path is obtained, it can be discretized into a grid. Important features in the image need to be tagged, like the start and goal. A high contrast color or a fiducial marker could be used or a manual user interface would work as well. Once a start and finish location are specified, reinforcement learning can begin. The reinforcement learning simulates trial and error runs of a simulated maze environment. The
control actions involve course wrist motion with long pauses. After each action the ball is at rest in one of the corners of the maze. After a sufficiently long run time value iteration converges and an optimal policy is obtained. Filtering of the optimal policy provides more general control domains. The final filtered control policy corresponding to this is then saved for online control. As long as the canter of the markers and the ball in the projected view are roughly in the same plane as the center of the ball and markers in the top down view, their transformations should be isomorphic. Online Analysis of the Maze Once the optimal control policy is obtained, any maze can be solved online with Bert. The maze is now viewed from a projected perspective. HSV (Hue-saturation-value) color thresholding provides a binary image indicating where high concentrations of red are present, which in turn indicates the location of the corner markers and the ball. A RasterScan segmentation approach returns the largest 5 objects in the field of vision of the robot. With the known geometry of the board on which the maze is built, the markers and the ball can be labeled. From the markers, an inverse homography would return the unprojected coordinates of the maze. This inverse homography is the applied to the ball s location. After cropping and discretizing the resulting projected image, the ball s location with respect to the grid and maze can be accurately determined. The application of the optimum control policy used in the offline analysis of the maze would result in the movement of the ball. Due to this design, the robot would move the board along one of the 3 possible axes and wait for a period of time before leveling the board once again. During this period, the ball would have progressed to another corner of the maze that is closer to the goal.
Currera camera identifying the ball The first thing that is needed for the entire project to run is to determine what object(s) can be classified as the ball in question. We pre-determined that the ball was going to be red in color, so it seemed natural to program the Currera camera to look for red objects, i.e. objects whose R value is high in the RGB value. The following figure gives an indication of the range of values seen by the Ximea Currera camera. Figure 9 - RGB values in the maze As a result of the small variation in the RGB values, I programmed the Currera camera to look at the HSV values instead. Now, the results are much more promising. The Currera camera accurately detects the
four corners of the maze, and can detect the ball easily. The corner detection and the resulting inverse homography is shown in the next 2 figures. Figure 10 - Corner detection Figure 11 - After inverse homography
Figure 12 - Detecting the ball correctly This represents an almost complete information the Currera camera outputs to the icub for further processing. Until this point, the Currera camera has already performed a significant amount of computer vision processing. The camera has detected the four edges of the board, the internal paths of the board, the relative position of the ball and performed an inverse homography in real time. But there is something even more amazing that the Currera can perform that made it perfect for my project. The Currera is also able to segment the image and its path (based on the HSV values), so that the icub can generate the optimum path right away. Furthermore, the Currera can also detect the start and end points of the maze as well, based on the tags attached to the maze. The following figures show the true computing capability of the Currera camera.
Figure 13 - HSV image of the maze Figure 14 - Segmenting the image into little squares for rule generation
Figure 15 - Currera labelling the start and end points on its own The information conveyed in figures 10,12,14 and 15 are sent to the icub processor to conduct the next part of the project, which is to apply the artificial intelligence algorithms (Q-learning, SARSA) and generate the rules for the maze. Here to, constant feedback from the Currera camera is what guides the icub to be able to successfully solve the maze.
Figure 16 - Rule generation by icub Figure 17 - Control policy generated based on Currera input
Figure 18 - Smoothed out control policy Thus, with the rules generated (figure 18) and all the information stored in the memory, the icub relies on the live feed streaming from the Currera camera to ensure that the ball is on path and is able to make decisions to guide the ball accurately from the start to the finish point.
Figure 19 - icub in action solving the maze