Design and Implementation of an Intuitive Gesture Recognition System Using a Hand-held Device

Design and Implementation of an Intuitive Gesture Recognition System Using a Hand-held Device Hung-Chi Chu 1, Yuan-Chin Cheng 1 1 Department of Information and Communication Engineering, Chaoyang University of Technology, Taiwan {hcchu, s9830620}@cyut.edu.tw Abstract. As sensing technologies have developed, information equipment operating interfaces have changed. Thus, traditional instruction operation has evolved into intuitive operation, providing users a convenient operation experience without the need of learning in advance. At present, several kinds of intuitive user interface systems have been developed in intelligent mobile phones, such as picture vertical/horizontal automatic sensing switch and mobile phone face/back sensing switch. However, this system cannot process complicated action sensing. Therefore, this study developed a new intuitive gesture recognition system, where a G-sensor inside a phone records gestures, and then the gestures are identified by our gesture recognition algorithm in order to realize the intuitive gesture recognition system. Keywords: gesture recognition, G-sensor, accelerometer. 1 Introduction In recent years, there have been many studies on gesture recognition technologies in various domains, such as computer vision technology [1], intelligent gloves [2], inertial motion tracking system [3], and mobile equipments [4, 5]. The equipments used in various technologies are different. The studies related to computer vision technology use infrared sensing technology to measure body movements and gestures, e.g. the Wii Remote [6, 7]. The studies of intelligent gloves use diversified sensors (accelerometer) to measure fine gestures [2]. The accelerometer is commonly used in studies of inertial systems, and gyroscopes and magnetometers are also used for action detection. In mobile equipments, the accelerometer is used as a sensitive element able to identify gestures [3, 4]. However, the computer vision technology, intelligent gloves, and inertial motion tracking systems require enormous data for operation and special data acquisition equipments, thus, they are inapplicable to mobile equipments. Among mobile equipment, as intelligent mobile phones are rapidly developed, the phones have become smaller, have more functions, and hardware is equipped with diversified sensors, such as G-sensor, gyroscopes, magnetometers, and light sensors. The intelligent mobile phone operating systems are Android, Window mobile, ios4, etc., as well as the Android system released by Google, becomes a popular application target platform in recent years. This system has the following advantages.

(1) Open operating system: users can add functions or modify the operating system according to individual needs, and the operating system is provided with an open source code, which can be modified without authorization. (2) High freedom of software development: all users can design application programs for this operating system. The application programs can be shared or sold through Android Market services. (3) Integrated with Google service: the operating system can be integrated with cloud services, such as Gmail and Google maps. Based on the above advantages, users can develop application programs for this system. However, there are many technical challenges in the basic interactions of gesture recognition for mobile phones. First, the gesture recognition lacks a standardized and extensively used gesture vocabulary. Secondly, spontaneous interactions should be participated in immediately. For example, when a user inputs a gesture in an intelligent mobile phone, it will be identified by predefined gestures in its vocabulary, and the results corresponding to the gesture must be executed immediately. Thirdly, the intelligent mobile phone platform is highly limited to cost and system resources, which include computing power and an electric quantity battery. In addition, most of the gesture tracking recognition methods use an accelerometer to capture three-axis acceleration variation values, and the total displacement of the accelerometer is obtained upon calculations of Euclidean Distance. Finally the threeaxis acceleration variation values and the total displacement are used as the conditions of gesture tracking. However, if gesture tracking is identified only by using the above two conditions, the failure rate of gesture tracking recognition will increase as a result of error accumulation. Although the intuitive user interface system has been applied to intelligent mobile phones in recent years, the complicated automatic motion sensing and control cannot yet be performed. Therefore, this study developed a new intuitive user interface system, where a G-sensor inside a phone records gestures, and the gestures are identified by the proposed gesture recognition algorithm, in order to realize this system. The gesture recognition algorithm can solve the abovementioned challenges, and the precision and success rate of gesture recognition can be improved by using the proposed algorithm. 2 Related works In general inertial motion tracking systems, the motion direction is measured by sensors [3]. The workable sensors include accelerometers, gyroscopes, and magnetometers. The accelerometer measures acceleration, the gyroscope measures rotation rate, and the magnetometer measures the direction of motion. Therefore, Inertia Measurement Units (IMUs) are composed of the three components, and changes in the direction of motion are measured by the gyroscope or magnetometer. However, the inertial measurement system has two disadvantages, one is the high price of the gyroscope, the other, is that the magnetometer is likely to be disturbed by other electronic equipment, which result in errors. Therefore, [3] used three three-axis accelerometers to measure movements, the algorithm proposed in the system is

combined with an extrapolation method, the Least Squares Method (LSM), and Least Squares Problem (LSP), and the calculated error value is minimized using Lagrange multipliers. A highly efficiency recognition algorithm, using only a single three-axis accelerometer to identify gestures, is proposed in [4,5], called the uwave. The algorithm requires only one training sample as the identification condition for each gesture mode, which allows users to use personalized gestures as recognition of real operations. However, the core of the uwave algorithm is dynamic time warping (DTW) [8], which has been extensively studied and used in voice recognition systems. The uwave algorithm consists of acceleration quantization, dynamic time warping, and Template adaptation, and the dynamic time warping algorithm is used for two time series to determine the optimal corresponding value at the same time point. The error magnitude of the optimal corresponding value is used as the condition for identifying gestures. In [2], five two-axis accelerometers are used in the intelligent glove for gesture recognition, and the accelerometers are mounted at the five fingertips of the glove, allowing the finger movements to be precisely identified. However, considering the cost and size of the mobile equipment, the hardware cannot be equipped with several accelerometers. Furthermore, most of the previous studies of gesture recognition placed stress on detecting the outline of hand movements rather than finger movements. In addition, many studies and discussions regarding gesture recognition have used computer vision technology [1], such as Wii Remote [6, 7] and Kinect [9]. The body sensing principle of the Wii Remote includes two functions, which are direction positioning and motion sensing. Direction positioning refers to tracking and fixing coordinates using infrared sensing. The operation mode is that the LED inside the optical sensor bar emits infrared rays, and the infrared COMS sensor at the front end of Wii Remote receives the infrared spots from the optical sensor bar to determine the position and distance between optical sensor bar and remote controller. In addition, the relative position of the user is fixed using the received infrared spots, and based on infrared indoor location technology, direction positioning can be obtained. Motion sensing can detect movements and rotations in a three-dimensional space, and the gradient and travel direction are judged according to the voltage variation values of the x-axis, the y-axis, and the z-axis collected by the built-in ADX330 accelerometer of the Wii Remote. Therefore, body sensing operations can be attained by combining direction positioning with motion sensing. The interface device Kinect, of the Microsoft Xbox 360, has three kinds of lens, including an RGB color camera, an infrared transmitter, and an infrared CMOS camera. The infrared transmitter and the infrared CMOS camera form a 3D depth sensor, which is the major component of Kinect for detecting user's movements. Therefore, the Kinect with the above components can capture three elements at one time, which are color image, 3D depth image, and sound signal. The Kinect sensing technology produces depth images based on PrimeSensor technology [10]. The Kinect operation contains movement tracking, voice recognition, and a built-in motor. In the movement tracking, the infrared VGA lens built into Kinect transmits an infrared pulsed light, the infrared COMS camera receives the reflected infrared ray within the scan limit, and the PS1080 sensing chip analyzes and judges the user's position. The

sensing chip marks the Depth Field of all the scanned objects, using different colors and distances between user and Kinect to mark different objects, i.e. to mark the user in different colors according to different distances. Then, the user's body is separated from background objects, and the user's pose is judged correctly by the image identification system, thus, a 3D depth image is formed. The 3D depth image data are converted into a skeleton drawing, and the user's movement is identified by the skeleton tracking system. 3 Gesture recognition system 3.1 System structure Figure 1 is the system operation environment structure diagram. Users are provided with a mobile phone, which has various sensitive elements, such as a light sensing element, G-sensor, and magnetometer. In this study, the mobile phone is equipped with a G-sensor. In addition, the server is equipped with a database and a gesture recognition algorithm. The database stores several sets of gestures and the corresponding execution actions, while the gesture recognition algorithm calculates the tracking data sent of the user and identifies the gestures. When the user inputs the gesture tracking data using the G-sensor inside the phone, which is sent to the server through Wi-Fi or 3G/3.5G network for judgment, then the server will calculate and obtain the identification condition data by using the gesture recognition algorithm and read the preset several sets of gestures in the database to identify the gesture. Finally the gesture recognition result will be sent to the user and displayed on the phone, and the action corresponding to the gesture will be executed. Figure 1. System operation environment structure diagram 3.2 Gesture recognition algorithm In this algorithm, the tracking data are collected at intervals of 200ms, and after collection, the recorded tracking data can be divided into x-axis y-axis and z-axis parts. The gesture recognition calculation is executed according to the three axes' data, which contains five steps: (1) Data initialization, (2) Record single axis acceleration again, (3) Calculate interval acceleration variation, (4) Calculate interval displacement, and (5) Gesture recognition.

(1) Data initialization: the fundamental purpose is to arrange the captured threeaxis data. For gesture data captured by the intelligent mobile phone Android system, the acceleration data decreases as the x-axis shifts rightwards, and increases as the x-axis shifts leftwards. The data decreases as the y-axis moves up, and increases as the y-axis moves down. Therefore, the data are converted into increasing as shifting rightwards and decreasing as shifting leftwards through data initialization. (2) Record single axis acceleration again: the fundamental purpose is to record the data again, after data initialization in step one. (3) Calculate interval acceleration variation: if x-axis acceleration is {0.1, 0.3, 0.4, 0.2, -0.2}, as shown in Figure 2. Figure 2. Acceleration sample value of x-axis Through (1) is the interval acceleration variation, the x-axis acceleration variation at 200ms is: The rest may be deduced by analogy. (4) Calculate interval displacement: is the interval displacement, the displacement of each time point can be obtained. (5) Gesture recognition: the comparison method is when new tracking data are calculated through the abovementioned steps, the interval displacements are added as a judgment condition. The gesture comparison is carried out by reading the preset several sets of gesture tracking conditions in the database. The comparison method is based on: (2) (3)

Where, is the error rate, is the total displacement of new tracks, is the total displacement of preset tracks. The error values between the new tracks and all the preset tracks in database are calculated by using Eq.(3). Finally, when the minimum error value between a new track and a gesture of the preset tracks occurs, means the new track is identical to the gesture. 3.3 System operation flow chart Figure 3 shows the system operation flow. When a user sends gesture tracking data to the server, Step 1 of gesture recognition algorithm is executed first, and the gesture data are initialized. Step 2 is then executed, and the initialized data are rerecorded. The interval acceleration variation is calculated in Step 3, and the interval displacement is calculated in Step 4. Finally, gesture recognition is executed, all the preset gesture data in database will be read, and all the gesture data are compared with the new gesture data. If the error value between the gesture data and a gesture in database is at the minimum, the execution action corresponding to this gesture will be sent to the user and executed. If the comparison fails, the result will be sent directly to the user. Figure 3. System operation flow 4 Experimental results and analysis 4.1 Experimental results In the experiment, the equipment for capturing gesture tracking data is HTC Desire, and the operating system is the Android 2.3 operating system developed by Google on December 7, 2010. This system has revised UI and added hardware [11], and it provides new API (TYPE_LINEAR_ACCELERATION) for capturing the value of the G-sensor. The main difference between this API, and version 2.2 is the captured acceleration value excludes the gravity value of gravitation. The G-sensor chip built into the phone is BMA150. The captured HTC acceleration value is within ±9.8g.

. The server host is DELL-Optiplex 745MT, the CPU is Intel Core (TM) 2 Qual Q8400, and the memory is 4GB. Figure 4 shows the captured gestures in the experiment, the gesture data are captured and the gesture tracking diagram is redrawn according to the four gestures. In this figure, 1 represents move down, 2 represents a circle drawn clockwise. Figure 4. Gestures (dot is start, arrowhead is end) Figure 5 (a) shows the redrawn double axis tracking diagram after calculations and comparison of gestures. As seen Figure 5, the error value of double axis track is 6.92%. Therefore, the track lines on x-axis and z-axis planes are different slightly, which is due to a little shaking or slight error in movement when capturing tracking data, causing slight errors in gesture data. In addition, the error value in the three axes' tracking diagram is 11.06%. Therefore, there is a significant error between y-axis value of movement tracking and that of the initial tracking, as shown in Figure 5 (b). This is because when a mobile phone is used for capturing gesture data, the acceleration value will be slightly affected by the gravitation, thus, the track data are influenced. Therefore, in the tracking data of gesture 1, there is a slight error in the tracks, as the y-axis is influenced by gravitation. (a) (b) Figure 5. (a) (x,y) tracks of gesture 1, (b) (x,y,z) tracks of gesture 1 Figure 6 (a) shows the tracks of two axes after calculations and comparison of gesture 2, the error value of the redrawn double axis track is 5.90%, thus, there is a slight error resulted from shaking or error in movement when capturing gesture data. In addition, the error value in three axes' track diagram is 17.01%. Therefore, the movement track also has slight errors as the initial track does, as shown in Figure 6 (b). The track error is increased as the track is slightly affected by gravitation and slight errors in movement.

(a) Figure 6. (a) (x,y) tracks of gesture 2, (b) (x,y,z) tracks of gesture 2 (b) 4.2 Experimental analysis According to the experimental results, the acceleration value captured by the new API of the Android 2.3 operating system will continue to be slightly influenced by the gravity value of gravitation. Therefore, this API cannot completely exclude the gravity value of gravitation. The track data error will increase as the acceleration values of x, y, and z axes are slightly influenced by gravitation. The gesture tracking data of gestures 3 and 4 used in the experiment are slightly influenced by gravitation, thus, the error increases. Table 1 shows the error rate of gestures analyzed by equation (4) after ten tests. Where, is the mean error rate, is the Maximum error rate, is the Minimum error rate. It is observed that the gesture tracking data mean error rates captured from gestures 1 and 2 are 13.76% and 13.42%, respectively. Since the new Android2.3 the API cannot completely eliminate the effects of gravity, therefore, the acceleration value of x, y and z axes slightly affected. Therefore, the failure rate of gesture recognition will increase. (4) is the mean error rate, equation (3), is the number of tests. is the total error rate calculated by Table 1. error rate Gesture 1 2 13.76% 13.42% 26.3% 17.01% 8.25% 8.73%

5 Conclusions This study developed a new intuitive gesture recognition system based on the Android operating system. This system can realize intuitive gesture recognition through the gesture recognition algorithm designed by this study. Users can create functions corresponding to gestures using this system, and thus, realize intuitive gestures. However, one challenge in the experiment is that the acceleration value captured by the hand-held device was slightly affected by gravity, causing the error value to increase. Therefore, the follow-up study will attempt to solve the above problem in the future, and reduce the gravity effect to increase the recognition success rate. Acknowledgement This work was supported in part by the National Science Council, Taiwan, under grant NSC 99-2632-E-324-001 -MY3. References [1] Y. Wu and T. S. Huang, Vision-Based Gesture Recognition: A Review, in: Proceedings of the International Gesture Workshop on Gesture-Based Communication in Human- Computer Interaction: SpringerVerlag, 1999. [2] J. K. Perng, B. Fisher, S. Hollar, K. S. J. Pister, Acceleration sensing glove (ASG), The Third International Symposium on Wearable Computers, 1999. [3] C.-W. Yi, C.-M. Su, W.-T. Chai, J.-L. Huang, T.-C. Chiang, G-Constellations: G-Sensor Motion Tracking Systems, IEEE 71st Vehicular Technology Conference (VTC 2010- Spring), 2010. [4] J. Liu, Z. Wang, and L. Zhong, J. Wickramasuriya and V. Vasudevan, uwave: Accelerometer-based Personalized Gesture Recognition and Its Applications, IEEE International Conference on Pervasive Computing and Communications, 2009. [5] J. Liu, L. Zhong, J. Wickramasuriya and V. Vasudevan, uwave: Accelerometer-based personalized gesture recognition and its applications, Pervasive and Mobile Computing, Volume 5, Issue 6, December 2009, Pages 657-675. [6] T. Petric, A. Gams, A. Ude, L. Zlajpah, Real-time 3D marker tracking with a WIIMOTE stereo vision system: Application to robotic throwing, IEEE 19th International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD), 2010. [7] P.-W. Chen, K.-S. Ou, K.-S. Chen, IR Indoor Localization and Wireless Transmission for Motion Control in Smart Building Applications based on Wiimote Technology, In Proceedings of SICE Annual Conference, Taiwan, 2010. [8] C. S. Myers, L. R. Rabiner, A cpmparative study of several dynamic time-warping algorithms for connected word recognition. The Bell System Technical Journal, vol. 60, pp. 1389-1409, 1981. [9] Kinect for Xbox 360, http://www.microsoft.com/presspass/presskits/ [10]PrimeSense Supplies 3-D-Sensing Technology to Project Natal for Xbox 360, http://www.microsoft.com/presspass/press/2010/mar10/03-31primesensepr.mspx [11] Wikipedia, Android http://zh.wikipedia.org/zh-tw/android#.e4.bd.9c.e6.a5.ad.e7.b3.bb.e7.b5.b1