Acquiring hand-action models by attention point analysis

Similar documents
P. Bruschi: Project guidelines PSM Project guidelines.

Spring Localization I. Roland Siegwart, Margarita Chli, Martin Rufli. ASL Autonomous Systems Lab. Autonomous Mobile Robots

Lab 3 Acceleration. What You Need To Know: Physics 211 Lab

Mobile Robot Localization Using Fusion of Object Recognition and Range Information

Knowledge Transfer in Semi-automatic Image Interpretation

Evaluation of the Digital images of Penaeid Prawns Species Using Canny Edge Detection and Otsu Thresholding Segmentation

Fuzzy Inference Model for Learning from Experiences and Its Application to Robot Navigation

Notes on the Fourier Transform

ECE-517 Reinforcement Learning in Artificial Intelligence

Role of Kalman Filters in Probabilistic Algorithm

Comparing image compression predictors using fractal dimension

5 Spatial Relations on Lines

Exploration with Active Loop-Closing for FastSLAM

Foreign Fiber Image Segmentation Based on Maximum Entropy and Genetic Algorithm

Motion-blurred star image acquisition and restoration method based on the separable kernel Honglin Yuana, Fan Lib and Tao Yuc

Memorandum on Impulse Winding Tester

Surveillance System with Object-Aware Video Transcoder

A Cognitive Modeling of Space using Fingerprints of Places for Mobile Robot Navigation

Distributed Multi-robot Exploration and Mapping

A new image security system based on cellular automata and chaotic systems

3D Laser Scan Registration of Dual-Robot System Using Vision

THE OSCILLOSCOPE AND NOISE. Objectives:

ARobotLearningfromDemonstrationFrameworktoPerform Force-based Manipulation Tasks

Investigation and Simulation Model Results of High Density Wireless Power Harvesting and Transfer Method

EE201 Circuit Theory I Fall

Autonomous Humanoid Navigation Using Laser and Odometry Data

A Segmentation Method for Uneven Illumination Particle Images

A WIDEBAND RADIO CHANNEL MODEL FOR SIMULATION OF CHAOTIC COMMUNICATION SYSTEMS

Design and Implementation an Autonomous Mobile Soccer Robot Based on Omnidirectional Mobility and Modularity

UNIT IV DIGITAL MODULATION SCHEME

Universal microprocessor-based ON/OFF and P programmable controller MS8122A MS8122B

A Smart Sensor with Hyperspectral/Range Fovea and Panoramic Peripheral View

Dynamic Networks for Motion Planning in Multi-Robot Space Systems

The student will create simulations of vertical components of circular and harmonic motion on GX.

Active Teaching in Robot Programming by Demonstration

Lecture #7: Discrete-time Signals and Sampling

Optimal Navigation for a Differential Drive Disc Robot: A Game Against the Polygonal Environment

(This lesson plan assumes the students are using an air-powered rocket as described in the Materials section.)

Pulse Train Controlled PCCM Buck-Boost Converter Ming Qina, Fangfang Lib

sensors ISSN

Appearance-Based Multimodal Human Tracking and Identification for Healthcare in the Digital Home

Receiver-Initiated vs. Short-Preamble Burst MAC Approaches for Multi-channel Wireless Sensor Networks

ICAMechS The Navigation Mobile Robot Systems Using Bayesian Approach through the Virtual Projection Method

Electrical connection

Localizing Objects During Robot SLAM in Semi-Dynamic Environments

SLAM Algorithm for 2D Object Trajectory Tracking based on RFID Passive Tags

Direct Analysis of Wave Digital Network of Microstrip Structure with Step Discontinuities

MAP-AIDED POSITIONING SYSTEM

Discrete Word Speech Recognition Using Hybrid Self-adaptive HMM/SVM Classifier

Prediction of Pitch and Yaw Head Movements via Recurrent Neural Networks

A-LEVEL Electronics. ELEC4 Programmable Control Systems Mark scheme June Version: 1.0 Final

Effective Team-Driven Multi-Model Motion Tracking

Automatic Power Factor Control Using Pic Microcontroller

Channel Estimation for Wired MIMO Communication Systems

Estimation of Automotive Target Trajectories by Kalman Filtering

Bounded Iterative Thresholding for Lumen Region Detection in Endoscopic Images

R. Stolkin a *, A. Greig b, J. Gilby c

ISSCC 2007 / SESSION 29 / ANALOG AND POWER MANAGEMENT TECHNIQUES / 29.8

ACTIVITY BASED COSTING FOR MARITIME ENTERPRISES

Pointwise Image Operations

ECMA st Edition / June Near Field Communication Wired Interface (NFC-WI)

AN303 APPLICATION NOTE

Robot Control using Genetic Algorithms

Modeling and Prediction of the Wireless Vector Channel Encountered by Smart Antenna Systems

Lecture September 6, 2011

Demodulation Based Testing of Off Chip Driver Performance

EXPERIMENT #4 AM MODULATOR AND POWER AMPLIFIER

A Harmonic Circulation Current Reduction Method for Parallel Operation of UPS with a Three-Phase PWM Inverter

Color-Based Object Tracking in Multi-camera Environments

EXPERIMENT #9 FIBER OPTIC COMMUNICATIONS LINK

Abstract. 1 Introduction

Inferring Maps and Behaviors from Natural Language Instructions

B-MAC Tunable MAC protocol for wireless networks

An Emergence of Game Strategy in Multiagent Systems

A New and Robust Segmentation Technique Based on Pixel Gradient and Nearest Neighbors for Efficient Classification of MRI Images

Performance Analysis of High-Rate Full-Diversity Space Time Frequency/Space Frequency Codes for Multiuser MIMO-OFDM

KALMAN FILTER AND NARX NEURAL NETWORK FOR ROBOT VISION BASED HUMAN TRACKING UDC ( KALMAN), ( ), (007.2)

Comparitive Analysis of Image Segmentation Techniques

Industrial, High Repetition Rate Picosecond Laser

MEASUREMENTS OF VARYING VOLTAGES

Abstract. 1 Introduction

4.5 Biasing in BJT Amplifier Circuits

EE368/CS232 Digital Image Processing Winter Homework #1 Released: Monday, January 8 Due: Wednesday, January 17, 1:30pm

EE 330 Lecture 24. Amplification with Transistor Circuits Small Signal Modelling

Power losses in pulsed voltage source inverters/rectifiers with sinusoidal currents

Sketch-based Image Retrieval Using Contour Segments

Family of Single-Inductor Multi-Output DC-DC Converters

Negative frequency communication

Classification of Multitemporal Remote Sensing Data of Different Resolution using Conditional Random Fields

Multiuser Interference in TH-UWB

Laplacian Mixture Modeling for Overcomplete Mixing Matrix in Wavelet Packet Domain by Adaptive EM-type Algorithm and Comparisons

A Bidirectional Three-Phase Push-Pull Converter With Dual Asymmetrical PWM Method

A New Voltage Sag and Swell Compensator Switched by Hysteresis Voltage Control Method

Table of Contents. 3.0 SMPS Topologies. For Further Research. 3.1 Basic Components. 3.2 Buck (Step Down) 3.3 Boost (Step Up) 3.4 Inverter (Buck/Boost)

Revision: June 11, E Main Suite D Pullman, WA (509) Voice and Fax

Auto-Tuning of PID Controllers via Extremum Seeking

Signal Characteristics

Variation Aware Cross-Talk Aggressor Alignment by Mixed Integer Linear Programming

Social-aware Dynamic Router Node Placement in Wireless Mesh Networks

An Integrated Architecture for Adaptive Image Stabilization in Zooming Operation

Transcription:

Acquiring hand-acion models by aenion poin analysis Koichi Ogawara Soshi Iba y Tomikazu Tanuki yy Hiroshi Kimura yyy Kasushi Ikeuchi Insiue of Indusrial Science, Univ. of Tokyo, Tokyo, 106-8558, JAPAN y) The Roboics Insiue, Carnegie Mellon Universiy, Pisburgh PA, USA yy) Research Division, Komasu Ld. Kanagawa, 254-8567, JAPAN yyy) Univ. of Elecro-Communicaions, Tokyo, 182-8585, JAPAN fogawara, kig@iis.u-okyo.ac.jp, iba+@cmu.edu omikazu anuki@komasu.co.jp, hiroshi@kimura.is.uec.ac.jp Absrac This paper describes our curren research on learning ask level represenaions by a robo hrough observaion of human demonsraions. We focus on human hand acions and represen such hand acions in symbolic ask models. We propose a framework of such models by efficienly inegraing muliple observaions based on aenion poins; we hen evaluae he produced model by using a humanform robo. We propose a wo-sep observaion mechanism. A he firs sep, he sysem roughly observes he enire sequence of he human demonsraion, builds a rough ask model and also exracs aenion poins (APs). The aenion poins indicae he ime and he posiion in he observaion sequence ha requires furher deailed analysis. A he second sep, he sysem closely examines he sequence around he APs, and obains aribue values for he ask model, such as wha o grasp, which hand o be used, or wha is he precise rajecory of he manipulaed objec. We have implemened his sysem on a human form robo and demonsraed is effeciveness. 1 Inroducion One of he mos imporan issues in roboics is how o program robo behaviors. Several mehodologies for programming robos have been proposed. We can classify hem ino he following hree caegories: saic exual programming, manipulaion by a human hrough a conrol device, and auomaic programming. The former wo mehods require human inervenion hroughou he enire ask. In conras, auomaic programming is inended o reduce human aid and o generae an enire robo program auomaically. Given he necessary iniial knowledge, robos ry o acquire heir behavior auomaically from observaion, simulaion or learning. Our research goal is auomaic acquisiion of robo behavior, in paricular, hand-acions, from observaion based on he auomaic programming approach. We divide he acquisiion process of human asks ino wo levels: ask level, e.g, wha-o-do and behavior level, e.g., how-o-do i. This paper covers he former one, ask level acquisiion, while he laer one is presened in [4]. In Chaper 2, we discuss he necessiy of inegraion of muliple observaions. In Chaper 3, we inroduce he concep of aenion poins and presen a mehod for consrucing a ask model by wo kinds of aenion poin (AP) analyses. In Chapers 4 and 5, we describe implemenaion deails for each aenion poin analysis. In Chaper 6 we presen experimenal resuls. Chaper 7 conains our conclusions and remarks on fuure work. 2 Acquisiion of human ask Ikeuchi, Suehiro and Kuniyoshi e al. sudied vision based ask acquisiion [1, 2]. In heir research, he acquisiion sysem observed a human performing an assembly ask and consruced high-level ask models. Then, using hose consruced models, a robo performed he same ask. Kimura e al. proposed ask models which could be used o realize cooperaion beween a human and a robo [3]. In his scheme, he robo firs observes sequenial human operaions, referred o as evens, by vision and analyzes muual even dependencies (pair of pre-condiions and resuls) in he asks. The robo is able o change is assisan behavior according o he curren even observed and he knowledge of wha is o be done nex, derived from he ask model, and o generae a large number of cooperaive paerns from a single ask model. However, hese models depend on one (ypically a single camera) or a few sensors and are consruced hrough one-ime observaion, herefore hey are no suiable for close analysis.

Our approach uilizes muliple observaions which vary in sensor variey and granulariy for efficien analysis. By analyzing each observaion sequenially or repeaedly, we can deermine he necessary par in he human demonsraion where he level of deail in he subsequen analysis should be changed and can hen accumulae each resul o build he ask model efficienly. Inegraion of observaions enables us o build heerogeneous ask models in which accuracy is enhanced locally. We inroduce he concep of aenion poin (AP) as a key of inegraion and propose a wo-sep analysis based on APs as a mehod of consrucing a human ask model. 3 Aenion poin 3.1 Two-sep analysis Inegraion of muliple observaions is accomplished by wo-sep analysis. A he firs sep, he sysem roughly analyzes he inpu modaliies and recognizes he ouline of he enire human demonsraion (rough ask model). A he same ime, he sysem also exracs APs. APs, which require close observaion o learn a paricular behavior, are defined around specific ime and posiion along a sequence of a human demonsraion. A he second sep, he sysem closely examines he demonsraion around each AP o enhance he ask model. This sequence can be he same observaion daa or anoher one. In he laer case, he sysem synchronizes wo observaion daa which are derived from differen demonsraions of he same ask. We employ a ype of ask models similar o Ikeuchi s. We decomposed a hand-acion ask as a sequence of discree hand-acions, during which a human performs some acion by manipulaing objecs, and we symbolized possible hand-acions as Acion Symbols, which indicae wha-o-do informaion. The ask model also includes several aribues for Acion Symbol, deailed informaion o achieve ha acion, such as which hand o use or which objec o grasp (Table 1). In he proposed wo-sep approach, his Acion Symbol is obained from he rough analysis a he firs sep. Then, from he deailed analysis around he APs previously deermined, hose aribues are obained a he second sep. Table 1: Task Model Aribues Prioriy Value Acion Symbol 3(high) Power Grasp, Precision Grasp Release, Pour, Hand Over Objec Model 3 Shape and Color hisogram Hand 2 Righ, Lef, Boh Posiion 1 Absolue Posiion in 3D space Time Samp 1(low) Absolue Time (sar and sop ime) We propose wo differen kinds of AP analyses in he following secions. 3.2 Inegraion of sensors separaed in space Sensor.. 1. Deecion of APs 2.Inegraion on APs Acion AP Memorizaion Deailed Analysis Fig. 1: Two Seps Analysis using Aenion Poin When several inpu sensors are available simulaneously, i is generally ineffecive o precisely analyze all he daa along he enire human demonsraion. So he sysem emporally records all he raw daa available and employs a wo-sep analysis of a human ask (Fig.1). To realize he wo-sep analysis, we uilized he shorerm memorizaion mehod. A one observaion sequence, he sysem firs analyzes he inpu daa given by he se of modaliies ha require he cheapes compuaion. I exracs acion symbol, and APs as he boundaries of each segmened acion while recording all he daa around each AP on sorage devices. Afer an observaion sequence compleed, i.e., afer one demonsraion was finished, he sysem acquires he recorded daa corresponding o each AP from he sorage devices and applies a deailed analysis on hem off-line. This process obains he remaining aribues in he ask model. 3.3 Inegraion of sensors separaed in ime The mehod described above requires emporal ses of recorded inpu daa; as he number of sensors and work ime increases, he amoun of unused daa expands. And also, for some sensors, i is no advisable o adop a specific sensor configuraion a all imes because of range, speed, precision rade-off. So we propose anoher wo-sep analysis in which he sysem requires quaniaive evaluaions of a number of demonsraions for he same ask. The sysem roughly analyzes he demonsraion and exracs APs a he firs observaion. Then he sysem changes he sensor configuraion if necessary and examines he second demonsraion around he APs o enhance he ask model. For he synchronizaion issues, he sysem can predic he hand moion from

he firs observaion and, by waching for he appearance of he prediced moion a each AP in he second observaion, he muliple observaions can be synchronized. 4 AP analysis for sensors separaed in space Our sysem employs a pair of daa gloves and a 9-eye real-ime sereo vision sysem. We can acquire deph and color images from he sereo vision sysem and can acquire hand moion (finger shape, absolue posiion and orienaion) from he daa gloves. The image processing is much more ime-consuming as opposed o he processing of he daa gloves; hus we adoped he AP analysis described in Secion 3.2. We uilized he daa gloves o exrac APs and acion symbols;. hen, o deermine aribues of he ask model, he sysem analyzes deph and color images around hose Aps. Fig.2 and Fig.3 show he flow of he AP based wo-sep analysis. The subsequen secions describe he ouline of he analysis. Please refer [5] for deails. 4.1 Rough analysis by gesure spoing We se up a ask domain for a specific hand-acion ask and buil a finie se of acion symbols, which represens all he possible hand moions ha appeared in ha ask domain. Acion symbols are combinaions of finger acions and local hand moions. For now, we classify possible finger acions ino hree acions: Power Grasp, Precision Grasp [6] and Release, and described human hand acions as a finie se of Acion Symbols which are combinaions of above finger acions and local hand moion. By excluding hand acions composed of independen finger moion, we can segmen he enire hand-acion ask ino meaningful Acion Symbols. Table 2: Gesure definiions Gesure Primiives Acion Grip cls+sp Power-grasp from open posiion Pick prc+sp Precision-grasp from open posiion Pour cls+roll+sp Power-grasp, and roll he wris Hand-over prc+forw+sp Precision-grasp, move forward, and back Release opn+sp Open a grasp hand Garbage gb A filler model for spoing Sar,End sil Silence a he sar and end We uilize a pair of daa gloves (CyberGlove 18-DOF each), and 6-DOF posiion sensors (Polhemus) as inpu devices for he HMM-based gesure spoing module. So, 24 dimensions and heir differenials are he inpu o he HMM module for each hand. The second column of Table2 indicaes he defined HMM primiives for each gesure. Each primiive is defined as 5-sae lef-righ HMMs.sil is a silen sae used a he ime of raining, sp is a shor pause which ends o occur a he end of he gesure corresponding o an acion symbol, and gb is a garbage collecor rained on arbirary non-gesure movemen. By sharing primiives, each acion symbol requires a small number of raining daa wih beer efficiency. Lef and righ single-hand gesures are spoed separaely in a parallel manner, while wo-handed gesures are spoed by combining resuls from he analysis of boh hands. Our sysem can sample he daa from a pair of daa gloves in 30Hz and can spo gesures corresponding o acion symbols in parallel wihou delay. 4.2 Aenion poin analysis by vision... Time Samp Ts Acion Symbol A Hand Posiion Ts +1 A +1 Ts +2 A +2 Aenion Poin... Vision Sysem 5fps Recording Image Sream Fig. 2: Rough Analysis by Gesure Spoing Color hisogram HDD To obain Acion Symbols in he ask model, we aim o spo human gesures from hand-acions performed by a human demonsraor as shown in Fig.2. In his experimen, we chose ransferring conen of conainer as a ask domain and seleced five gesures as possible hand acions (Table2). APs are defined as he saring poin of each gesure. To exrac hese Acion Symbols, we employ daagloves and a gesure spoing echnique based on Hidden Markov Models (HMMs). Shape hisogram (3D Tenmplae Maching) Fig. 3: AP analysis by vision Images a APs Compuaion ime for image processing is raher imeconsuming. So we firs record all he raw daa from he vision sysem around APs. These recorded daa are syn-

chronized wih he daa-glove analysis and he correspondence beween hem is easily made. Afer he firs analysis is finished and exracs APs, he sysem feches he corresponding images and exracs he informaion abou he manipulaed objecs (Fig. 3). By analyzing jus before each AP, we can obain he images in which he arge objec is no occluded by he hand. The objec is modeled by calculaing shape and color hisograms. We assume ha a human ask is demonsraed on a able whose geomeric informaion is known. By exracing deph regions corresponding o each objec on he able, we calculae shape hisogram as a lis of goodness of maching beween an objec exraced in he deph image and each objec model in he daabase. This goodness of maching is obained by using he 3D Templae Maching(3DTM)[7] echnique. 3D Templae Maching, a echnique for localizaion, finds he precise posiion and orienaion of he arge objec in deph daa. This process is calculaed by projecing he corresponding 3D model ino he 3D space generaed from a deph image and calculaes goodness of maching beween he 3D model and he 3D daa by summing up weighed disance beween each cener poin of he meshes in he emplae model and he closes 3D poin. 3DTM adops M robus esimaor o eliminae he effec of ouliers. Color hisogram is calculaed as a normalized hue hisogram which couns pixels wih large sauraion value among he area of he objec on he color image. These deph and color images are produced a 5 fps (up o 30 fps) synchronously by he 9-eye muli-baseline sereo vision sysem. The sysem regisers his hisogram informaion in he aribue slo of he ask model. 5 AP analysis for sensors separaed in ime In he previous chaper, we described he hand-acion model in erms of classified gesures. This model gives a good noion of hand moion and he ype of he manipulaed objecs, bu ells nohing abou he manipulaed objec s moion. In oder o model he delicae moion of he manipulaed objec or o judge he success/failure of he ask performed by he robo auomaically, a ask model mus conain some informaion abou precise posiion and orienaion of he manipulaed objec a paricular pars in he enire ask. We developed an efficien mehod o acquire he precise rajecory based on repeaed observaions and APs. In his chaper, we presen he mehod, which uses he zoom 5.1 sereorepeaed vision sysem. observaion To acquire he precise rajecory of he objec, we combined wo kinds of wo-sep analyses as shown in Table3. Table 3: Process of repeaed observaion Zoom Model Time descripion 1 x1 Coarse 0.6s exracion of APs 2 x2 Coarse 1.4s racking he objec in real-ime 3 x3 Fine 2.0s racking he objec in off-line Real objec Coarse model (168 polygons) Fig. 4: 3D Model Fine model (640 polygons) Process 1! 2 adops he mehod described in Secion 3.2 (repeaed observaion), while Process 2! 3 adops he mehod described in Secion 3.1. Fig.4 shows he objec and is CAD model used in his experimen. 5.1.1 Exracion of APs A firs, he zoom configuraion is se o x1(defaul) and he sysem roughly racks he objec for each hand acion using he 3DTM mehod. To esimae he iniial posiion and ime of he objec o be racked, we uilized daa-gloves used in he previous chaper; he gloves were enhanced by acile sensors o classify he grasping. The sysem can deec he grasping moion direcory from acile sensors, so i roughly esimaes he iniial objec posiion (from he polhemus sensor) a he ime of grasping. We used he coarse objec model during racking, because precise posiion and orienaion is no imporan. The sysem ges he rough rajecory and also ges he APs as he iniial posiion and ime of he racking. Fig.5 shows he racking resul (inensiy images overlaid wih he wire-frame model). 5.1.2 Tracking in repeaed observaion A his sage, he sysem demands he repeaed demonsraion of he same ask. In his sage, he doubly zoomed cameras canno pu he enire acion in sigh in a fixed orienaion; herefore, driving of he pan/il moving mechanism synchronized wih image processing in real-ime is necessary o rack he arge objec o be kep in he cener of he view. All he deph and inensiy images are recorded during racking. These images are used in he hird process below.

Fig. 5: Tracking a he firs sage Fig. 7: Tracking a he hird sage Fig. 6: Tracking a he second sage This racking is also processed by 3DTM wih he coarse objec model, because precise localizaion is no imporan. Fig.6 shows he racking resul (inensiy images overlaid wih he wire-frame model). 5.1.3 Esimaion of he precise rajecory A he hird sage, he sysem feches he recorded images and localizes he objec in each scene o esimae he precise rajecory wih he fine model. This image feching is he same echnique as ha described in chaper 4. This process is execued off-line. To localize he objec precisely, we developed a mehod o combine he 3DTM and 2DTM. 3DTM is a mehod for localizing he 3D model in he 3D poins obained from he deph daa[7]. 2DTM is he edge-based localizaion mehod beween he 3D model and he esimaed 3D edges of he conour of he objec, which are derived from he inensiy image[7]. 2DTM is sensiive o he edges in he image background and does no offer a good guess abou z posiion (parallel o he viewing direcion) of he model because of he approximaion of z posiion of he 3D edge. Bu, a he final sage of he localizaion, 2DTM offers a good guess abou he posiion and orienaion perpendicular o he viewing direcion. So, we firs adop he 3DTM only o localize he objec o he approximae posiion and hen we adop 2DTM & 3DTM combined mehod o localize he objec o he exac posiion as shown in Table4. 2DTM and 3DTM are calculaed in he same 3D space by M-esimaor (Lorenzian) wih differen weigh. Sigma is he parameer o reduce he effec of he ouliers. Fig.7 shows he racking resul. The upper row shows he inensiy images and he lower row shows he dispariy images. The conour of he objec s model is overlaid in Table 4: 2DTM & 3DTM combined localizaion Mehod Sigma[mm] 3DTM 10.0 3DTM 4.0 3DTM 2.0 3DTM & 2DTM 2.0 2DTM 1.0 each image. 5.2 Experimenal resul Our sereo vision consiss of zoom lens cameras. This is acually digial zooming bu, when capuring images, he sereo sysem re-samples each pixel a he raio of onequad, so we can expec ha o doubling he power of zooming value will no reduce he qualiy of he image capured by our sereo sysem. Sereo processing is done on a hardware chip and he sysem can acquire a deph image and he corresponding inensiy image (280 200) in 15fps a mos. The average racking rae of he firs sage is 0.6 [sec/frame] on our Penium3 500MHz PC. Similarly, he racking raes of he second sage and hird sage are 1.4 [sec/frame] and 2.0 [sec/frame], respecively. The difference in he rae beween he firs and he second sages is mainly due o he consrucion ime of he Kdree used in localizaion. We resriced he search area of 3DTM o be very close o he objec, he inside of he recangle shown in Fig.5 and Fig.6, so he search area of he firs sage is relaively smaller and he consrucion ime is shor. The difference in frame-rae beween he second and he hird sages is due o he difference in he number of ieraions in he localizaion process and he level of deail of he model. 6 Performance by robo We have developed a human-form robo as an experimenal plaform for learning and performing human hand-

acion asks[9]. The robo has similar capabiliies and body pars o hose of humans, including vision, dual arms and upper orso. When he robo is o perform he same ask afer consrucing a ask model, i searches for objecs on he able and, for each objec, i calculaes mean square disance beween he shape and color hisogram of he objec on he able and hose in he model daabase. The smalles value deermines he bes maching objecs. In his way, he robo recognizes he objec. Once he recogniion of he curren environmen is done, he robo sequenially execues he acion corresponding o each Acion Symbol in he ask model adaping o he curren environmen condiion. Fig.8 shows he experimenal resul in which he robo performed he same ask successfully. Fig. 8: Experimen 7 Conclusion We proposed a novel mehod of consrucing a human ask model by aenion poin (AP) analysis. Aenion poins relae and inegrae muliple observaions and consruc a locally enhanced ask model of human demonsraion. AP analysis consiss of wo seps. In he firs sep, acion segmen and APs are exraced. Then, a he second sep, by closely examining human demonsraion only around APs, he sysem exracs he aribue values and improves he model. By reducing unnecessary analysis, he sysem can consruc he ask model efficienly. Efficiency is imporan when we consider human-robo cooperaion asks in which he robo mus respond o he acion aken by boh a human and he robo iself in relaively shor ime. We presened wo kinds of AP analyses, one for inegraion of sensors available simulaneously and he oher for inegraion of sensors derived from differen observaions of he same ask by repeaed demonsraion. To realize he firs AP analysis, we proposed a shorerm memorizaion mehod, which records all he raw inpu daa around each AP o be processed a he second sep. And also, we proposed a localizaion mehod which combines 2DTM and 3DTM o rack and localize a moving objec robusly. The fuure work is o solve he problem of inegraing he rajecory informaion ino he curren ask model for raining he robo iself auomaically. We are also planning o combine his ask level acquisiion wih he behavior level acquisiion mehod [4]. Firs, he ask level acquisiion consrucs ask models o perform he enire ask o be adaped o he environmen. I also exracs special APs ha require behavior level acquisiion. Second, he behavior level acquisiion analyzes hose APs closely and obains a suiable moion sequence (sub-skill). This wo-layer approach should exend he capabiliies of he learning robo ha can acquire a human ask hrough observaion. Acknowledgmen This work is suppored, in par, by Japan Sociey for he Promoion of Science (JSPS) under he gran RFTF 96P00501, and, in par, by Japan Science and Technology Corporaion (JST) under Ikeuchi CREST projec. References [1] K. Ikeuchi and T. Suehiro: Toward an Assembly Plan from Observaion Par I: Task Recogniion Wih Polyhedral Objecs, IEEE Trans. Roboics and Auomaion, 10(3):368 384, 1994. [2] Y. Kuniyoshi, M. Inaba, and H. Inoue: Learning by waching, IEEE Trans. Roboics and Auomaion, 10(6):799 822, 1994. [3] H. Kimura, T. Horiuchi and K. Ikeuchi: Task- Model Based Human Robo Cooperaion Using Vision, IROS 99, 2:701 706, 1999. [4] J. Takamasu, H. Tominaga, K. Ogawara, H. Kimura and K. Ikeuchi: Symbolic Represenaion of Trajecories for Skill Generaion, IEEE ICRA, 4:4077-4082, 2000. [5] K. Ogawara, S. Iba, T. Tanuki, H. Kimura, K. Ikeuchi: Recogniion of Human Task by Aenion Poin Analysis, IEEE/RSJ IROS, 3:2121-2126, 2000. [6] M. R. Cukosky, On Grasp Choice, Grasp Models, and he Design of Hands for Manufacuring Tasks, IEEE Trans. on Roboics and Auomaion, 5(3):269 279, 1989. [7] M. D. Wheeler: Auomaic Modeling and Localizaion for Objec Recogniion, Ph.D Thesis, CMU, 1996. [8] K. M. Knill and S. J. Young: Speaker Dependen Keyword Spoing for Accessing Sored Speech, Cambridge Universiy Engineering Dep., Tech. Repor, No. CUED/F-INFENT/TR 193, 1994. [9] K. Ogawara and J. Takamasu and S. Iba and T. Tanuki and H. Kimura and K. Ikeuchi: Acquiring hand-acion models in ask and behavior levels by a learning robo hrough observing human demonsraions, IEEE Conf. on Humaniod Robos, 2000.