Person Tracking in Urban Scenarios by Robots Cooperating with Ubiquitous Sensors

Similar documents
Spring Localization I. Roland Siegwart, Margarita Chli, Martin Rufli. ASL Autonomous Systems Lab. Autonomous Mobile Robots

Role of Kalman Filters in Probabilistic Algorithm

Mobile Robot Localization Using Fusion of Object Recognition and Range Information

Exploration with Active Loop-Closing for FastSLAM

SLAM Algorithm for 2D Object Trajectory Tracking based on RFID Passive Tags

A Cognitive Modeling of Space using Fingerprints of Places for Mobile Robot Navigation

Knowledge Transfer in Semi-automatic Image Interpretation

Increasing multi-trackers robustness with a segmentation algorithm

Autonomous Humanoid Navigation Using Laser and Odometry Data

Estimation of Automotive Target Trajectories by Kalman Filtering

Distributed Multi-robot Exploration and Mapping

Foreign Fiber Image Segmentation Based on Maximum Entropy and Genetic Algorithm

A Comparison of EKF, UKF, FastSLAM2.0, and UKF-based FastSLAM Algorithms

Effective Team-Driven Multi-Model Motion Tracking

Multiple target tracking by a distributed UWB sensor network based on the PHD filter

arxiv: v1 [cs.ro] 19 Nov 2018

Social-aware Dynamic Router Node Placement in Wireless Mesh Networks

Autonomous Robotics 6905

DrunkWalk: Collaborative and Adaptive Planning for Navigation of Micro-Aerial Sensor Swarms

Comparing image compression predictors using fractal dimension

Memorandum on Impulse Winding Tester

Pointwise Image Operations

A Segmentation Method for Uneven Illumination Particle Images

Moving Object Localization Based on UHF RFID Phase and Laser Clustering

ECE-517 Reinforcement Learning in Artificial Intelligence

Motion-blurred star image acquisition and restoration method based on the separable kernel Honglin Yuana, Fan Lib and Tao Yuc

3D Laser Scan Registration of Dual-Robot System Using Vision

Simultaneous camera orientation estimation and road target tracking

Dynamic Networks for Motion Planning in Multi-Robot Space Systems

PARTICLE FILTER APPROACH TO UTILIZATION OF WIRELESS SIGNAL STRENGTH FOR MOBILE ROBOT LOCALIZATION IN INDOOR ENVIRONMENTS

MAP-AIDED POSITIONING SYSTEM

sensors ISSN

ECMA st Edition / June Near Field Communication Wired Interface (NFC-WI)

Receiver-Initiated vs. Short-Preamble Burst MAC Approaches for Multi-channel Wireless Sensor Networks

The vslam Algorithm for Navigation in Natural Environments

P. Bruschi: Project guidelines PSM Project guidelines.

Evaluation of the Digital images of Penaeid Prawns Species Using Canny Edge Detection and Otsu Thresholding Segmentation

Modeling and Prediction of the Wireless Vector Channel Encountered by Smart Antenna Systems

Color-Based Object Tracking in Multi-camera Environments

Inferring Maps and Behaviors from Natural Language Instructions

Location Tracking in Mobile Ad Hoc Networks using Particle Filter

Mobile Communications Chapter 3 : Media Access

ICAMechS The Navigation Mobile Robot Systems Using Bayesian Approach through the Virtual Projection Method

Distributed Tracking in Wireless Ad Hoc Sensor Networks

KALMAN FILTER AND NARX NEURAL NETWORK FOR ROBOT VISION BASED HUMAN TRACKING UDC ( KALMAN), ( ), (007.2)

ECMA-373. Near Field Communication Wired Interface (NFC-WI) 2 nd Edition / June Reference number ECMA-123:2009

Installing remote sites using TCP/IP

Variation Aware Cross-Talk Aggressor Alignment by Mixed Integer Linear Programming

The IMU/UWB Fusion Positioning Algorithm Based on a Particle Filter

DAGSTUHL SEMINAR EPIDEMIC ALGORITHMS AND PROCESSES: FROM THEORY TO APPLICATIONS

(This lesson plan assumes the students are using an air-powered rocket as described in the Materials section.)

Electrical connection

Performance Study of Positioning Structures for Underwater Sensor Networks

Investigation and Simulation Model Results of High Density Wireless Power Harvesting and Transfer Method

Stochastic Radio Interferometric Positioning with Unsynchronized Modulated Signals in Wireless Sensor Networks

Lecture September 6, 2011

Design and Implementation an Autonomous Mobile Soccer Robot Based on Omnidirectional Mobility and Modularity

A Real-time Computer Vision System for Measuring Trac. Parameters. David Beymer, Philip McLauchlan, Benn Coifman, and Jitendra Malik

Key Issue. 3. Media Access. Hidden and Exposed Terminals. Near and Far Terminals. FDD/FDMA General Scheme, Example GSM. Access Methods SDMA/FDMA/TDMA

Appearance-Based Multimodal Human Tracking and Identification for Healthcare in the Digital Home

Localizing Objects During Robot SLAM in Semi-Dynamic Environments

Lab 3 Acceleration. What You Need To Know: Physics 211 Lab

4 20mA Interface-IC AM462 for industrial µ-processor applications

An off-line multiprocessor real-time scheduling algorithm to reduce static energy consumption

Evaluation of Instantaneous Reliability Measures for a Gradual Deteriorating System

On line Mapping and Global Positioning for autonomous driving in urban environment based on Evidential SLAM

EXPERIMENT #9 FIBER OPTIC COMMUNICATIONS LINK

NEURAL NETWORK APPROACH TO BAYESIAN BACKGROUND MODELING FOR VIDEO OBJECT SEGMENTATION

A WIDEBAND RADIO CHANNEL MODEL FOR SIMULATION OF CHAOTIC COMMUNICATION SYSTEMS

Gestures Everywhere: A Multimodal Sensor Fusion and Analysis Framework for Pervasive Displays

Experiments in Vision-Laser Fusion using the Bayesian Occupancy Filter

Negative frequency communication

2600 Capitol Avenue Suite 200 Sacramento, CA phone fax

Multiple Load-Source Integration in a Multilevel Modular Capacitor Clamped DC-DC Converter Featuring Fault Tolerant Capability

Experiment 6: Transmission Line Pulse Response

A Multi-model Kalman Filter Clock Synchronization Algorithm based on Hypothesis Testing in Wireless Sensor Networks

Particle Filtering and Sensor Fusion for Robust Heart Rate Monitoring using Wearable Sensors

Reducing Computational Load in Solution Separation for Kalman Filters and an Application to PPP Integrity

R. Stolkin a *, A. Greig b, J. Gilby c

Multiuser Interference in TH-UWB

Abstract. 1 Introduction

5 Spatial Relations on Lines

4.5 Biasing in BJT Amplifier Circuits

Prediction of Pitch and Yaw Head Movements via Recurrent Neural Networks

Bounded Iterative Thresholding for Lumen Region Detection in Endoscopic Images

Comparitive Analysis of Image Segmentation Techniques

A New Voltage Sag and Swell Compensator Switched by Hysteresis Voltage Control Method

Teacher Supplement to Operation Comics, Issue #5

Dimensions. Model Number. Electrical connection emitter. Features. Electrical connection receiver. Product information. Indicators/operating means

An Indoor Pedestrian Localization Algorithm Based on Multi-Sensor Information Fusion

Learning Spatial-Semantic Representations from Natural Language Descriptions and Scene Classifications

Fuzzy Inference Model for Learning from Experiences and Its Application to Robot Navigation

Development of Temporary Ground Wire Detection Device

Lecture #7: Discrete-time Signals and Sampling

A NEW DUAL-POLARIZED HORN ANTENNA EXCITED BY A GAP-FED SQUARE PATCH

Particle Filter-based State Estimation in a Competitive and Uncertain Environment

EE 330 Lecture 24. Amplification with Transistor Circuits Small Signal Modelling

Learning Spatial-Semantic Representations from Natural Language Descriptions and Scene Classifications

Communications II Lecture 7: Performance of digital modulation

Line Structure-based Localization for Soccer Robots

Transcription:

Person Tracking in Urban Scenarios by Robos Cooperaing wih Ubiquious Sensors Luis Merino Jesús Capián Aníbal Ollero Absrac The inroducion of robos in urban environmens opens a wide range of new poenial applicaions for service roboics. One of hese applicaions is people guidance. To accomplish his ask, he robo needs informaion abou he posiion of he person. Sensors embedded in he urban environmen can complemen he percepion of he robo in his case. The paper shows how he combinaion of he robo sensorial informaion wih ha from a camera nework and wih a wireless sensor nework, is very useful o cope wih racking failures by being more robus under occlusion, cluer and lighing changes. The paper summarizes he main characerisics of he algorihms for racking wih he fixed surveillance cameras and cameras on board roboic sysems. I also presens resuls on posiion racking by using he srengh of he radio signal from he nodes of Wireless Sensor Nework (WSN). The esimaes from all hese sources are hen combined using a decenralized daa fusion algorihm o provide an increase in performance. This scheme is scalable, can cope wih communicaion laencies and degrades smoohly wih communicaion failures. We presen resuls of he sysem, operaing in real ime, in a large oudoor environmen, including 22 nonoverlapping cameras, 30 wireless sensor nodes and one mobile robo. I. INTRODUCTION Many major ciies in Europe are looking for means of reducing he raffic in cerain areas, in order o miigae air and noise polluion, raffic jams and, in general, o improve he qualiy of life. I is inended o develop auomaic sysems o perform, in free car areas, services such as person guiding, people and objecs ransporaion, surveillance, ec. The EU Projec called URUS (Ubiquious Neworking Roboics in Urban Seings) [1] considered a eam of mobile robos, a se of saic cameras and a Wireless Sensor Nework (WSN) for hese asks. In paricular, he applicaion of person guidance requires he abiliy o deermine and rack he posiion of he person o be guided. This applicaion requires he collaboraion of differen sysems, as, in many cases, a single auonomous eniy (i.e. a robo or a saic surveillance camera) is no able o acquire all he informaion required because of he This work is parially suppored by he FROG Projec (ICT-288235) and he URUS projec (IST-2006-045062) funded by he European Commission, and he projec RURBAN, funded by he Andalusian Governmen (P09- TIC-5121). Jesús Capián is also funded by Fundação para a Ciência e a Tecnologia (ISR/IST pluriannual funding) hrough he PIDDAC Program funds and by projec CMU-PT/SIA/0023/2009 (Carnegie Mellon-Porugal Program). Luis Merino is wih Pablo de Olavide Universiy, Seville, Spain lmercab@upo.es. Jesús Capián is wih Insiuo Superior Tecnico, Lisbon, Porugal jescap@isr.is.ul.p. Aníbal Ollero are wih Universiy of Seville, Seville, Spain [aollero]@caruja.us.es. characerisic of he ask or he harmful condiions (i.e. loss of visibiliy). The se of fixed cameras can obain global views of he scene. However, as hey are saic, hey canno deal wih non-covered zones, shadows can affec he sysem and so forh. Robos carry local cameras and can move o adequae places, reacing o he changing condiions. However, heir field of view is limied and hey can lose he person hey are racking. Wireless devices can also help o localize he people, by esimaing heir posiions measuring he signal srengh from differen saic receivers. However, he resoluion obained is usually low, and depends on he densiy of anchored receivers. In his paper we show how he informaion from he above differen sysems can be fused o improve he performance. In order o cope wih scalabiliy, a decenralized daa fusion algorihm is employed. In his algorihm only local esimaion and local communicaion are used. Then, his paper focuses on he percepion sysem of URUS and he experimenal resuls obained (see [1] for a more general descripion of he projec). Nex secion will review relaed work. Afer an overview of he full sysem in Secion II, he paper will presen he individual inpu sensor algorihms. Thus, Secions III and IV summarize he process o exrac informaion from a se of fixed cameras and from cameras on board robos. Secion V explains he use of he signal srengh from wireless sensors for racking. Finally, he resuls of he racking from all sensors are used o infer he posiion of he person in a global coordinae sysem hrough a daa fusion process. This sysem is described in Secion VI. The paper ends showing resuls obained during he experimens of he URUS projec, in an urban scenario involving 22 fixed cameras, a WSN of 30 wireless Mica2 nodes, and a mobile robo. A. Relaed Work There has been an increasing amoun of research on person racking in he lieraure. Mos works describe a single sysem or algorihm for person racking using vision, laser range-finders or oher sensors. Thus, here has been many aemps o rack people and oher moving objecs using neworks of fixed cameras. The early racking algorihms [2], [3] require boh camera calibraion and overlapping fields of view o compue he handover of objecs of ineres beween cameras. Ohers [4] can work wih non-overlapping cameras bu sill require calibraion. More recen works [5], [6] do no require a priori calibraion o be explicily saed; insead hey use he observed moion over ime o esablish reappearance periods beween cameras.

Fig. 1: A block descripion of he URUS percepion sysem. The differen subsysems are inegraed in a decenralized manner hrough a se of decenralized daa fusion nodes. Locally, each sysem can process and inegrae is daa in a cenral way (like he WSN) or in a disribued way (like he camera nework). Some sysems could obain informaion from he res of he nework even in he case hey do no have local sensors. Tracking from mobile plaforms like robos in oudoor scenarios is a hard problem affeced by cluer, illuminaion changes in he case of vision approaches, occlusions, ec. Mos of he echniques combine people deecion and people racking modules for he ask. The people deecion module ries o obain person hypoheses analyzing he sensor daa, and is usually compuaionally demanding. Many classificaion echniques are used for his ask, like boosing [7], SVM [8], ec. The racking module is usually a feaure racking algorihm applied o he iniial hypohesis given by he deecion module, which can run a higher pace han he deecion algorihm, like CamShif [9]. In mos cases, boh modules suppor each oher, so when he racker is los new hypoheses from he deecor can be used. More complex combinaions, including wha is called cogniive feedback are also considered [10]. There is also work devoed o he racking of mobile nodes by using radio signals, which is he problem of esimaing he posiion of a mobile node from he signal received by a se of saic devices whose posiions are known. A uorial on he main issues and approaches for he problem is presened in [11]. Many algorihms use, beside signal srengh, addiional informaion o obain range esimaes or even direcion of arrival esimaes. For insance, [12] considers he use of paricle filers for racking a mobile node using Time of Arrival, Difference of Time of Arrival and power measuremens, presening resuls in simulaion. The work [13] uses he Doppler shif of inerference signals o esimae he velociy and posiion of mobile nodes. These approaches require he precise synchronizaion of he emission of signals. In our approach, only signal srengh is used, hrough a calibraed model for radio propagaion. There are approaches in which his model is learn; [14] presens an approach in which Gaussian Processes are used as non-parameric models for he errors in indoor signal propagaion. The key issue in he paper is o show how he combinaion of he local informaion obained by he robo wih he informaion received from ubiquious sensors in he environmen can improve grealy he resuls. Moreover, a decenralized daa fusion approach is employed, producing an scalable soluion wih respec o he number of subsysems. II. URUS SYSTEM OVERVIEW The URUS sysem consiss of a eam of mobile robos, equipped wih cameras, laser range-finders and oher sensors for localizaion, navigaion and percepion; a fixed camera nework of more han 20 cameras for environmen percepion; and a WSN of 30 Mica2 nodes ha uses he signal srengh of he received messages from a mobile device o deermine he posiion of a person carrying i. Figure 1 shows a simplified version of he percepion sysem used in URUS. The sysem consiss of a se of fusion nodes which implemen a decenralized daa fusion algorihm. Each node only employs local informaion (daa from local sensors; for insance, a subse of cameras, or he sensors on board he robo) o obain a local esimaion of he variables of ineres (in his case, he posiion of he person being racked). Then, hese nodes share heir local esimaions among hemselves if hey are wihin communicaion range. The nodes only use local communicaions and daa, and hen he sysem is scalable. Also, each node can accumulae informaion from is local sensors, so emporal communicaion failures can be oleraed wihou losing informaion. Noice ha he way in which a paricular fusion node processes is local daa can have a disribued or even cenralized implemenaion iself. For he camera nework, each fusion node considers informaion from a small subse of cameras, which are processed in a disribued way (as i will be described in Secion III), wih a separae racker obaining esimaions from each camera. For he case of he WSN, messages from all he nework are processed in a gaeway o localize he mobile node using he signal srengh (Secion V). Moreover, each robo locally processes is daa (on-board cameras). Then, he local esimaions of

(a) (b) Fig. 2: (a) Tracks on he image plane of 4 differen cameras. The ideniy is correcly handed over he cameras using he weak cues described in Secion III. (b) Esimaed posiion of he person on he experimenal sie. he differen elemens are fused in a decenralized way using he algorihm presened in Secion VI. Thus, he sysem is easily scalable: for insance, a new se of cameras could be included by adding a new fusion node in charge of hese cameras (and maybe a new server o process he informaion of hese cameras); even robos wihou local sensors (receiving informaion from he res of he nodes) could be added o he sysem. III. FIXED CAMERA TRACKING The nework of fixed cameras covers a wide area of he experimen sie and herefore, in mos cases, hey iniiae he person guidance; hey are able o rack objecs of ineres boh on and across differen cameras wihou explici calibraion periods. Wihin his paper, he fixed camera racking algorihm of Gilber and Bowden [15], [16] is used. A local racker processes he daa from each camera. Background modelling and subracion is used o idenify foreground objecs, and Kalman filering is used o provide emporal correspondence beween deeced objecs. Very ineresingly, he rackers are able o learn inercamera relaionships for iner-camera objec handling, even wihou camera calibraion or overlapping. By using weak cues, he sysem is able o incremenally build probabiliy disribuions on he possibiliy ha a person leaving one camera eners a differen camera some ime inerval afer. This informaion, combined wih color hisograms, is used for iner-camera racking. More deails can be found in [15]. Alhough no used by he sysem o esimae he iner-camera relaionships, he cameras are homographycalibraed, so i is possible o obain a 2D esimaion on he posiion of he people racked in he map of he URUS scenario. Figure 2 shows an example, in which a person is racked using 4 differen cameras wih lile or no overlap a all (and is ideniy mainained) using he echniques described in [15]. The figure shows he esimaed posiion of he person using only informaion from he camera nework. IV. ROBOT CAMERA TRACKING The robos carry on-board cameras ha are used for person guiding. This cameras can be used o obain local esimaions on he posiion of he person o be guided. The algorihms employed for his are based on a combinaion of sae-ofhe-ar algorihms for person deecion and racking. The person deecion algorihm applied o he image is he one in [17]. This deecion module is launched when he robo is requesed o guide a person and i is close o he locaion where he person is waiing. Once he person is deeced, i is racked by using a racking algorihm which is based on he CamShif echnique [9]. While he algorihm is able o handle emporal occlusions, he racking sysem is no enough o mainain he rack on he person coninuously due o changes in illuminaion, he changing field of view of he camera due o he robo moion, or even he person going ou of he field of view. Therefore, he resuls from he racking and he deecion applicaions are combined, so ha he robo employs he person deecor whenever he racker is los o recover he rack. The algorihm deermines ha he person is los employing some heurisics, like he rack going ou o he limis of he image or size resricions on he blob. As a resul, he robos can obain esimaions of he pose of he person on he image plane. Some improvemens can be applied o he feaures in order o cope wih illuminaions changes [18]. However, in general, hese algorihms are no robus enough o be able o guide one person hrough he whole scenario. Furhermore, hey can rack he wrong people someimes. Moreover, from informaion from one camera alone i is no possible o esimae he full 3D posiion of he person. Nex secions will

Fig. 3: Paricles (red) are used o represen person hypoheses. The signal received by a se of saic nodes can be used o infer he posiion of he node. The filer is iniiaed when he firs message is received by sampling uniformly from a spherical annulus around he receiver. Map informaion is also aken ino accoun (only free spaces wihin he annulus are considered). show how he combinaion of he local camera informaion and he informaion from he oher subsysems (camera nework and WSN) can overcome hese problems. V. WIRELESS SENSOR NETWORK TRACKING A nework of wireless Mica2 sensor nodes is also considered. The signal srengh received by he se of saic nodes (Received Signal Srengh Indicaor, RSSI) can be used o infer he posiion of a person carrying one of he nodes (he emier). The algorihm o esimae and rack he node posiion is based on paricle filering. In he paricle filer, he curren belief abou he posiion of he mobile node is defined by a se of paricles {x (i) }, which represen hypoheses abou he curren posiion of he person ha carries he node (see Figure 3). In each ieraion of he filer, kinemaic models of he moion of he person and map informaion are used o predic he fuure posiion of he paricles. The likelihood of hese paricles is updaed any ime new messages are received from he saic nework. The echnique is summarized in Algorihm 1, where z j is he measuremen provided by each saic node j, consising of is posiion x j and he srengh RSSI j of he received signal from he mobile node. Nex subsecions furher describe he main seps in his algorihm. A. Prior, predicion and imporance funcions The filer is iniialized wih he firs message received from he mobile node, considering an uniform disribuion on a spherical annulus around he receiver. The map of he scenario is aken ino accoun when sampling from his prior (see Figure 3), considering ha he person is no inside any building. Algorihm 1 {x (i), ω (i) ; i = 1,..., L} Paricle filer({x (i) 1 ω(i) 1 ; i = 1,..., L}, zj = {x j, RSSI j }) 1: for i = 1 o L do 2: x (i) sample kinemaic model (x (i) 1 ) 3: end for 4: if Message from nework z j hen 5: for i = 1 o L do 6: Compue d (i) 7: Deermine µ(d (i) = x (i) 8: Updae weigh ω (i) p(rssi j x (i) 9: end for 10: end if 11: Normalize weighs {ω (i) x j ) and σ(d (i) ) = N (µ(d (i) ) = p(rssi j x (i) ), σ(d (i) )) }, i = 1,..., L 1 )ω (i) 1 wih 12: Compue N eff = L i=1 (ω(i) ) 2 13: if N eff < N h hen 14: Resample wih replacemen L paricles from {x (i) ω (i) 15: end if, ω (i) ; i = 1,..., L}, according o he weighs Each ime sep, he posiion of he paricles are prediced from heir previous posiion (Line 2 of Algorihm 1). The predicion funcion uses a Brownian moion model [19]. This model is combined wih map informaion o discard unfeasible moions (like going hrough walls); paricles arriving a occupied places are rejeced and subsiued by new sampled paricles. Oher predicion models could be used as well. B. The likelihood funcion The likelihood funcion p(rssi x ) plays a very imporan role in he esimaion process, since each ime a message is received his likelihood is used o updae he paricles weighs (Lines 5 o 9). The likelihood models he correlaion ha exiss beween he disance ha separae wo nodes and he RSSI value, alhough his correlaion decreases wih he disance beween he wo nodes, ransmier and receiver [20]. This is mainly caused by radio-frequency effecs such as radio reflecion, muli-pah or anenna polarizaion. The model used here considers ha he condiional densiy p(rssi j x ) can be approximaed as a Gaussian disribuion for a given disance d j = x x j beween he mobile node and saic node j, as follows: RSSI j = µ(d j ) + N (0, σ(d j )) (1) where he funcions µ(d j ) and σ(d j ) are non-linear funcions of he disance (which iself is a non-linear funcion of he sae). These funcions are esimaed during a calibraion procedure (he form of he funcions and he calibraion procedure are described in [20]). C. Filer evoluion Alhough Secion VII will show addiional resuls, Figure 4 presens he evoluion of he paricles for a paricular

racking experimen performed a he experimenal sie. 500 paricles are employed, and he algorihm runs a more han 1 Hz. When he filer converges o a Gaussian disribuion, he esimaed mean and covariance can be fed o he decenralized fusion sysem ha will be explained in he nex secion. VI. DECENTRALIZED DATA FUSION FOR PEOPLE GUIDANCE Using he rackers described above, he camera nework, he robos and he WSN will be able o obain local esimaions of he posiion of he people on he image plane or in a 3D coordinae sysem. Tha informaion provided by each racker, characerized as Gaussian disribuions (mean and covariance marix), can be fused in order o obain a more accurae esimaion of he 3D posiion of he person. As commened in Secion II, he idea is o implemen a decenralized fusion approach, in which each node only employs local informaion (daa only from local sensors, for insance, a camera subne, or he sensors on board he robo), and hen shares is esimaion wih oher nodes (see Figure 1). Thus, scalabiliy and robusness are improved and bandwidh requiremens alleviaed. This fusion algorihm is based on an Informaion Filer and is described in [21], [22]. Here, he main conceps are summarized. A. Delayed-Sae Informaion Filer The Informaion Filer (IF), which corresponds o he dual implemenaion of he Kalman Filer (KF), is a suiable approach for decenralized sae esimaion. Whereas he KF represens a Gaussian disribuion on he sae x using is firs µ and second Σ order momens, he IF employs he socalled canonical represenaion. The fundamenal elemens are he informaion vecor ξ = Σ 1 µ and he informaion marix Ω = Σ 1. Predicion and updaing equaions for he (sandard) IF can also be derived from he sandard KF [21]. In he case of non-linear predicion or measuremen models, firs order linearisaion leads o he Exended Informaion Filer (EIF). A Delayed-Sae Informaion Filer mainains no jus he las sae, bu a belief over he full rajecory of he sae up o he curren ime sep, denoed by Ω and ξ. B. Decenralized Informaion Filer The main ineres of he IF is ha i can be easily decenralized. In a decenralized approach, each urban robo or ubiquious eniy represens a node i wihin he nework, which employs only is local daa z i o obain a local esimaion of he person rajecory (given by ξ i, and Ω i, ) and hen shares is belief wih is neighbours. Therefore, each node i will run a Delayed-Sae EIF using only is local informaion, and will fuse locally he received informaion ξ j, and Ω j, from anoher node j in order o improve he local percepion of he world. Ideally, he decenralized fusion rule should produce he same resul locally as ha obained by a cenral node employing a cenralized filer. In [21] he auhors propose he nex fusion rule: Ω i, Ω i, + Ω j, Ω ij, (2) ξ i, ξ i, + ξ j, ξ ij, (3) The above equaions mean ha each node should sum up he informaion received from oher nodes. The addiional erms Ω ij, and ξ ij, represen he common informaion beween he nodes. This common informaion is due o previous communicaions beween nodes, and should be removed o avoid double-couning of informaion (known as rumour propagaion [23]). As long as a ree-shaped logical opology in he percepion sysem (no cycles or duplicaed pahs of informaion) is assumed, his common informaion can be mainained by a separaed EIF so-called channel filer [24]. I is imporan o remark ha, using hese fusion equaions and considering rajecories (delayed saes), he local filer can obain an esimaion ha is equal o ha obained by a cenralized sysem [21] (provided ha enough ime has passed o allow he informaion o flow hrough he differen nework nodes). Anoher advanage of using delayed saes is ha he belief saes can be fused asynchronously wihou missing informaion. Each sensor can accumulae evidence, and send i whenever i is possible. Also, asequen and delayed measuremens can be incorporaed in he filer. However, as he sae grows over ime, he size of he message needed o communicae is belief also does. For he normal operaion of he sysem, only he sae rajecory over a ime inerval is needed, so hese belief rajecories can be bounded by marginalizing ou old saes. Noe ha he rajecories should be longer han he maximum expeced delay in he nework in order no o miss any measuremens informaion. Finally, when no assumpions abou he nework opology can be made (e.g. due o he exisence of mobile objecs, possible losses of communicaion links, ec), anoher opion o remove he common informaion is o employ a conservaive fusion rule, which ensures ha he sysem does no become overconfiden even in presence of duplicaed informaion a he cos of losing opimaliy in he fusion. For he case of he IF, here is an analyic soluion for his, given by he Covariance Inersecion algorihm of [25]. C. Daa associaion Each fusion node of he sysem should be able o associae is local observaions wih he curren racks. In he case of he camera nework, his is done by combining he iner-camera informaion and geomeric informaion. As commened in Secion III, he sysem is able o handle inercamera racking wihou calibraion, using as weak cues reappearance probabiliies and color informaion. Therefore, he sysem uses his informaion for daa associaion. As his scheme may fail, he non-associaed observaions are also passed hrough a daa associaion procedure based on he Mahalanobis disance, using he esimaed global person posiion obained using he homographies.

(a) (b) (c) (d) Fig. 4: A sequence of he 500 paricles employed in he filer for his experimen. Red poins represen he paricles. Yellow poins represen he saic nodes, being he green one he emier a each frame. A person carrying he mobile node ravels from righ o lef in he corridor a he boom. The daa associaion in he case of he WSN node is sraighforward, as he messages from he WSN are agged wih an ID. The image racker in he case of he robos mainains he ideniy of he racked people while hey are on he image plane. The Mahalanobis disance is also used o associae new measuremens wih previous racks. Moreover, he decenralized nodes should be able o associae he received racks wih he local racks. For his rack-o-rack fusion, he Mahalanobis disance is used again. (a) (b) VII. EXPERIMENTAL RESULTS The echniques described above were esed during he experimenal sessions of he URUS EU Projec. These experimens were carried ou a he Barcelona Robo Lab, which is an oudoor urban experimenal roboics sie locaed a he UPC (Universidad Poliécnica de Caaluña) campus Nord. In order o build he sysem of Secion II, 22 fixed color video cameras were insalled and conneced hrough a Gigabi Eherne connecion o a compuer rack, as well as wireless sensor nodes for localizaion purposes and 9 WLAN anennas wih complee area coverage. URUS proposed people guidance as one of he possible applicaions for he above urban scenario. Firs, by means of a mobile phone, a person calls for a robo in order o receive he service. Then, he closes available robo wih his funcionaliy approaches and idenifies he person, and guides him/her o he requesed final desinaion. In all his process, he decenralized daa fusion beween he ubiquious sensors is essenial in order o help he robo wih he guidance ask. A. Robo and WSN In order o illusrae he benefis from he daa fusion process, a firs seup is presened here. This seup considers informaion from one camera on board he robo Romeo (4- wheel vehicle, see Figure 7) and he WSN (30 nodes). The objecive was o rack he posiion of a person cooperaively while he robo was guiding. In his case, jus wo nodes of he decenralized fusion scheme were used: one on board he robo and one for (c) Fig. 5: (a) The person was carrying a Mica2 node during he experimen. (b,c,d) The robo was able o obain local observaions on he image plane of he face of he person. he WSN. These nodes locally inegraed informaion from a monocular camera (see Figure 5) and from he signal srengh-based esimaions (Secion V, see Figure 5a), respecively. Figure 6 shows he X and Y esimaions obained by he robo alone and when he robo combines is informaion wih he one provided by he WSN. In his case, as ground ruh we have he rajecory of he robo measured by is navigaion sofware. The person is following behind he robo (see Figure 7) (which in his rajecory means ha he X coordinaes of he person are larger han ha of he robo) and some meers beside he robo (a lower Y coordinae). I can be seen how he inroducion of he WSN reduces he uncerainy; as we have a monocular camera, he uncerainy on he person posiion is quie big in boh axes when he robo is alone. In his case, he iniial posiion of he (d)

(a) (a) (b) Fig. 6: Tracking using one on-board camera and he WSN. Black: robo alone. Green: robo and WSN. Dashed lines are he sigma inervals and he blue solid line represens he robo rajecory. Fig. 7: Tracks obained by he camera nework. person is compued assuming a known heigh of he face. In he second case, he 3D esimaion of he WSN is used o iniiae he filer. B. Robo, WSN and camera nework In his seup an experimen on a larger area is shown. This ime one robo, he WSN and 7 fixed cameras were used. Again, here was a person following he robo whose posiion had o be esimaed. The seup of he percepion sysem was one decenralized fusion node on he robo, one for he daa from he WSN and 2 fusion nodes for he fixed cameras, one inegraing measuremens from 3 camera rackers and he anoher from 4 cameras. Figure 7 shows some examples of he racks obained by he camera nework. Along he rajecory here were gaps in he camera coverage. Moreover, he The robo los a imes he objec is following due o he changes in illuminaion, ec. Fig. 8: Esimaed posiion of he person (blue) compared o he posiion of he robo (green). Dashed lines represen he sandard deviaion of he esimaion. (a) Complee rajecory. (b) An inerval of he rajecory. The person was following he robo wih he same X coordinae up o ime 80 seconds. Then he robo changed orienaion. The person was separaed from he robo around 3-4 meers. Figure 8a shows he esimaed posiion of he person wih he full sysem running. The oal lengh of he experimen was around 350 meers and 5 minues. The person was usually besides he robo (which means ha he X or Y coordinaes are he same). The sysem was able o mainain he esimaion of he person posiion for he full rajecory. There was WSN coverage beween 0 and 150 seconds, approximaely. Figure 8b shows an inerval of he rajecory. In his par, only WSN and robo informaion were available. Alhough he WSN measuremens have low accuracy, hey allow he sysem o bound he error from he robo monocular camera. A ime 75 approximaely, he person enered under coverage of he camera nework, which led o a big reducion in uncerainy. During all he above experimens, he communicaion beween he fusion nodes on board he robo and he fusion nodes relaed o he camera nework and he WSN was done using WiFi and 3G. A sofware running on he robo was able o measure he qualiy of he WiFi links, and o swich o 3G whenever his qualiy dropped below a cerain hreshold. The swiching beween communicaion neworks creaed from ime o ime communicaion breakdowns of several seconds. Moreover, alhough 3G had a more sable coverage in he scenario, i had also lower bandwidh and higher laencies han WiFi. In order o ackle hese problems, i was crucial he use of a decenralized sysem wih delayed saes, as in he meanime, he local nodes were accumulaing informaion. When he communicaion links were recovered, he nodes exchanged heir esimaions. Moreover, as delayed (b)

saes were considered, his delayed informaion (and also informaion delayed due o he laencies) could be fused in a correc way, and no informaion was los. VIII. CONCLUSIONS In urban scenarios, he cooperaion beween mobile robos and ubiquious sensors can provide soluions o problems in which single, even if powerful, sysems can fail. Very complex algorihms employing jus one source of informaion are usually unable o cope wih all he poenial siuaions in hese scenarios, affeced by changes in illuminaion, cluer, and in which a wide area mus be covered. The combinaion of complemenary sysems can be useful for his problem. This paper has presened a sysem ha aims o use muliple sensors o accuraely rack people wihin a guidance applicaion. The sysem uses exensively daa fusion procedures o incorporae all he informaion available. Scalabiliy is an issue in hese sysems, and hus decenralized algorihms are required. The sysem presened is a mixure beween disribued or cenralized subsysems ha are linked hrough a decenralized daa fusion scheme. The addiion of new robos or sub-nes of cameras does no affec he res of he percepion sysem in erms of sorage, as only local communicaion and local processing is used. The algorihms are real-ime and have been esed in he urban scenario proposed by he URUS Projec, consising of a camera nework wih 22 cameras, a WSN wih 30 nodes and mobile robos. Fuure developmens include he inegraion of acive sensing behaviors in he sysem. The WSN can be acively conrolled o save energy, acivaing hose nodes more useful for racking. Nex seps also include closing he loop, and developing more complex robo navigaion algorihms for social people guiding by robos. This will be he focus of he FROG European projec: besides posiioning informaion, informaion like human commimen will be exraced and used o develop robo moions ha are socially accepable. IX. ACKNOWLEDGMENTS The auhors would like o hank all he parners in he URUS projec for heir suppor during he differen experimens. REFERENCES [1] A. Sanfeliu, J. Andrade-Ceo, M. Barbosa, R. Bowden, J. Capian, A. Corominas, A. Gilber, J. Illingworh, L. Merino, J. Miras, P. Moreno, A. Ollero, J. Sequeira, and M. Spaan, Decenralized Sensor Fusion for Ubiquious Neworking Roboics in Urban Areas, Sensors, vol. 10, pp. 2274 2314, 2010. [2] T. Chang, S. Gong, and E. Ong, Tracking Muliple People under Occlusion using Muliple Cameras, In Proc. of BMVA Briish Machine Vision Conference (BMVC 00), pp. 566 575, 2000. [3] V. Morariu and O. Camps, Modeling Correspondences for Muli- Camera Tracking using Nonlinear Manifold Learning and Targe Dynamics, In Proc. of IEEE Inernaional Conference on Compuer Vision and Paern Recogniion (CVPR 06), vol. I, pp. 545 552, 2006. [4] T. Huang and S. Russell, Objec Idenificaion in a Bayesian Conex, In Proc. of Inernaional Join Conference on Arificial Inelligence (IJCAI-97), pp. 1276 1283, 1997. [5] P. KaewTrakulPong and R. Bowden, A Real-ime Adapive Visual Surveillance Sysem for Tracking Low Resoluion Colour Targes in Dynamically Changing Scenes, In Journal of Image and Vision Compuing, vol. 21, no. 10, pp. 913 929, 2003. [6] T. Ellis, D. Makris, and J. Black, Learning a Muli-Camera Topology, In Proc. of Join IEEE Workshop on Visual Surveillance and Performance Evaluaion of Tracking and Surveillance (VS-PETS), pp. 165 171, 2003. [7] O. M. Mozos, R. Kurazume, and T. Hasegawa, Muli-par people deecion using 2D range daa, Inernaional Journal of Social Roboics, 2010. [8] L. E. Navarro-Sermen, C. Merz, and M. Heber, Pedesrian deecion and racking using hree-dimensional ladar daa, in Proc. of The 7h In. Conf. on Field and Service Roboics, July 2009. [9] G. Bradski, Compuer Vision Face Tracking as a Componen of Percepual User Inerface, In Proc. of Workshop on Applicaions of Compuer Vision, pp. 214 219, 1998. [10] A. Ess, B. Leibe, K. Schindler, and L. V. Gool., A Mobile Vision Sysem for Robus Muli-Person Tracking, in IEEE Conference on Compuer Vision and Paern Recogniion (CVPR), 2008. [11] F. Gusafsson and F. Gunnarsson, Mobile Posiioning using Wireless Neworks, IEEE Signal Processing Magazine, pp. 41 53, 2005. [12] P. Norlund, F. Gusafsson, and F. Gunnarsson, Paricle Filers for Posiioning in Wireless Neworks, in Proceedings of EUSIPCO, 2002. [13] B. Kusý, A. Ledeczi, and X. Kousoukos, Tracking mobile nodes using RF Doppler shifs, in Proceedings of SenSys, 2007, pp. 29 42. [14] G. Hollinger, J. Djugash, and S. Singh, Tracking a moving arge in cluered environmens wih ranging radios, in IEEE Inernaional Conference on Roboics and Auomaion, May 2008. [15] A. Gilber and R. Bowden, Incremenal, Scalable Tracking of Objecs Iner Camera, In Compuer Vision and Image Undersanding (CVIU), vol. 3, pp. 43 58, 2008. [16] A. Gilber, J. Capián, R. Bowden, and L. Merino, Accurae Fusion of Robo, Camera and Wireless Sensors for Surveillance Applicaions, in In Proc. Ninh IEEE Inernaional Workshop on Visual Surveillance (ICCV09), Kyoo, Japan, 2009. [17] P. Viola and M. Jones, Robus Real-Time Face Deecion, Inernaional Journal of Compuer Vision, vol. 57, pp. 137 154, 2004. [18] M. Villamizar, J. Scandaliaris, A. Sanfeliu, and J. Andrade-Ceo, Combining color invarian gradien deecor wih HOG descripors for robus image deecion in scenes under cas shadows, in Proceedings of he Inernaional Conference on Roboics and Auomaion, ICRA, 2009. [19] L. D. Sone, T. L. Corwin, and C. A. Barlow, Bayesian Muliple Targe Tracking. Norwood, MA, USA: Arech House, Inc., 1999. [20] F. Caballero, L. Merino, P. Gil, I. Maza, and A. Ollero, A probabilisic framework for enire wsn localizaion using a mobile robo, Journal of Roboics and Auonomous Sysems, vol. 56, no. 10, pp. 798 806, 2008. [21] J. Capián, L. Merino, F. Caballero, and A. Ollero, Delayed-Sae Informaion Filer for Cooperaive Decenralized Tracking, in Proceedings of he Inernaional Conference on Roboics and Auomaion, ICRA, 2009. [22], Decenralized Delayed-Sae Informaion Filer (DDSIF): A new approach for cooperaive decenralized racking, Roboics and Auonomous Sysems, vol. 59, no. 6, pp. 376 388, 2011. [23] E. Neleon, H. Durran-Whye, and S. Sukkarieh, A robus archiecure for decenralised daa fusion, in Proc. of he Inernaional Conference on Advanced Roboics (ICAR), 2003. [24] F. Bourgaul and H. Durran-Whye, Communicaion in general decenralized filers and he coordinaed search sraegy, in Proc. of The 7h In. Conf. on Informaion Fusion, 2004, pp. 723 730. [25] S. Julier and J. Uhlmann, A non-divergen esimaion algorihm in he presence of unknown correlaions, in Proceedings of he American Conrol Conference, vol. 4, Jun. 1997, pp. 2369 2373.