Fourth IEEE Workshop on Applcatons of Computer Vson, October 1998, Prnceton, New Jersey, USA Vdeo Occupant Detecton for Arbag Deployment John Krumm and Greg Krk Intellgent Systems & Robotcs Center Sanda Natonal Laboratores Albuquerque, NM 87185 Fgure 1: Empty, nfant, and occuped seats as seen by camera nsde vehcle. Arbag should not deploy on empty nor nfant seat. Abstract When an arbag deploys on a rear-facng nfant seat, t can nure or kll the nfant. When an arbag deploys on an empty seat, the arbag and the money to replace t are wasted. We have shown that vdeo mages can be used to determne whether or not to deploy the passenger-sde arbag n a crash. Images of the passenger seat, taken from a vdeo camera mounted nsde the vehcle, can be used to classfy the seat as ether empty, contanng a rear-facng nfant seat, or occuped. Our frst experment used a sngle, monochrome vdeo camera. The system was automatcally traned on a seres of test mages. Usng a prncple components (egenmages) nearest neghbor classfer, t acheved a correct classfcaton rate of 99.5% on a test of 91 mages. Our second experment used a par of monochrome vdeo cameras to compute stereo dsparty (a functon of 3D range) nstead of ntensty mages. Usng a smlar algorthm, the second approach acheved a correct classfcaton rate of 95.1% on a test of 89 mages. The stereo technque has the advantage of beng less senstve to llumnaton, and would lkely work best n a real system. For correspondence, contact frst author at: Mcrosoft Corporaton One Mcrosoft Way Redmond, WA 985 ckrumm@mcrosoft.com Ths work was performed at Sanda Natonal Laboratores and supported by the U.S. Department of Energy under contract DE- AC4-94AL85. 1. Introducton The ncreased safety afforded by automoble arbags (55 lves saved to date) has produced government regulatons and consumer demand that wll have half of all vehcles on the road n equpped wth drver- and passenger-sde arbags[3]. All cars made after August 31, 1997 must have dual arbags[4]. The ncreased number of arbags wll also magnfy ther problems: arbags wastefully deployng on empty passenger seats and dangerously deployng on rear-facng nfant seats (RFIS). The average cost of replacng an arbag s $7, whch has n part fueled more arbag thefts[4]. A more serous unwanted arbag deployment occurs on RFIS s. Arbags deploy at speeds up to mph. Ths force s blamed for 1 nfant deaths snce 199[3]. Ths paper descrbes a research effort at Sanda Natonal Laboratores to develop a vdeo sensor mounted nsde a vehcle to solve these problems. Ths occupant detecton system relably classfes a vehcle s passenger seat as ether empty, occuped by a RFIS, or occuped by a regular person. The camera s vew of these three stuatons s shown n Fgure 1. Secton descrbes a system based on prncple components of mages from a sngle black-and-whte vdeo camera along wth conventonal nearest-neghbor mage classfcaton. In order to make the system less senstve to llumnaton and color, we mplemented a smple dense stereo algorthm that s descrbed n Secton 3. Secton 4 descrbes how these range mages can be classfed usng an algorthm smlar to the one used for monocular ntensty mages. On tests n a real vehcle, the ntensty-based algorthm 1
Fourth IEEE Workshop on Applcatons of Computer Vson, October 1998, Prnceton, New Jersey, USA correctly classfed about 99% of test mages, whle the range-based algorthm correctly classfed about 95% of a more challengng test set. Alternatve technologes for occupant sensng[5] nclude a system from Mercedes Benz that senses the presence of a RFIS wth a specal resonatng devce bult n. They also use a weght sensor to prevent the arbag from deployng f the passenger seat s empty. Tests have shown, however, that any non-zero readng from a weght sensor n the seat can ndcate a wde varety of ambguous stuatons, from a bag of groceres to a tghtened-down RFIS. Technologes for measurng the presence of an occupant and hs/her poston nclude nfrared spots, ultrasound, capactance, and a pezoelectrc sheet embedded n the seat to measure pressure dstrbuton. Although the ultmate occupant sensng system wll lkely use multple sensors, vson s attractve because t s passve and can provde a multtude of cues for determnng how to deploy the arbag, e.g. RFIS/empty/occuped, occupant sze, and poston. Ths paper shows how we used vdeo to classfy the state of the passenger seat.. Intensty Image Classfcaton Our frst approach to the problem of occupant detecton was to use black & whte ntensty mages taken from a sngle vdeo camera mounted nsde a vehcle. We gathered hundreds of mages over several days from a statonary vehcle parked outsde our laboratory buldng. Some of the mages were used to tran our program to explctly recognze the empty and RFIS classes based on a nearest neghbor algorthm. In order to reduce the amount of computaton requred, we used egenvector prncple components to compress the mage data. Ths secton descrbes the expermental setup, theory, algorthm, and results of mage classfcaton usng ntensty mages. Fgure : Cameras used for monocular and bnocular mages of vehcles nterors.1. Expermental Setup We parked our test vehcle near our laboratory buldng such that t would be shaded for part of the day. We mounted a sngle vdeo camera near the top of the drver s sde A pllar usng the drver s-sde sunvsor mount ponts for attachment. The camera tself appears n Fgure along wth a companon camera used for stereo descrbed n Secton 3. Typcal black & whte mages from the camera are shown n Fgure 1. The mages were dgtzed, stored, and processed on a general-purpose workstaton computer nsde the laboratory. Images were taken every fve mnutes durng daylght hours for sx days. Three of the days were devoted to mages of the empty seat, wth the three separate days havng the passenger seat adusted to ts most rearward, mddle, and most forward postons respectvely. The seat was smlarly adusted for the next three days of mages of a doll baby n a RFIS. Full days of magng gave a good varety of llumnaton as the sun moved overhead on the typcally cloudless days n Albuquerque, NM. We also took ten mages each of ten adult volunteers as they sat n the passenger seat. In all, we took 638 mages of the seat empty, 576 of the RFIS, and 11 mages of the seat occuped. In order to smulate an nexpensve vdeo camera such as mght be used n real producton, we reduced the resoluton of the mages by averagng square regons of pxels n the orgnal mages nto sngle pxels n ther lower resoluton counterparts. We vared the amount of resoluton reducton for testng. After reducng the resoluton, each mage was hstogram equalzed to help reduce the effects of llumnaton varatons. Hstogram equalzaton was partcularly good at recoverng acceptable mages taken n the darker condtons near sunrse and sunset. Fnally, each mage was normalzed by dvdng each pxel by the square root of the sum of the squares of all ts pxels. Mathematcally, ths means that the sum of the squares of the pxels n each normalzed mage s one. Practcally, ths helps factor out overall llumnaton dfferences n mages that are otherwse smlar... Theory of Image Matchng We classfed the test mages nto three categores: empty, RFIS, or other. We chose not to create an explct class for occuped seats, snce the appearance of an occuped seat s so varable. Any mage that was not explctly classfed as ether empty or RFIS was consdered a case of an occuped seat. In order to do the classfcaton, we extracted every sxth mage from the empty and RFIS mage sets to make a set of prototype mages taken 3 mnutes apart. Spacng the prototype mages evenly over the day helped the system work n spte of changng llumnaton and shadows. The remanng 5/6 of the mages were used as tests, and they were classfed by comparng each of them to all the prototype mages. If a test mage was deemed
Fourth IEEE Workshop on Applcatons of Computer Vson, October 1998, Prnceton, New Jersey, USA smlar enough to a prototype mage, t was gven the same class as the prototype. To make the mage comparson faster, we compressed all the mages usng prncple components computed from the preprocessed (resoluton reducton, hstogram equalzaton, normalzaton) prototype mages. For a gven set of prototype mages (ether empty or RFIS), we raster scan each mage nto a column vector p. (In our notaton, a bar over a varable ndcates a vector.) We form a matrx P contanng all the column vectors sde-by-sde n no partcular order. The sample covarance matrx s P = [ p a p 1 a p a p n 1 a] K, (1) where n s the number of prototype mages n the prototype set, and a s the overall mean of all the elements of all the p of the prototype set. For the empty and RFIS classes, n had the value 16 and 96, T respectvely. The sample covarance matrx s Q = PP. The egenvectors of Q are e. Any of the reconstructed from the e usng p ( c e ) a n = 1 + = where the c are computed from c ( p ) a e p p can be, () =. (3) These c coeffcents serve as a representaton of the mages. The values of the frst two coeffcents c and c for the empty and RFIS classes are shown n Fgure 3. 1 Each dot n the plots represents one mage n the prototype set. A preprocessed mage v wth an unknown class can be decomposed wth the same egenvectors nto coeffcents d usng d ( d a) e =. (4) It can be shown that the sum of squared dfferences (SSD) between a prototype mage p and the unknown mage v s n 1 = ( ) d c v p = (5).4. Empty Seat Proectons -.1.1..3.4.5 -. -.4.4. Infant Seat Proectons -.1.1..3.4.5 -. -.4 Fgure 3: Coeffcents c and c. Ellpses on top plot 1 show proectons from seat n forward, mddle, and back postons. Ponts near the orgn occurred n darker condtons, whle ponts farthest away occurred around noon. Smlar structure holds for nfant seat proectons. n 1 = ( ) d c v p. (6) Based on our experments, we used n = 18. We ustfy ths choce n Secton.4. Once we have the c (whch are precomputed) and the d (whch are easy to compute from Equaton (4)), Equaton (6) gves a fast way of approxmatng the SSD between an unknown mage and each of the prototypes. These deas of prncple components and nearest neghbor classfcaton can be found n standard textbooks such as Fukunaga s[1]. n terms of the two mages coeffcents. To the extent that the mages can be approxmately reconstructed from the frst n coeffcents (wth n n ) and correspondng egenvectors, the SSD can be approxmated as 3
Fourth IEEE Workshop on Applcatons of Computer Vson, October 1998, Prnceton, New Jersey, USA percent correct.3. Image Matchng Algorthm We compare each new unknown mage to all the prototypes as descrbed above usng Equaton (6). For a gven unknown mage, and are the SSD s between e r the mage and the nearest neghbor n the empty and RFIS prototype sets, respectvely. We classfy the mage as empty f the mage s close enough to an empty prototype, and lkewse for RFIS. Specfcally, we decde what to do wth the arbag accordng to the followng decson table and expermentally determned thresholds t e : t e e > t e e t r r tebreaker RFIS > t r r Accuracy vs. Dstance Threshold 1 95 9 85 8 75 empty seat nfant seat 7..5.1.15. threshold Fgure 4: Classfcaton accuracy vares as a functon of the thresholds t e used on nearest neghbor dstances. We used ths data to maxmze the algorthm s performance. (retan arbag) empty (retan arbag) (retan arbag) occuped (deploy arbag) Unless the seat s explctly recognzed as ether empty or RFIS, the arbag s deployed. In the tebreaker case, the unclassfed mage looks smlar to an mage n both prototype classes. By luck of the problem, however, the acton of the arbag should be the same n both cases (retan arbag), so t makes no dfference whch of the two classes the unknown mage actually belongs n. We dscuss our choce of the thresholds t e n the next secton..4. Expermental Results In assessng the accuracy of our algorthm, we were free to vary several parameters. We adusted the thresholds t e, the resoluton of the mages, and the number of egenvectors used ( n ). Our best results were acheved wth t =. 9 and t =. 19, an mage e r resoluton of 96x1, and n = 18 egenvectors. The classfer was actually tested as two separate classfers, one for empty seats and one for RFIS s. The empty seat classfer faled to recognze three of 413 empty seat mages as empty seats, and t msclassfed one of 11 occuped seats as empty seats. The RFIS classfer faled to recognze one of 396 RFIS mages as a RFIS, and t msclassfed none of the 11 occuped seats as a RFIS. From the pont of vew of arbag actons, the results are: Arbag acton Computed acton/ Percent correct acton Correct overall 95/91 99.5% Fatal retenton (on occuped 1/11 1.% seat) Fatal deployment (on 1/396.3% RFIS) Unneeded deployment (on empty seat) 3/413.7% We refer to the 99.5% fgure as the accuracy of the system. Ths s the percentage of mages on whch the system drects the arbag to take the correct acton. We not that there were no cases where the system acheved extra accuracy by merely confusng an empty seat wth a RFIS or vce versa. We chose the thresholds by computng the percentage of correct arbag actons as a functon of the thresholds. Ths data s plotted n Fgure 4. By pckng the thresholds to correspond to the maxma of these plots, we maxmzed the accuracy. Ideally, the plots would show broad peaks near ther respectve maxma, whch would ndcate relatve nsenstvty to the value of the thresholds. As the thresholds ncrease, the accuracy reaches a constant value. percent correct 1 98 96 94 9 9 5 1 15 number of egenvectors Fgure 5: Percent of correct arbag actons as a functon of the number of egenmages. The accuracy ncreases untl n = 18, whch s the number we used. 4
Fourth IEEE Workshop on Applcatons of Computer Vson, October 1998, Prnceton, New Jersey, USA At ths pont, the net acton of the system s to retan the arbag n every case, and the accuracy percentage smply reflects the relatve number of the three classes of mages n the test set. We found that accuracy was not a strong functon of resoluton n the range of resolutons that we tested. From a mnmum mage sze of 68x73 to a maxmum sze of 1x18, the accuracy vared by only.%. The fnal adustable parameter was the number of egenvectors used, n. We optmzed performance by computng the accuracy as a functon of n. The results of ths computaton are shown n Fgure 5, and the best value was n = 18. 3. Stereo Vson One potental problem wth classfyng ntensty mages, as descrbed n the prevous secton, s that a class wth large ntensty varatons may be dffcult to characterze wth a lmted number of prototype mages. For nstance, we would not expect our classfer to work well f the seats of the test vehcle were temporarly covered wth a towel. Even more mportant, ths lmtaton prevented us from establshng a separate occuped class, because the appearance of an occuped seat vares sgnfcantly wth what clothes the occupant s wearng. We could not hope to capture enough varatons n the appearance of an occuped seat wth a reasonable number of prototype mages. Ths problem prompted us to consder usng mages whose pxels represent range (dstance to surface) rather than ntensty. Our ustfcaton s that range mages are deally nsenstve to the lghtness of obects n the scene, and that prototype range mages of a gven class wll be more smlar to each other than prototype ntensty mages. Ths s especally true for occuped seats, where the range mage s deally ndependent of the color of the occupant s clothes. Our technque for gettng range mages s bnocular stereo, whch we descrbe n the next subsecton. We used essentally the same technques for classfyng range mages as we dd for ntensty mages. Classfcaton of the range mages s descrbed n Secton 4. 3.1. Expermental Setup We used two cameras, mounted sde-by-sde, as shown n Fgure. The camera mount held the cameras nearly parallel. We algned the rows of the two cameras by pontng the cameras at a horzontal edge. We rotated them each around ther respectve roll axes untl the edge was as close as possble to beng horzontal near the center row of both mages. Ths made the eppolar lnes correspond approxmately to rows n the mage, meanng that a match for a pont n the left mage would fall on a known row n the rght mage. Gven that we reduced the mage resoluton by four tmes before stereo matchng (48x51 down to 1x18), approxmate algnment was suffcent. A typcal stereo par from nsde the vehcle s shown n Fgure 6. 3.. Bnocular Stereo Measurng range from a stereo par such as ours reduces to measurng the dsparty (shft) between correspondng ponts n the left and rght mages. The range s nversely proportonal to dsparty. In fact, we dd not compute range at all, usng ust the raw dspartes for classfcaton. For each pont n the left mage, we fnd a match n the rght mage usng a stereo correlaton method descrbed by Matthes n []. Ths method extracts a small wndow around each pont n the left mage and fnds the best match n the rght mage usng correlaton (SSD) search along the eppolar lne. For our reduced resoluton stereo mages of sze 1x18, we used wndows of sze 5x7. Based on the geometry of the cameras and scene and the resoluton of the mages, we lmted the dspartes to the range [,5] pxels. Followng Matthes algorthm, we computed subpxel dspartes by fttng a parabola to the SSD values at the mnmum SSD and ts neghbors on ether sde. The subpxel dsparty was taken as the locaton of the mnmum of ths parabola. A typcal dsparty mage s shown n Fgure 6. Note that we have masked out the pxels on the wndow of the vehcle, as Fgure 6: Stereo (left/rght) mages taken nsde test vehcle. Rghtmost mage shows computed dsparty, wth lghter ponts havng more dsparty. 5
Fourth IEEE Workshop on Applcatons of Computer Vson, October 1998, Prnceton, New Jersey, USA they gve no ndcaton of the state of the passenger seat. 4. Dsparty Image Classfcaton Our procedure for classfyng dsparty mages s nearly the same as that for classfyng ntensty mages, as descrbed n Secton. Besdes the obvous dfference of usng dsparty nstead of ntensty, the other dfference was that we classfed nto three classes (empty, RFIS, occuped) rather than ust two (empty, RFIS) as we dd wth ntensty mages. We felt that the ntensty nvarance of the dsparty mages ustfed a separate class for occuped seats that would be compact enough to gve accurate classfcatons. We collected stereo mages over a perod of seven days. For the empty and RFIS cases, we collected data n the same way as for the frst experment usng monocular mages (every fve mnutes, set n rearward, mddle, and forward postons for one day each for both empty and RFIS). We also took ten mages of a dfferent RFIS ( mnorty RFIS) for testng. For the occuped class, we took 439 stereo pars of regular occupants ( mages of each person, wth one bad mage thrown out). The occupants were asked to change postons durng the magng. Of all the mages, we used 76 empty seat pars and 68 RFIS pars for tranng, taken at 3-mnute ntervals. None of the 1 "mnorty" RFIS were used for tranng. We used of the occuped seat pars for tranng. All the stereo pars that were not used for tranng were used for testng. For the occuped seat, 11 pctured ndvduals were used for tranng, and 11 others were used for testng. All the stereo pars were subected to our stereo algorthm, and all subsequent processng was done on the real-valued dsparty mages. We processed the dsparty mages n the same was as the ntensty mages, elmnatng hstogram equalzaton and normalzaton. We approxmated the SSD usng the top egenvectors and classfed an unknown dsparty mage wth ts nearest neghbor out of all the mages n the prototype sets. Ths method acheved a classfcaton accuracy of 93%. We modfed the classfcaton program to gve weghts to the three SSD s for each of the three classes. After computng the SSD between the proecton of an unclassfed mage and the proecton of a tranng mage, t was scaled by a weghtng factor for that prototype s class. The optmal weghts for the empty, RFIS, and occuped classes were 1., 1.16, and.79, respectvely. Usng these weghts brought the classfcaton accuracy up to 95%. The followng table shows the specfc types of classfcaton errors. We note that two of the fatal deployments (on RFIS) were due to msclassfyng the mnorty RFIS as an occuped seat. The mnorty RFIS was correctly classfed n the remanng eght mages. The weghts could be adusted to decrease the number of fatal errors (fatal retenton and fatal deployment) at the expense of unneeded deployments. Arbag acton Computed acton/ Percent correct acton Correct overall 846/89 95.1% Fatal retenton (on occuped 1/19.5% seat) Fatal deployment (on 16/318 5.% RFIS) Unneeded deployment (on empty seat) 7/353 7.6% 5. Conclusons We have shown that vdeo mages can be successfully used to determne whether or not to deploy the passengersde arbag. Images of the passenger seat taken from a vdeo camera mounted nsde the vehcle can be used to classfy the seat as ether empty, contanng a RFIS, or occuped. Our frst experment used a sngle vdeo camera. The system was automatcally traned on a seres of test mages. Usng a prncple components (egenmages) nearest neghbor classfer, t acheved a correct classfcaton rate of 99.5% on a test of 91 mages. Our second experment used a par of vdeo cameras to compute stereo dsparty (a functon of 3D range) nstead of ntensty mages. Usng a smlar algorthm, the second approach acheved a correct classfcaton rate of 95.1% on a test of 89 mages. The stereo technque has the advantage of beng nsenstve to llumnaton, and would lkely work best n a real system. In addton, range data from stereo mages could be used to estmate the poston of the occupant, gvng mportant nformaton on how to deploy arbags n an advanced system wth multple arbags and varable nflaton rates. References 1. Fukunaga, Kenosuke. Introducton to Statstcal Pattern Recognton, Second Edton. Academc Press, 199.. Matthes, Larry. Dynamc Stereo Vson, (Ph.D. Thess), Carnege Mellon Unversty School of Computer Scence, Techncal Report CMU-CS-89-195, October 1989. 3. McGnn, Danel and Danel Pedersen. A Lfe-or- Death Choce? Newsweek, October, 1997, 41-45. 4. O Donnell, Jayne. Insurers fnd some accdent costs actually ncrease, USA Today, September 9, 1996, Sec. B, p. 4. 5. Paula, Greg. Sensors Help Make Ar Bags Safer, Mechancal Engneerng Magazne, 119(8), August 1997. 6