Moving Object Detection for Intelligent Visual Surveillance

Moving Object Detection for Intelligent Visual Surveillance Ph.D. Candidate: Jae Kyu Suhr Advisor : Prof. Jaihie Kim April 29, 2011

Contents 1 Motivation & Contributions 2 Background Compensation for PTZ Cameras 3 Background Subtraction for Static Cameras 4 Experimental Results 5 Conclusions 2

Motivation & Contributions 3

Intelligent Visual Surveillance Intelligent Visual Surveillance system automatically extracts and analyzes useful information from surveillance videos. It mainly consists of four core technologies. Object detection Object classification Object tracking Behavior and identity recognition 4

Object Detection Among them, Object detection conducts the lowest-level task. Its output is the base of the other high-level tasks. Object detection could be categorized into two approaches: Moving object detection Utilizes the changes induced by object s movements. Appearance-based object detection Utilizes the physical shapes of objects. Between two approaches, Moving object detection is more widely used for real-time surveillance systems. It requires relatively low computational resources. 5

Considerations for Moving Object Detection The moving object detection methods should consider two aspects: Accuracy (performance) Computational resources (time and memory) Its output affects the performances of the following high-level tasks. Remaining resources after this task will be used for the rest of the high-level tasks. It is important to enhance the moving object detection methods in terms of accuracy and computational resources. 6

Contributions Under this motivation, this dissertation proposes two novel moving object detection methods: One is for pan-tilt-zoom (PTZ) cameras, and the other is for static cameras. For PTZ cameras, Background compensation using 1-D feature matching and outlier rejection is proposed. It is robust against blurring effects and moving object proportion. It dramatically decreases computational costs. For static cameras, Background subtraction using Bayer-pattern images is proposed. It shows a higher performance than the case of using RGB color images. It uses as low resource requirements as the case of using grayscale images. 7

Background Compensation for PTZ cameras 8

Background Compensation A frame differencing technique with background alignment In case of a static camera, In case of a PTZ camera, 1 if Dist I tx, I tx n TH tx Ft otherwise 0 x 1 if Dist I tx, T I tx n TH tx Ft otherwise 0 x Without background compensation 9 With background compensation

Image Transformation Relationship between consecutive PTZ camera images can be approximated to 3-parameter similarity transformation. If the camera center is assumed to be fixed, x and x', images of a 3-D point (X) before and after panning, tilting, and zooming can be described as x K [I 0]X x ' K ' [ R 0 ]X K & K : camera s intrinsic parameters matrix I & R : 3x3 identity and rotation matrices K= f 0 0 K'= sf 0 0 R R x R y 1 0 0 cos x 0 sin x 0 f 0 ox o y 1 0 s f 0 ox o y 1 θx : panning angle θy : tilting angle f & λ : focal length and aspect ratio ox & oy : coordinates of principal point s : zoom factor 10 0 cos y sin x 0 cos x sin y 0 sin y 1 0 0 cos y

Image Transformation x K [I 0]X x ' K' [ R 0]X K -1x [I 0]X x ' K' R [I 0]X replace x ' = K ' R K -1x = Hx K= f 0 0 0 f 0 ox o y 1 K'= sf 0 0 H : 3x3 homography 0 s f 0 s H K' R K -1 cos x s sin x sin y cos x cos y sin y f cos y ox o y 1 R R x R y 1 0 0 cos x 0 sin x sf sin y 0 s cos y sin x f cos x cos y 11 0 cos y sin x 0 cos x sin y cos x cos y s f sin x cos x 1 0 sin y 1 0 0 cos y

Image Transformation If pan and tilt angels (θx and θy) between consecutive images are small, s H K' R K -1 cos x s sin x sin y cos x cos y sin y f cos y sinθx & sinθy cosθx & cosθy s (zoom factor) λ (aspect ratio) f (focal length) sf sin y 0 s cos y sin x f cos x cos y sf sin y s 0 cos x cos y cos x cos y s f sin x s f sin x 0 s cos x cos x 0 0 1 1 They are not approximated : close to zero since f is very large. : close to one : close to one : near one : very large value (mostly larger than 100) 12

Image Transformation x '=Hx s 0 0 0 s 0 sf sin y x cos x cos y s f sin x cos x 1 replace x ' s 0 tx x y ' 0 s t y y 1 0 0 1 1 x ' sx t x y ' sy t y Two things should be noticed. Estimation of the transformation parameters can be geometrically interpreted as a parameter estimation of two lines with the same slope. 1-D feature correspondences are enough to estimate the transformation parameters. Because transformations in x- and y-axes are separable. 2-D feature correspondences are not necessarily required. 13

1-D Feature Correspondence Extraction Local maxima and minima of intensity projection profiles in horizontal and vertical axes are used as 1-D features. The projection profiles are extracted from sub-images with overlapping. Projection profiles of sub-images are less distorted than those of the whole image when the proportion of a moving object is large. This approach produces more corresponding 1-D features. Rules for making horizontal sub-images 14 Rules for making vertical sub-images

1-D Feature Correspondence Extraction Intensity values in each sub-image are projected onto each axis. Local maxima ( ) and minima ( ) are extracted from intensity projection profiles as 1-D features. coordinates ( y ) 50 100 150 projection profile local maximum local minimum 200 40 60 80 100 120 intensity value coordinates ( y' ) 50 100 150 projection profile local maximum local minimum 200 40 60 80 intensity value 15 100 120

1-D Feature Correspondence Extraction 1-D features are matched based on their projected intensity values and identities (local maximum or minimum). 6 100 150 5 50 projection profile local maximum local minimum 200 40 60 80 100 120 intensity value coordinates ( y ) coordinates ( y ) 50 4 3 150 2 coordinates ( y' ) 200 1 100 50 100 150 200 coordinates ( y' ) projection profile local maximum local minimum 200 40 60 80 intensity value 100 120 y ' sy t y 100 50 150 x ' sx t x 1-D feature correspondences extracted from all the vertical sub-images 16 0 A line should be estimated by using these 1-D feature correspondences.

Transformation Parameters Estimation Outlier rejection approach was adopted since initial matches include a large number of outliers. First, Hough transformation is applied to initial matches. Feature correspondences which do not contribute to the making of the peak are identified as outliers and are rejected. Finally, RANSAC line estimator is applied to the retained 1-D matches to precisely estimate the line parameters (s, ty). 6 coordinates ( y ) y ' sy t y 5 50 4 100 3 150 2 200 1 50 100 150 200 0 coordinates ( y' ) Initial matches Hough transform. Retained matches RANSAC line estimation All these procedures are applied to the horizontal sub-images to estimate the parameters of the other line (s, tx). 17

Example Result x ' sx t x y ' sy t y A B A Original image pair D= A-B Without background compensation Transformed image of A D = A -B With background compensation 18 Binarized image of D

Background Subtraction for Static Cameras 19

Background Subtraction Background subtraction is a method which detects moving object regions by comparing the current image with background model. x x x 1 if Dist I, B TH t t t Ft x otherwise 0 Current image Background model 20 Background subtraction result

Background Subtraction It could be divided into two steps: Background modeling / Foreground classification It mostly utilizes two types of images: RGB color image Background modeling and foreground classification are conducted in 3-D RGB color domain. It achieves a high segmentation accuracy due to the color information, but requires a large amount of memory and high computational cost. Grayscale image Background modeling and foreground classification are conducted in 1-D grayscale domain. It achieves a low segmentation accuracy due to the loss of color information, but requires a small amount of memory and low computational cost. 21

Bayer-Pattern Image Different from the previous approaches, the proposed method uses Bayer-pattern images. Bayer-pattern images are acquired by Bayer color filter array built in front of CCD sensor. Most popular method for acquiring RGB color images Bayer color filter array consists of repetitive 2x2 patterns. Each pixel measures only one color according to its spatial location. A Bayer-pattern image includes RGB color information in a grayscale-like image. Interpolation process to obtain a full color image is called demosaicing. If bilinear demosaicing is applied to the pixel location (2,2), G 2,2 and B 2,2 can be estimated as G G2,1 G2,3 G3,2 G 2,2 1,2 4 B B1,3 B3,1 B3,3 B 2,2 1,1 4 Bayer-pattern image 22 Demosaiced image

Proposed Strategy Proposed strategy Background modeling in Bayer-pattern domain Foreground classification in interpolated RGB domain Advantages It achieves almost the same performance as the method using RGB color images. It requires as low computational resources as the method using grayscale images. Background modeling Foreground classification Using RGB color images Grayscale domain Grayscale domain Using grayscale images RGB domain RGB domain Proposed strategy (using Bayer-pattern images) Bayer-pattern domain interpolated RGB domain 23

MoG-based Background Subtraction We adopt the proposed strategy to Mixture of Gaussians (MoG)based background subtraction. One of the most popular background subtraction methods. MoG method models a background with K Gaussian Distributions. P I x t x t K I, μ i 1 x i,t x t x i,t, Σix,t where Σix,t ix,t I 2 μ ix,t : mean vector of i-th Gaussian distribution at x in t-th image Σix,t : covariance matrix of i-th Gaussian distribution at x in t-th image ix,t I different color channels are independent, and I ix,t It assumes that : RGB pixel value vector at x in t-th image : weight of i-th Gaussian distribution at x in t-th image : standard deviation of i-th Gaussian distribution at x in t-th image : 3x3 identity matrix 24 have the same variance for computational reasons.

Proposed Method 1. The proposed method models a background using K Gaussians in 1-D Bayer-pattern domain (not 3-D RGB color domain) at each pixel. P I x t K I x i,t i 1 x t, ix,t, ix,t kx,t kx,t 2. K Gaussian distributions are ordered according to. 3. First B distributions are chosen as background distribution according to the following equation. b x B arg min i,t T b i 1 4. Calculate the smallest Mahalanobis distance ( Dtx,R ) among the input pixel value ( I x ) and B background distributions at each pixel. t Dtx, R min I tx μbx,t bx,t 25

Proposed Method 5. Estimate the Mahalanobis distances of the other two color x,g and D x,b ) by interpolating the distances of channels ( D t t spatially neighboring pixels via bilinear demosaicing. x11,b t D x12,g t D x13,b t D Dtx21,G Dtx22,R Dtx23,G Dtx31,B Dtx32,G Dtx,B 6. D tx11,rx11,g D tx12,rx12,g D tx13,rx13,g Dt Dt D t Dtx11,B D tx12,b Dtx13,B D tx21,rx21,g Dtx22,Rx22,G D tx23,rx23,g Dt D t Dt x,b x x,b D t 21 D t 22 D t 23,B D tx31,rx31,g D tx32,rx32,g D tx33,rx33,g Dt Dt D t Dtx31,B D tx32,b Dtx33,B Classify the current pixel by thresholding the distances of three color channels. background x Ft foreground if Dtx.R TH D tx,g TH D tx, B TH otherwise 26

Original method vs. Proposed method For better understanding, Blue channel is omitted. Number of Gaussians representing background distribution is set to two. Original MoG using color images It models a background with two 2-D Gaussians Since the method knows both R and G color values at each pixel location. Decision boundary is defined with two circles (not ellipses) Since it is assumed that different color channels are independent and have the same variance. Background model & decision boundary of the original MoG method 27

Original method vs. Proposed method Proposed method using Bayer-pattern images It models a background with two 1-D Gaussians in each color channel, Since each pixel has only one color information in a Bayer-pattern image. Decision boundary is defined with four rectangles (not squares), Since the method separately estimates the variance of each color channel. It produces false background regions (two dashed rectangles), Since the method does not know the correct combination of R and G channels. Background model & & decision boundary proposed method 28

Two Properties of Proposed Method [Negative] It produces false background regions. It has more chances to classify the foreground as background. However, the probability that a foreground pixel falls into the false background regions is quite low when considering the whole 3-D RGB space. Since the variances of the Gaussians chosen as background are very small. [Positive] It separately estimates the variances of RGB channels without increasing computational costs. But, the original MoG assumes that the variances of three channels are the same. Therefore, the proposed method can more accurately estimate the decision boundary. Background model & & decision boundary proposed method 29 Background model & decision boundary of original MoG method

Comparison of Computational Resources Computational cost (per pixel) Proposed method requires less than 50% computing power of the original method. Proposed method: 5 multiplications, 3 additions Original method : 11 multiplications, 9 additions Memory requirement (per pixel) Proposed method needs approximately 60% memory space of the original method. Proposed method : 3ｘK+2 buffers MoG using RGB images: 5ｘK+2 buffers 30

Experimental Results for Background Compensation 31

Experimental Environment Database were acquired while a PTZ camera tracks moving objects. 80480 images (about 45 minutes) were taken in 10 different places. Development version of SAMSUNG PTZ camera (360x240 pixels) Background complexity Indoor / outdoor Distance from camera to object Moving object proportion Number of images DB1 low outdoor 16-47 m 5-56 % 7496 DB2 low indoor 9-35 m 0-36 % 8069 DB3 low indoor 8-22 m 4-45 % 7751 DB4 medium outdoor 36-130 m 0-38 % 7392 DB5 medium outdoor 70-200 m 2-52 % 6132 DB6 medium outdoor 15-32 m 0-83 % 7741 DB7 high indoor 6-34 m 4-76 % 7105 DB8 high indoor 15 m 0% 6249 DB9 high indoor 5-17 m 0-67 % 7574 DB10 high outdoor 8-30 m 0-67 % 7612 DB11 high outdoor 23-38 m 0-51 % 7359 32

Example Images of Database DB1 DB2 DB3 DB4 DB5 DB6 DB7 & DB8 DB9 DB10 33 DB11

Evaluation Criteria Intensity Difference Mean Mean of absolute difference image after background compensation This is calculated only from the background regions. This measure indicates the performance of the algorithm (the smaller the better) 1 W' H' Intensity difference mean I t, I t n I t i, j T I t n i, j W ' H ' i 1 j 1 Extraction time (sec) Duration for extracting feature correspondences Estimation time (sec) Duration for estimating transformation parameters 34

Two Previous Methods for Comparison Araki s method [1] Transformation: 6-parameter affine transformation Features : 2-D correspondences obtained by Harris corner detector and correlation-based block matching Estimator : Least Median of Squares (LMedS) estimator Pham s method [2] Transformation: 4-parameter affine transformation Features : 1-D correspondences from 32 pairs of binary images Estimator : Multi-resolution Hough transformation [1] S. Araki, T. Matsuoka, N. Yokoya, and H. Takemura, Real-time tracking of multiple moving object contours in a moving camera image sequence, IEICE Trans. Inf. Syst., vol. E83-D, no. 7, 2000. [2] X. D. Pham, J. U. Cho and J.W. Jeon, Background Compensation Using Hough Transformation. in Proc. Int. Conf. Robot. Autom., 2008. 35

Experimental Results Intensity difference mean Extraction time (sec) Estimation time (sec) Araki s method Pham s method Proposed method Araki s method Pham s method Proposed method Araki s method Pham s method Proposed method DB1 4.102 2.551 2.141 1.085 3.354 0.019 0.044 15.498 0.017 DB2 3.572 2.482 1.955 1.075 3.038 0.019 0.044 13.785 0.018 DB3 4.425 3.213 2.313 1.091 2.894 0.020 0.045 13.928 0.020 DB4 3.634 2.474 2.156 1.081 3.415 0.019 0.045 13.492 0.018 DB5 5.176 3.204 2.943 1.097 3.821 0.021 0.045 13.056 0.024 DB6 7.162 4.055 3.141 1.090 2.509 0.020 0.044 16.001 0.019 DB7 4.945 3.388 2.630 1.083 2.062 0.019 0.045 11.793 0.019 DB8 4.261 3.772 2.517 1.074 2.156 0.018 0.044 12.543 0.019 DB9 5.793 3.981 3.237 1.066 1.999 0.018 0.044 10.883 0.016 DB10 6.511 4.665 3.642 1.098 2.691 0.022 0.045 14.507 0.026 DB11 6.489 4.546 3.465 1.085 2.952 0.021 0.044 16.907 0.024 Avg. 5.097 3.485 2.740 1.084 2.808 0.020 0.045 13.854 0.020 The proposed method has the smallest intensity difference mean. The proposed method is the fastest algorithm in terms of extraction and estimation times. 36

Two Reasons for the Superiority (1) The first reason is that it is more robust in regards to moving object proportion (how much area is occupied by moving objects). The figure below shows the intensity difference mean with different moving object proportions. The proposed method is the least sensitive to moving object proportion. intensity difference mean 14 12 The proposed method utilizes the projection profiles of sub-images which are not easily affected by a large moving object proportion. Araki's method Pham's method proposed method 10 Araki s method is sensitive to a large number of outliers produced in the moving object regions. 8 6 4 2 0~10 10~20 20~30 30~40 40~50 50~60 moving object proportion (%) 60~70 Pham s method uses the projection profiles of whole images which can be easily distorted by a large proportion of moving object. 37

Two Reasons for the Superiority (2) The second reason is that 1-D features used in the proposed method are more robust against blurring effects. The table below shows the intensity difference mean calculated with and without blurring effects. 10175 blurred images out of 80480 were manually selected. Proposed and Pham s methods are robust against blurring effects. Because those two methods utilize 1-D features rather than 2-D features. 2-D features are sensitive to blurring effect due to the localization error. Araki s method Pham s method Proposed method Without blurring effect 4.868 3.398 2.660 With blurring effect 6.509 4.048 3.201 Error increase 1.640 0.651 0.541 Error increasing rate 33.7 % 19.1 % 20.3 % 38

Example of Resulting Images (1) Moving Object proportion is very large. Original image pair Araki s method Pham s method 39 Proposed method

Example of Resulting Images (2) Images are severely blurred. Original image pair Araki s method Pham s method 40 Proposed method

Experimental Results for Background Subtraction 41

Experimental Environment 12 image sequences including 10 public and 2 of our own database. The proposed method was compared with MoG using three types of images. Grayscale image / RGB color image / Bayer-pattern image We refer the MoG with Bayer-pattern images as pseudo-grayscale since it uses Bayerpattern images as grayscale images (This is different from the proposed method). Resolution (pixels) # of image Environment Source of database DB1 360ⅹ240 500 outdoor http://www.cs.cmu.edu/~yaser DB2 320ⅹ240 1501 outdoor http://web.eee.sztaki.hu/~bcsaba DB3 320ⅹ240 440 outdoor http://web.eee.sztaki.hu/~bcsaba DB4 320ⅹ240 300 indoor http://cvrr.ucsd.edu/aton/shadow DB5 320ⅹ240 887 indoor http://web.eee.sztaki.hu/~bcsaba DB6 320ⅹ256 1286 indoor http://perception.i2r.a-star.edu.sg DB7 320ⅹ240 2227 outdoor http://vision.gel.ulaval.ca/~castshadows DB8 320ⅹ240 1800 indoor http://vision.gel.ulaval.ca/~castshadows DB9 320ⅹ240 300 indoor Our own database (flickering illumination) DB10 320ⅹ240 300 indoor Our own database (swinging illumination) DB11 320ⅹ240 440 outdoor Noise-contaminated version of DB3 DB12 320ⅹ240 300 indoor Noise-contaminated version of DB4 42

Example Images of Database DB1 DB2 DB3 & DB11 DB4 & DB12 DB5 DB6 DB7 DB8 43 DB9 & DB10

Evaluation Criteria False negative rate (FNR) Percentage of the misclassified foreground pixels FNR # of foreground pixels misclassified as background # of total foreground pixels False positive rate (FPR) Percentage of the misclassified background pixels FPR # of background pixels misclassified as foreground # of total background pixels Processing time (sec) Duration for background modeling and foreground classification per frame 44

Experimental Results (1) False negative rate (FNR) (%) False positive rate (FPR) (%) Proposed method RGB color Grayscale Pseudograyscale Proposed method RGB color Grayscale Pseudograyscale DB1 1.20 2.97 9.09 9.73 3.97 2.47 2.46 2.50 DB2 12.97 16.12 23.51 24.89 8.62 5.69 6.65 6.05 DB3 9.82 10.12 22.87 23.07 11.19 10.89 8.86 8.82 DB4 5.66 9.35 18.09 18.56 2.51 2.79 1.83 1.78 DB5 4.06 5.48 15.03 15.02 8.71 7.75 6.73 6.68 DB6 6.87 11.04 18.07 20.02 8.23 5.95 6.13 5.91 DB7 1.16 3.21 7.83 9.26 3.14 1.67 2.44 2.42 DB8 15.81 15.07 29.03 29.47 5.64 6.37 4.16 3.98 DB9 6.39 8.02 24.23 21.85 5.93 8.68 3.21 3.81 DB10 7.50 12.66 24.50 22.75 6.52 6.27 3.49 4.06 DB11 12.10 12.28 23.37 23.97 11.65 12.68 9.82 9.78 DB12 9.09 15.34 25.01 24.91 1.96 0.64 1.66 1.68 Avg. 7.72 10.14 20.05 20.29 6.51 5.99 4.79 4.79 The proposed method has the smallest false negative rate. False positive rates of four approaches are similar to each other. 45

Experimental Results (2) The result can be depicted with ROC curves. True Positive Rate (TPR) 1 The proposed method showed the best performance. 0.9 0.8 0.7 0.6 0.5 0 Proposed method RGB color Grayscale Pseudo-grayscale 0.1 0.2 0.3 0.4 False Positive Rate (FPR) 0.5 The reason that the proposed method slightly outperforms the MoG with RGB color images is that it can more accurately estimate the decision boundary due to the separate variance estimation for each color channel. This result also reveals that the negative property of the proposed method (false background regions) seldom affects the performance. 46

Processing time (sec) Proposed method RGB color Grayscale Pseudo-grayscale DB1 0.54 1.27 0.47 0.47 DB2 0.49 1.24 0.43 0.43 DB3 0.50 1.14 0.45 0.44 DB4 0.48 0.93 0.42 0.42 DB5 0.49 1.31 0.43 0.43 DB6 0.59 1.60 0.51 0.51 DB7 0.51 1.39 0.44 0.45 DB8 0.49 1.40 0.44 0.44 DB9 0.48 1.01 0.41 0.42 DB10 0.47 1.04 0.42 0.42 DB11 0.50 1.27 0.43 0.44 DB12 0.48 1.40 0.42 0.42 Avg. 0.50 1.25 0.44 0.44 The proposed method is much faster than the MoG with RGB color images. Its processing time is comparable to the MoG with grayscale and pseudograyscale images. 47

Example of Resulting Images (1) Original image Proposed method Grayscale images Ground truth RGB color images Pseudo-gray images 48

Example of Resulting Images (2) Original image Proposed method Grayscale images Ground truth RGB color images Pseudo-gray images 49

Conclusions 50

Conclusions This dissertation proposed two novel moving object detection methods for PTZ and static cameras. Proposed background compensation method for PTZ cameras Robust against blurring effects and moving object proportion. Dramatically decreases computational costs. Proposed background subtraction method for static cameras Slightly higher performance than the method using RGB color images. Comparable resource requirements to the method using grayscale images. The proposed moving object detection methods achieve both high performances and low resource requirements. Significantly meaningful for low-level tasks such as moving object detection in real-time surveillance systems. 51

Thank you. - Questions & Answers - 52

Comment & Response 1 Comment In case of a noisy image sequence, the result of the proposed method includes many salt-and-pepper-like noises, and this makes it look worse than that of the MoG using RGB color images. Original image Proposed method MoG using color images - Background region produced by the proposed method has more noises, but foreground region produced by it has less holes and better silhouette. 53

Comment & Response 1 Response This phenomenon occurs because of the differences in calculating decision boundaries. MoG using RGB color images estimates the decision boundary larger than the actual one because it assumes that the variances of Gaussians for three color channels are the same. However, the proposed method estimates the decision boundary more accurately because it separately calculates the variances of Gaussians for three color channels. Therefore, the proposed method tends to classify noise-contaminated background pixels as foreground. Green Decision boundary of MoG using RGB color images Noise-contaminated background pixel Background distribution Decision boundary of proposed method Red 54 This response has been included in Chapter 4 of the dissertation.

Comment & Response 1 Response These salt-and-pepper-like noises can easily be removed by using image filters. Resulting images by applying a 5x5 median filter are shown below. Noises in background regions of two results are almost the same, but the foreground region of the proposed method has a better silhouette. Proposed method MoG using RGB color images 55

Comment & Response 2 Comment Why are Bayer-pattern images used? Have you considered other color spaces? Response There are mainly two reasons for using Bayer-pattern images. Computational costs can be reduced while maintaining the accuracy. Bayer-pattern images are raw data of the most conventional surveillance cameras. No additional computation is required for producing Bayer-pattern images. If another color space (e.g. HSV) is used, it is necessary to transform Bayer-pattern images to RGB color images, followed by another transformation to HSV images. Since these transformations should be done for every frame, it requires additional computational costs which will be a burden of real-time surveillance system. If only one or two channels are taken in a certain color space (e.g. H and S), it will cause performance degradation due to the loss of some information. 56