Near Infrared Face Image Quality Assessment System of Video Sequences

2011 Sixth International Conference on Image and Graphics Near Infrared Face Image Quality Assessment System of Video Sequences Jianfeng Long College of Electrical and Information Engineering Hunan University Changsha, China ianfenglong@hnu.edu.cn Shutao Li College of Electrical and Information Engineering Hunan University Changsha, China shutao_li@yahoo.com.cn Abstract--In near infrared face recognition systems, situations including head rotation, motion blur, darkness, eyes closed, mouth opened and the small face region will deteriorate the recognition accuracy. Thus, it is necessary to design a quality assessment system to select the best frame from the input video sequence before face recognition or saving it to database. In this paper we present a scoring evaluation system based on five features including sharpness, brightness, resolution, head pose and expression. Firstly, the score of each feature is computed independently, and then the final quality score is obtained by combining the scores of five features with weights. Center for Biometrics and Security Research (CBSR) Near Infrared Face Dataset is used to test the system. The experiment results demonstrate the effectiveness of the proposed quality assessment. Keywords- near infrared; face quality assessment; scoring evaluation system; video sequences I. INTRODUCTION In the computer vision community, more and more attentions have been paid to beyond the visible spectrum for imaging and vision procession such as near infrared, far infrared and thermal infrared [1], [2], [3]. The thermal and far infrared imagery have been used for face detection and recognition, because they offer a promising alternative to visible imagery for handling variations in face appearance caused by illumination changes. However, thermal and far infrared are not only sensitive to the changes of environmental temperature and health condition, but also affected by the eyeglasses. Different from the thermal and far infrared, near infrared imaging brings a new view for face detection and recognition, due to its advantages of illumination invariance, anti-spoofing (reect photo printing, reect video replay, reect 3D mimics) and reducing the interference of eyeglasses. Like the visible light face recognition, near infrared face recognition system which works well with highresolution frontal face images is very sensitive to the quality and resolution of its input face images. If the face of the input images is head rotated, motion blurred, dark, eyes closed, mouth opened and too small, the recognition rate will decline significantly. Furthermore, as the development of near infrared face recognition, more and more near infrared cameras will be installed in institutions all around the world. Most of these places need to save the face images of clients to a database as basic information. The detected face is checked with the client primitive information to confirm or deny the identity claimed by the client. If the face is verified, it will be used to update the client primitive information. In some cases, the face images can even be used to match against criminal mug shots on file nationwide. Therefore, in order to ensure the face images saved in the database are reliable, there is a need for a quality estimation system to select the best frame from the input video sequence before saving it to the database or recognizing. There are many works in the literature [4-8] to deal with the problem of face image quality assessment under visible light. Adam and Robert [4] extracted six features to assign score, and then contributed them to the overall quality score of an image. Among the six features, the computation of rotation and sharpness was time-consuming. Nasrollahi and Moeslund [7] [8] used four features including out-of-plan rotation, sharpness, brightness and resolution, yields high performance when assessing the visual face quality in a video sequence. As far as we know, there is little work based on near infrared face image quality assessment. The imaging of near infrared produces face images of good condition regardless of visible lights in the environment. So, we need not to consider the influence of illumination. In addition, the background of near infrared image is usually dark. Some features (brightness, sharpness) are computed differently from the visible light image. This paper proposes a scoring evaluation system working with near infrared faces from video sequences. To cope with the computational cost of quality assessment system, we utilize five features including brightness, sharpness, resolution, head pose and expression. Brightness, sharpness and resolution are inspired by the face quality assessment in visual face images, while we make some modifications to these features, in order to fit the characters of the near infrared image. The head pose feature in [7] [8] only take into account left-toright head rotation. In this paper, the new head pose and expression features, e.g. eyes closed and mouth opened, are proposed. These five features having been chosen, are not only sufficient for quality assessment, but also 978-0-7695-4541-7/11 $26.00 2011 IEEE DOI 10.1109/ICIG.2011.45 275

yield more accurate recognition. The experimental results indicate that our assessment system is practical and reliable, and can be used to improve the performance of face recognition and select the best face image to update the database. The rest of this paper is organized as follows: Section Ⅱ. gives a short introduction of face detection. In Section Ⅲ, the details of five features and the scoring process for quality assessment are explanation. Experimental results are given in Section Ⅳ. The conclusions of the assessment system are presented in Section Ⅴ. II. FACE DETECTION Because some of the features should be extracted from the face block, it is necessary to locate the face regions first, thus the area of search is reduced to the face region instead of a whole frame. In this section, we briefly describe the face detection. The real-time face detection is implemented using AdaBoost algorithm based on Haar-like features which proposed by Viola and Jones [9]. AdaBoost is a simple learning algorithm that selects some weak classifiers from the Haar-like features and then integrates them into a strong classifier. Finally, all these strong classifiers are cascaded to achieve the purpose of face detection. In our system, only one face is accepted in one image (See Fig. 1). If face detection has searched more than one face, the system will discard this image. III. QUALITY ASSESSMENT In this section, we present our scoring evaluation system for assessing the quality of near infrared images of video sequences, with respect to sharpness, brightness, resolution, head pose and expression. The details of the five features and the scoring process are described as follow. A. Sharpness In practical applications, blur images are usually generated when people are shaking in front of the camera. The well-focused image get a high sharpness value, while the blurred image get low one [4], we define sharpness of face V 1 as N 1M 1 V 1 1 = I( x, N M, (1) x= 0 y = 0 I ( x, = abs( P( x, P ( x, ), (2) where P(x, indicates face block of image and P ( x, indicates the result of applying 3 3 mean filter to the face block, I( x, is pixel value of the face block, N and M are height and width of the face block. It s waste of time to compute sharpness value for all face images of all sequences ust to get the global maximum value of sharpness as the reference normalization. In order to minimize computation time, we use the local maximum value of the sharpness V 1max in each sequence as the reference image and let the rest of images compared with the reference image to obtain sharpness score (see Fig. 2): V1 S 1 = (3) V 1max B. Brightness Although the near infrared camera produces face images of good condition regardless of visible lights in the environment, the produced face image is still affected by the distance between face and the near infrared lights and camera lens. If the face is too close to the camera, the face area of the image will be very bright. On the contrary, if people move far away from camera, the area will be very dark. This paper takes these aspects into consideration. Define brightness of face V 2 as the value of averaging face region pixels: N 1 M 1 V 1 2 = I ( x, N M (4) x = 0 y= 0 sharpness 7874 4481 3446 2847 2463 S 1 1 0.57 0.44 0.36 0.31 Figure 2. A normalized face image after face detection with different sharpness conditions and the associated scores. brightness 187 131 104 102 79 S 2 0 1 0.79 0.78 0.60 Figure 3. A normalized face image after face detection with different brightness conditions and the associated scores. Extracted Figure 1. Face detection. From left to right: Input near infrared image, detected face. w h 64 64 45 45 32 32 20 20 S 3 1 0.56 0.28 0.11 Figure 4. Extracted faces are arranged from highest resolution to lowest resolution and the associated scores, when read from left to right. 276

(a) (b) (c) Figure 5. The mouth detection is shown in (a). The eye detection is shown in (b). In (c) the red lines represent level line and vertical line, the green lines represent the centers of the eyes and the center of the mouth. Based on experience of a large number of experiments, we set a threshold to discard too bright face image and give these image zero brightness score. Then, V 2max the maximum brightness value of the rest face images is selected. Finally, like the sharpness, S 2 the brightness score of each other face image is calculated as: V2 S 2 = (5) V2 max The threshold value obtained by experiments is 150. Part of the results can be seen in Fig. 3. C. Resolution As we known, faces images with high resolution are preferred over low resolution in many ways. So, we always want to feed high resolution face image to the database or face recognition. In practice, the smaller the face is, the lower the resolution is after being normalized. The first step is to set a lower limit for the size of detected faces, but usually set the limit of 60 60 pixels is suitable for which can work for the faces with different original image sizes [4] [7]. Define w and h to be the width and height of the face image, the resolution score can be obtained as (see Fig. 4): w h S 3 = min 1, (6) 60 60 Since the sizes of extracted faces from the CBSR NIR Face Dataset are all bigger than the lower limit, we choose one of the extracted faces, and then change the size of the chosen face image, use the resolution feature to assess them. D. Head pose The head pose variation has been one of the most difficult obstacles for face recognition. Besides, incorrect head pose should not be saved to the database as the client s information. We assume that the high quality face image is a frontal face image without large changes in head pose orientation. To be specific, the individual s head rotated angle should be less than +/- 5 degree, and the face should lies on the center of the image. In order to evaluate the head pose feature, the center of two eyes and the mouth are located first. The Fig. 5 shows results of mouth detection (a) and eyes detection (b). S 4 1 0.81 0.35 Figure 6. A normalized face image after face detection with different head pose and the associated scores. Figure 7. From left to right: mouth block process, eye block process. From top to bottom: Detection, normalization and its binary image. Define the coordinate of the right eye A 1 is (x r, y r ), the coordinate of the left eye A 2 is (x l, y l ), the coordinate of midpoint of two eyes A 3 is (x c, y c ) and the coordinate of the mouth B is (x m, y m ). From Fig. 5(c), the rotated angleθ of the line through A 1 and A 2 can be obtained: yr yl θ = arctan (7) xr xl C d is the tilt score of face image: π 1 if θ < 36 C d = 3 (8) θ π 1 if θ π / 2 36 If the tilt angle of face is less than +/- 5 degree, we will udge the face is on the normal state. Otherwise, more than 5 degree, the lower score C d is given. l 1 is the distance between A 1 and A 2, l 2 is the distance between A 3 and B: l l ( x x ) 2 + ( y y ) 2 =, 1 r l r l ( x x ) 2 + ( y y ) 2 = (9) 2 c m c m The distances ratio r is computed as: l2 r = (10) l1 Finally, the score of head pose S 4 can be obtained as: 277

S 4 2 Cd r if r < 1 = (11) Cd if r 1 If participant s head rotates around the camera from up (+90 degrees) to down (-90 degrees), the ratio r will change gradually. Generally, the face on the normal state, the ratio r is greater than or equal to 1. So, we integrate the tilt score C d and ratio r to get the final head pose score S 4. The results show that we can fast estimate head pitch and roll from 0 up to 90. E. Expression Different researchers adopt different algorithms to solve the expression problem [7]. Most expression are relevant to the organs of face especially including eyes and mouth. The neutral face expression: no smile, both eyes normally opened, mouth closed. The detection of eyes and mouth is performed using the Adaboost algorithm (see Fig. 7). Once the eyes and mouth are located, we can analyze whether eyes close or mouth open. The expression score S 5 is defined as S 5 V V e, (12) = m Rm V m = min 1,, 7 3 3 Ve = min 1,, (13) R l R r where R m is the ratio of length to width of the normalization mouth, R l and R r are the ratio of length width of the normalization left eye and right eye area respectively. If R m is above 7.0, the mouth is closed. If R l and R r are below 3.0, the eyes are open. F. Combining the features into a general score Each of the five features discussed in the previous sections can score in the range [0, 1]. Now, we combine the features into a general score to choose the best face in a given sequence, as shown in the following equation 5 = 1 5 Q = w S / w, (14) i = 1 where Q i is the quality score for the ith face image in the sequence, S is the value of the th normalized facial feature for this face image, and w is the value of the weight that is associated to this feature. The images are sorted based on their combined scores and depending on the application, one or more images with the greatest values in S are considered as the highest quality image in the given sequence. From experiment, we obtain five feature weights are 2, 2, 2, 3 and 3 respectively base on the impact each of the feature have on the final image quality score. S 5 1 0.70 1 0.71 Figure 8. A normalized face image after face detection with different expressions and the associated scores. Human System 4 2 1 2 3 4 2 1 1 3 Human 1 2 3 4 5 System 1 2 3 4 5 Figure 9. Some sampled sequence of images: The quality assessment results based by human (first row of each sequence) vs. our proposed system output based (second row). 1 0.8 0.6 0.4 0.2 0 first best image second best image third best image Figure 10. The overall agreement between the quality assessment results by human and our proposed system in finding the first, the second, and the third best images from near infrared video sequence of the database. IV. EXPERIMENTAL RESULTS In order to investigate the performance of the proposed assessment system, we selected the CBSR NIR Face Dataset [1], [11]. The images were taken by an NIR camera with active NIR lighting. This database contains 197 sequences. These sequences are captured at a rate of 30 frames per second, at a resolution of 640 480 pixels. In one representative video sequence, one participant is observed standing towards a stationary camera while looking with all kinds of 278

actions, and then a total of 20 images of one video sequence are captured. All these face images are under different conditions include the variation of face size, head pose, brightness and sometimes sharpness and expression. This provides good examples for evaluating all the features together. In experiment, a group of people based on their perception (comfort, expression nature and sharpness), evaluates all images of the database. According to the human perception, the image which has the higher quality is given the better rank. As for the proposed system, the image which has the higher score represents the better rank. Finally, the results assessed by human are compared to that of mine. Fig. 9 shows the example sequences from database and the results of the quality based by the proposed system and the human. From the results, we can obviously find that there is a general agreement between the s of the human and system. Fig. 10 illustrates the overall results of agreement between the quality assessment results by human and our proposed system for finding the first, second, and third best images of all the near infrared video sequences of the database. The higher the rate, the more similar agreed between the human perception and system results. During the experiment, if the quality of the images is not too poor, the gotten rank of the images by our proposed system is similar to the rank decided by the human. Especially, the first rank of all the sequences assessed by human ust the same as evaluated by our proposed system. However, slight differences of the second and third rank are occurred mainly due to participants roll or bulge their eyes. V. CONCLUSIONS In this paper, we present a system based on five features to assess the quality of near infrared images affected by factors such as sharpness, brightness, resolution, head pose and expression. The five features are integrated using their own scoring algorithm and weight. Our experiments using near infrared face image database show that our system is capable of assessing the quality of near infrared face images of video sequences for there is a general agreement between the quality scores of our proposed system and human perception scores of quality. Our system applied to select the best frame from the input near infrared video sequence can not only ensure accurate automated and manual face recognition but also feed the best image to the database. Proects Program of National Laboratory of Pattern Recognition, China. REFERENCES [1] S.Z. Li, R. Chu, S. Liao and L. Zhang, Illumination invariant face recognition using near-infrared images, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 4, 2007, pp. 627-639. [2] Z. Pan, G. Healey, M. Prasad and B. Tromberg, Face Recognition in Hyperspectral Images, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 12, 2003, pp. 1552-1560. [3] X. Chen, P.J. Flynn and K.W. Bowyer, IR and Visible Light Face Recognition, Computer Vision and Image Understanding, vol. 99, no. 3, 2005, pp. 332-358. [4] A. Fourney and R. Laganiere, Constructing Face Image Logs that are Both Complete and Concise, In: 4th Canadian Conference on Computer Vision and Robot Vision, Canada, 2007, pp. 488-494. [5] M. Subasic, S. Loncaric, T. Petkovic and H. Bogunvoic, Face Image Validation System, In: 4th International Symposium on Image and Signal Processing and Analysis, 2005, pp. 30-33. [6] H. Fronthaler, K. Kollreider and J. Bigun, Automatic Image Quality Assessment with Application in Biometrics, In: International Conference on Computer Vision and Pattern Recognition, 2006, pp.30-36. [7] K. Nasrollahi and T.B. Moeslund, Face Quality Assess- ment System in Video Sequences, Proc. BIOID, Denmark, 2008, pp. 11-20. [8] K. Nasrollahi and T.B. Moeslund, Finding and Improv-ing the Key-frames of Long Video Sequences for Face Recognition, In: 2010 Fourth IEEE International Conference on Theory Applications and Systems, 2010, pp. 1-6. [9] P. Viola and M. Jones, Rapid Obect Detection using a Boosted Cascade of Simple Features, Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2001, pp. 511-518. [10] W. Hizem, L. Allano, A. Mellakh and B. Dorizzi, Face Recognition from Synchronised Visible and Near-infrared Images, IET Signal Processing, vol. 3, no. 4, 2009, pp. 282-288. [11] Stan Z. Li, R. Chu, M. Ao, L. Zhang and R He, Highly Accurate and Fast Face Recognition Using Near Infrared Images, Lecture Notes in Computer Science, vol. 3832, 2005, pp. 151-158. ACKNOWLEDGEMENT This paper is supported by the National Natural Science Foundation of China (No.60871096), the Ph.D. Programs Foundation of Ministry of Education of China (No.200805320006), the Key Proect of Chinese Ministry of Education (2009-120), and the Open 279