CMOS Image Sensor for High Speed and Low Latency Eye Tracking

This article has been accepted and published on J-STAGE in advance of copyediting. ntent is final as presented. IEICE Electronics Express, Vol.*, No.*, 1 10 CMOS Image Sensor for High Speed and Low Latency Eye Tracking Junichi Akita 1a) 1 Kanazawa University Kakuma, Kanazawa, Ishikawa, 920 1192, Japan a) akita@is.t.kanazawa-u.ac.jp Abstract: Eye tracking, or detecting where the user is looking at, is expected as a new type of user interfaces, with including the phenomenon of rapid eye movement, so called saccade. However, real-time tracking of saccade is difficult with the conventional image processing systems for their long processing time and latency against the speed of saccade. In this paper, we describe the design of the CMOS image sensor for eye tracking with high speed and low latency capability, as well as its evaluation results. Keywords: CMOS image sensor, Eye tracking, Saccade, lumn parallel architecture Classification: Integrated circuits References IEICE 2018 DOI: 10.1587/elex.15.20180785 Received August 13, 2018 Accepted October 9, 2018 ublicized October 24, 2018 [1] E. A. Hoffman, and J. V. Haxby: Distinct representations of eye gaze and identity in the distributed human neural system for face perception, Nature Neuroscience, bf 3, (2000) 80-84. [2] J.Triesch et al.: What you see is what you need, Journal of Vision, 3,, 1, (2003) 86-94. [3] J.Watanabe et al.: Study of Remote Saccade Detection Technique based on Retroreflective Features of the Retina, Journal of the Virtual Reality Society of Japan, 9, 1, (2004) 105-114. [4] Tobii AB, Tobii ro X2-60, https://www.tobiipro.com/en/productlisting/tobii-pro-x2-60/ [5] nac Image Technology., Inc., EMR-9, http://www.eyemark.jp/product/emr 9/ [6] Tobii AB, Tobii ro Spectrum, https://www.tobiipro.com/en/productlisting/tobii-pro-spectrum/ [7] T.Takegami et al.: An Algorithm for Model-Based Stable upil Detection for Eye Tracking System, Systems and mputers in Japan, 35, 13, (2004) 21-31. [8] J.Akita et al.: lumn-arallel Vision Chip Architecture for High- Resolution Line-of-Sight Detection Including Saccade, IEICE Trans. on Electronics, E90-C, 10, (2007) 1869-1875. [9] H.Kawakami et al.: lumn-arallel Architecture for Line-of-Sight Detection Image Sensor Based on Centroid Calculation, ITE Trans. on Media Technology and Applications, 2, 2, (2014) 161-166. 1

[10] S.Yamamura, J.Akita: Design and reliminary Evaluation of High- Speed Line-of-Sight Calculation CMOS Image Sensor, Electronics and mmunications in Japan, 101, 5, (2018) 47-57 (DOI: 10.1002/ecj.12062). 1 Introduction Eye tracking, or detecting where the user is looking at, is expected as a new type of user interfaces[1]. The eyeball often moves very rapidly, which is called as saccade, and it is also expected to be applied for another type of user-interfaces[2]. However, real-time tracking of saccade is difficult with the conventional image processing systems for their long processing time and latency against the speed of saccade, whose speed reaches up to 700[deg/sec], which required frame rate of over 200[fps] to capture[3]. Most of the commercial eye tracking system capture the eye image at the video rate of 60[fps][4], that is too slow to track the saccade. There are several eye tracking systems at a high frame rate to capture the saccade, however, their output latency, or the processing time to obtain the result, is several frames[5, 6], and they cannot track the saccade in real time. In this paper, we describe the design of the CMOS image sensor for eye tracking that realize the high speed and low latency capability for the real time saccade tracking. It employs the column-parallel processing architecture to achieve the high speed processing, the low latency, and the high resolution. We also describe the evaluation results of eye tracking using the designed CMOS image sensor. 2 Design of CMOS Image Sensor for Eye Tracking 2.1 LoS calculation algorithm Fig. 1. Example of an infrared eye image Eye tracking is performed by calculating the line of sight (LoS), or the eye direction. The LoS can be calculated from the position of the pupil in the infrared image of the user s eye, which is observed as a black area[7], as shown in Fig. 1. The position of the pupil, whose shape is a circle, can be defined as a centroid of the pupil (digitized black) area at the sub-pixel 2

accuracy. The centroid of the area, ( x, ȳ), with the binary flag of p xy for the pixel composing the pupil area, can be calculated as follows. ( ) ( x, ȳ) = x x y xp xy y p xy, x x y yp xy y p xy As shown in eq. (1), the position of the pupil can be calculated from the values of S, SX and SY as follows. S = x p xy, y SX = x xp xy, y SY = x (1) yp xy (2) y Thus, we can obtain the LoS by using the values of S, SX, and SY. The dividing calculation for SX and SY by S requires low computing power since it is performed just once in each frame. Therefore, we focus on the high-speed calculations of S, SX, and SY, which require the high computing power to realize high speed, low latency eye tracking. Note that the actual LoS, where the user is looking at, is calculated from the center of the pupil with the calibration parameters measured in advance to the operation. 2.2 Image Sensor Architecture for LoS Calculation Fig. 2. lumn-parallel image sensor architecture for eye tracking Figure 2 shows the CMOS image sensor architecture for high speed, low latency, and high resolution LoS calculation[10]. The architecture is composed of image sensor (pixel array) part and the column-parallel LoS calculation circuit, E (processing element). The pixel array is the standard pixel array similar to the conventional CMOS image sensor whose size is 6.0[µm], which is suitable for realizing practical high resolution (VGA (640 480[pixels]) or above). The column-parallel LoS calculation circuit, E, is placed at each column. One E processes the signals from all the pixel in one column in order. Figure 3 shows the E architecture of one column. E is composed of S, SX, and SY, that perform to the y-directional accumulation, y, of S, SX, 3

IX IX Row Selector E IX lumn Address lumn Address Decoder Vref mparator 0 XSEL So Su S0 S 0 XSEL Su So S1 0 XSEL Su So SN-1 SX 0 SXu X SXu X X0 SX0 X1 SX1 SXu X XNX-1 SXNX-1 SY 0 SYu Y SYu Y Y0 SY0 Y1 SY1 SYu Y YNY-1 SYNY-1 Fig. 3. E architecture of one column 4

CK XSEL X/Y CK XSEL S FA A B Su D Q S FA A B D Q SXu/SYu So SXo/SYo (a) (b) Fig. 4. rcuit of single-bit E of (a)s, and (b)sx and SY. and SY, respectively, in eq.(2). E performs the calculation of y by using the flag, p xy generated by the comparator, x coordinate value of the column, and y coordinate value of the pixel accessed. The E s component of S, SX, and SY are composed of single-bit accumulator with read-out buffer as shown in Fig.4, to form the ripple-carry accumulator[10]. Although the processing time for one column is proportional to the size of row, each E processes each column s signal in parallel at high speed with using the conventional digital circuit implementation, such as 10[MHz]. This realizes the total high processing frame rate, such as 500[fps] or above so that the saccade can be tracked. After all the pixels are accessed, the x-directional accumulation, x of S, SX, and SY, respectively, in eq.(2) of E in each column, by the accumulator connected to Es in Fig.2. Here, we obtain the calculation results of S, SX, and SY in eq.(2) in the accumulator. Fig. 5. Operation timing Figure 5 shows the operation timing of the designed image sensor. The access to each pixel is composed of the integration operation, the generating electrons from the received photons, followed by the read out and the digitize operations by the comparator. The y-directional operation finishes after the pixel read out operations, followed by the x-directional accumulation operations. These operations finishes in single frame. We obtain the calculation result at the end of each frame, thus, the possible lowest output latency of single frame is achieved, which is smaller than the conventional image processing-based eye tracking systems. 5

IEICE Electronics Express, Vol.*, No.*, 1 10 2.3 Design of CMOS image sensor for eye tracking The author has already tried to design the CMOS image sensor for eye tracking based with the proposed column-parallel architecture. However, the fabricated CMOS image sensor has some problems in operation[10]. Based on this first version, we designed the second version with improvements to solve the problems in operation. The most important point in the design revision is to apply the clock tree for Es. In the evaluation of the first version, the Es output the random values, and we estimated its reason as the meta-stable operation of flip flips for lack of the proper clock buﬀering. We inserted the clock tree as shown in Fig. 6 to obtain the proper clock signals in each flip flop. The number of clock buﬀers and the drive capability of each clock buﬀer are determined based on the load capacitance estimated from the signal wire length and the number of driven flip flops 1. Fig. 6. Inserted clock tree Fig. 7. Fabricated CMOS Image sensor for eye tracking Figure 7 shows the fabricated CMOS image sensor for eye tracking using CMOS 0.18µm CIS process. The number of pixels is 640 480, and the chip 1 This work is supported by VLSI Design and Education Center(VDEC), the University of Tokyo in collaboration with Cadence Design Systems, Inc. and Mentor Graphics, Inc. 6

size is 4 5mm. 3 Experiment on Eye Tracking 3.1 Experimental setup Fig. 8. Evaluation board We have carried out the eye tracking experiment using the fabricated CMOS image sensor. Figure 8 shows the designed evaluation board for the test on the fabricated CMOS image sensor. The control signals are generated by the FGA, and the calculated results of S, SX, and SY are transferred to C. The raw image output is also captured by A/D converter for debugging. Fig. 9. Experimental setup Figure 9 shows the experimental setup for eye tracking. The subject is asked to put his chin to the chin-rest. The camera is set to capture his eye area, and three IR-LEDs are used for illumination. Figure 10 shows the captured raw images, where the center of the pupil is indicated by the cross point of red lines. The size of the pupil is also indicated as the box of the green lines, as the bounding box whose size is calculated from the pupil area, S. As shown in Fig. 10, the position and the size of the pupil are obtained by using the fabricated CMOS image sensor. Here, we manually adjusted 7

Fig. 10. Captured eye image and the calculated pupil s position and size the threshold voltage of the comparator for digitizing the pupil to obtain the adequate digitized image for the pupil. Note that there are some black regions other than the pupil, that result in the errors in the calculation results of the pupil s center. The region of interest (ROI) should be adequately configured so as to access to the eye region. In this experiment, the ROI is manually configured. It is also notable that we can calibrate combining the pupil s center (including errors) and the actual LoS without configuring ROI to obtain the correct LoS.[9]. 3.2 Experimental results on eye tracking (a) (b) Fig. 11. Recorded eye motions. The x-coordinate of the pupil (a) and the angular velocity of the eye(b). We carried out the eye tracking experiment at the frame rate of 315 [fps], 8

with exposure time of 980 [µs]. The subject is asked to read the sentences for three lines, and the position of the pupil is recorded during this task. Figure 11(a) shows the obtained x-coordinate of the pupil, calculated as SX/S. Note that the smaller x-coordinate value corresponds to the right direction of the eye. We can observe three pairs of the left-to-right eye motion, that correspond to reading three lines at 2000 4000 [ms], 6000 8000 [ms], and 10000 12000 [ms], where some saccade motions are included. We also observed two sudden eye motions from left-to-right followed by right-to-left at 8500 [ms] and 13000 [ms], which is expected to be corresponding to the blinks. Fig. 12. Model of eye motion angle calculation We also calculated the angular velocity of eye motion from the recorded eye motion. The radius of our eye ball is almost constant, approximately 12 [mm] for adults, regardless of the age and the race. The relation of the pixel size and the actual size around eye can be calculated from the number of the pixels in the captured image and the measured physical size of the eye area. In the experimental setup in Fig. 9, one pixel size is corresponding to 0.1 [mm] around the eye area. We can calculate the eye motion angle from the motion of the the pupil center as shown in Fig. 12 as follows. 1 x θ[rad] = tan x (3) R e R e Here, θ, x, and R e are the eye motion angle, the difference of the x- coordinate of the pupil center, and the size of eye ball, respectively. Note that x and R e are represented in the unit of the pixel, and R e equals to 120 [pix] for 12 [mm]. Figure 11(b) shows the calculated angular velocity of eye motion with the average of five successive samples. Figure 13(a) and (b) show the magnified samples of the pupil positions in Fig. 11, and Fig. 13(c) and (d) show their corresponding angular velocities of eye motion in Fig. 13(a) and (b), respectively. We can observe the angular velocity of eye motion which larger than 100[deg/s], up to 250[deg/s], which we can judge them as saccade motionss. 4 nclusion In this paper, we described the design of the CMOS image sensor for eye tracking that realize the high speed and low latency capability for saccade tracking. We also described the evaluation results of eye tracking using the 9

(a) (b) (c) (d) Fig. 13. Magnified eye motions in Fig. 11. (a)(b)the x- coordinate of the pupil and (c)(d)the angular velocity of the eye. fabricated CMOS image sensor, and shown the saccade tracking result at single frame latency. 10