A camera controlling method for lecture archive

A camera controlling method for lecture archive NISHIGUHI Satoshi Kyoto University Graduate School of Law, Kyoto University nishigu@mm.media.kyoto-u.ac.jp MINOH Michihiko enter for Information and Multimedia studies, Kyoto University minoh@media.kyoto-u.ac.jp Abstract Archiving lectures is important not only to students but also to lecturers. It will be intellectual properties of universities, will be material for multimedia course ware and for teaching evaluation, and will be knowledge sources. We present a method to control multiple cameras for lecture archive. Lecture archive consists of media information, like video, audio, text, and event information like position of lecturer, activities of students. We consider that video for lecture archive should includes various video clips because the users demand for the lecture archive is different from person to person. In this study, we propose a method for shooting various video clips using multiple video cameras, in which we introduce a probabilistic method to camera control. 1. Introduction In this study, we discuss a camera controlling method to get various kinds of video clips for lecture archive. The purpose to record lecture is to provide information about the lecture to users without spatial and temporal restriction. In order to achieve the purpose, we record the information sources in a lecture room into a lecture archive. Lecture archive is defined as a set of media information such as video, audio, slide, and event information generated as a result of interaction between a lecturer and students in the lecture room. Hence facial expressions and gestures of a lecturer and the students attending a lecture should be projected on the video for lecture archive. A camera controlling method for a distance learning system has been proposed. The system controls the cameras according to the situation of the lecture room and selects the most suitable video at a certain time. Then it transmits the selected video to the remote lecture room in real time. The shooting cameras which are not selected are controlled in order to shoot objects for the next transmission. In other words, an obtained video by a distance learning system is a sequence of the shots suitable to the situation of the lecture room. On the other hand, video clips for lecture archive should include more various kinds of video clips than those of a distance learning system because the lecture archive is used for various purposes by users. The various video clips in this study should have the following two characteristics: One is that important objects in the lecture room are shot by the cameras with as many camera works as possible at a certain time. The other is that different kinds of camera works can be assigned to a camera under the same situation in the lecture room. In order to characterize each shooting camera for lecture archive, we introduced a probabilistic model to design our camera controlling method. A probabilistic density function for selecting a camera work, which is designed for reflecting our policy, is assigned to each camera. Since slides on the screen and drawings on the white board are recorded by an electronic way, we focus on shooting the students based on the activity of them. There are many reasons why students move with fidgety behavior during lecture. In one case, they may be boring with a lecture, and in another case, they may bend themselves forward to see details on a screen. On the other hand, when students remain still, they may be sleepy or may think deeply. Hence, it is considered that the students activity is very important information for lecture archive and camera controlling method. The rest of the paper is organized as follows. In Section 2 we describe our lecture archive. In Section 3 we explain our method for camera controlling using probabilistic density function. Implementation and experimental results are

presented in Section 4 and Section 5, respectively. 2. Lecture archive 2.1. Lecture In a lecture of face-to-face style, a lecturer stands up in front of the lecture room, and teaches students a subject. He explains subjects with his voice and gesture. He can use slides about the subject. In addition, he writes some drawings on the white board. In order to indicate a point on the screen or engage students interest, he may walk around in front of the lecture room during lecture. Students in the lecture room sit down at their seats and listen to the lecturer and see him, slides and the white board. They remain seated, but move their head, hand and the upper half of their body in order to listen and see in detail. We express a degree of such a behavior of them as their activity. A lecturer can get information from students facial expression and fidgety behavior, and may reflect how to explain according to it. On the other hand, students in the lecture room can get information from the lecturer s facial expression and behavior, slides projected on the screen and drawings on the white board. 2.2. Lecture Archive The purpose to record lecture is to provide the information about the lecture to users without spatial and temporal restriction. However, users demand is different from person to person. Hence we record the information sources in the lecture room as our lecture archive. ach user can get information from the archived information sources. Information sources in a lecture room are the followings: ffl Lecturer ffl Students ffl Slides ffl rawings on the white board The above information sources have two aspects: information expressed by media and information expressed by event. We call these media information and event information, respectively. Media information is represented by media data like video, audio, stroke of drawings with its captured time. vent information expresses the status of information sources. Position, activity, presence or absence of status changes of information sources are examples of event information. vent information is represented by event data with its occurrence time. Lecturer and students as information sources are characterized by their facial and physical behavior and voice. Hence we record their behavior with video data and their voice with audio data. In addition, their position and activities are recorded as event information. The slides as information sources are characterized by their image and switching event of slides. Hence we record each slide image as media information and record the switching events of slide as event information. Th white board is characterized by the drawings on its screen. The lecturer writes drawings on the white board stroke by stroke. It is difficult to cut with the status change of the white board. The lecturer writes and erases drawings on it. A size of area of the drawing is defined as the minimum boundary rectangle(mbr). So we record erasing timing as event information of the white board. The drawings themselves can be reconstruct completely by the stroke information as event information. 2.3. Variety of video clips for lecture archive Over the past few years, several studies have focused on the camera controlling for shooting a lecturer in the lecture room. In these studies, the seats of the students are divided into the several fixed regions, and the students are modeled by them. Therefore, the students are recorded with only several kinds of shots. On the other hand, users demand for lecture video clips of the students, which is a set of serial frames which projects an object, is different from person to person. Hence we treat one or more students as shooting object in this paper, and we define various video clips of the students as follows: ffl Video clips shot by one camera should include various objects. ffl ifferent objects should be shot at a certain time, when we can use multiple cameras. 2.4. Approach for shooting various video clips In order to shoot various video clips, it is needed that there are a lot of candidate objects to shoot. Hence we defined seat region which is a set of seats sit by students. Position of the seats sit by students are used in order to detect seat regions. We propose the following steps as camera controlling rule for shooting students. 1. alculating seat regions The seat region is decided by the position of the seats sit by the students. Our method detect seats sit by students and calculate the seat regions from them. 2. Selecting a seat region with probabilistic density function In order to assign a camera work to each shooting camera, we introduce probabilistic density function to select a seat region for shooting.

3. Assigning the camera work based on the selected seat region Based on the selected seat region, camera controlling commands are sent to each shooting camera. More details of these steps are described in the next section. 3. Probabilistic method for camera control 3.1. Shooting cameras for shooting students Primary seat is A Primary seat is Primary seat is Primary seat is B Primary seat is Primary seat is The structure of a lecture room effects on where multiple cameras are installed in the lecture room. Students usually sit down on the seats from the middle to back of the lecture room. Therefore, the cameras to shoot the students should be installed in the front toward the back of the lecture room to shoot facial expression and behavior of the students. ach shooting camera has the ability of remote control of pan, tilt, zoom. Generally, the number of cameras installed in the lecture room is restricted because of the space of the lecture room, cost etc. It is necessary to select some seat regions according to the number of the shooting cameras. The number of selected seat region is equal to the number of the cameras to shoot the students. 3.2. efinition of seat region The seat region is a subset of seats sit by the students. It is defined as follows: In this definition a primary seat means a seat which exists within a predefined unit length from a seat, and the n th seats means the seats which exist at n times of unit length from a primary seat. ffl ach primary seat makes a seat region respectively. ffl ach primary seat and its next seat make a seat region which includes the two seats. ffl or 1» n» n max, the seats which are the 1 to n th seats of a primary seat and which can be traced through the next seats from the primary seat make a seat region. ffl ach primary seat and its second seat which can t be traced through the next seats of the primary seat make a seat region which includes the two seats. The example case when 6 students(a to ) are detected in the lecture room is shown in igure1. The seats for students are aligned on a grid and drawn by circle. The filled ones show the seats sit by the students, and yarded ones are the seat regions whose primary seats are A to respectively. As a result, there are 17 seat regions after deleting the same seat region in this example. A B igure 1. xample of seat regions: circles are seats for students. illed ones are seats sit by the students. Yarded circles are the seat regions 3.3. Selecting a seat region by probabilistic density function In this section, we consider the activity of the seat region. rom the definition of the seat region, it consists of one or more seats sit by the student. Hence we define the activity of a seat region as the average of the activity of students sitting on the seat included the seat region. In order to apply probabilistic density function for selecting a seat region from a set of seat regions, we judge ranking of all the seat regions with their activities. Now, we define how to make probabilistic density function (P) of a camera work: When the number of the cameras is m, and the number of the seat regions is n, the seat regions are divided into m parts by the order of their ranking. As a result, each part has n=m seat regions. The following probabilistic density function p i (x) (x is the rank of the seat region) is assigned to each shooting camera i(0» i<m). When m =1, When 2» m, p i (x) = ( A p 0 (x) = 1 n ; 2m ; for n=m Λ i» x < n=m Λ (i +1) n(m 1) m ; for other x n(m 1) B

xamples of probabilistic density function are shown in igure2. camera No.0 When the number of camera m = 1 1/n When the number of camera m = 2 camera No. 0 camera No. 1 When the number of camera m = 3 camera No.0 camera No. 1 camera No.2 igure 3. ish eye image of students(i f ) igure 2. xamples of probabilistic density function included in the corresponding rectangle region of the mask image I m. As a result, we can get the estimation value of the activity of each students. The value has been normalized between 0:0 to 1:0. ach shooting camera can select one seat region according to the assigned P. 4. Implementation 4.1. stimation of students activities We estimate the students activities by the inter-frame subtraction. When they behave a lot, difference of pixel value between frames becomes large. In order to avoid the occlusion caused by the students themselves, the observation camera with fish-eye lens is installed on the ceiling of the lecture room. The camera captures image (I f ) of the students from the ceiling. The size of the captured images are 640x480 pixels (igure3). The color image captured at time (t n ) is translated to a gray image, and then each pixel value is subtracted by each pixel value of the same position of gray image captured at the time t n 1. inally, each pixel is binarized with a threshold (T a ) into the image (I a ). Here we use a mask image like igure 4 which expresses the seats for students in the lecture room in order to estimate the activity of the seat sit by the students. ach region in the mask image is rectangle, and was defined by hand based on the seats for students. The binarized image I a is masked by the mask image I m into the image (I a 0). The binarized pixels in the image I a 0 reflects the activities of the students. Hence the number of pixels in each rectangle region of I a 0 is divided by the number of pixels igure 4. Mask image for fish eye image (I m ) 4.2. etection of position of seat sit by student In our method, the position of seats sit by students is needed in order to calculate the seat regions. In order to detect such a seats, we use the method of the background subtraction, but other objects like bags are extracted too. So we use the method of the inter-frame subtraction used for estimating the activities of students, in addition to the method of the background subtraction as follows: ffl A captured fish eye image I f is subtracted by the background image and binarized by a threshold (T p ) into the image (I p ). ffl The I p is masked by the mask image I m into the image (I p 0).

ffl With respect to each rectangle region of I m, the mean value of the number of pixels in I a 0 and the number of pixels in I p 0 is calculated. ffl When the mean value is exceed a threshold (T e ), we result that the seat represented by the rectangle region is sit by a student. Primary seat B xtre seat 4.3. alculation of seat regions In our environment, the seats of students are aligned on a grid. The following procedure is applied to each seat in order to calculate a set of seat regions. irst, we construct a tree which consists of the seats sit by students as follows: 1. One seat(primary seat) sit by a student is selected as a root node of the tree. 2. If there is a seat sit by a student at the 8-neighbors of the primary seat, the seat is added to the tree as a child node of the root node. 3. If there is a seat sit by students at the 8-neighbors of the seat which is already added, and it has not been added to the tree, the seat is added to the tree as the child node of the node which is already added. 4. Step 3 is repeated until no more seats are added as nodes. 5. If there is a seat which exist at the 8-neighbors of the 8-neighbors of the primary seat, the seat is added to the tree as a child node of the root node. we call the seat xtra seat. 6. The value of the depth from the root node is assigned to all the nodes of the tree. 7. Step 1 to 6 are repeated for all of the primary seats. Second, we calculate the seat regions of each primary seat using the tree. ffl Seat region consist of one seat A primary seat as root node is a seat region. ffl Seat region consist of two seats A combination of a root node and the child node at depth 1 make a seat region which includes the two seats. A combination of a root node and the child node which expresses an extra seat at depth 2 make a seat region which includes the two seats. ffl Seat region consist of three or more seats A combination of the nodes from depth 0 to d(1» d) make a seat region. All of the seat regions calculated by the above way make a set of the seat regions. An example tree of the seat B in igure 1 is shown in igure 5. depth 0 depth 1 depth 2 depth 3 igure 5. xample tree whose root node is the seat B in igure 1 5. xperimental results There are students in the center block of the lecture room. The block has 6 7=42seats. In our experiment, the lecturer talked about the subject and 20 students listened to him. We recorded the video of the fish eye camera into a video tape about 5 minutes. And we applied our method in order to check how many kinds of seat regions are selected. We used 2 shooting cameras. Table1 shows accuracy of detecting the seats sit by students. Totally, we can get about 74:9% accuracy of judging the status of seats. Table 1. Accuracy of detecting seats sit by students Student exist not exist exist not exist Judgment exist not exist not exist exist Ave. 14.1382 17.3162 5.8618 4.6838 % 33.6623 41.2290 13.9568 11.1519 total(%) 74.8913 25.1087 We can get on average about 93 seat regions at each time. If we defined that the seat region consists of one seat, we can select from only 20 seat regions. But in our method, we can select seat regions from about 93 seat regions at each time. Table 2 shows the result of selecting of a seat regions using our method, and Table 3 shows the result of random selecting of seat regions. igure 6 shows the comparison between our method and the random selecting. In igure 6, the line with box and star icon show the results of a random selection of the seat regions. It shows that similar seat regions are selected for the two shooting cameras. On the other hand, the line with plus and cross icon show the results of our method. The line with plus icon shows that the seat regions which include many students are more selected and the seat regions which include a few students are less selected than the random method. And the line with cross

icon has the opposite character. These results show that the two cameras can shoot more various objects by our method using P based on the activities of the seat regions comparing with the random method. 30 25 amera No.0 with P amera No.1 with P amera No.0 with random method amera No.1 with random method Table 2. Selected regions with P. amera No. 0 amera No. 1 selected Ave. of selected Ave. of seat region freq(%) activity freq(%) activity 1 16.2941 0.0071 24.6701 0.0051 2 19.2297 0.0080 25.9898 0.0056 3 6.9486 0.0021 8.9146 0.0013 4 7.5680 0.0027 7.4872 0.0011 5 5.7635 0.0023 4.4438 0.0014 6 3.1511 0.0014 2.6932 0.0008 7 3.5820 0.0012 2.9356 0.0007 8 4.4977 0.0017 2.5316 0.0005 9 4.3092 0.0012 3.1780 0.0008 10 3.8513 0.0013 2.8279 0.0007 11 4.3361 0.0012 2.7471 0.0007 12 4.6054 0.0016 2.9087 0.0008 13 6.3560 0.0018 3.1242 0.0009 14 4.8209 0.0015 2.5855 0.0008 15 2.7202 0.0010 1.6967 0.0006 16 1.2928 0.0005 0.7541 0.0003 17 0.5656 0.0002 0.4040 0.0001 18 0.1077 0.0000 0.1077 0.0000 19 0.0000 0.0000 0.0000 0.0000 Table 3. Selected regions with the random method. amera No. 0 amera No. 1 selected Ave. of selected Ave. of seat region freq(%) activity freq(%) activity 1 21.4382 0.0062 20.0108 0.0055 2 21.5459 0.0065 22.8117 0.0069 3 7.8104 0.0015 7.5411 0.0016 4 8.2682 0.0018 8.2413 0.0022 5 4.8748 0.0020 5.0633 0.0014 6 2.6663 0.0010 3.3935 0.0011 7 3.1242 0.0009 3.1780 0.0011 8 3.6897 0.0010 3.4204 0.0010 9 3.6359 0.0010 4.0668 0.0012 10 3.6628 0.0012 3.1511 0.0010 11 3.6359 0.0008 3.7705 0.0010 12 3.1780 0.0009 2.9356 0.0009 13 4.6593 0.0012 5.3865 0.0014 14 3.7705 0.0012 3.6359 0.0012 15 2.2893 0.0008 1.8853 0.0006 16 1.1850 0.0005 0.9696 0.0004 17 0.4040 0.0001 0.2963 0.0001 18 0.1616 0.0001 0.1885 0.0001 19 0.0000 0.0000 0.0539 0.0000 requency in selecting(%) 20 15 10 5 0 2 4 6 8 10 12 14 16 18 The number of the seats included in the selected seat region igure 6. omparison between method with P and the random method. 6. onclusion In this paper, we proposed a camera controlling method for lecture archive. The seat region for variety of video clips is defined, and activities of the seat regions and the probabilistic density function are introduced. We showed that different trend of selecting a seat region can be assigned to each shooting camera with P based on the activity of the students and showed that we can get various video clips as a result of selecting various seat regions. References [1] S. Goodridge. Multimedia Sensor usion for Intelligent amera ontrol and Human-omputer-Interaction. Ph thesis, North arolina State University, 1997. [2] Y. KAMA, K. ISHIZUKA, and M. MINOH. A live video imaging method for capturing presentation information in d istance learning. In I International onference on Multimedia and xpo, volume 3, pages 1237 1240, 2000. [3] K. Yagi, Y. Kameda, M. Nakamura, M. Minoh, and M. Ashour-Abdalla. A novel distance learning system for the tide project. In Proceedings of I/IAI 2000, volume 2, pages 1166 1169, 2000.