A camera controlling method for lecture archive

Similar documents
Development of an Automatic Camera Control System for Videoing a Normal Classroom to Realize a Distant Lecture

Motion Detection Keyvan Yaghmayi

Huffman Coding - A Greedy Algorithm. Slides based on Kevin Wayne / Pearson-Addison Wesley

Yue Bao Graduate School of Engineering, Tokyo City University

EFFICIENT ATTENDANCE MANAGEMENT SYSTEM USING FACE DETECTION AND RECOGNITION

Detection of Compound Structures in Very High Spatial Resolution Images

1 ImageBrowser Software User Guide 5.1

Digital Portable Overhead Document Camera LV-1010

A Method of Multi-License Plate Location in Road Bayonet Image

Smart Kitchen: A User Centric Cooking Support System

Adaptive Fingerprint Binarization by Frequency Domain Analysis

ECEN 4606, UNDERGRADUATE OPTICS LAB

A Geometric Correction Method of Plane Image Based on OpenCV

Welcome to Math Journaling!

T I P S F O R I M P R O V I N G I M A G E Q U A L I T Y O N O Z O F O O T A G E

CoE4TN4 Image Processing. Chapter 3: Intensity Transformation and Spatial Filtering

Chapter 6. [6]Preprocessing

FamilySearch Mobile Apps: Family History Anytime, Anywhere

Sample Test Project Regional Skill Competitions Level 3 Skill 23 - Mobile Robotics Category: Manufacturing & Engineering Technology

(Refer Slide Time: 2:23)

Blue-Bot TEACHER GUIDE

EPS to Rhino Tutorial.

GESTURE RECOGNITION SOLUTION FOR PRESENTATION CONTROL

Understanding Digital Photography

Lecture 18: Light field cameras. (plenoptic cameras) Visual Computing Systems CMU , Fall 2013

Privacy-Protected Camera for the Sensing Web

LabVIEW based Intelligent Frontal & Non- Frontal Face Recognition System

Types of Angles. Low Angle: High Angle: Dutch Angle:

Eye Contact Camera System for VIDEO Conference

Estimation of Folding Operations Using Silhouette Model

CS 376b Computer Vision

Artificial Beacons with RGB-D Environment Mapping for Indoor Mobile Robot Localization

'Smart' cameras are watching you

Customized Foam for Tools

Thinking Kids. First Grade. NCTM Strands Covered: Number and Operations. Algebra. Geometry. Measurement. Data Analysis and Probability.

SBIG ASTRONOMICAL INSTRUMENTS

CPSC 4040/6040 Computer Graphics Images. Joshua Levine

Adobe Photoshop. Levels

Table of Contents 1. Image processing Measurements System Tools...10

BODILY NON-VERBAL INTERACTION WITH VIRTUAL CHARACTERS

in the list below are available in the Pro version of Scan2CAD

NEW HIERARCHICAL NOISE REDUCTION 1

Contents Technical background II. RUMBA technical specifications III. Hardware connection IV. Set-up of the instrument Laboratory set-up

An Efficient DTBDM in VLSI for the Removal of Salt-and-Pepper Noise in Images Using Median filter

An Evaluation of Automatic License Plate Recognition Vikas Kotagyale, Prof.S.D.Joshi

Adaptive Sensor Selection Algorithms for Wireless Sensor Networks. Silvia Santini PhD defense October 12, 2009

Social Editing of Video Recordings of Lectures

CSC 170 Introduction to Computers and Their Applications. Lecture #3 Digital Graphics and Video Basics. Bitmap Basics

ImagesPlus Basic Interface Operation

Keyword: Morphological operation, template matching, license plate localization, character recognition.

PackshotCreator 3D User guide

IMAGE PROCESSING PAPER PRESENTATION ON IMAGE PROCESSING

A Novel Transform for Ultra-Wideband Multi-Static Imaging Radar

Functions added in CLIP STUDIO PAINT Ver are marked with an *.

Assignment: Light, Cameras, and Image Formation

Forest Inventory System. User manual v.1.2

Cameras. Steve Rotenberg CSE168: Rendering Algorithms UCSD, Spring 2017

Multi-modal Human-Computer Interaction. Attila Fazekas.

What will be on the midterm?

Interactive Simulation: UCF EIN5255. VR Software. Audio Output. Page 4-1

COMP 558 lecture 5 Sept. 22, 2010

COMPACT GUIDE. Camera-Integrated Motion Analysis

Inserting and Creating ImagesChapter1:

Final Report. Project Title: E-Scope Team Name: Awesome

Toile la Joie: Toile Jardin Software Lesson. By Tamara Evans. Floriani...The Name That Means Beautiful Embroidery!

Distributed Vision System: A Perceptual Information Infrastructure for Robot Navigation

Fabrication of the kinect remote-controlled cars and planning of the motion interaction courses

Face Detection System on Ada boost Algorithm Using Haar Classifiers

Android User manual. Intel Education Lab Camera by Intellisense CONTENTS

A Vehicular Visual Tracking System Incorporating Global Positioning System

Nikon. King s College London. Imaging Centre. N-SIM guide NIKON IMAGING KING S COLLEGE LONDON

A study of the ionospheric effect on GBAS (Ground-Based Augmentation System) using the nation-wide GPS network data in Japan

Telling What-Is-What in Video. Gerard Medioni

Study and Analysis of various preprocessing approaches to enhance Offline Handwritten Gujarati Numerals for feature extraction

Energy Consumption and Latency Analysis for Wireless Multimedia Sensor Networks

Evaluation of a Tricycle-style Teleoperational Interface for Children: a Comparative Experiment with a Video Game Controller

The Fundamental Problem

PASS Sample Size Software. These options specify the characteristics of the lines, labels, and tick marks along the X and Y axes.

UNIVERSIDAD CARLOS III DE MADRID ESCUELA POLITÉCNICA SUPERIOR

Interframe Coding of Global Image Signatures for Mobile Augmented Reality

Digital Image Processing. Lecture 5 (Enhancement) Bu-Ali Sina University Computer Engineering Dep. Fall 2009

Image acquisition. In both cases, the digital sensing element is one of the following: Line array Area array. Single sensor

Excel / Education. GCSE Mathematics. Paper 5B (Calculator) Higher Tier. Time: 2 hours. Turn over

CELL PHONE PHOTOGRAPHY

Compression Method for High Dynamic Range Intensity to Improve SAR Image Visibility

Texts and Resources: Assessments: Freefoto.com Group Photo Projects

Camera Raw software is included as a plug-in with Adobe Photoshop and also adds some functions to Adobe Bridge.

Intelligent Traffic Sign Detector: Adaptive Learning Based on Online Gathering of Training Samples

Differential Image Compression for Telemedicine: A Novel Approach

Consumer Behavior when Zooming and Cropping Personal Photographs and its Implications for Digital Image Resolution

Preprocessing and Segregating Offline Gujarati Handwritten Datasheet for Character Recognition

OBJECTIVE OF THE BOOK ORGANIZATION OF THE BOOK

Digital Image Processing. Digital Image Fundamentals II 12 th June, 2017

Introduction to DSP ECE-S352 Fall Quarter 2000 Matlab Project 1

Digital images. Digital Image Processing Fundamentals. Digital images. Varieties of digital images. Dr. Edmund Lam. ELEC4245: Digital Image Processing

Finger print Recognization. By M R Rahul Raj K Muralidhar A Papi Reddy

Getting Started Guide

Measuring in Centimeters

Travel Photo Album Summarization based on Aesthetic quality, Interestingness, and Memorableness

Advanced Techniques for Mobile Robotics Location-Based Activity Recognition

Transcription:

A camera controlling method for lecture archive NISHIGUHI Satoshi Kyoto University Graduate School of Law, Kyoto University nishigu@mm.media.kyoto-u.ac.jp MINOH Michihiko enter for Information and Multimedia studies, Kyoto University minoh@media.kyoto-u.ac.jp Abstract Archiving lectures is important not only to students but also to lecturers. It will be intellectual properties of universities, will be material for multimedia course ware and for teaching evaluation, and will be knowledge sources. We present a method to control multiple cameras for lecture archive. Lecture archive consists of media information, like video, audio, text, and event information like position of lecturer, activities of students. We consider that video for lecture archive should includes various video clips because the users demand for the lecture archive is different from person to person. In this study, we propose a method for shooting various video clips using multiple video cameras, in which we introduce a probabilistic method to camera control. 1. Introduction In this study, we discuss a camera controlling method to get various kinds of video clips for lecture archive. The purpose to record lecture is to provide information about the lecture to users without spatial and temporal restriction. In order to achieve the purpose, we record the information sources in a lecture room into a lecture archive. Lecture archive is defined as a set of media information such as video, audio, slide, and event information generated as a result of interaction between a lecturer and students in the lecture room. Hence facial expressions and gestures of a lecturer and the students attending a lecture should be projected on the video for lecture archive. A camera controlling method for a distance learning system has been proposed. The system controls the cameras according to the situation of the lecture room and selects the most suitable video at a certain time. Then it transmits the selected video to the remote lecture room in real time. The shooting cameras which are not selected are controlled in order to shoot objects for the next transmission. In other words, an obtained video by a distance learning system is a sequence of the shots suitable to the situation of the lecture room. On the other hand, video clips for lecture archive should include more various kinds of video clips than those of a distance learning system because the lecture archive is used for various purposes by users. The various video clips in this study should have the following two characteristics: One is that important objects in the lecture room are shot by the cameras with as many camera works as possible at a certain time. The other is that different kinds of camera works can be assigned to a camera under the same situation in the lecture room. In order to characterize each shooting camera for lecture archive, we introduced a probabilistic model to design our camera controlling method. A probabilistic density function for selecting a camera work, which is designed for reflecting our policy, is assigned to each camera. Since slides on the screen and drawings on the white board are recorded by an electronic way, we focus on shooting the students based on the activity of them. There are many reasons why students move with fidgety behavior during lecture. In one case, they may be boring with a lecture, and in another case, they may bend themselves forward to see details on a screen. On the other hand, when students remain still, they may be sleepy or may think deeply. Hence, it is considered that the students activity is very important information for lecture archive and camera controlling method. The rest of the paper is organized as follows. In Section 2 we describe our lecture archive. In Section 3 we explain our method for camera controlling using probabilistic density function. Implementation and experimental results are

presented in Section 4 and Section 5, respectively. 2. Lecture archive 2.1. Lecture In a lecture of face-to-face style, a lecturer stands up in front of the lecture room, and teaches students a subject. He explains subjects with his voice and gesture. He can use slides about the subject. In addition, he writes some drawings on the white board. In order to indicate a point on the screen or engage students interest, he may walk around in front of the lecture room during lecture. Students in the lecture room sit down at their seats and listen to the lecturer and see him, slides and the white board. They remain seated, but move their head, hand and the upper half of their body in order to listen and see in detail. We express a degree of such a behavior of them as their activity. A lecturer can get information from students facial expression and fidgety behavior, and may reflect how to explain according to it. On the other hand, students in the lecture room can get information from the lecturer s facial expression and behavior, slides projected on the screen and drawings on the white board. 2.2. Lecture Archive The purpose to record lecture is to provide the information about the lecture to users without spatial and temporal restriction. However, users demand is different from person to person. Hence we record the information sources in the lecture room as our lecture archive. ach user can get information from the archived information sources. Information sources in a lecture room are the followings: ffl Lecturer ffl Students ffl Slides ffl rawings on the white board The above information sources have two aspects: information expressed by media and information expressed by event. We call these media information and event information, respectively. Media information is represented by media data like video, audio, stroke of drawings with its captured time. vent information expresses the status of information sources. Position, activity, presence or absence of status changes of information sources are examples of event information. vent information is represented by event data with its occurrence time. Lecturer and students as information sources are characterized by their facial and physical behavior and voice. Hence we record their behavior with video data and their voice with audio data. In addition, their position and activities are recorded as event information. The slides as information sources are characterized by their image and switching event of slides. Hence we record each slide image as media information and record the switching events of slide as event information. Th white board is characterized by the drawings on its screen. The lecturer writes drawings on the white board stroke by stroke. It is difficult to cut with the status change of the white board. The lecturer writes and erases drawings on it. A size of area of the drawing is defined as the minimum boundary rectangle(mbr). So we record erasing timing as event information of the white board. The drawings themselves can be reconstruct completely by the stroke information as event information. 2.3. Variety of video clips for lecture archive Over the past few years, several studies have focused on the camera controlling for shooting a lecturer in the lecture room. In these studies, the seats of the students are divided into the several fixed regions, and the students are modeled by them. Therefore, the students are recorded with only several kinds of shots. On the other hand, users demand for lecture video clips of the students, which is a set of serial frames which projects an object, is different from person to person. Hence we treat one or more students as shooting object in this paper, and we define various video clips of the students as follows: ffl Video clips shot by one camera should include various objects. ffl ifferent objects should be shot at a certain time, when we can use multiple cameras. 2.4. Approach for shooting various video clips In order to shoot various video clips, it is needed that there are a lot of candidate objects to shoot. Hence we defined seat region which is a set of seats sit by students. Position of the seats sit by students are used in order to detect seat regions. We propose the following steps as camera controlling rule for shooting students. 1. alculating seat regions The seat region is decided by the position of the seats sit by the students. Our method detect seats sit by students and calculate the seat regions from them. 2. Selecting a seat region with probabilistic density function In order to assign a camera work to each shooting camera, we introduce probabilistic density function to select a seat region for shooting.

3. Assigning the camera work based on the selected seat region Based on the selected seat region, camera controlling commands are sent to each shooting camera. More details of these steps are described in the next section. 3. Probabilistic method for camera control 3.1. Shooting cameras for shooting students Primary seat is A Primary seat is Primary seat is Primary seat is B Primary seat is Primary seat is The structure of a lecture room effects on where multiple cameras are installed in the lecture room. Students usually sit down on the seats from the middle to back of the lecture room. Therefore, the cameras to shoot the students should be installed in the front toward the back of the lecture room to shoot facial expression and behavior of the students. ach shooting camera has the ability of remote control of pan, tilt, zoom. Generally, the number of cameras installed in the lecture room is restricted because of the space of the lecture room, cost etc. It is necessary to select some seat regions according to the number of the shooting cameras. The number of selected seat region is equal to the number of the cameras to shoot the students. 3.2. efinition of seat region The seat region is a subset of seats sit by the students. It is defined as follows: In this definition a primary seat means a seat which exists within a predefined unit length from a seat, and the n th seats means the seats which exist at n times of unit length from a primary seat. ffl ach primary seat makes a seat region respectively. ffl ach primary seat and its next seat make a seat region which includes the two seats. ffl or 1» n» n max, the seats which are the 1 to n th seats of a primary seat and which can be traced through the next seats from the primary seat make a seat region. ffl ach primary seat and its second seat which can t be traced through the next seats of the primary seat make a seat region which includes the two seats. The example case when 6 students(a to ) are detected in the lecture room is shown in igure1. The seats for students are aligned on a grid and drawn by circle. The filled ones show the seats sit by the students, and yarded ones are the seat regions whose primary seats are A to respectively. As a result, there are 17 seat regions after deleting the same seat region in this example. A B igure 1. xample of seat regions: circles are seats for students. illed ones are seats sit by the students. Yarded circles are the seat regions 3.3. Selecting a seat region by probabilistic density function In this section, we consider the activity of the seat region. rom the definition of the seat region, it consists of one or more seats sit by the student. Hence we define the activity of a seat region as the average of the activity of students sitting on the seat included the seat region. In order to apply probabilistic density function for selecting a seat region from a set of seat regions, we judge ranking of all the seat regions with their activities. Now, we define how to make probabilistic density function (P) of a camera work: When the number of the cameras is m, and the number of the seat regions is n, the seat regions are divided into m parts by the order of their ranking. As a result, each part has n=m seat regions. The following probabilistic density function p i (x) (x is the rank of the seat region) is assigned to each shooting camera i(0» i<m). When m =1, When 2» m, p i (x) = ( A p 0 (x) = 1 n ; 2m ; for n=m Λ i» x < n=m Λ (i +1) n(m 1) m ; for other x n(m 1) B

xamples of probabilistic density function are shown in igure2. camera No.0 When the number of camera m = 1 1/n When the number of camera m = 2 camera No. 0 camera No. 1 When the number of camera m = 3 camera No.0 camera No. 1 camera No.2 igure 3. ish eye image of students(i f ) igure 2. xamples of probabilistic density function included in the corresponding rectangle region of the mask image I m. As a result, we can get the estimation value of the activity of each students. The value has been normalized between 0:0 to 1:0. ach shooting camera can select one seat region according to the assigned P. 4. Implementation 4.1. stimation of students activities We estimate the students activities by the inter-frame subtraction. When they behave a lot, difference of pixel value between frames becomes large. In order to avoid the occlusion caused by the students themselves, the observation camera with fish-eye lens is installed on the ceiling of the lecture room. The camera captures image (I f ) of the students from the ceiling. The size of the captured images are 640x480 pixels (igure3). The color image captured at time (t n ) is translated to a gray image, and then each pixel value is subtracted by each pixel value of the same position of gray image captured at the time t n 1. inally, each pixel is binarized with a threshold (T a ) into the image (I a ). Here we use a mask image like igure 4 which expresses the seats for students in the lecture room in order to estimate the activity of the seat sit by the students. ach region in the mask image is rectangle, and was defined by hand based on the seats for students. The binarized image I a is masked by the mask image I m into the image (I a 0). The binarized pixels in the image I a 0 reflects the activities of the students. Hence the number of pixels in each rectangle region of I a 0 is divided by the number of pixels igure 4. Mask image for fish eye image (I m ) 4.2. etection of position of seat sit by student In our method, the position of seats sit by students is needed in order to calculate the seat regions. In order to detect such a seats, we use the method of the background subtraction, but other objects like bags are extracted too. So we use the method of the inter-frame subtraction used for estimating the activities of students, in addition to the method of the background subtraction as follows: ffl A captured fish eye image I f is subtracted by the background image and binarized by a threshold (T p ) into the image (I p ). ffl The I p is masked by the mask image I m into the image (I p 0).

ffl With respect to each rectangle region of I m, the mean value of the number of pixels in I a 0 and the number of pixels in I p 0 is calculated. ffl When the mean value is exceed a threshold (T e ), we result that the seat represented by the rectangle region is sit by a student. Primary seat B xtre seat 4.3. alculation of seat regions In our environment, the seats of students are aligned on a grid. The following procedure is applied to each seat in order to calculate a set of seat regions. irst, we construct a tree which consists of the seats sit by students as follows: 1. One seat(primary seat) sit by a student is selected as a root node of the tree. 2. If there is a seat sit by a student at the 8-neighbors of the primary seat, the seat is added to the tree as a child node of the root node. 3. If there is a seat sit by students at the 8-neighbors of the seat which is already added, and it has not been added to the tree, the seat is added to the tree as the child node of the node which is already added. 4. Step 3 is repeated until no more seats are added as nodes. 5. If there is a seat which exist at the 8-neighbors of the 8-neighbors of the primary seat, the seat is added to the tree as a child node of the root node. we call the seat xtra seat. 6. The value of the depth from the root node is assigned to all the nodes of the tree. 7. Step 1 to 6 are repeated for all of the primary seats. Second, we calculate the seat regions of each primary seat using the tree. ffl Seat region consist of one seat A primary seat as root node is a seat region. ffl Seat region consist of two seats A combination of a root node and the child node at depth 1 make a seat region which includes the two seats. A combination of a root node and the child node which expresses an extra seat at depth 2 make a seat region which includes the two seats. ffl Seat region consist of three or more seats A combination of the nodes from depth 0 to d(1» d) make a seat region. All of the seat regions calculated by the above way make a set of the seat regions. An example tree of the seat B in igure 1 is shown in igure 5. depth 0 depth 1 depth 2 depth 3 igure 5. xample tree whose root node is the seat B in igure 1 5. xperimental results There are students in the center block of the lecture room. The block has 6 7=42seats. In our experiment, the lecturer talked about the subject and 20 students listened to him. We recorded the video of the fish eye camera into a video tape about 5 minutes. And we applied our method in order to check how many kinds of seat regions are selected. We used 2 shooting cameras. Table1 shows accuracy of detecting the seats sit by students. Totally, we can get about 74:9% accuracy of judging the status of seats. Table 1. Accuracy of detecting seats sit by students Student exist not exist exist not exist Judgment exist not exist not exist exist Ave. 14.1382 17.3162 5.8618 4.6838 % 33.6623 41.2290 13.9568 11.1519 total(%) 74.8913 25.1087 We can get on average about 93 seat regions at each time. If we defined that the seat region consists of one seat, we can select from only 20 seat regions. But in our method, we can select seat regions from about 93 seat regions at each time. Table 2 shows the result of selecting of a seat regions using our method, and Table 3 shows the result of random selecting of seat regions. igure 6 shows the comparison between our method and the random selecting. In igure 6, the line with box and star icon show the results of a random selection of the seat regions. It shows that similar seat regions are selected for the two shooting cameras. On the other hand, the line with plus and cross icon show the results of our method. The line with plus icon shows that the seat regions which include many students are more selected and the seat regions which include a few students are less selected than the random method. And the line with cross

icon has the opposite character. These results show that the two cameras can shoot more various objects by our method using P based on the activities of the seat regions comparing with the random method. 30 25 amera No.0 with P amera No.1 with P amera No.0 with random method amera No.1 with random method Table 2. Selected regions with P. amera No. 0 amera No. 1 selected Ave. of selected Ave. of seat region freq(%) activity freq(%) activity 1 16.2941 0.0071 24.6701 0.0051 2 19.2297 0.0080 25.9898 0.0056 3 6.9486 0.0021 8.9146 0.0013 4 7.5680 0.0027 7.4872 0.0011 5 5.7635 0.0023 4.4438 0.0014 6 3.1511 0.0014 2.6932 0.0008 7 3.5820 0.0012 2.9356 0.0007 8 4.4977 0.0017 2.5316 0.0005 9 4.3092 0.0012 3.1780 0.0008 10 3.8513 0.0013 2.8279 0.0007 11 4.3361 0.0012 2.7471 0.0007 12 4.6054 0.0016 2.9087 0.0008 13 6.3560 0.0018 3.1242 0.0009 14 4.8209 0.0015 2.5855 0.0008 15 2.7202 0.0010 1.6967 0.0006 16 1.2928 0.0005 0.7541 0.0003 17 0.5656 0.0002 0.4040 0.0001 18 0.1077 0.0000 0.1077 0.0000 19 0.0000 0.0000 0.0000 0.0000 Table 3. Selected regions with the random method. amera No. 0 amera No. 1 selected Ave. of selected Ave. of seat region freq(%) activity freq(%) activity 1 21.4382 0.0062 20.0108 0.0055 2 21.5459 0.0065 22.8117 0.0069 3 7.8104 0.0015 7.5411 0.0016 4 8.2682 0.0018 8.2413 0.0022 5 4.8748 0.0020 5.0633 0.0014 6 2.6663 0.0010 3.3935 0.0011 7 3.1242 0.0009 3.1780 0.0011 8 3.6897 0.0010 3.4204 0.0010 9 3.6359 0.0010 4.0668 0.0012 10 3.6628 0.0012 3.1511 0.0010 11 3.6359 0.0008 3.7705 0.0010 12 3.1780 0.0009 2.9356 0.0009 13 4.6593 0.0012 5.3865 0.0014 14 3.7705 0.0012 3.6359 0.0012 15 2.2893 0.0008 1.8853 0.0006 16 1.1850 0.0005 0.9696 0.0004 17 0.4040 0.0001 0.2963 0.0001 18 0.1616 0.0001 0.1885 0.0001 19 0.0000 0.0000 0.0539 0.0000 requency in selecting(%) 20 15 10 5 0 2 4 6 8 10 12 14 16 18 The number of the seats included in the selected seat region igure 6. omparison between method with P and the random method. 6. onclusion In this paper, we proposed a camera controlling method for lecture archive. The seat region for variety of video clips is defined, and activities of the seat regions and the probabilistic density function are introduced. We showed that different trend of selecting a seat region can be assigned to each shooting camera with P based on the activity of the students and showed that we can get various video clips as a result of selecting various seat regions. References [1] S. Goodridge. Multimedia Sensor usion for Intelligent amera ontrol and Human-omputer-Interaction. Ph thesis, North arolina State University, 1997. [2] Y. KAMA, K. ISHIZUKA, and M. MINOH. A live video imaging method for capturing presentation information in d istance learning. In I International onference on Multimedia and xpo, volume 3, pages 1237 1240, 2000. [3] K. Yagi, Y. Kameda, M. Nakamura, M. Minoh, and M. Ashour-Abdalla. A novel distance learning system for the tide project. In Proceedings of I/IAI 2000, volume 2, pages 1166 1169, 2000.