tsushi Sasaki Fig. Flow diagram of panel structure recognition by specifying peripheral regions of each component in rectangles, and 3 types of detect

Similar documents
Method for Real Time Text Extraction of Digital Manga Comic

Extraction and Recognition of Text From Digital English Comic Image Using Median Filter

Lecture 23 Deep Learning: Segmentation

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

Panel and speech balloon extraction from comic books

Continuous Gesture Recognition Fact Sheet

Detection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 -

Hand Gesture Recognition by Means of Region- Based Convolutional Neural Networks

Multi-task Learning of Dish Detection and Calorie Estimation

Recognition: Overview. Sanja Fidler CSC420: Intro to Image Understanding 1/ 83

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. ECE 289G: Paper Presentation #3 Philipp Gysel

Semantic Segmentation in Red Relief Image Map by UX-Net

Colorful Image Colorizations Supplementary Material

Automatic understanding of the visual world

中国科技论文在线. An Efficient Method of License Plate Location in Natural-scene Image. Haiqi Huang 1, Ming Gu 2,Hongyang Chao 2

Derek Allman a, Austin Reiter b, and Muyinatu Bell a,c

A new method to recognize Dimension Sets and its application in Architectural Drawings. I. Introduction

Recognition: Overview. Sanja Fidler CSC420: Intro to Image Understanding 1/ 78

arxiv: v1 [cs.lg] 2 Jan 2018

A comparative study of different feature sets for recognition of handwritten Arabic numerals using a Multi Layer Perceptron

Autocomplete Sketch Tool

Intelligent Traffic Sign Detector: Adaptive Learning Based on Online Gathering of Training Samples

CS231A Final Project: Who Drew It? Style Analysis on DeviantART

Mobile Cognitive Indoor Assistive Navigation for the Visually Impaired

Photo Selection for Family Album using Deep Neural Networks

Contents 1 Introduction Optical Character Recognition Systems Soft Computing Techniques for Optical Character Recognition Systems

Content Based Image Retrieval Using Color Histogram

SIMULATION-BASED MODEL CONTROL USING STATIC HAND GESTURES IN MATLAB

GESTURE RECOGNITION FOR ROBOTIC CONTROL USING DEEP LEARNING

Study Impact of Architectural Style and Partial View on Landmark Recognition

Semantic Localization of Indoor Places. Lukas Kuster

Learning to Predict Indoor Illumination from a Single Image. Chih-Hui Ho

Face Recognition in Low Resolution Images. Trey Amador Scott Matsumura Matt Yiyang Yan

International Journal of Ubiquitous Computing (IJUC) Volume 1, Issue 1, Edited By Computer Science Journals

Introduction to Machine Learning

Fully Convolutional Networks for Semantic Segmentation

Applications of Music Processing

COLOR IMAGE SEGMENTATION USING K-MEANS CLASSIFICATION ON RGB HISTOGRAM SADIA BASAR, AWAIS ADNAN, NAILA HABIB KHAN, SHAHAB HAIDER

Face Detection: A Literature Review

Multimedia Forensics

A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION. Scott Deeann Chen and Pierre Moulin

A Novel Algorithm for Hand Vein Recognition Based on Wavelet Decomposition and Mean Absolute Deviation

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

Matching Words and Pictures

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Using RGB-Depth Cameras and AI Object Recognition for Enhancing Images with Haptic Features

Consistent Comic Colorization with Pixel-wise Background Classification

SCIENCE & TECHNOLOGY

Libyan Licenses Plate Recognition Using Template Matching Method

TRANSFORMING PHOTOS TO COMICS USING CONVOLUTIONAL NEURAL NETWORKS. Tsinghua University, China Cardiff University, UK

DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION

Wavelet-based Image Splicing Forgery Detection

SECURITY EVENT RECOGNITION FOR VISUAL SURVEILLANCE

CIS581: Computer Vision and Computational Photography Homework: Cameras and Convolution Due: Sept. 14, 2017 at 3:00 pm

Speech/Music Change Point Detection using Sonogram and AANN

Evaluation of Image Segmentation Based on Histograms

Urban Road Network Extraction from Spaceborne SAR Image

Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising

True Color Distributions of Scene Text and Background

An Efficient Method for Landscape Image Classification and Matching Based on MPEG-7 Descriptors

NU-Net: Deep Residual Wide Field of View Convolutional Neural Network for Semantic Segmentation

Number Plate Detection with a Multi-Convolutional Neural Network Approach with Optical Character Recognition for Mobile Devices

A Fuller Understanding of Fully Convolutional Networks. Evan Shelhamer* Jonathan Long* Trevor Darrell UC Berkeley in CVPR'15, PAMI'16

THE problem of automating the solving of

Color Constancy Using Standard Deviation of Color Channels

A Fast Segmentation Algorithm for Bi-Level Image Compression using JBIG2

DETECTION AND RECOGNITION OF HAND GESTURES TO CONTROL THE SYSTEM APPLICATIONS BY NEURAL NETWORKS. P.Suganya, R.Sathya, K.

Deep Learning. Dr. Johan Hagelbäck.

Lecture 7: Scene Text Detection and Recognition. Dr. Cong Yao Megvii (Face++) Researcher

Target detection in side-scan sonar images: expert fusion reduces false alarms

Research on Hand Gesture Recognition Using Convolutional Neural Network

COLOR LASER PRINTER IDENTIFICATION USING PHOTOGRAPHED HALFTONE IMAGES. Do-Guk Kim, Heung-Kyu Lee

Segmentation using Saturation Thresholding and its Application in Content-Based Retrieval of Images

VEHICLE LICENSE PLATE DETECTION ALGORITHM BASED ON STATISTICAL CHARACTERISTICS IN HSI COLOR MODEL

Automatics Vehicle License Plate Recognition using MATLAB

Classification for Motion Game Based on EEG Sensing

Robust Hand Gesture Recognition for Robotic Hand Control

A Study on Gaze Estimation System using Cross-Channels Electrooculogram Signals

Vehicle Color Recognition using Convolutional Neural Network

Real-Time License Plate Localisation on FPGA

Artificial Intelligence Machine learning and Deep Learning: Trends and Tools. Dr. Shaona

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

CHAPTER-4 FRUIT QUALITY GRADATION USING SHAPE, SIZE AND DEFECT ATTRIBUTES

DETECTION AND CLASSIFICATION OF POWER QUALITY DISTURBANCES

An Analysis of Image Denoising and Restoration of Handwritten Degraded Document Images

On Emerging Technologies

Locating the Query Block in a Source Document Image

Optimized Speech Balloon Placement for Automatic Comics Generation

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Multi-frame convolutional neural networks for object detection in temporal data

Wadehra Kartik, Kathpalia Mukul, Bahl Vasudha, International Journal of Advance Research, Ideas and Innovations in Technology

A multi-class method for detecting audio events in news broadcasts

CS688/WST665 Student presentation Learning Fine-grained Image Similarity with Deep Ranking CVPR Gayoung Lee ( 이가영 )

Target Recognition and Tracking based on Data Fusion of Radar and Infrared Image Sensors

EFFICIENT ATTENDANCE MANAGEMENT SYSTEM USING FACE DETECTION AND RECOGNITION

Hand Gesture Recognition System Using Camera

Drum Transcription Based on Independent Subspace Analysis

AUGMENTED CONVOLUTIONAL FEATURE MAPS FOR ROBUST CNN-BASED CAMERA MODEL IDENTIFICATION. Belhassen Bayar and Matthew C. Stamm

Text-independent speech balloon segmentation for comics and manga

Convolutional Networks Overview

Transcription:

RECOGNITION OF NEL STRUCTURE IN COMIC IMGES USING FSTER R-CNN Hideaki Yanagisawa Hiroshi Watanabe Graduate School of Fundamental Science and Engineering, Waseda University BSTRCT For efficient e-comics creation, automatic extracting technique for comic components such as panel layout, speech balloon, and characters is necessary. In the conventional methods, comic components are extracted using geometrical characteristics such as line drawings or connected pixels. However, it is difficult to extract all comic components by focusing on a particular geometric feature, since the components are drawn in various expressions. In this paper, we extract comic components using Faster R-CNN regardless of various comic expressions, and recognize panel structure. Experimental results show proposed method succeed to recognize 67.5% of panel structures on average.. INTRODUCTION Current state of publishing industry has been shifting from the traditional paper based version to e-books. In the e-book market in Japan, e-comic dominates 80% of sales amount []. In order to improve convenience of e-comics, services using metadata of e-comics have been proposed. Such services are, e.g. comic search system using particular scene or dialogue in comics, or automatic digest generation system. However, most of e-comics are converted from scanned paper comics. Therefore, it is necessary to manually extract comic structure components such as panel layout, speech balloon, characters (in this paper, we use the word character as actors in comics) and so on. To reduce a cost of metadata extraction, a technique which extracts comic components automatically is important. In this paper, we evaluate a system, which automatically obtains the number of speech balloons and characters in panels using Faster R- CNN from comics. 2. RERTED WORK For detecting panel layout, Ishii et al. [2] proposed to identify panels by detecting dividing line using gradient concentration. Nonaka et al. [3] introduced panel layout recognition method by detecting lines and rectangles according to a characteristic that panels are often represented as rectangles. Next, for speech balloon extraction, Tanaka et al. [4] proposed a method that identify text areas using da-boost and detect white areas in speech balloons. Moreover, in a study for structure recognition of comics, rai et al. [5] proposed a detection method for panel, speech balloon and text area. That based on the image blob detection and extracting function using modified connected component labeling (CCL) method. For character detection, Ishii et al. [6] proposed an approach using machine learning with HOG features to detect character face areas. We applied Fast R-CNN in character face detection. [7] From its result, Fast R-CNN showed higher detection rate than HOG features. Conventional methods extract comic components according to the geometric characteristics, e.g. line detection or extracting connected pixels. However, in some of comic images, panels and speech balloons are illustrated in special expressions. Therefore, it is difficult to detect such components as drawn in specific shapes or overlapped other objects. 3. FSTER R-CNN Garshick et al. [8] proposed Regions with Convolutional Neural Network features (R-CNN) as a general object detection method using convolutional neural network (CNN). R-CNN detects objects in following process. First, objects region proposals are extracted from input image by selective search [9]. Second, the region proposals are input to CNN and image feature values are calculated. Then, the output feature values are classified by support vector machine (SVM). Finally, the deviation of region proposals is corrected by bounding box regression. However, R-CNN is slow since it calculates convolutional network features for each object proposal. In order to improve this problem, Fast R-CNN is introduced. Fast R-CNN enables end-to-end detector ing on shared convolutional features. Therefore, it shows compelling accuracy and speed [0]. Ren et al. [] proposed Faster R-CNN as a faster improved object detection technique. Faster R-CNN is single network connected Fast R-CNN and Region roposal Network (RN) that share full-image convolutional features with the detection network. RN is fully convolutional network that simultaneously predict object bounds and object scores at each position. In addition, RN is ed end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection. Therefore, Faster R-CNN can detect object more quickly and shows higher detection accuracy than state-of-the-art methods. 4. ROOSED METHOD We propose a method of panel structure recognition from comic images by detection of panels, speech balloons and character faces. We create annotations of comic images

tsushi Sasaki Fig. Flow diagram of panel structure recognition by specifying peripheral regions of each component in rectangles, and 3 types of detectors are generated by ing of Faster R-CNN. The flow diagram of panel structure obtaining is shown in Fig.. First, panels are detected from an input image and sorted them. The sorting order is based on the height of detected areas. In addition, if the heights of areas are same, they are sorted from right side one. Figure 2 shows example images of panel location and sorting orders. Then, there is a slight shift in the position of each panel detected by Faster R- CNN. Therefore, they are normalized per 50 pixels in y- axis direction. Next, speech balloon and character face are detected. They are belonged to the panel that overlapping more than 50% over the detected area. If there is a component which overlaps 50% or more on multiple panels as seen in Fig.3, the component is belonged to the panel sorted back side. Finally, the numbers of speech balloons and character faces that belong to each panel are obtained. 5. EXERIMENT Hishika Minamisawa (a) In this section, we evaluate the detection accuracy of comic components using Faster R-CNN. lso, the recognition accuracy of panel structures is evaluated. In this experiment, we use an algorithm published in https://github.com/rbgirshick/py-faster-rcnn [] for ing and evaluation of Faster R-CNN, and set vgg_cnn_m_024 [2] as architecture of CNN for ing. Datasets for ing and evaluation are made of comic images provided in Manga 09 database (http://www.manga09.org/) [3]. The ing dataset consists of each 00 images in 20 titles of comics drawn by different authors. The dataset consists of each 30 images in 5 titles of comic named as Comic -to-e drawn by different authors from ing images. (b) Fig.2 Examples of panel sorting

Hishika Minamisawa 8 6 4 2 0.88 (a) anel detection anel has 2 characters and 3 balloons anel 2 has character and 2 balloons Fig.3 Example of panel structure recognition In this experiment, we define a true positive as the detected area overlapping the correct area more than 50%. 8 6 4 2 5.. Iteration number We verified relationship between in the ing process of Faster R-CNN and average precisions () for each comic component. means the average values of precisions at each level of recalls. In this experiment, is calculated for 2000 images in the ing dataset and 50 images in the dataset. Experimental results are shown in Fig.4. In this figure, x- axis indicates and y-axis indicates. From this result, the detection rates are increased by increasing of. In addition, when the is over 70000, the for ing images is converged. 5.2. Threshold of confidence We evaluate the detailed results of comic component detection for 50 images in dataset using the detectors ed with 70000 iterations. Faster R-CNN calculates a confidence of object in the region proposals, and detects a region when its confidence is larger than a threshold. In this experiment, the threshold of confidence is set to 0.6 at panel detection, and those are set to 0.8 at speech balloon and character face detection. The thresholds are heuristic values. Experimental results are shown in Table. In this table, Total means total numbers of comic components in images, T means true positive, FN means false negative and F means false positive. We also measure parameters of recall (R) and precision (). Table 2 shows the detection results of panels and speech balloons by the method of [5] for same set. 5 0.85 0.8 0.75 0.7 (b) Speech balloon detection (c) Character face detection Fig.4 Relationship elation between average precision and increasing Experimental results show that the precision rates of Faster R-CNN are more than 90%, and this method exceeds the conventional method at panel and speech balloon detection. Examples of detection results are shown in Fig.5. From this figure, it is shown that blob extraction is hard to separate panels when a panel overlapping another panels. On the other hand, R-CNN can detect panels independently of those layouts. 5.3. Recognition rate of panel construction We evaluate a recognition accuracy of panel structures for each 30 pages in 5 comics. The recognition accuracy

tsushi Sasaki (a) Examples of panel detection by [5] (b) Examples of panel detection by Faster R-CNN Fig.5 Examples of panel detection for flat panels and connected panels Table Results of comic component extraction for 5 comic sources by Faster R-CNN R Total T FN F (%) (%) anel 859 770 90 40 89.5 95. Balloon 90 6 29 42 97.6 96.5 Character 937 803 34 50 85.7 94. Table 2 Results of comic component extraction for 5 comic sources by [5] R Total T FN F (%) (%) anel 859 48 378 83 56.0 72.4 Balloon 90 790 400 650 66.4 54.9 Table 3 Results of panel structure recognition for 5 comic sources B (%) C (%) B + C (%) Comic 83.0 74.5 68. Comic B 9.4 89.8 84.9 Comic C 8.7 72.8 66.3 Comic D 94.6 69.0 65.2 Comic E 62.3 62.9 52.8 is defined as follows: B means the panels which speech balloon numbers correctly extracted, C means the panels which character face numbers correctly extracted and B + C means the panels which both numbers of speech balloon and character face correctly extracted. n experimental result is shown in Table 3. From this result, the highest value of B + C is 84.9% in comic B and the lowest value is 52.8% in comic E. n example case of failure to panel structure recognition is the detection failure caused by deformed faces as shown in Fig.6. In addition, the reason of low recognition rate in Comic E is that it contains fuzzy panel layout as shown in Fig.7. In Fig.6 and Fig.7, red rectangle shows the detected area as comic component. 6. CONCLUSION & FUTURE WORK In this paper, we evaluated panel structure recognition using Faster R-CNN. Experimental results show our proposed method success to recognizing 67.5% of panel structures on average. For future works, there are some possible improvements in detection for panels and character faces those are hard to detected in this method. s a specific technique, it is considerable to combine image processing such as highlighting division lines of panels with Faster R-CNN detection. In addition, for obtaining metadata to be used for automatic generation of comic summaries, we need to consider a technique for classifying main characters from detected character faces. 7. REFERENCES [] Internet Media Research Institute: ecomic Marketing Report 202, Impress R&D, pp.4 (202). [2] D. Ishii, K. Kawamura, H. Watanabe: Study on Frame Decomposition of Comic Images", IEICE Transactions, Vol. J90-D, No.7, pp. 667 670 (2007). [3] S. Nonaka, T. Sawano, N. Haneda: Development of GT- Scan, the Technology for utomatic Detection of Frames in Scanned Comic, FUJIFILM RESERCH & DEVELOMENT, No.57, pp.46 49 (202). [4] T. Tanaka, F. Toyama, J. Miyamichi, K. Shoji: Detection and Classification of Speech Balloons in Comic Images, Journal of the Institute of Image Information and Television Engineers, Vol.64, No.2, pp.933 939 (200).

Satoshi rai Saya Miyauchi Fig.6 Example of failure to detect character faces [5] rai K, Tolle Herman: Method for Real Time Text Extraction from Digital Manga Comic, International Journal of Image rocessing Vol.4, No.6, pp.669 676 (20). [6] D. Ishii, H. Watanabe: Study on utomatic Character Detection and Recognition from Comics, The Journal of the Institute of Image Electronics Engineers of Japan, Vol.42, No.4 (203) [7] H. Yanagisawa, H. Watanabe: study of Multi-view Face Detection for Characters in Comic Images, roceedings of the 206 IEICE General Conference, D 2 2 (206). [8] R. Girshick, J. Donahue, T. Darrell, J. Malik: Rich feature hierarches for accurate object detection and semantic segmentation, in IEEE Conference on Computer Vision and attern Recognition, (204). Fig.7 Example of failure to detect panels in Comic E [9] J. R. R. Uijlings, K. E.. van de Sande, T. Gevers,. W. M. Smeulders: Selective Search for Object Recognition, International Journal of Computer Vision, Vol.02, No.2 pp.54 7, (203). [0] R. Girshick: Fast R-CNN, arxiv:504.08083, (205). [] S. Ren, K. He, R. Girshick, J. Sun: Faster R-CNN: Towards Real-Time Object Detection with Region roposal Networks, dvances in Neural Information rocessing Systems (NIS), (205). [2] S. Farfade, M. Saberian: Multi-view Face Detection Using Deep Convolutional Neural Networks, arxiv:502.02766, (205). [3] Y.Matsui, K.Ito, Y. ramaki, T.Yamasaki, K. izawa: Sketch-based Manga Retrieval using Manga09 Dataset, arxiv:50.04389,(205).