Colorful Image Colorizations Supplementary Material

Similar documents
TRANSFORMING PHOTOS TO COMICS USING CONVOLUTIONAL NEURAL NETWORKS. Tsinghua University, China Cardiff University, UK

Lecture 23 Deep Learning: Segmentation

Detection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 -

Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks

A Neural Algorithm of Artistic Style (2015)

Artistic Image Colorization with Visual Generative Networks

ROAD RECOGNITION USING FULLY CONVOLUTIONAL NEURAL NETWORKS

Deep Learning. Dr. Johan Hagelbäck.

Semantic Segmentation on Resource Constrained Devices

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. ECE 289G: Paper Presentation #3 Philipp Gysel

arxiv: v1 [cs.lg] 2 Jan 2018

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

CS 7643: Deep Learning

A Fuller Understanding of Fully Convolutional Networks. Evan Shelhamer* Jonathan Long* Trevor Darrell UC Berkeley in CVPR'15, PAMI'16

Derek Allman a, Austin Reiter b, and Muyinatu Bell a,c

Semantic Segmentation in Red Relief Image Map by UX-Net

Research on Hand Gesture Recognition Using Convolutional Neural Network

GESTURE RECOGNITION FOR ROBOTIC CONTROL USING DEEP LEARNING

Automatic understanding of the visual world

Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising

Fully Convolutional Networks for Semantic Segmentation

Biologically Inspired Computation

Convolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3

Continuous Gesture Recognition Fact Sheet

arxiv: v1 [cs.cv] 15 Apr 2016

arxiv: v1 [stat.ml] 10 Nov 2017

A Fast Method for Estimating Transient Scene Attributes

arxiv: v3 [cs.cv] 18 Dec 2018

NU-Net: Deep Residual Wide Field of View Convolutional Neural Network for Semantic Segmentation

Can you tell a face from a HEVC bitstream?

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland

Multi-task Learning of Dish Detection and Calorie Estimation

Consistent Comic Colorization with Pixel-wise Background Classification

Understanding Convolution for Semantic Segmentation

Understanding Neural Networks : Part II

Understanding Convolution for Semantic Segmentation

Learning to Predict Indoor Illumination from a Single Image. Chih-Hui Ho

Correlating Filter Diversity with Convolutional Neural Network Accuracy

DSNet: An Efficient CNN for Road Scene Segmentation

SIMULATION-BASED MODEL CONTROL USING STATIC HAND GESTURES IN MATLAB

Automatic tumor segmentation in breast ultrasound images using a dilated fully convolutional network combined with an active contour model

Hand Gesture Recognition by Means of Region- Based Convolutional Neural Networks

arxiv: v1 [cs.cv] 27 Nov 2016

PROJECT REPORT. Using Deep Learning to Classify Malignancy Associated Changes

Semantic Localization of Indoor Places. Lukas Kuster

DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION

Modeling the Contribution of Central Versus Peripheral Vision in Scene, Object, and Face Recognition

Fully Convolutional Network with dilated convolutions for Handwritten

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

Wildlife Census via LSH-based animal tracking APOORV PATWARDHAN

arxiv: v1 [cs.ce] 9 Jan 2018

An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet

Comparison of Google Image Search and ResNet Image Classification Using Image Similarity Metrics

Driving Using End-to-End Deep Learning

Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks

Compact Deep Convolutional Neural Networks for Image Classification

Scene Text Eraser. arxiv: v1 [cs.cv] 8 May 2017

tsushi Sasaki Fig. Flow diagram of panel structure recognition by specifying peripheral regions of each component in rectangles, and 3 types of detect

Learning to Understand Image Blur

Going Deeper into First-Person Activity Recognition

Teaching icub to recognize. objects. Giulia Pasquale. PhD student

Thermal Image Enhancement Using Convolutional Neural Network

Park Smart. D. Di Mauro 1, M. Moltisanti 2, G. Patanè 2, S. Battiato 1, G. M. Farinella 1. Abstract. 1. Introduction

Preprocessing and Segregating Offline Gujarati Handwritten Datasheet for Character Recognition

Deep Neural Network Architectures for Modulation Classification

Locating the Query Block in a Source Document Image

Global Contrast Enhancement Detection via Deep Multi-Path Network

ON CLASSIFICATION OF DISTORTED IMAGES WITH DEEP CONVOLUTIONAL NEURAL NETWORKS. Yiren Zhou, Sibo Song, Ngai-Man Cheung

arxiv: v2 [cs.cv] 11 Oct 2016

Impact of Automatic Feature Extraction in Deep Learning Architecture

Radio Deep Learning Efforts Showcase Presentation

Semantic Segmented Style Transfer Kevin Yang* Jihyeon Lee* Julia Wang* Stanford University kyang6

Xception: Deep Learning with Depthwise Separable Convolutions

Synthetic View Generation for Absolute Pose Regression and Image Synthesis: Supplementary material

Wadehra Kartik, Kathpalia Mukul, Bahl Vasudha, International Journal of Advance Research, Ideas and Innovations in Technology

Generating an appropriate sound for a video using WaveNet.

arxiv: v1 [cs.cv] 26 Jul 2017

En ny æra for uthenting av informasjon fra satellittbilder ved hjelp av maskinlæring

GE 113 REMOTE SENSING

Practical Image and Video Processing Using MATLAB

Object Recognition with and without Objects

The Art of Neural Nets

Automatic point-of-interest image cropping via ensembled convolutionalization

Introduction to Machine Learning

Convolutional Neural Networks

LANDMARK recognition is an important feature for

Spectral Detection and Localization of Radio Events with Learned Convolutional Neural Features

Hyperspectral Image Denoising using Superpixels of Mean Band

SECURITY EVENT RECOGNITION FOR VISUAL SURVEILLANCE

Artificial Intelligence Machine learning and Deep Learning: Trends and Tools. Dr. Shaona

Vehicle Color Recognition using Convolutional Neural Network

Automatic Image Cropping and Selection using Saliency: an Application to Historical Manuscripts

COLOR IMAGE SEGMENTATION USING K-MEANS CLASSIFICATION ON RGB HISTOGRAM SADIA BASAR, AWAIS ADNAN, NAILA HABIB KHAN, SHAHAB HAIDER

How Convolutional Neural Networks Remember Art

The Cityscapes Dataset for Semantic Urban Scene Understanding SUPPLEMENTAL MATERIAL

EXIF Estimation With Convolutional Neural Networks

Convolu'onal Neural Networks. November 17, 2015

Face detection, face alignment, and face image parsing

Improving a real-time object detector with compact temporal information

arxiv: v1 [cs.cv] 9 Nov 2015 Abstract

Transcription:

Colorful Image Colorizations Supplementary Material Richard Zhang, Phillip Isola, Alexei A. Efros {rich.zhang, isola, efros}@eecs.berkeley.edu University of California, Berkeley 1 Overview This document is divided into three sections. Section 2 adds some clarifications regarding filtering grayscale images from the dataset, along with additional details about the network architecture. Section 3 contains a discussion of our algorithm in comparison to Cheng et al. [1]. Section 4 contains a more detailed explanation of the VGG category analysis presented in Section 4.1 and Figure 8 of the paper, along with an additional analysis on common category confusions after recolorization. 2 Clarifications 3 Comparison to Cheng et al. [1] 3.1 Network architecture Figure 3 in the paper showed a diagram of our network architecture. Table 1 in this document thoroughly lists the layers used in our architecture during training time. During testing, the temperature adjustment, softmax, and bilinear upsampling are all implemented as subsequent layers in a feed-forward network. Note the column showing the effective dilation. The effective dilation is the spacing at which consecutive elements of the convolutional kernel are evaluated, relative to the input pixels, and is computed by the product of the accumulated stride and the layer dilation. Through each convolutional block from conv1 to conv5, the effective dilation of the convolutional kernel is increased. From conv6 to conv8, the effective dilation is decreased. 3.2 Filtering Grayscale Images Some of the images in the Imagenet [2] dataset are in grayscale and were filtered out of training, validation, and testing sets. An image was considered to be grayscale if no pixel had a value of ab above 5 and was withheld from training and testing. The threshold was set to be conservative. A more aggressive threshold would remove more grayscale images at the expense of color images that happened to contain a very desaturated palette.

2 Zhang, Isola, Efros Our Network Architecture X C S D Sa De BN L data 224 3 - - - - - - conv1 1 224 64 1 1 1 1 - - conv1 2 112 64 2 1 1 1 - conv2 1 112 128 1 1 2 2 - - conv2 1 56 128 2 1 2 2 - conv3 1 56 256 1 1 4 4 - - conv3 2 56 256 1 1 4 4 - - conv3 3 28 256 2 1 4 4 - conv4 1 28 512 1 1 8 8 - - conv4 2 28 512 1 1 8 8 - - conv4 3 28 512 1 1 8 8 - conv5 1 28 512 1 2 8 16 - - conv5 2 28 512 1 2 8 16 - - conv5 3 28 512 1 2 8 16 conv6 1 28 512 1 2 8 16 - - conv6 2 28 512 1 2 8 16 - - conv6 3 28 512 1 2 8 16 conv7 1 28 256 1 1 8 8 - - conv7 2 28 256 1 1 8 8 - - conv7 3 28 256 1 1 8 8 conv8 1 56 128.5 1 4 4 - - conv8 2 56 128 1 1 4 4 - - conv8 3 56 128 1 1 4 4 - Table 1. Our network architecture. X spatial resolution of output, C number of channels of output; S computation stride, values greater than 1 indicate downsampling following convolution, values less than 1 indicate upsampling preceding convolution; D kernel dilation; Sa accumulated stride across all preceding layers (product over all strides in previous layers); De effective dilation of the layer with respect to the input (layer dilation times accumulated stride); BN whether BatchNorm layer was used after layer; L whether a 1x1 conv and cross-entropy loss layer was imposed 3.3 Training, Validation, and Testing Splits The full 1.3M images, minus the grayscale images were used for training. The first 2000 images in the Imagenet [2] validation set were used for validation. For the aggregated quantitative testing results shown in Table 1 of the paper, we used the last 10,000 images in the validation set, 9803 of that were color. For the qualitative VGG classification results shown in Figure 8, we used the last 48,000 images in the validation set were used, 47,023 of that contained color. Since we sorted on classification performance across 1000 categories for the VGG analysis, we used the full validation set for testing to maximize the number of samples per category. Quantitative comparisons to Cheng et al. [1] are not possible, as the authors have not released their code or test set results. We provide qualitative comparisons to the 23 test images in [1] on the attached website, which we obtained by

Colorful Image Colorization 3 Cheng et al. [1] Ours (1) Extract feature sets (a) 7x7 patch (b) DAISY Algorithm (c) FCN on 47 categories Feed-forward CNN (2) 3-layer NN regressor (3) Joint-bilateral filter Extract features. Train FCN [4] Train CNN from pixels to Learning on pre-defined categories. color distribution. Tune single Train 3-layer NN regressor. parameter on validation. 2688/1344 images from 1.3M/10k images from Dataset SUN [3] for train/test. ImageNet [2] for train/test. Limited variety with Broad and diverse only scenes. set of objects and scenes. Run-time 4.9s/image on 100ms/image in Caffe Matlab implementation on K40 GPU Table 2. Comparison to Cheng et al. [1] manually cropping from the paper. Our results are about the same qualitative level as [1]. Note that Cheng et al. [1] has several advantages in this setting: (1) the test images are from the SUN dataset [3], which we did not train on and (2) the 23 images were hand-selected from 1344 by the authors, and is not necessarily representative of algorithm performance. We were unable to obtain the 1344 test set results through correspondence with the authors. Additionally, we compare the methods on several important dimensions in Table 2: algorithm pipeline, learning, dataset, and run-time. Our method is faster, straightforward to train and understand, has fewer hand-tuned parameters and components, and has been demonstrated on a broader and more diverse set of test images than Cheng et al. [1]. 4 VGG Evaluation 4.1 Classification Performance In Section 4.1, we investigated the grayscale and re-colored images using the VGG classifier [5] for the last 48,000 images in the Imagenet validation set. For each category, we computed the top-5 classification performance on grayscale and recolorized images, a gray, a recolor [0, 1] C, where C = 1000 categories. We sorted the categories by a recolor a gray and plotted examples of some selected top and bottom classes in Figure 8 of the paper. The re-colored vs grayscale performance per category is shown in Figure 1, with top and bottom 50 categories highlighted. For the top example categories, the individual images are sorted by ascending rank of the correct classification of the recolored image, with tiebreakers on descending rank of the correct classification of the grayscale image. For the bottom example categories, the images are sorted in reverse, in order to highlight the instances when recolorization results in an errant classification relative to the grayscale image.

4 Zhang, Isola, Efros Fig. 1. Performance of VGG top-5 classification on recolorized images vs grayscale images per category. Test was done on last 48,000 images in Imagenet validation set. 4.2 Common Confusions To further investigate the biases in our system, we look at the common class confusions that often occur after image recolorization but not with the original ground truth image. We compute the rate of top-5 confusion C orig, C recolor [0, 1] C C, with ground truth colors and after recolorization. A value of C c,d = 1 means that every image in category c was classified as category d in the top-5. We find the class-confusion added after recolorization by computing A = C recolor C orig, and sort the off-diagonal entries. Figure 2 shows all C (C 1) off-diagonal entries of C recolor vs C orig, with the top 100 entries from A highlighted. For each category pair (c, d), we extract the images that contained the confusion after recolorization but not with the original colorization. We then sort the images in descending order of the classification score of the confused category. Examples for some top categories are shown in Figure 3. An image of a minibus is often colored yellow, leading to a misclassification as school bus. Animal classes are sometimes colored differently than ground truth, leading to misclassification to related species. Note that the colorizations are often visually realistic, even though they lead to a misclassification.

Colorful Image Colorization 5 Fig. 2. Top-5 confusion rates with recolorizations and original colors. Test was done on last 48,000 images in Imagenet validation set [2].

6 Zhang, Isola, Efros Fig. 3. Examples of some most-confused categories. Top rows show ground truth image. Bottom rows show recolorized images. Rank of common confusion in parentheses. Ground truth and confused categories after recolorization are labeled.

Colorful Image Colorization 7 References 1. Cheng, Z., Yang, Q., Sheng, B.: Deep colorization. In: Proceedings of the IEEE International Conference on Computer Vision. (2015) 415 423 2. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115(3) (2015) 211 252 3. Patterson, G., Hays, J.: Sun attribute database: Discovering, annotating, and recognizing scene attributes. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE (2012) 2751 2758 4. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2015) 3431 3440 5. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arxiv preprint arxiv:1409.1556 (2014)