Convolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3

Similar documents
NU-Net: Deep Residual Wide Field of View Convolutional Neural Network for Semantic Segmentation

Semantic Segmentation in Red Relief Image Map by UX-Net

Semantic Segmentation on Resource Constrained Devices

Road detection with EOSResUNet and post vectorizing algorithm

A Fuller Understanding of Fully Convolutional Networks. Evan Shelhamer* Jonathan Long* Trevor Darrell UC Berkeley in CVPR'15, PAMI'16

arxiv: v1 [cs.cv] 9 Nov 2015 Abstract

DSNet: An Efficient CNN for Road Scene Segmentation

Detection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 -

Lecture 23 Deep Learning: Segmentation

Colorful Image Colorizations Supplementary Material

Fully Convolutional Networks for Semantic Segmentation

Understanding Convolution for Semantic Segmentation

CS 7643: Deep Learning

Deep Learning. Dr. Johan Hagelbäck.

Fully Convolutional Network with dilated convolutions for Handwritten

arxiv: v1 [cs.cv] 19 Jun 2017

Understanding Convolution for Semantic Segmentation

ROAD RECOGNITION USING FULLY CONVOLUTIONAL NEURAL NETWORKS

Generating an appropriate sound for a video using WaveNet.

DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation

arxiv: v1 [cs.cv] 3 May 2018

Semantic Segmented Style Transfer Kevin Yang* Jihyeon Lee* Julia Wang* Stanford University kyang6

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

Synthetic View Generation for Absolute Pose Regression and Image Synthesis: Supplementary material

Classification Accuracies of Malaria Infected Cells Using Deep Convolutional Neural Networks Based on Decompressed Images

Learning to Predict Indoor Illumination from a Single Image. Chih-Hui Ho

Applying Convolutional Neural Networks to Per-pixel Orthoimagery Land Use Classification

Deep Multispectral Semantic Scene Understanding of Forested Environments using Multimodal Fusion

A COMPARATIVE ANALYSIS OF IMAGE SEGMENTATION TECHNIQUES

SCENE SEMANTIC SEGMENTATION FROM INDOOR RGB-D IMAGES USING ENCODE-DECODER FULLY CONVOLUTIONAL NETWORKS

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

Suneel Marthi Jose Luis Contreras. June 11, 2018 Berlin Buzzwords, Berlin, Germany

Lecture 7: Scene Text Detection and Recognition. Dr. Cong Yao Megvii (Face++) Researcher

Cascaded Feature Network for Semantic Segmentation of RGB-D Images

Introduction to Machine Learning

Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising

Improving Robustness of Semantic Segmentation Models with Style Normalization

Automatic tumor segmentation in breast ultrasound images using a dilated fully convolutional network combined with an active contour model

arxiv: v1 [stat.ml] 10 Nov 2017

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland

Data-Driven Segmentation of Post-mortem Iris Images

Designing Convolutional Neural Networks for Urban Scene Understanding

Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks

6. Convolutional Neural Networks

Evaluation of Image Segmentation Based on Histograms

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

Convolutional neural networks

Convolutional Networks Overview

On the Use of Fully Convolutional Networks on Evaluation of Infrared Breast Image Segmentations

Video Object Segmentation with Re-identification

arxiv: v1 [cs.cv] 15 Apr 2016

Convolutional Neural Network-Based Infrared Image Super Resolution Under Low Light Environment

Convolutional Neural Networks

DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION

Automatic understanding of the visual world

LifeCLEF Bird Identification Task 2016

The Cityscapes Dataset for Semantic Urban Scene Understanding SUPPLEMENTAL MATERIAL

Supplementary Material for Generative Adversarial Perturbations

Understanding Neural Networks : Part II

Derek Allman a, Austin Reiter b, and Muyinatu Bell a,c

Scene Perception based on Boosting over Multimodal Channel Features

TRACKING ROBUSTNESS AND GREEN VIEW INDEX ESTIMATION OF AUGMENTED AND DIMINISHED REALITY FOR ENVIRONMENTAL DESIGN.

Sketch-a-Net that Beats Humans

Durham Research Online

Wadehra Kartik, Kathpalia Mukul, Bahl Vasudha, International Journal of Advance Research, Ideas and Innovations in Technology

Consistent Comic Colorization with Pixel-wise Background Classification

Multi-task Learning of Dish Detection and Calorie Estimation

Research on Hand Gesture Recognition Using Convolutional Neural Network

fast blur removal for wearable QR code scanners

Radio Deep Learning Efforts Showcase Presentation

Deformable Convolutional Networks

Residual Conv-Deconv Grid Network for Semantic Segmentation

IMAGE RESTORATION WITH NEURAL NETWORKS. Orazio Gallo Work with Hang Zhao, Iuri Frosio, Jan Kautz

Semantic Localization of Indoor Places. Lukas Kuster

arxiv: v3 [cs.cv] 18 Dec 2018

Learning and Visualizing Modulation Discriminative Radio Signal Features

En ny æra for uthenting av informasjon fra satellittbilder ved hjelp av maskinlæring

arxiv: v2 [cs.lg] 13 Oct 2018

SIMULATION-BASED MODEL CONTROL USING STATIC HAND GESTURES IN MATLAB

A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION. Scott Deeann Chen and Pierre Moulin

arxiv: v1 [cs.cv] 4 Apr 2017

arxiv: v3 [cs.cv] 5 Dec 2017

LIGHT FIELD (LF) imaging [2] has recently come into

GAN-Assisted Two-Stream Neural Network for High-Resolution Remote Sensing Image Classification

Rapid Computer Vision-Aided Disaster Response via Fusion of Multiresolution, Multisensor, and Multitemporal Satellite Imagery

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to publication record in Explore Bristol Research PDF-document

The Art of Neural Nets

Lecture 11-1 CNN introduction. Sung Kim

Learning Rich Features for Image Manipulation Detection

Can you tell a face from a HEVC bitstream?

3D-Assisted Image Feature Synthesis for Novel Views of an Object

Thermal Image Enhancement Using Convolutional Neural Network

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. ECE 289G: Paper Presentation #3 Philipp Gysel

Biologically Inspired Computation

Leaf Counting with Deep Convolutional and Deconvolutional Networks

Wheeler-Classified Vehicle Detection System using CCTV Cameras

Global Contrast Enhancement Detection via Deep Multi-Path Network

یادآوری: خالصه CNN. ConvNet

Recent Advances in Image Deblurring. Seungyong Lee (Collaboration w/ Sunghyun Cho)

multiframe visual-inertial blur estimation and removal for unmodified smartphones

Transcription:

Convolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3 1 Olaf Ronneberger, Philipp Fischer, Thomas Brox (Freiburg, Germany) 2 Hyeonwoo Noh, Seunghoon Hong, Bohyung Han (POSTECH, Korea) 3 Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla (Cambridge, U.K.) 12 January 2018 Presented by: Gregory P. Spell

Outline 1 Image Segmentation 2 U-Net 3 DeconvNet 4 SegNet

Image Segmentation Goal is to perform pixel-wise classification on images. Useful for scene understanding (as in autonomous driving) Modern methods adopt deep architectures for image classification and extend them to pixel-wise labelling

General Segmentation Architecture Encoder Network: extract image features using deep convolutional network Each layer: bank of trainable convolutional filters, followed by ReLUs and max-pooling to downsample image features Decoder Network: upsamples feature map back to image resolution with final output having same number of channels as there are pixel classes Where the methods differ most dramatically Network mirrors encoder network Pixel-wise softmax over final feature map and cross-entropy loss function for training using SGD.

Encoder Schemes Both SegNet and DeconvNet use the convolutional network from VGG16 for image classification DeconvNet keeps two fully-connected layers from VGG16 SegNet discards fully connected layers to decrease number of parameters U-Net uses shallower network and no fully-connected layers

Decoder Networks: Upsampling Upsampling is needed to return feature map to higher resolution for pixel classification Pooling destroys spatial information, which is useful for precise localization To reconstruct (partially): store max-pooling indices from encoder and place each activation back to its original pooled location Pad zeros to other locations

Decoder Networks: Deconvolution Upsampling provides sparse feature maps Use trainable (de)convolution filters to densify maps

Decoder Analysis Unpooling captures example-specific structures Deconvolution captures class-specific shapes Hierarchical structure reconstructs shape details from coarse to fine

U-Net Specifics Designed for biomedical image processing: cell segmentation Data augmentation via applying elastic deformations, which is natural since deformation is a common variation of tissue Concatenate features from encoder network with corresponding arm of decoder network instead of reusing pooling indices Introduce a weight map to compensate for class imbalance of pixels and force network to learn borders between touching cells

U-Net Architecture

DeconvNet Specifics Instance-wise segmentation: use edge-box 1 algorithm to generate object proposals from which to predict pixel classes. Aggregate all proposal outputs for an image via pixel-wise max (or average) Two-stage training: train on easy examples (cropped bounding boxes centered on a single object) first and then more difficult examples (proposals from edge-box) 1 Edge Boxes: Locating Object Proposals from Edges, C.L. Zitnick and P. Dollar (ECCV, 2014)

DeconvNet Results Evaluate on the PASCAL VOC 2012 benchmark with the Intersection-over-Union (IoU between ground truth and predicted segmentations) metric E denotes an ensemble with Fully-Convolutional Nets (FCNs an earlier framework), and CRF denotes use of conditional random field post-processing 2 2 Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, V.Koltun (NIPS, 2011)

SegNet Architecture

SegNet Results CamVid Dataset: 3433 training road scenes SUN-RGB-D Dataset: 5250 training indoor scenes

Road Scene Examples

Indoor Scenes Examples